Jump to content
UBot Underground

Simple Scrape - First Attempt


Recommended Posts

Hi,

I am just getting into ubot and trying to scrape some data from my company intranet.   In this example, I want to navigate to a page, change the date drop down, scrape the number of contacts, change the drop down to the next date, scrape the number of contacts, and continue through all available drop down dates.

 

So far I have been able to navigate to the page and change the drop down, but cannot seem to scrape the data that I need.  Please see image of data I'm trying to scrape (https://www.screencast.com/t/fLFBVxj0L)

 

I've tried right click and scrape

navigate("http://www.myintratnetsite.com/manage/report.htm?wldid=1579490","Wait")
change dropdown(<value="CURRENT">,"04/2017")
set(#var0,$scrape attribute(<innertext=w"*">,"value"),"Global")

 

But that gives me lots of useless data in the debugger :(

 

I've tried this https://www.screencast.com/t/LRFJBvjTDx

navigate("http://www.myintranetsite.com/manage/report.htm?wldid=1579490","Wait")

change dropdown(<value="CURRENT">,"04/2017")
set(#var0,$scrape attribute(<innertext=92>,"value"),"Global")

 

and I get zero data in the debugger.

 

I'm sure I'm missing something very basic.  Any guidance would be very much appreciated.

 

Thanks very much!
Chris

 

 

 

 

 

 

Link to post
Share on other sites

I havnt checked the site but heres a few things, I have checked the picture you uploaded

 

 

set(#var0,$scrape attribute(<innertext=92>,"value"),"Global")

 

this code is actually looking for an attribute called value, and not actually looking for the innertext inside the tag

 

the code for this is <span>92</span>

 

for the above code to work, the html would need to be <span value="SpanValue">92</span>

 

the result of your above would be "SpanValue" as that is the value of the value attribute of the element that has an innertext of 92

 

your code above needs to be 

 

set(#var0,$scrape attribute(<innertext=92>,"innertext"),"Global")

 

This code will only scrape if the innertext is 92, or any element with an innertext on 92, so it needs a better selector

 

 

this should work, ubot can be pretty difficult if the elements are setup not very good, so for this it might be difficult with ubot

 

In the code, I selected the id of the parent, 2 rows above the 92 value and used some functions to remove the whitespace, for this type of stuff like how you were using the devtools if you right click on the blue area, you can "copy element"(CSS) or "copy xpath" to that exact element, me and Dan both have free plugin for this, heres an example of simply using that right click in the web inspector and getting a path to the element, of course the plugins make this easier,ive changed the html slightly below but should work for your page too

load html("<div id=\"total_contacts\">
 <p class=\"stat\">
 <span>92</span>
 </p>
 </div>")
alert($trim($strip tags($scrape attribute(<id="total_contacts">,"outerhtml"))))
define $CSS Selector(#CSS Path, #Attribute To Scrape) {
    return($eval("document.querySelector(\"{#CSS Path}\").{#Attribute To Scrape}"))
}
alert($CSS Selector("#total_contacts > .stat >span", "innerText"))

Link to post
Share on other sites

OK. First of all, thank you both for taking the time to respond/help. I really appreciate it.   Since I'm so new, I'm struggling to get my mind around deliter's tactic, but I'm going to have to try to find the plugin referenced and play around with it. I'm sure it will help me solve a lot of problems as I work through this project. 

 

In this case, I may be close to a solution for this basic first step, using $element offset (thanks Pash!) and here is where I am.

 

#1 I was able to create a stripped down version of this intranet page for testing purposes http://www.bazoogle3.com/testscrape/ 

#2 I plan on loading multiple pages like this, and scraping whatever number is in the upper leftmost position (92 on this example page), into a list

#3 here is what i have so far:

 

navigate("http://www.bazoogle3.com/testscrape/","Wait")
wait(5)
set(#var0,"{$scrape attribute($element offset(<class="stat">,0),"innertext")}{$replace(#var0," CONTACTS",$nothing)}","Global")

 

So I'm able to scrape "92 Contacts", then I try to add a replace of " CONTACTS" so I'm left only with "92", BUT the result is "92 CONTACTS92".

 

I'm SO close, but can't seem to figure out how to remove the "92 CONTACTS" and only end up with only "92" as my variable result.

 

Any additional insight would be appreciated.

 

Thanks!
Chris

Edited by christojuan
Link to post
Share on other sites

Ah Wait! Just after I posted I tried this:

 

navigate("http://www.bazoogle3.com/testscrape/","Wait")
wait(5)
set(#var0,$scrape attribute($element offset(<class="stat">,0),"innertext"),"Global")
set(#var0a,$replace(#var0," CONTACTS",$nothing),"Global")

 

and it worked!!! :) Is that an efficient / the most efficient way to do what I was trying to do? 

 

any further insight/feedback would be appreciated.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...