Jump to content
UBot Underground

Scrape Help Needed


Recommended Posts

We need your help expert ubot users.

We would like to scrape the urls from all the 821 companies show on this site below.

http://launchpoint.marketo.com/?show=all

 

Seen in the screen shots it requires to click on the company, scroll a bit down and off to the right is the "Website" which if clicked takes you to the company website.

What would be the best way to scrape all 821 companies websites of this page?

post-11525-0-03180700-1505748659_thumb.png

post-11525-0-92244700-1505748665_thumb.png

 

 

Thank you for any help we greatly appreciate your time.

post-11525-0-02357700-1505748671_thumb.png

Edited by firefox29
Link to post
Share on other sites

Here is one way,

navigate("http://launchpoint.marketo.com/?show=all","Wait")
clear list(%urls)
add list to list(%urls,$list from text($plugin function("XpathPlugin.dll", "$Generic Xpath Parser", $document text, "//li/a", "href", ""),$new line),"Delete","Global")

You need the free xpath plugin by bot factory. http://network.ubotstudio.com/store/product/free-xpath-plugin/

 

You can do it with $scrape attribute too but  think the above solution is best.

 


Regards,
CD

  • Like 1
Link to post
Share on other sites

Edit: seems I'm too slow today  :P  but I'll leave it here anyways.

 

You will need to add them to a list or table or something but this should get you pretty far:

navigate("http://launchpoint.marketo.com/?show=all","Wait")
wait for browser event("Everything Loaded","")
wait for element(<id="logo-footer">,"","Appear")
wait(2)
clear list(%containers)
add list to list(%containers,$scrape attribute(<class=w"*mod-*">,"innerhtml"),"Don\'t Delete","Global")
loop($list total(%containers)) {
    set(#url,$find regular expression($next list item(%containers),"(?<=href=\\\").*?(?=\\\")"),"Global")
    navigate(#url,"Wait")
    set(#website,$scrape attribute(<class="WebsiteURL">,"fullhref"),"Global")
}
  • Like 1
Link to post
Share on other sites

Helloinsomina, 

 

We have another site we would like to do this too, we were wondering how you found the "Containers" <class=*mod-*>  We've tried designating the scrape from all over the page and are unable to find the same starting point that's listed under your "Advanced Element editor:" Any help would be greatly appreciated, also is there any training sites you would suggest so that we can build our skills with Ubot?

 

 

Edit: seems I'm too slow today  :P  but I'll leave it here anyways.

 

You will need to add them to a list or table or something but this should get you pretty far:

navigate("http://launchpoint.marketo.com/?show=all","Wait")
wait for browser event("Everything Loaded","")
wait for element(<id="logo-footer">,"","Appear")
wait(2)
clear list(%containers)
add list to list(%containers,$scrape attribute(<class=w"*mod-*">,"innerhtml"),"Don\'t Delete","Global")
loop($list total(%containers)) {
    set(#url,$find regular expression($next list item(%containers),"(?<=href=\\\").*?(?=\\\")"),"Global")
    navigate(#url,"Wait")
    set(#website,$scrape attribute(<class="WebsiteURL">,"fullhref"),"Global")
}
Link to post
Share on other sites

Helloinsomina, 

 

We have another site we would like to do this too, we were wondering how you found the "Containers" <class=*mod-*>  We've tried designating the scrape from all over the page and are unable to find the same starting point that's listed under your "Advanced Element editor:" Any help would be greatly appreciated, also is there any training sites you would suggest so that we can build our skills with Ubot?

 

Any time you have a results set like this you want to scrape the outer container and then go inside it from there to grab the information. That is what I did there. For training I have a site launching soon, and have free Youtube videos here: https://www.youtube.com/channel/UC9YQ2-tCEZRgqj0KvhlOa3g

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...