Jump to content
UBot Underground

Scraping Google Problems


Recommended Posts

So I'm new to UBot, but I am learning with the tutorials and trial and error. I went through the Google scraping video tutorial (which is very old) and obviously Google is not the same....

 

The video is using the class within the A tag to scrape the URLS....Google no longer has a class within the A tag....

 

I've been able to easily scrape Bing by grabbing the innertext of the "cite" tags.

 

Google however, is formatted like this:

<a href="https://theURLweWant.com/terms-of-service"ping="/url?sa=t&source=web&rct=j&url=https://theURLweWant.com/terms-of-service&ved=2ahUKEwi1hOCV4eboAhVhhHIEHXAzCRcQFjAHegQICBAB"><br><h3 class="LC20lb DKV0Md">The Title Tag of The Website</h3><div class="TbwUpd NJjxre"><cite class="iUh30 bc tjvcx">URLShortened<span class="eipWBe"> › terms-of-service</span></cite></div></a>

So how would I pull out the href URL?

 

Thanks!

Edited by HelloInsomnia
Add code tags so the url isnt truncated
Link to post
Share on other sites

Easiest way is to use xpath, there is a free plugin with an xpath parser here: https://network.ubotstudio.com/forum/index.php/topic/20002-free-heopas-custom-plugin-thread-lock-sqlite-thread-variables-email-ini-clipboard/

 

I know xpath is just one more thing to learn but its worth it considering many sites will be tricky (at best) to parse using the built in selectors.

 

One thing you can do in Chrome is right click -> inspect on the element you want. Be sure its highlighted in the element inspector in dev tools and then right click -> copy -> copy xpath. This will give you the xpath for this particular element. In this case you probably want a list and so it wouldn't help much but sometimes that can work.

 

Here is a pure Ubot example:

add list to list(%urls,$scrape attribute($element child(<class="r">),"href"),"Delete","Global")

And using xpath instead:

add list to list(%urls,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", $document text, "//div[@class=\'r\']/a", "href", ""),"Delete","Global")
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...