ShorePatrol 0 Posted April 14, 2020 Report Share Posted April 14, 2020 (edited) So I'm new to UBot, but I am learning with the tutorials and trial and error. I went through the Google scraping video tutorial (which is very old) and obviously Google is not the same.... The video is using the class within the A tag to scrape the URLS....Google no longer has a class within the A tag.... I've been able to easily scrape Bing by grabbing the innertext of the "cite" tags. Google however, is formatted like this: <a href="https://theURLweWant.com/terms-of-service"ping="/url?sa=t&source=web&rct=j&url=https://theURLweWant.com/terms-of-service&ved=2ahUKEwi1hOCV4eboAhVhhHIEHXAzCRcQFjAHegQICBAB"><br><h3 class="LC20lb DKV0Md">The Title Tag of The Website</h3><div class="TbwUpd NJjxre"><cite class="iUh30 bc tjvcx">URLShortened<span class="eipWBe"> › terms-of-service</span></cite></div></a>So how would I pull out the href URL? Thanks! Edited April 14, 2020 by HelloInsomnia Add code tags so the url isnt truncated Quote Link to post Share on other sites
HelloInsomnia 1103 Posted April 14, 2020 Report Share Posted April 14, 2020 Easiest way is to use xpath, there is a free plugin with an xpath parser here: https://network.ubotstudio.com/forum/index.php/topic/20002-free-heopas-custom-plugin-thread-lock-sqlite-thread-variables-email-ini-clipboard/ I know xpath is just one more thing to learn but its worth it considering many sites will be tricky (at best) to parse using the built in selectors. One thing you can do in Chrome is right click -> inspect on the element you want. Be sure its highlighted in the element inspector in dev tools and then right click -> copy -> copy xpath. This will give you the xpath for this particular element. In this case you probably want a list and so it wouldn't help much but sometimes that can work. Here is a pure Ubot example: add list to list(%urls,$scrape attribute($element child(<class="r">),"href"),"Delete","Global") And using xpath instead: add list to list(%urls,$plugin function("HeopasCustom.dll", "$Heopas Xpath Parser", $document text, "//div[@class=\'r\']/a", "href", ""),"Delete","Global") Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.