Jump to content
UBot Underground

Scraping Attributes Inside Of Another Attribute


Recommended Posts

Hi,

 

I'm trying to scrape a website that shows a lot of ads around the listings. I'm only trying to scrape the organic listings. The way the site is set up it has different classes for the ads and the organic listings. 

 

So what I am trying to do is select the "organic" class, then select all of the "business" classes that are all inside of the "organic" class and save them to a list. Is this possible with UBot?

 

P.S. The "business" class exists in the ads classes as well, so I need to do it this way I think.

Link to post
Share on other sites

You can possibly use an element child inside of an scrape attribute for this. Or you may have to scrape the organic class and then use a regular expression. It's hard to say without seeing the code but it's likely one of those will work.

Link to post
Share on other sites

no that would not work,scrape attribute functions cannot be nested,the "AND" selector may work by inputting both classes,although I've yet to really understand how the and selector works

 

you could use child element selector too but from my experience it it can be hit or miss

try this

 

hard to see without the site,but the below code should remove the entire ads class so all the Business class that is inside the Ads class will be removed from the page,leaving the remaining business class to be those remaining in the organic business class

 

change attribute(<class="adsListingHere">,"innerhtml","")
add list to list(%business,$scrape attribute(<class="businessClass">,"innertext"),"Delete","Global")

Link to post
Share on other sites

heres example code

load html("<div class=\"ads\">
all content in this class will dissapear
<a class=\"businessListings\" href=\"www.businessOne.com\">www.businessOne.com</a>
</div>
</br>
<div class=\"OrganicBusiness\">
all content in this class will be scraped
<a class=\"businessListings\" href=\"www.businessTwo.com\">www.businessTwo.com</a>
</div>
</br>
<div class=\"ads\">
all content in this class will dissapear
<a class=\"businessListings\" href=\"www.businessThree.com\">www.businessThree.com</a>
</div>
</br>
<div class=\"OrganicBusiness\">
all content in this class will be scraped
<a class=\"businessListings\" href=\"www.businessFour.com\">www.businessFour.com</a>
</div>
</br>
")
wait for browser event("Everything Loaded","")
change attribute(<class="ads">,"innerhtml","")
add list to list(%organicCompanies,$scrape attribute(<class="businessListings">,"innertext"),"Delete","Global")

Link to post
Share on other sites

Hmmm... I still can't get this to work. If I use change attribute, seemingly nothing happens. 

 

EDIT: I got it to work using deliter's suggestion. However, instead of 'innerhtml', 'innertext' works on change attribute.

Edited by awesome sauce
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...