How to scrape inner text of attribute based on ID

Rob JH · August 15, 2013

Hi Guys,

How do I go about scraping just the values from the below div?

So I want to end up with:

peterjeffry86

jeffrypeter456

pjeffry27

Obviously suggestions are going to change so I think I need a regex? though i just cant figure out how to get going with regex in UBot.

<div id="username-suggestions" class="username-suggestions" style="display: block;">Available: <a href="">peterjeffry86</a><a href="">jeffrypeter456</a><a href="">pjeffry27</a></div>

Any help appreciated.

LoWrIdErTJ - BotGuru · August 15, 2013

are the href="" really href=""

or do they have links?

are there other links on the result?

LoWrIdErTJ - BotGuru · August 15, 2013

3 ways shown below you can do it..

load html("<div id=\"username-suggestions\" class=\"username-suggestions\" style=\"display: block;\">Available: <a href=\"\">peterjeffry86</a><a href=\"\">jeffrypeter456</a><a href=\"\">pjeffry27</a></div>")
clear list(%names)
add list to list(%names, $list from text($page scrape("<a href=\"\">", "</a>"), $new line), "Delete", "Global")
comment("

OR

")
clear list(%names)
add list to list(%names, $list from text($find regular expression($document text, "(?=<a href=\"\">).*?(?=</a>)"), $new line), "Delete", "Global")
comment("

OR

")
set(#var, $scrape attribute(<id="username-suggestions">, "innerhtml"), "Global")
clear list(%names)
add list to list(%names, $list from text($find regular expression(#var, "(?=<a href=\"\">).*?(?=</a>)"), $new line), "Delete", "Global")

positivity13 · August 15, 2013

Take the time to learn regex, it's the nuts once it clicks. Took me a little while but so glad took the time. I use it scrape most things now.

Tj - doesn't the last bit of the regex need the < sign. I.e (?<=</a>)

LoWrIdErTJ - BotGuru · August 15, 2013

Take the time to learn regex, it's the nuts once it clicks. Took me a little while but so glad took the time. I use it scrape most things now.

Tj - doesn't the last bit of the regex need the < sign. I.e (?<=</a>)

nope should work as is

positivity13 · August 15, 2013

Apologies, my bad

Rob JH · August 15, 2013

Great thanks BotGuru for the in depth answer, I do know Regex in general just could not get it going in UBot, Ill read up on the docs when I get the time as it is something I need to do, though only been using Ubot a couple of weeks. And in answer to your question yes the hrefs are like that, that's the exact code from the page, the reasoning behind the scrape is to get the page to do the hard work of creating a username as I think that is the smallest footprint when signing up rather than trying to put a username together yourself like #firstname #lastname #dob etc, there are just so many variations you would have to do to hide the footprint.

Rob JH · August 20, 2013

I found that the bottom 2 suggestions you gave were also showing the href before on the names list, this worked in the end based on your first suggestion:

I actually realised I could just get away with:

clear list(%SuggestedUsernames)
add list to list(%SuggestedUsernames, $list from text($page scrape("<a href=\"\">", "</a>"), $new line), "Delete", "Global")

As the page did not have any other empty hrefs though I could do with knowing why the Regex was bringing back the href before it and not just the username if possible please for future reference when I need the Regex.

solstudioim · December 3, 2014

This thread is really damn helpful. Thanks everyone!

Sign In

How to scrape inner text of attribute based on ID

Recommended Posts

Rob JH 0

Link to post

Share on other sites

LoWrIdErTJ - BotGuru 904

Link to post

Share on other sites

LoWrIdErTJ - BotGuru 904

Link to post

Share on other sites

positivity13 4

Link to post

Share on other sites

LoWrIdErTJ - BotGuru 904

Link to post

Share on other sites

positivity13 4

Link to post

Share on other sites

Rob JH 0

Link to post

Share on other sites

Rob JH 0

Link to post

Share on other sites

solstudioim 4

Link to post

Share on other sites

Join the conversation

Browse

Activity