Jump to content
UBot Underground

How to scrape inner text of attribute based on ID


Recommended Posts

Hi Guys,

 

How do I go about scraping just the values from the below div?

 

So I want to end up with:

 

peterjeffry86

jeffrypeter456

pjeffry27

 

Obviously suggestions are going to change so I think I need a regex? though i just cant figure out how to get going with regex in UBot.

 

<div id="username-suggestions" class="username-suggestions" style="display: block;">Available: <a href="">peterjeffry86</a><a href="">jeffrypeter456</a><a href="">pjeffry27</a></div>

 

Any help appreciated.

 

 

Link to post
Share on other sites

3 ways shown below you can do it..

 

 

 

load html("<div id=\"username-suggestions\" class=\"username-suggestions\" style=\"display: block;\">Available: <a href=\"\">peterjeffry86</a><a href=\"\">jeffrypeter456</a><a href=\"\">pjeffry27</a></div>")
clear list(%names)
add list to list(%names$list from text($page scrape("<a href=\"\">""</a>"), $new line), "Delete""Global")
comment("


OR


")
clear list(%names)
add list to list(%names$list from text($find regular expression($document text"(?=<a href=\"\">).*?(?=</a>)"), $new line), "Delete""Global")
comment("


OR


")
set(#var$scrape attribute(<id="username-suggestions">"innerhtml"), "Global")
clear list(%names)
add list to list(%names$list from text($find regular expression(#var"(?=<a href=\"\">).*?(?=</a>)"), $new line), "Delete""Global")

 

 

 

  • Like 2
Link to post
Share on other sites

Take the time to learn regex, it's the nuts once it clicks. Took me a little while but so glad took the time. I use it scrape most things now.

 

 

 

Tj - doesn't the last bit of the regex need the < sign. I.e (?<=</a>)

 

 

nope should work as is

  • Like 1
Link to post
Share on other sites

Great thanks BotGuru for the in depth answer, I do know Regex in general just could not get it going in UBot, Ill read up on the docs when I get the time as it is something I need to do, though only been using Ubot a couple of weeks. And in answer to your question yes the hrefs are like that, that's the exact code from the page, the reasoning behind the scrape is to get the page to do the hard work of creating a username as I think that is the smallest footprint when signing up rather than trying to put a username together yourself like #firstname #lastname #dob etc, there are just so many variations you would have to do to hide the footprint.

Link to post
Share on other sites

I found that the bottom 2 suggestions you gave were also showing the href before on the names list, this worked in the end based on your first suggestion:

 

I actually realised I could just get away with:

 

clear list(%SuggestedUsernames)
add list to list(%SuggestedUsernames$list from text($page scrape("<a href=\"\">""</a>"), $new line), "Delete""Global")

 

As the page did not have any other empty hrefs though I could do with knowing why the Regex was bringing back the href before it and not just the username if possible please for future reference when I need the Regex.

Link to post
Share on other sites
  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...