Jump to content
UBot Underground

Regex In The Advanced Element Editor


Go to solution Solved by HelloInsomnia,

Recommended Posts

I can't figure this out for the life of me. I am trying to scrape the page results stats from google, and output just the page number and results count.

 

The HTML is:

<div id="resultStats">Page 8 of about 18,800,000 results<nobr> (0.39 seconds) </nobr></div>

 

Now I can easily grab all of the info with just a wildcard

 

"Page 8 of about 18,800,000 results<nobr> (0.39 seconds)"

 

But all I want is the "Page 8 of about 18,800,000 results"

 

So I fired up HelloInsomnia's Regex Builder (awesome tool!) and got this output:

(?<=\"\>).+?(?=\<nobr\>)

 

Which should work for my purposes. However, I am at a loss as to where to input it! I select "Regular Expression" and "outerhtml" in the attributes dropdown, then replace the text in the area between that and the match dropdown. So it looks like this:

<outerhtml=r"(?<=\\\"\\>).+?(?=\\<nobr\\>)">

Nothing happens!

 

Then I tried to only replace the part of that field that the regular expression should replace and get:

<outerhtml=r"<div id=\"resultStats\"<outerhtml=r\"(?<=\\\\\\\"\\\\>).+?(?=\\\\<nobr\\\\>)\"> (0.39 seconds) </nobr></div>">

 

And still nothing!

 

What am I doing wrong?

 

I have tried to use innerhtml, outer,innertext, etc. Nothing works!

 

Link to post
Share on other sites

loop(#gspr) {
if($search page(#surl)) {
then {
set(#pagenum,$scrape attribute(<outerhtml=r"(?<=\\<div\\ id\\=\\\"resultStats\\\"\\>).+?(?=\\<nobr\\>\\ \\(0\\.39\\ seconds\\)\\&nbsp\\;\\<\\/nobr\\>\\<\\/div\\>)">,"outerhtml"),"Global")
alert("Url is found! {#pagenum}")
stop script
}
else {
click(<title="Next page">,"Left Click","No")
wait($rand(3,7))
if($exists(<title="Next page">)) {
then {
}
else {
alert("Last Page Reached!")
stop script
}
}
}
}
}

Link to post
Share on other sites

I don't understand what I am doing wrong, and looking at the code now, I am even more confused. I tried to replace the whole statement in the Advanced Element Editor area, next to attributes dropdown, just as I would place a star somewhere in the text there if I were doing a wildcard. And I used this generated regex to replace the whole line:

(?<=\<div\ id\=\"resultStats\"\>).+?(?=\<nobr\>\ \(0\.39\ seconds\)\&nbsp\;\<\/nobr\>\<\/div\>)

 

Still not working. Could it be  bug? I don't know :(

Link to post
Share on other sites

try

set(#Data,$find regular expression("Page 8 of about 18,800,000 results (0.39 seconds)
Page 80 of about 800,000 results (0.39 seconds)
Page 889 of about 100,000 results (0.39 seconds)","Page \\d+ of about [\\d,]+ results"),"Global")

 

  • Like 1
Link to post
Share on other sites

I tried to utilize what you put, in the way I was doing this script, but it crashed ubot :(

then {
    set(#pagenum,$scrape attribute(<outerhtml=r"Page \\d+ of about [\\d,]+ results">,"outerhtml"),"Global")
    alert("Url is found! {#pagenum}")
    stop script
}

It returned 11 results as well, oddly enough.

 

It confuses me further, because I have no idea what I am doing wrong with the scrape attribute element editor. I believe that is the problem, but just can't understand how or why.

Link to post
Share on other sites

loop(#gspr) {
if($search page(#surl)) {
then {
set(#pagenum,$scrape attribute(<id="resultStats">,"innertext"),"Global")
set(#pagenum,$find regular expression(#pagenum,"Page \\d+ of about [\\d,]+ results"),"Global")
alert("Url is found! {#pagenum}")
stop script
}
else {
click(<title="Next page">,"Left Click","No")
wait($rand(3,7))
if($exists(<title="Next page">)) {
then {
}
else {
alert("Last Page Reached!")
stop script
}
}
}
}
}
  • Like 1
Link to post
Share on other sites

That works, thank you. But  I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

Link to post
Share on other sites
  • 2 weeks later...

That works, thank you. But  I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

Anyone? Looking to understand what went wrong here, need help...

Link to post
Share on other sites

i user software "RegexBuddy"

I am using HelloInsomnia's "Regex Builder", which seems great, it worked in the program, but not in ubot. That was my question above, I need help with those specific questions I just quoted above in post 8 and 9... :(

  • Like 1
Link to post
Share on other sites
  • Solution

That works, thank you. But  I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

 

If you use your original regex but first scrape the line you intended then it would do what you want (I believe) for example:

set(#pagenum,$find regular expression($scrape attribute(<id="resultStats">,"outerhtml"),"(?<=\\\"\\>).+?(?=\\<nobr\\>)"),"Global")

This scrapes the outerhtml of that element first, giving you something like:

 

#pagenum: <div id="resultStats">About 3,510,000 results<nobr> (0.49 seconds) </nobr></div>

 

Then the regex applies to that text and gives you a result such as: "About 3,510,000 results"

 

You were trying to use regex to first find that container which isn't usually necessary so I won't go over that in this case because I don't want it to get long and confusing but if you ever need that feel free to PM me or ask here on the forums and I am sure somebody can help.

Link to post
Share on other sites

Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection)

 

So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor?

 

Is this a correct assessment of where I messed up?

 

If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct?

 

I guess that regex is only ever needed there, when there is no other viable way for the element to be found?

Link to post
Share on other sites

Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection)

 

So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor?

 

Is this a correct assessment of where I messed up?

 

If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct?

 

I guess that regex is only ever needed there, when there is no other viable way for the element to be found?

 

If you want to use regex on a bit of text you would use find regular expression - probably this is mostly what you will use it for. Generally using a wildcard is enough for the element editor you probably will never have to use actual regex in the element editor.

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...