Regex In The Advanced Element Editor

steelersfan · September 8, 2016

I can't figure this out for the life of me. I am trying to scrape the page results stats from google, and output just the page number and results count.

The HTML is:

<div id="resultStats">Page 8 of about 18,800,000 results<nobr> (0.39 seconds) </nobr></div>

Now I can easily grab all of the info with just a wildcard

"Page 8 of about 18,800,000 results<nobr> (0.39 seconds)"

But all I want is the "Page 8 of about 18,800,000 results"

So I fired up HelloInsomnia's Regex Builder (awesome tool!) and got this output:

(?<=\"\>).+?(?=\<nobr\>)

Which should work for my purposes. However, I am at a loss as to where to input it! I select "Regular Expression" and "outerhtml" in the attributes dropdown, then replace the text in the area between that and the match dropdown. So it looks like this:

<outerhtml=r"(?<=\\\"\\>).+?(?=\\<nobr\\>)">

Nothing happens!

Then I tried to only replace the part of that field that the regular expression should replace and get:

<outerhtml=r"<div id=\"resultStats\"<outerhtml=r\"(?<=\\\\\\\"\\\\>).+?(?=\\\\<nobr\\\\>)\"> (0.39 seconds) </nobr></div>">

And still nothing!

What am I doing wrong?

I have tried to use innerhtml, outer,innertext, etc. Nothing works!

pash · September 8, 2016

post your code.

steelersfan · September 8, 2016



loop(#gspr) {

    if($search page(#surl)) {

        then {

            set(#pagenum,$scrape attribute(<outerhtml=r"(?<=\\<div\\ id\\=\\\"resultStats\\\"\\>).+?(?=\\<nobr\\>\\ \\(0\\.39\\ seconds\\)\\&nbsp\\;\\<\\/nobr\\>\\<\\/div\\>)">,"outerhtml"),"Global")

            alert("Url is found! {#pagenum}")

            stop script

        }

        else {

            click(<title="Next page">,"Left Click","No")

            wait($rand(3,7))

            if($exists(<title="Next page">)) {

                then {

                }

                else {

                    alert("Last Page Reached!")

                    stop script

                }

            }

        }

    }

}

steelersfan · September 8, 2016

I don't understand what I am doing wrong, and looking at the code now, I am even more confused. I tried to replace the whole statement in the Advanced Element Editor area, next to attributes dropdown, just as I would place a star somewhere in the text there if I were doing a wildcard. And I used this generated regex to replace the whole line:

(?<=\<div\ id\=\"resultStats\"\>).+?(?=\<nobr\>\ \(0\.39\ seconds\)\&nbsp\;\<\/nobr\>\<\/div\>)

Still not working. Could it be bug? I don't know

pash · September 8, 2016

try

set(#Data,$find regular expression("Page 8 of about 18,800,000 results (0.39 seconds)
Page 80 of about 800,000 results (0.39 seconds)
Page 889 of about 100,000 results (0.39 seconds)","Page \\d+ of about [\\d,]+ results"),"Global")

steelersfan · September 8, 2016

I tried to utilize what you put, in the way I was doing this script, but it crashed ubot

then {
    set(#pagenum,$scrape attribute(<outerhtml=r"Page \\d+ of about [\\d,]+ results">,"outerhtml"),"Global")
    alert("Url is found! {#pagenum}")
    stop script
}

It returned 11 results as well, oddly enough.

It confuses me further, because I have no idea what I am doing wrong with the scrape attribute element editor. I believe that is the problem, but just can't understand how or why.

darryl561 · September 8, 2016



loop(#gspr) {

    if($search page(#surl)) {

        then {

            set(#pagenum,$scrape attribute(<id="resultStats">,"innertext"),"Global")

            set(#pagenum,$find regular expression(#pagenum,"Page \\d+ of about [\\d,]+ results"),"Global")

            alert("Url is found! {#pagenum}")

            stop script

        }

        else {

            click(<title="Next page">,"Left Click","No")

            wait($rand(3,7))

            if($exists(<title="Next page">)) {

                then {

                }

                else {

                    alert("Last Page Reached!")

                    stop script

                }

            }

        }

    }

}

steelersfan · September 8, 2016

That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

steelersfan · September 19, 2016

That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

Anyone? Looking to understand what went wrong here, need help...

pash · September 19, 2016

Anyone? Looking to understand what went wrong here, need help...

i user software "RegexBuddy"

steelersfan · September 19, 2016

i user software "RegexBuddy"

I am using HelloInsomnia's "Regex Builder", which seems great, it worked in the program, but not in ubot. That was my question above, I need help with those specific questions I just quoted above in post 8 and 9...

HelloInsomnia · September 19, 2016

That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.

If you use your original regex but first scrape the line you intended then it would do what you want (I believe) for example:

set(#pagenum,$find regular expression($scrape attribute(<id="resultStats">,"outerhtml"),"(?<=\\\"\\>).+?(?=\\<nobr\\>)"),"Global")

This scrapes the outerhtml of that element first, giving you something like:

#pagenum: <div id="resultStats">About 3,510,000 results<nobr> (0.49 seconds) </nobr></div>

Then the regex applies to that text and gives you a result such as: "About 3,510,000 results"

You were trying to use regex to first find that container which isn't usually necessary so I won't go over that in this case because I don't want it to get long and confusing but if you ever need that feel free to PM me or ask here on the forums and I am sure somebody can help.

steelersfan · September 20, 2016

Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection)

So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor?

Is this a correct assessment of where I messed up?

If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct?

I guess that regex is only ever needed there, when there is no other viable way for the element to be found?

HelloInsomnia · September 20, 2016

Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection)

So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor?

Is this a correct assessment of where I messed up?

If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct?

I guess that regex is only ever needed there, when there is no other viable way for the element to be found?

If you want to use regex on a bit of text you would use find regular expression - probably this is mostly what you will use it for. Generally using a wildcard is enough for the element editor you probably will never have to use actual regex in the element editor.

steelersfan · September 21, 2016

Thank you Helloinsomnia! I understand where I went wrong now, a very important lesson. Hopefully it helps others to understand the use of regex and the advanced element editor better.

Regex In The Advanced Element Editor

Recommended Posts

steelersfan 38

Link to post

Share on other sites

pash 504

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

pash 504

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

darryl561 177

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

pash 504

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

HelloInsomnia 1103

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

HelloInsomnia 1103

Link to post

Share on other sites

steelersfan 38

Link to post

Share on other sites

Join the conversation