steelersfan 38 Posted September 8, 2016 Report Share Posted September 8, 2016 I can't figure this out for the life of me. I am trying to scrape the page results stats from google, and output just the page number and results count. The HTML is:<div id="resultStats">Page 8 of about 18,800,000 results<nobr> (0.39 seconds) </nobr></div> Now I can easily grab all of the info with just a wildcard "Page 8 of about 18,800,000 results<nobr> (0.39 seconds)" But all I want is the "Page 8 of about 18,800,000 results" So I fired up HelloInsomnia's Regex Builder (awesome tool!) and got this output:(?<=\"\>).+?(?=\<nobr\>) Which should work for my purposes. However, I am at a loss as to where to input it! I select "Regular Expression" and "outerhtml" in the attributes dropdown, then replace the text in the area between that and the match dropdown. So it looks like this:<outerhtml=r"(?<=\\\"\\>).+?(?=\\<nobr\\>)">Nothing happens! Then I tried to only replace the part of that field that the regular expression should replace and get:<outerhtml=r"<div id=\"resultStats\"<outerhtml=r\"(?<=\\\\\\\"\\\\>).+?(?=\\\\<nobr\\\\>)\"> (0.39 seconds) </nobr></div>"> And still nothing! What am I doing wrong? I have tried to use innerhtml, outer,innertext, etc. Nothing works! Quote Link to post Share on other sites
pash 504 Posted September 8, 2016 Report Share Posted September 8, 2016 post your code. Quote Link to post Share on other sites
steelersfan 38 Posted September 8, 2016 Author Report Share Posted September 8, 2016 loop(#gspr) { if($search page(#surl)) { then { set(#pagenum,$scrape attribute(<outerhtml=r"(?<=\\<div\\ id\\=\\\"resultStats\\\"\\>).+?(?=\\<nobr\\>\\ \\(0\\.39\\ seconds\\)\\ \\;\\<\\/nobr\\>\\<\\/div\\>)">,"outerhtml"),"Global") alert("Url is found! {#pagenum}") stop script } else { click(<title="Next page">,"Left Click","No") wait($rand(3,7)) if($exists(<title="Next page">)) { then { } else { alert("Last Page Reached!") stop script } } } } } Quote Link to post Share on other sites
steelersfan 38 Posted September 8, 2016 Author Report Share Posted September 8, 2016 I don't understand what I am doing wrong, and looking at the code now, I am even more confused. I tried to replace the whole statement in the Advanced Element Editor area, next to attributes dropdown, just as I would place a star somewhere in the text there if I were doing a wildcard. And I used this generated regex to replace the whole line:(?<=\<div\ id\=\"resultStats\"\>).+?(?=\<nobr\>\ \(0\.39\ seconds\)\ \;\<\/nobr\>\<\/div\>) Still not working. Could it be bug? I don't know Quote Link to post Share on other sites
pash 504 Posted September 8, 2016 Report Share Posted September 8, 2016 try set(#Data,$find regular expression("Page 8 of about 18,800,000 results (0.39 seconds) Page 80 of about 800,000 results (0.39 seconds) Page 889 of about 100,000 results (0.39 seconds)","Page \\d+ of about [\\d,]+ results"),"Global") 1 Quote Link to post Share on other sites
steelersfan 38 Posted September 8, 2016 Author Report Share Posted September 8, 2016 I tried to utilize what you put, in the way I was doing this script, but it crashed ubot then { set(#pagenum,$scrape attribute(<outerhtml=r"Page \\d+ of about [\\d,]+ results">,"outerhtml"),"Global") alert("Url is found! {#pagenum}") stop script } It returned 11 results as well, oddly enough. It confuses me further, because I have no idea what I am doing wrong with the scrape attribute element editor. I believe that is the problem, but just can't understand how or why. Quote Link to post Share on other sites
darryl561 177 Posted September 8, 2016 Report Share Posted September 8, 2016 loop(#gspr) { if($search page(#surl)) { then { set(#pagenum,$scrape attribute(<id="resultStats">,"innertext"),"Global") set(#pagenum,$find regular expression(#pagenum,"Page \\d+ of about [\\d,]+ results"),"Global") alert("Url is found! {#pagenum}") stop script } else { click(<title="Next page">,"Left Click","No") wait($rand(3,7)) if($exists(<title="Next page">)) { then { } else { alert("Last Page Reached!") stop script } } } } } 1 Quote Link to post Share on other sites
steelersfan 38 Posted September 8, 2016 Author Report Share Posted September 8, 2016 That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it. Quote Link to post Share on other sites
steelersfan 38 Posted September 19, 2016 Author Report Share Posted September 19, 2016 That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it.Anyone? Looking to understand what went wrong here, need help... Quote Link to post Share on other sites
pash 504 Posted September 19, 2016 Report Share Posted September 19, 2016 Anyone? Looking to understand what went wrong here, need help...i user software "RegexBuddy" Quote Link to post Share on other sites
steelersfan 38 Posted September 19, 2016 Author Report Share Posted September 19, 2016 i user software "RegexBuddy"I am using HelloInsomnia's "Regex Builder", which seems great, it worked in the program, but not in ubot. That was my question above, I need help with those specific questions I just quoted above in post 8 and 9... 1 Quote Link to post Share on other sites
Solution HelloInsomnia 1103 Posted September 19, 2016 Solution Report Share Posted September 19, 2016 That works, thank you. But I still don't understand what I did wrong with my initial regex. Why did that regex work perfectly in the builder, but not in ubot? Why was my regex output from regex builder so complicated, and the correct ones so simple? And why does "find regular expression" work to get what I need, but the advanced element editor does not? Why couldn't I use just the advanced element editor to input the proper regex? When I did, it returned 11 results and held ubot up, or crashed it. If you use your original regex but first scrape the line you intended then it would do what you want (I believe) for example: set(#pagenum,$find regular expression($scrape attribute(<id="resultStats">,"outerhtml"),"(?<=\\\"\\>).+?(?=\\<nobr\\>)"),"Global") This scrapes the outerhtml of that element first, giving you something like: #pagenum: <div id="resultStats">About 3,510,000 results<nobr> (0.49 seconds) </nobr></div> Then the regex applies to that text and gives you a result such as: "About 3,510,000 results" You were trying to use regex to first find that container which isn't usually necessary so I won't go over that in this case because I don't want it to get long and confusing but if you ever need that feel free to PM me or ask here on the forums and I am sure somebody can help. Quote Link to post Share on other sites
steelersfan 38 Posted September 20, 2016 Author Report Share Posted September 20, 2016 Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection) So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor? Is this a correct assessment of where I messed up? If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct? I guess that regex is only ever needed there, when there is no other viable way for the element to be found? Quote Link to post Share on other sites
HelloInsomnia 1103 Posted September 20, 2016 Report Share Posted September 20, 2016 Oh, I think I get it. So the code I was working on in the advanced element editor, was going against the found info itself? (As the element was already found by the selection) So inputting the regex just overrode that, causing confusion and it not to work as I intended. And that is why people were using "find regular expression" instead of doing so in the advanced element editor? Is this a correct assessment of where I messed up? If so, then me using the advanced element editor was unfounded, and should be avoided for such situations. Correct? I guess that regex is only ever needed there, when there is no other viable way for the element to be found? If you want to use regex on a bit of text you would use find regular expression - probably this is mostly what you will use it for. Generally using a wildcard is enough for the element editor you probably will never have to use actual regex in the element editor. 1 Quote Link to post Share on other sites
steelersfan 38 Posted September 21, 2016 Author Report Share Posted September 21, 2016 Thank you Helloinsomnia! I understand where I went wrong now, a very important lesson. Hopefully it helps others to understand the use of regex and the advanced element editor better. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.