wealthtutorsllc 0 Posted May 2, 2019 Report Share Posted May 2, 2019 (edited) I'm trying to scrape a google url only, and my regex works in the online tester but for some reason in Ubot studio it does not match this is my regex https:\/\/www\.google\.com\/recaptcha\/api2\/payload[^"]+ this is what html looks like <img class="rc-image-tile-44" src="https://www.google.com/recaptcha/api2/payload?p=06AOLTBLTa-vvrnIWo7i1_Eix6gNYTpUzPqU12orIsTHtCreitubGC47NpV5W65VenezqAvBB9phZcGYE6v9OYJ4I3s8nroGgeAOWQbVj7Weydh5KaDmwq73-QYKgZkttOzh-8LLq-ef4dOfOgxPBxA-694yvaaFvUTCjUVq7FDrR0eKhx3KV9jXM&k=6LfYMygUAAAAAM1da_u97ejUiRNeG_b2opEPAKkv" style="top:0%; left: -300%">I've copied the html to the regex tester so it is how it appears on the website. The weird thing is when I remove the "payload" section I get all the other google links on there, so for some reason "payload" is not matching, so I have no idea what it could be, i'm not that great with regex so any help would be appreciated. Edited May 2, 2019 by HelloInsomnia Formatted code so you can view the entire link Quote Link to post Share on other sites
HelloInsomnia 1103 Posted May 2, 2019 Report Share Posted May 2, 2019 Just tried it and it appears to be working, give this a try: set(#img,$find regular expression("<img class=\"rc-image-tile-44\" src=\"https://www.google.com/recaptcha/api2/payload?p=06AOLTBLTa-vvrnIWo7i1_Eix6gNYTpUzPqU12orIsTHtCreitubGC47NpV5W65VenezqAvBB9phZcGYE6v9OYJ4I3s8nroGgeAOWQbVj7Weydh5KaDmwq73-QYKgZkttOzh-8LLq-ef4dOfOgxPBxA-694yvaaFvUTCjUVq7FDrR0eKhx3KV9jXM&k=6LfYMygUAAAAAM1da_u97ejUiRNeG_b2opEPAKkv\" style=\"top:0%; left: -300%\">","https:\\/\\/www\\.google\\.com\\/recaptcha\\/api2\\/payload[^\"]+"),"Global") Quote Link to post Share on other sites
wealthtutorsllc 0 Posted May 2, 2019 Author Report Share Posted May 2, 2019 It doesn't match anything still. Is google using some other characters that look similar is that possible? This is the page im trying to scrape the img from https://www.pof.com/register.aspx?id=1 I couldn't even get it with xpath I've been struggling with this for a couple of days. Quote Link to post Share on other sites
wealthtutorsllc 0 Posted May 2, 2019 Author Report Share Posted May 2, 2019 I just figured it out. Looks like the content was within an iframe. I guess the document text doesn't get that info. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.