chameleon 0 Posted December 9, 2014 Report Share Posted December 9, 2014 I'm pretty new to Ubot Studio and am attempting to scrape google serps using the following code (I'm sure this is familiar to most of you): add list to list(%scraped urls,$scrape attribute($element child(<class="r">),"href"),"Delete","Global") Typically, the href in class="r" results were listed as the url of each serp result, but with some user agent / ip combinations the hrefs on the serp page are the following, unencoded string: Old: http://www.kohls.com/catalog/barbie-dolls-doll-houses-toys.jsp New: http://www.google.com/url?url=http://www.kohls.com/catalog/barbie-dolls-doll-houses-toys.jsp%3FCN%3D4294732270%2B4294719501%2B4294719592&rct=j&frm=1&q=&esrc=s&sa=U&ei=QyGHVNOHDs7hoAT_14C4Aw&ved=0CFYQFjAL&usg=AFQjCNGb8RP1e5sUaZxBWxCwZ9aSQd8WSg If I click this new link from within ubots browser, the redirect does not occur. I have tried to click both with commands and manually from within the ubot browser. If I paste this same link into a browser outside of ubot, the redirect works fine. So, I'm assumig the issue is that the ubot browser does not handle this type of unecoded link like the rest of modern browsers, but perhaps I'm wrong? Anyone else seeing this? Thanks Quote Link to post Share on other sites
HelloInsomnia 1103 Posted December 9, 2014 Report Share Posted December 9, 2014 You can easily replace the first bit with nothing then for the last part of the url use a replace regular expression and use this: http://rubular.com/r/86ANfgxwtM Then you should have the bare url. Quote Link to post Share on other sites
chameleon 0 Posted December 9, 2014 Author Report Share Posted December 9, 2014 But I don't want to just navigate to that url, or just record the position, I'd like google to see it as a click on the serp page. So I need to be able to click that link and have it take me to the page. Cool regex site you linked to, btw. That will come in handy. Quote Link to post Share on other sites
HelloInsomnia 1103 Posted December 9, 2014 Report Share Posted December 9, 2014 But I don't want to just navigate to that url, or just record the position, I'd like google to see it as a click on the serp page. So I need to be able to click that link and have it take me to the page. Cool regex site you linked to, btw. That will come in handy. You said if you paste it outside of Ubot it works fine, I would grab that user agent from here: http://whatsmyuseragent.com/ and use that user agent in Ubot. Then you can start trying others to determine which ones work with the redirect and you can add them all into a list and randomly set one with each search. Quote Link to post Share on other sites
chameleon 0 Posted December 9, 2014 Author Report Share Posted December 9, 2014 Thanks HelloInsomnia, good idea. I guess I will have to weed out user agents from my list that return the google redirect url vs. the real url of the site. I assume that ubot browser limitations will prevent me from using serps with the non-encoded redirect url at all. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.