troyinmogi 0 Posted December 18, 2010 Report Share Posted December 18, 2010 Hi, I'm fairly new here, so forgive me if I'm making some newbie mistakes... Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved... Now, I'm trying to issue a "site:mydomain.com" command and scrape the results... I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all). I can't get #1 or #2 to work... I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever... Any help would be greatly appreciated. Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one? Thanks Quote Link to post Share on other sites
MiriamMB 63 Posted December 18, 2010 Report Share Posted December 18, 2010 Hi, I'm fairly new here, so forgive me if I'm making some newbie mistakes... Was having a hard time even getting the Google Search button to click, but after the Choose By Attribute I put a 2 second delay to allow Google time to display that "Press Enter to Search" message and that did the trick. Problem one solved... Now, I'm trying to issue a "site:mydomain.com" command and scrape the results... I want to #1 scrape the returned results count at the top just under the search box and #2 scrape the list of URLs (right now just trying to get it working for page #1, but ideally I want it to paginate through all results and scrape them all). I can't get #1 or #2 to work... I've followed the tutorial for Scrape_Chosen_Attribute with the Ezine example and even duplicated it's success. I'm using that type of an approach, but am having no luck whatsoever... Any help would be greatly appreciated. Also, when scraping just a single item of data, is there a way to scrape it straight to a variable? Or do you have to use a list even if it is a list of one? Thanks Choose the results, add some wildcard for the number and the seconds, and then set the scrape to a variable. If you want to watch the results on the UI you can watch the variable with the UI stat monitor.If you want to scrape to a variable, just set a variable with the set command and place the scrape chosen attribute or the page scrape within the set command under content. I have attached an image: Quote Link to post Share on other sites
MiriamMB 63 Posted December 18, 2010 Report Share Posted December 18, 2010 As for scraping google, I was having a hard time, so I tried some regular expression and it seems to work, however you might want to read up on Regular expression a bit just so you know how to move things around in case you hit a block. Hopefully someone can show you a much easier and simpler way to get the links you want. Here is my script: regex scraping google.ubot Quote Link to post Share on other sites
Abs* 12 Posted December 21, 2010 Report Share Posted December 21, 2010 hi - Ive attached a google scraper - not sure if it will help - hope it does One issue that im facing though is that a few of my customers who also have a google scraper are not able to scrape the serps with it any idea thanksGoogleScraper.ubot Quote Link to post Share on other sites
Guerrilla 19 Posted December 21, 2010 Report Share Posted December 21, 2010 To scrape links on google serp page select this outerhtml wildcard: <A class=l *>*</A> To scrape the page links to subsequent results use this outer html wildcard: <A class=fl href="/search?q=*>*</A> PS. Set advanced search options so 100 results display per page and turn google instant off. Quote Link to post Share on other sites
Abs* 12 Posted December 21, 2010 Report Share Posted December 21, 2010 To scrape links on google serp page select this outerhtml wildcard: <A class=l *>*</A> To scrape the page links to subsequent results use this outer html wildcard: <A class=fl href="/search?q=*>*</A> PS. Set advanced search options so 100 results display per page and turn google instant off. HI - yes I was using advanced search before then many users were not turning off google instant - so changed it The thing that baffles me is that I have used the same wildcard as you have noted in the example above and it works great for me and many others - I was troubleshooting a user in india today and for some reason it jjust wouldnt scrape the results could you test the script and see if it works for you athanks abs Quote Link to post Share on other sites
UBotBuddy 331 Posted December 21, 2010 Report Share Posted December 21, 2010 Just so you know (if you didn't) Google changes the result pages more than any other pages. This causes more bot failures for me than anything else. IMHO Google only believes in automation on their side of the fence and not ours. "Do as I say and Not as I do!" Google's mantra. Quote Link to post Share on other sites
Abs* 12 Posted December 21, 2010 Report Share Posted December 21, 2010 lol - I agree do you think that different Ip ranges use different id's etc I just cant figure out why it hasnt caused an issue for me for over 4 months and still continues to work like a charm - however a few users its not working for ALso Buddy - I have a quick question if you dont mind When im coding using IF commands - then in the past I have been doing like the following If (choose by attribute xxx) Then click selected I noticed that after the upgrade some systems are not working well with this but when i coded like If (choose by attribute xxx) Then (choose by attribute xxx) click selected and it solved the issue I have coded the majority of my bots according to the first example - I need to get an update out and im terrified that I will be hit with thousands of support queries do you know if this has been noted before - or can I assume it was a glitch Quote Link to post Share on other sites
UBotBuddy 331 Posted December 21, 2010 Report Share Posted December 21, 2010 abs, I cannot say as I dont code the way you do. I typically search and then choose by attribut. IF (SEARCH) THEN (Choose by attribute & Click) Else (Choose by attribut etc) Quote Link to post Share on other sites
Abs* 12 Posted December 21, 2010 Report Share Posted December 21, 2010 hmm - ok -well it was working like a charm -and still does for me - and it also allowed me not to duplicate the if choose by attribute the problem that I have is that my bot is big and keeps crashing while i try to change I mean its taken me like 1 hours to just rmeove the branding and add a splash page thanks Quote Link to post Share on other sites
UBotBuddy 331 Posted December 21, 2010 Report Share Posted December 21, 2010 Wow! That's a big bot if it takes that long. Quote Link to post Share on other sites
Abs* 12 Posted December 21, 2010 Report Share Posted December 21, 2010 well - its not that its taking long to save - its just that every time i try to save it crashes - and its become a hit and miss saving now the bot is autobacklinkbomb.com thanks Quote Link to post Share on other sites
Guerrilla 19 Posted December 21, 2010 Report Share Posted December 21, 2010 If you can show me the html source code that doesn't scrape then I can see if I can work it out. It should be as simple as setting up an if statement that detects what format the results page is in. Quote Link to post Share on other sites
meter 145 Posted December 21, 2010 Report Share Posted December 21, 2010 I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine Quote Link to post Share on other sites
Abs* 12 Posted December 24, 2010 Report Share Posted December 24, 2010 I've coded a google scraper the past couple of days. I noticed there are 3 different kinds of SERPs pages it returns, depending on what datacenter you query. Get all 3 down and your scraper should start working fine hi - any idea what the 3 different queries are? or how i can access the different datacenteres to check the current code that im using is the following <A class=l*</A> thanks Quote Link to post Share on other sites
UBotBuddy 331 Posted December 24, 2010 Report Share Posted December 24, 2010 * * Also, as someone mentioned in another thread, I also agree that Google probably has other varying Datacenter code for their search engine rendering. That's probably why mimicking other people's versions is so difficult. Quote Link to post Share on other sites
Guerrilla 19 Posted December 24, 2010 Report Share Posted December 24, 2010 You could try the "google global" firefox plugin for getting different countries results pages. That might help you see some variations. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.