moulinier 0 Posted January 20, 2010 Report Share Posted January 20, 2010 Hi everybody, I bought UBot a few days ago and now I am trying to create my first bot based on the things I learned when watching the tutorial videos. However, I am a bit lost now. What I would like to accomplish is the following: 1) Navigating to www.google.com and searching for a keyword I typed in. - I solved this as it is very easy. 2) Getting the bot to collect the first 30 organic search results while ignoring Google shopping results as well as Google Adwords ads. - I did not find a solution for this. 3) Getting the following data from each of these websites: a) URL Meta Titlec) Meta Descriptiond) Meta Keywords 4) Putting this information in a csv or any other file which I can use with Excel. The data of each website should be in one row while the first column is "URL", the second column "Meta Title", etc. Could anybody point me into the right direction? Many thanks to everybody and happy botting,Josef Quote Link to post Share on other sites
1nspire 5 Posted January 20, 2010 Report Share Posted January 20, 2010 2) Getting the bot to collect the first 30 organic search results while ignoring Google shopping results as well as Google Adwords ads. - I did not find a solution for this. Typically with Google you have title, description and then site in the results. You want to scrape the url in between <cite></cite> tags 3) Getting the following data from each of these websites: a) URL Meta Titlec) Meta Descriptiond) Meta Keywords Just set up a loop that will loop through each url and when the page loads use add to list and insert the document constants. 4) Putting this information in a csv or any other file which I can use with Excel. The data of each website should be in one row while the first column is "URL", the second column "Meta Title", etc. Watch this video to help you understand the scraping and csv creation. Give it your best shot and in a day or so if you are still having trouble let us know. Honestly the best way to learn ubot is to watch and follow the videos plus study bot source. There is a bunch of bot source floating around here. Quote Link to post Share on other sites
webautomationlab 21 Posted January 20, 2010 Report Share Posted January 20, 2010 This is exactly what I have been working on. Right now I have two bots. One scrapes the serp. The other scrapes the head attributes and URL as you want. The issue is, the second scraper for kws, description, title, url doesn't nav consistently over a large block of URLs. Until that is resolved, you will be limited in how small of a sample you can scrape. I'm considering lowering mine to 20 results because doing 100 is not stable. Quote Link to post Share on other sites
1nspire 5 Posted January 20, 2010 Report Share Posted January 20, 2010 Here is a bot I made. Now I only tested on 20 results max but I am using a delay instead of wait in the nav. Its set for 3 seconds but if you are having trouble over large lists you may need to up the delay. Oh and I was wrong on the <cite> tag. I am scraping the <A class=l*</A> area as a wildcard.meta_harvester.ubot 1 Quote Link to post Share on other sites
Natureboy 3 Posted January 20, 2010 Report Share Posted January 20, 2010 Here is a bot I made. Now I only tested on 20 results max but I am using a delay instead of wait in the nav. Its set for 3 seconds but if you are having trouble over large lists you may need to up the delay. Oh and I was wrong on the <cite> tag. I am scraping the <A class=l*</A> area as a wildcard. thats what i was gonna say...those delays help out in ways you wouldnt believe...the bot flies thru commands so fast that its already on to the next thing before the last page finished loading or whatever. Quote Link to post Share on other sites
webautomationlab 21 Posted January 20, 2010 Report Share Posted January 20, 2010 It would be handy to be able to dial down the playback speed. It's one of the few features I miss from iMacros. Quote Link to post Share on other sites
webautomationlab 21 Posted January 20, 2010 Report Share Posted January 20, 2010 Here is a bot I made. Now I only tested on 20 results max but I am using a delay instead of wait in the nav. Its set for 3 seconds but if you are having trouble over large lists you may need to up the delay. Oh and I was wrong on the <cite> tag. I am scraping the <A class=l*</A> area as a wildcard.I'm going to use your bot and see if it will do my job. Thanks. A lot. +Rep One note, if you scrape PDF links in Google, they will throw nasty errors when you try to get the meta information. I manually removed PDF links between bot 1 and bot 2. Some sort of URL checking would need to be added if something like this was used heavily. Quote Link to post Share on other sites
tooltrainer 12 Posted January 21, 2010 Report Share Posted January 21, 2010 It would be handy to be able to dial down the playback speed. It's one of the few features I miss from iMacros. This is another reason I like to use "wait for" instead of a timed delay, wait finish, etc. It doesn't work in every instance but it's definitely the vast majority. Can even use wait for followed by a timed delay to get just a hair of extra pause, but never too much. I hate having a bot sit there when I can clearly see it's ready to move on! LOL Jonathan Quote Link to post Share on other sites
webautomationlab 21 Posted January 21, 2010 Report Share Posted January 21, 2010 This is another reason I like to use "wait for" instead of a timed delay, wait finish, etc. It doesn't work in every instance but it's definitely the vast majority. Can even use wait for followed by a timed delay to get just a hair of extra pause, but never too much. I hate having a bot sit there when I can clearly see it's ready to move on! LOL JonathanRight, but when I'm hitting a list scraped from Google, there is nothing consistent to wait for. I could do an IF, EITHER with a bunch of WAIT FORs I suppose. I don't know if it is Ubot or IE, but surfing is not robust. I can't feed it 500 sites and go out for the night and reasonably think it will complete 250 or less without locking up. Quote Link to post Share on other sites
tooltrainer 12 Posted January 21, 2010 Report Share Posted January 21, 2010 I've had good luck when I know there could be one of several possible things on the next page, using a while loop with the 'either' eval and a bunch of search page nodes in the eval. Don't know if that'll help you at all though... Jonathan Quote Link to post Share on other sites
moulinier 0 Posted January 21, 2010 Author Report Share Posted January 21, 2010 Dear 1nspire, many thanks for your helpful posting and especially for the bot. I will immediately start working on it. All the best to you,Josef Quote Link to post Share on other sites
webautomationlab 21 Posted January 21, 2010 Report Share Posted January 21, 2010 I've had good luck when I know there could be one of several possible things on the next page, using a while loop with the 'either' eval and a bunch of search page nodes in the eval. Don't know if that'll help you at all though... JonathanIt helps a little. It just seems like a lot of extra coding work to do what should be a reasonable expectation out of the box. We should be able to nav across a variety of pages (from the top 100 in the google serps no less) with a slight delay between each, without freezing up, and without requiring nodes and nodes of error checking. Quote Link to post Share on other sites
tooltrainer 12 Posted January 21, 2010 Report Share Posted January 21, 2010 Yep, gotta agree with you there. Should work reliably without all the extra code bloat, I just like finding solutions even when things don't behave like I want 'em to. Jonathan Quote Link to post Share on other sites
greencat 18 Posted January 23, 2010 Report Share Posted January 23, 2010 One thing I'm starting to find work quite reliably is waiting for "</body>". 99% (maybe even 99.999%) of html pages should have them - and it's a good indication that everything worth scraping has loaded as </body> tends to come at the end of a page's code. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.