Guest turbolapp Posted October 27, 2009 Report Share Posted October 27, 2009 When a search result yields many pages of results whats the best way to scrape the results to the Last page? Something within Looping I would imagine but what to set the Looping perimeter to? Quote Link to post Share on other sites
webautomationlab 21 Posted October 27, 2009 Report Share Posted October 27, 2009 I've been thinking this over for 15 minutes. I'm thinking.... NODE1: SET Variable loops (#loops) = 100, or 500 (upper limit of possible pages) NODE2: LOOP = #loops at the end of the loop run an IF command >IF >>NOT >>>>>>>(SEARCH PAGE for next link or some indicator of another page) >>THEN SET #loops = 0 If you try this I would be interested to know how it works out. Quote Link to post Share on other sites
Guest turbolapp Posted October 28, 2009 Report Share Posted October 28, 2009 Damn it's good to see you here in the forum, Guerilla, I know your smarts are going to come in handy with Seth's monstrosity. ;D It took me all morning but I figured out something that worked (and I'm proud of myself that I was thinking along the same lines as you.) I did: loop 200 if not outerhtml "Next" wildcard then stop script It's not pretty but it gets the job done. Quote Link to post Share on other sites
webautomationlab 21 Posted October 28, 2009 Report Share Posted October 28, 2009 Brilliant! ;D Quote Link to post Share on other sites
Guest turbolapp Posted October 30, 2009 Report Share Posted October 30, 2009 Well crap. I'm stuck again. While my method worked while it was a stand alone script, now that I've put that loop inside a bigger loop when it reaches the end of the small loop, it of course stops the whole script (big loop included). My challenge is how do I get the small loop to stop when it's done so that the big loop will know to resume??? the loops are making me loopy, I swear. I did a little video capture of what it does to try to illustrate. At the end of the actual script you'll see: Then Set next state $Next list Item That doesn't work (obviously)I need a "Then" That will send it back up to the bigger loop (and populated it with the next state, AK...ect). The big look fills in the states at the top the smaller loop goes through all the results from that state and grab all the business urls. http://screencast.com/t/pAHEx3jdq Quote Link to post Share on other sites
webautomationlab 21 Posted October 30, 2009 Report Share Posted October 30, 2009 Its hard to see in the video. Could you screen cap the bot nodes? Might take a couple screenshots to do it. You can PM it to me if you dont want to paste it publicly. Quote Link to post Share on other sites
Guest turbolapp Posted October 30, 2009 Report Share Posted October 30, 2009 http://img11.imageshack.us/img11/4427/13399860.png http://img62.imageshack.us/img62/2851/30497271.png http://img5.imageshack.us/img5/3423/1030200951326pm.png http://img21.imageshack.us/img21/8964/28244506.png http://img4.imageshack.us/img4/1399/38922199.png http://img4.imageshack.us/img4/4716/70113723.png Quote Link to post Share on other sites
webautomationlab 21 Posted October 30, 2009 Report Share Posted October 30, 2009 OK, so it goes to a state page, then it is supposed to execute the scraping loop. When the scraping loop is done, you want it to loop to the next state. I would save this bot under a new name before trying what I suggest so I dont wreck your work You've got IF NOT THEN SET But if there is no NEXT, you just want it to loop. You don't need to tell it next state because the loop automatically should move to the next state in the list Can you try IF NOT THEN DELAY 5 seconds? Quote Link to post Share on other sites
Guest turbolapp Posted October 31, 2009 Report Share Posted October 31, 2009 A filler, I like how you're thinking! But alas, I tried it, no good. Script just hangs. (no errors, but it hasn't removed the stopped button so I would say it's still just waiting for the next command and doesn't see the delay as a command.) Too bad, cause I liked that thinking outside the box. Any other ideas? Quote Link to post Share on other sites
Seth Turin 223 Posted October 31, 2009 Report Share Posted October 31, 2009 I'm gonna get back to the op here. the answer to how to go through a list of pages like you said is to use the while loop. a while loop is a loop with a conditional, so you could say while searchpage next >> do some stuff or whatever. this will loop as long as the word next >> appears on the page, and once it doesn't any more, it will resume the script. Quote Link to post Share on other sites
Guest turbolapp Posted October 31, 2009 Report Share Posted October 31, 2009 Ok, that appears to be working. The script has been running for an hour now with no errors. Thanks everyone! Quote Link to post Share on other sites
sweetman 1 Posted December 23, 2009 Report Share Posted December 23, 2009 I just wanted to say this is an excellent suggestion. Thanks for posting! Quote Link to post Share on other sites
slash30 2 Posted December 23, 2009 Report Share Posted December 23, 2009 I just wanted to say this is an excellent suggestion. Thanks for posting! Agreed. I've been avoiding doing this until I figured out a good way to do it. Great info. Quote Link to post Share on other sites
Aaron Nimocks 19 Posted December 23, 2009 Report Share Posted December 23, 2009 The latest video tutorial on my site in my sig shows how I do it. I do it this way for every site I have to go through pages. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.