Chris M 55 Posted June 18, 2013 Report Share Posted June 18, 2013 I'm trying to learn how to scrape until the bot has no more pages to scrape or at least 'x' amount deep. Here is the code I have so far but I can't understand yet how to make it click remaining pages or 'x' pagesand keep adding titles. clear cookies navigate("http://www.copyblogger.com/blog/", "Wait") wait for browser event("Everything Loaded", "") clear list(%titles) add list to list(%titles, $scrape attribute(<h2,class="entry-title">, "innertext"), "Delete", "Global") Any suggestions? Quote Link to post Share on other sites
Legend 181 Posted June 18, 2013 Report Share Posted June 18, 2013 Here's one way: clear cookiesui text box("# of Pages:", #pages)navigate("http://www.copyblogger.com/blog/", "Wait")wait for browser event("Everything Loaded", "")clear list(%titles)loop(#pages) { add list to list(%titles, $scrape attribute(<h2,class="entry-title">, "innertext"), "Delete", "Global") if($exists(<innertext="Next Page»">)) { then { click(<innertext="Next Page»">, "Left Click", "No") wait(2) } }} Quote Link to post Share on other sites
danoctav 7 Posted June 18, 2013 Report Share Posted June 18, 2013 Solution nr. 1: navigate("http://www.copyblogger.com/blog/", "Wait") wait for browser event("Everything Loaded", "") clear list(%titles) loop while($exists(<innertext="Next Page»">)) { add list to list(%titles, $scrape attribute(<class="entry-title">, "innertext"), "Delete", "Global") click(<innertext="Next Page»">, "Left Click", "No") wait for browser event("Everything Loaded", "") wait(3) } If exists Next Page» then click it until not exist.... Quote Link to post Share on other sites
danoctav 7 Posted June 18, 2013 Report Share Posted June 18, 2013 Solution nr 2: ui text box("Nr. of pages:", #nr_of_pages) clear list(%titles) set(#i, 1, "Global") loop(#nr_of_pages) { navigate("http://www.copyblogger.com/blog/page/{#i}/", "Wait") wait for browser event("Everything Loaded", "") wait(3) add list to list(%titles, $scrape attribute(<class="entry-title">, "innertext"), "Delete", "Global") increment(#i) } for this particular case,can be used also this...If you want to scrape all pages for titles,without to introduce a number, scrape the maximum number, <193> in this case ,as total nr_of_pages) Quote Link to post Share on other sites
Chris M 55 Posted June 18, 2013 Author Report Share Posted June 18, 2013 Thank you guys, this really opened my eyes to possibilities Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.