roguehat 0 Posted November 5, 2009 Report Share Posted November 5, 2009 Im triing to scrape say 3 pages. I able to scrape the first page have ubot nav to the second page ,but thi sis were I run into issues. I cant get it to scrape the second page an save in the same text file as the first page. Do I need to make seperate save files for each page I will have it navigate and scrape? Any ideas will help -thanks. Quote Link to post Share on other sites
hypex 2 Posted November 5, 2009 Report Share Posted November 5, 2009 Hi roguehat. Once you have gone through the three pages (looping, manually etc) I would add all the scraped contents to one variable. Then at the end of the 3 pages write the variable to a text file. For example list variable mydata page one -> scrape -> add_to_list -> mydata page two-> scrape -> add_to_list -> mydata page three-> scrape -> add_to_list -> mydata variable save to file -> path -> sub mydata Quote Link to post Share on other sites
roguehat 0 Posted November 5, 2009 Author Report Share Posted November 5, 2009 thanks, Im on my way now Quote Link to post Share on other sites
1nspire_ 0 Posted November 16, 2009 Report Share Posted November 16, 2009 I am trying to do something similar. I scraped an url list and now I am looping through the list. I want to scrape some data on each url and save it to 1 file. Currently all I am saving is the data from the last visited url. How can I have ubot not overwrite but add to an existing list. Quote Link to post Share on other sites
bluegoat 24 Posted November 17, 2009 Report Share Posted November 17, 2009 Try something like this to append a line to a file. http://img693.imageshack.us/img693/976/append.jpg Quote Link to post Share on other sites
roguehat 0 Posted November 17, 2009 Author Report Share Posted November 17, 2009 I'm not following, what is the second part of the example . I see saving the file like usual , but the second part "{1}{2}{3} ? edit: Thanks EMP, you got me sorted out.Coffee on me. Quote Link to post Share on other sites
1nspire_ 0 Posted November 18, 2009 Report Share Posted November 18, 2009 My solution to this issue was I scraped the data in the main loop that loops through the pages. I saved outside of the conditional "while" loop and then cleared list. I modified the GooglePPC scraper that nevele wrote to illustrate the changes. You can download the source bot here. Its not as robust as the script nevele wrote but it gets the job done. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.