Abs* 12 Posted July 10, 2010 Report Share Posted July 10, 2010 Hi Guys im having a little difficulty scraping urls and saving them in the format that I would like to - An example url that I am working with is hxxp://www.warezforum.info/music/1592425-world-dance-hits-2010-a.html If you go to the url you will see that all the urls that are required by me are within the Code: box - I can manage to scrape each url on the page with no issues - however the issue that I am having is in cleaning the urls - when I scrape then save to file it is also scraping the word Code: and then a blank line - anothe url is hxxp://www.warezforum.info/tv-shows/1591961-friday-night-lights-s04e10-hdtv-xvid-lol.html this page is a little more tricky - inside of the Code: box there are also line breaks and headings for each type of link - what I want to do is to scrape all of the links and nothing else - so basically everything that starts with http:// just wondering if anyone could give me a hand - as there are line breaks and spacing then i am not able to control where the link will be saved to the csv file - i have also tried using "{1}" where {1} is the links url file when adding it to csv but still it will not retain the form. I have given a example image below to show what I mean - any help would be great - in the csv file I have to columns - the first is for the thread url which is scraped when i navigate to the page and the 2nd column is for the links - however due to the spaces and line breaks the links appear in the first column with the thread url and just the word Code: normally appears in the links column- thanks http://www.bigseotechniques.com/scraperimage.GIF Quote Link to post Share on other sites
pftg4 102 Posted July 11, 2010 Report Share Posted July 11, 2010 Ok try this works ok for the first url you gave in the post if you need more help PM me (if it works of course) Thx Pftg4Urls.ubot Quote Link to post Share on other sites
Abs* 12 Posted July 11, 2010 Author Report Share Posted July 11, 2010 Ok try this works ok for the first url you gave in the post if you need more help PM me (if it works of course) Thx Pftg4Urls.ubot Hi Thanks alot - Worked great for the first one but not the 2nd - you have given me a great idea with the scrape page - There is so much more control using it and it didnt even strike me to use it - instead ive been using scrape chosen attribute - I really like the way you have managed to remove the lines - Im going through it but really cant figure the entire process out - Would you mind walking me through the coding you have done - Especially the part wher you are using the commands set and replace thanks Quote Link to post Share on other sites
Abs* 12 Posted July 11, 2010 Author Report Share Posted July 11, 2010 Hi Ive managed to get it to work so that it scrapes all urls and leaves out the word code: One issue that I am having is with the saving to csv - I have 2 columns - one for the thread url and the other for downloaded links - The issue that I face now is that when i save to csv it leaves too many line breaks - I have attached 2 images showing how i am scraping and the other for the populated csv file thanks http://www.bigseotechniques.com/scraperimage1.GIFhttp://www.bigseotechniques.com/csvscreenshot.GIF Quote Link to post Share on other sites
musiclover2010 0 Posted September 9, 2010 Report Share Posted September 9, 2010 Thanks a lot for sharing the instruction. I adore your help. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.