Bob The Builder 62 Posted August 15, 2011 Report Share Posted August 15, 2011 What is the best way to handle this? I have an account with a provider who provides elite Google proxies on a daily basis (twice a day) but he only provides them by forum post through a private forum and will not deliver in a text file or anything else. I wrote a bot that logs in, finds all posts in the last 12 hours and displays them. Now I want to scan the page and scape every occurrence of IP:PORT on the page. For example: 10.10.10.10:8080 How do I scrape each occurrence one by one into a list variable from the current loaded page? This bot will run automatically every 12 hours. Quote Link to post Share on other sites
Gogetta 263 Posted August 15, 2011 Report Share Posted August 15, 2011 Using regex along with document text you can pretty much scrape any websites. Include sockets and threads and you scrape a ton of proxies in no time. I added an example below. Good Luck!Proxy Scraper Example.ubot Quote Link to post Share on other sites
Bob The Builder 62 Posted August 15, 2011 Author Report Share Posted August 15, 2011 Using regex along with document text you can pretty much scrape any websites. Include sockets and threads and you scrape a ton of proxies in no time. I added an example below. Good Luck! Thanks! I will check this out and see if I can make it work. Quote Link to post Share on other sites
Bob The Builder 62 Posted August 15, 2011 Author Report Share Posted August 15, 2011 Thanks, this worked (to some extent, see below) in v3, but v4 (which is where I am coding this script) it doesn't work. I think find regular expression doesn't work yet or something, I posted about it in the v4 section. I also removed the list functionality completely, and just put the find regular expression in the save to file command as content and it worked. But I am having very weird results. If I use your code as is, it finds five proxies on this test site I am using:http://atomintersoft...oxy/proxy-list/ My modified code where I put your find regular expression node directly into save to file "content" field, I can find 15. I am not sure what is going on. Quote Link to post Share on other sites
Bob The Builder 62 Posted August 15, 2011 Author Report Share Posted August 15, 2011 Thanks, this worked (to some extent, see below) in v3, but v4 (which is where I am coding this script) it doesn't work. I think find regular expression doesn't work yet or something, I posted about it in the v4 section. I also removed the list functionality completely, and just put the find regular expression in the save to file command as content and it worked. But I am having very weird results. If I use your code as is, it finds five proxies on this test site I am using:http://atomintersoft...oxy/proxy-list/ My modified code where I put your find regular expression node directly into save to file "content" field, I can find 15. I am not sure what is going on. nm, doesn't look like it isn't removing duplicates in the file. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.