Jump to content
UBot Underground

scrape IP:PORT notation from page, store in list


Recommended Posts

What is the best way to handle this?

 

I have an account with a provider who provides elite Google proxies on a daily basis (twice a day) but he only provides them by forum post through a private forum and will not deliver in a text file or anything else.

 

I wrote a bot that logs in, finds all posts in the last 12 hours and displays them.

 

Now I want to scan the page and scape every occurrence of IP:PORT on the page.

 

For example: 10.10.10.10:8080

 

How do I scrape each occurrence one by one into a list variable from the current loaded page?

 

This bot will run automatically every 12 hours.

Link to post
Share on other sites

Using regex along with document text you can pretty much scrape any websites. Include sockets and threads and you scrape a ton of proxies in no time. I added an example below. Good Luck!

 

Thanks! I will check this out and see if I can make it work.

Link to post
Share on other sites

Thanks, this worked (to some extent, see below) in v3, but v4 (which is where I am coding this script) it doesn't work. I think find regular expression doesn't work yet or something, I posted about it in the v4 section.

 

I also removed the list functionality completely, and just put the find regular expression in the save to file command as content and it worked.

 

But I am having very weird results.

 

If I use your code as is, it finds five proxies on this test site I am using:

http://atomintersoft...oxy/proxy-list/

 

My modified code where I put your find regular expression node directly into save to file "content" field, I can find 15.

 

I am not sure what is going on.

Link to post
Share on other sites

Thanks, this worked (to some extent, see below) in v3, but v4 (which is where I am coding this script) it doesn't work. I think find regular expression doesn't work yet or something, I posted about it in the v4 section.

 

I also removed the list functionality completely, and just put the find regular expression in the save to file command as content and it worked.

 

But I am having very weird results.

 

If I use your code as is, it finds five proxies on this test site I am using:

http://atomintersoft...oxy/proxy-list/

 

My modified code where I put your find regular expression node directly into save to file "content" field, I can find 15.

 

I am not sure what is going on.

 

nm, doesn't look like it isn't removing duplicates in the file.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...