Jump to content
UBot Underground

Recommended Posts

Hi guys,

 

I'm building a bot that will go and scrape a google page and put all of the links on the page into a list, and then allow the user to provide a list of their urls and keywords.

 

I want the bot to compare the lists (the list of scraped urls against the list of user provided urls) to see #1, if any of the user urls are in the scraped url list, and #2, to note the position of the user urls within that list.

 

I just can't seem to get a grasp on it.

 

So if the google search creates a list like this:

 

1- domain.com

2- otherdoman.com

3- moredomain.com

4- dodoman.com

5- roroman.com

 

and the user has provided a list of domains like this:

1- udomain1.com

2-dodoman.com

3-udomain3.com

 

Then the bot should find dodoman.com and note that it is currently in the #4 position.

 

Can anyone help me get on track with this?

Link to post
Share on other sites

all you need is a function  that will return the position of an url, from a list of them stored in a list.

 

build the return function first, then just pass the users input as a parameter into the function each time. Thats one way to do it.

Link to post
Share on other sites

GoGetta, you're a Rock Star and I appreciate the help!

 

It's not working for me as is, but I'll probably be able to tweak on it a bit to get it moving in my direction. I honestly didn't know where to even start on it.

 

Thanks again.

Link to post
Share on other sites

GoGetta - If you happen to come back to this page, could you give me details on the code below?

 

add list to list(%list_b, $find regular expression(#list_b, "[a-zA-Z0-9\\-]*\\.[a-zA-Z]\{2,4\}"), "Delete", "Global")
 

This appears to be the sticking point... Nothing is being added into list b

Link to post
Share on other sites

GoGetta - If you happen to come back to this page, could you give me details on the code below?

 

add list to list(%list_b, $find regular expression(#list_b, "[a-zA-Z0-9\\-]*\\.[a-zA-Z]\{2,4\}"), "Delete", "Global")

 

This appears to be the sticking point... Nothing is being added into list b

 

Yeah, that was  to match only the root domain when comparing to the next list item in a. I'll be the first to admit that I am not to good with regex. I used the regex cause  I wasn't to sure if you wanted to match a domain even if the url was a subpage. If it doesn't matter you dont need to use regex for this.

 

But take a look at this.

 

http://www.regular-expressions.info/wordboundaries.html

 

When I tested it with the example you provided above it worked. 

 

Here it is again without using regex, but the list b can't contain any subpages or it wont match the current a item.

match lists.ubot

Link to post
Share on other sites

Give this a try:

 

Edit: improved it now a bit

http\:\/\/(www\.|)[\-\.\;\:\%\&\=\+\$\,\w+@]+[a-zA-Z\.]{2,4}+(\/[\-\.\#\?\;\:\%\&\=\+\$\,\w+@]+|)

I don't have ubot open though so while it should work if it doesn't let me know.

Link to post
Share on other sites

Didn't work for me, but I might just be doing it wrong. I pasted your string directly into the find-regular-expression box and got an error.

 

I can't begin to tell you guys how much I appreciate all of the help you're giving.

Link to post
Share on other sites

Didn't work for me, but I might just be doing it wrong. I pasted your string directly into the find-regular-expression box and got an error.

 

I can't begin to tell you guys how much I appreciate all of the help you're giving.

 

Okay here it is in Ubot friendly mode:

http\:\/\/[\-\+\.\;\:\%\&\=\$\,\w\@]+[a-zA-Z\.]{2,4}(\/[\-\+\.\#\?\;\:\%\&\=\$\,\w\@]+|)
Link to post
Share on other sites

Wow! Perfect and smooth!

Thanks so much for all of the help. You guys went above and beyond, and I deeply appreciate it.

 

Gogetta - Thanks so much.

 

HelloInsomnia - Thank you man... This was really kicking my butt.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...