Scrape Wildcard from %list or from #variable

whoami · March 4, 2014

Hello..

Lets suppose I have this $list or #variable:

//www.domain.com/folder/anything1
//www.domain.com/folder/anything2
//www.domain.com/folder/anything3
//www.domain.com/folder/anything4
//www.domain.com/folder/anything5
/something-different
/something-else
http://nothing.com/regular

I need to only scrape the links like //www.domain.com/folder/

and remove the other links.

What would be the easiest way to achieve this?

Thanks guys!

Bot-Factory · March 4, 2014

Hi.

You do that with regex. But it depends on how the links are structured.

If the ones you want to filter always start with //www you could use:

set(#tmp1, "//www.domain.com/folder/anything1
//www.domain.com/folder/anything2
//www.domain.com/folder/anything3
//www.domain.com/folder/anything4
//www.domain.com/folder/anything5
/something-different
/something-else
http://nothing.com/regular", "Global")
set(#tmp2, $find regular expression(#tmp1, "//www.+"), "Global")

If the matching part is different, you need to modify the regex of course.

Cheers

Dan

whoami · March 5, 2014

Thanks a lot!

So from now on + is to extend on regex.

I could have used also something like "+//www.+" right?

Bot-Factory · March 5, 2014

The . means any character.

the + just says One or more of.

If you are new to regex I highly recommend checking out:

http://www.ubotstudio.com/forum/index.php?/topic/15905-sell-learn-regular-expressions-video-course-2-hours-of-content/

Dan

Sign In

Scrape Wildcard from %list or from #variable

Recommended Posts

whoami 26

Link to post

Share on other sites

Bot-Factory 602

Link to post

Share on other sites

whoami 26

Link to post

Share on other sites

Bot-Factory 602

Link to post

Share on other sites

Join the conversation

Browse

Activity