Jump to content
UBot Underground

Trying To Scrape A Pages Contact Link


Recommended Posts

So im trying to create a bot that scrapes the  HREFs then with a regular expression grabs the any href link that contains the word "contact".  I created the regex in regexbuddy which is really good.  But the regex doest work in ubot.

clear all data
navigate("http://keywestwatertours.com/","Wait")
wait for browser event("Everything Loaded","")
set(#links,$page scrape("<a href=\"","\">"),"Global")
set(#links2,$find regular expression(#links,"(^[-a-zA-Z0-9]*contact\\.[a-zA-Z0-9]*)|(^[-a-zA-Z0-9]*contact[-a-zA-Z0-9]*.[a-zA-Z0-9]*)"),"Global")

The regex i created will grab anything like:

 

contact.html

yesimacontactpage.php

contact-us.asp

etc

 

See snapshot.

 

 

Thanks for all your help as always :)

 

 

 

 

post-9583-0-48532100-1444795607_thumb.png

Link to post
Share on other sites

Hi.

 

This site only gives:

contact.html

 

Your regex could look like:
set(#links2,$find regular expression(#links,".*contact.*"),"Global")

 

 

And if you want the full URLs and not just the short ones you can use:

set(#links,$scrape attribute(<tagname="a">,"fullhref"),"Global")

 

Cheers

Dan

Link to post
Share on other sites

Thanks Dan.  That worked great. Your way seems alot more cleaner and easier :).

 

I only added those extra pages on to my regex buddy to illustrate how my regex should work.  I should of informed in the post that the page would only return one contact link. 

 

 

My only question is how come the regex I created with regex buddy didnt work on ubot?

Link to post
Share on other sites

Thanks Dan.  That worked great. Your way seems alot more cleaner and easier :).

 

I only added those extra pages on to my regex buddy to illustrate how my regex should work.  I should of informed in the post that the page would only return one contact link. 

 

 

My only question is how come the regex I created with regex buddy didnt work on ubot?

 

Hi.

 

There are a lot of different Regex Engines out there. And depending on which one regex buddy uses and which one ubot uses, there might be different feature sets. 

 

If you take a look at:

https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

 

You will see that there are really a LOT of differences. 

 

Dan

Link to post
Share on other sites

Hi.

 

There are a lot of different Regex Engines out there. And depending on which one regex buddy uses and which one ubot uses, there might be different feature sets. 

 

If you take a look at:

https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

 

You will see that there are really a LOT of differences. 

 

Dan

 

 

Regexbuddy gives me the option to change to .net, java, perl etc.  Is that the engine you speak of?

Link to post
Share on other sites

Regexbuddy gives me the option to change to .net, java, perl etc.  Is that the engine you speak of?

Yes, that's what I was referring to. But I can't tell you which one Ubot is actually using. 

But something you can experiment with. 

 

Dan

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...