Jump to content
UBot Underground

scrape data and clean it up?


Recommended Posts

Hey everyone,

 

My name is Shannon Herod and I am brand new. I've actually never done any sort of programming before in my life and this is my first day with UBot.

 

I have a question that may be pretty easy to answer.

 

I am trying to scrape data from a table that has a lot of data in it. There is no identifying characters around the data I want to scrape and if I use the page scrape function I get a lot of miscellaneous code that I do not need.

 

I have attached an image that shows the exact data that I'm trying to scrape…

 

 

basically, what I'm trying to do is scrape the data in an organized format so that I can put it in a list that can be used to recall the information in a logical manner.

 

I.e. I want to pull the name, e-mail, and address separately.

 

Hopefully this makes sense.

 

Any help would be greatly appreciated.

 

Talk soon,

 

Shannon Herod

post-2772-0-13243200-1302034007_thumb.png

Link to post
Share on other sites

Hi Shannon, With that kind of page to scrape my only option is to use regex. Its not recommended for newbie's simply because its very complicated at first but it is a powerful tool when most of your bots are made to scrape.

 

Go to this page http://ubotstudio.com/forum/index.php?/topic/6489-regex-101-and-beyond/ for regex introduction.

 

Here is the sample I created whois.ubot for you to study.

 

Regards

  • Like 1
Link to post
Share on other sites

Hi Shannon, With that kind of page to scrape my only option is to use regex. Its not recommended for newbie's simply because its very complicated at first but it is a powerful tool when most of your bots are made to scrape.

 

Go to this page http://ubotstudio.com/forum/index.php?/topic/6489-regex-101-and-beyond/ for regex introduction.

 

Here is the sample I created whois.ubot for you to study.

 

Regards

 

oh boy! My head is hurting just by looking at that :-)

Link to post
Share on other sites

Shannon,

 

Let's not try to make your head explode over regex.

 

It can be done without it. Check out my bot.

 

The 1st node clears my list.

The 2nd node will actually scrape the data you circled in your image.

 

You will note that I am using the same variable name "var". I am simply overwriting the contents with a better version as I modify the info.

 

The 3rd node I am replacing 6 blank spaces with a new line. Simple enough.

 

The 4th node I am also changing these two characters "> into a new line because the email address will be messed up if I leave it.

 

The 5th node I get rid of the HTML which makes the email address hyperlinked.

 

The 6th node actually makes a list so that I can get rid of two unwanted lines;the first line (position 0) which is a blank line and the second instance of my domain name that makes up the email address (position 2).

 

The 7th node removes the item at position 2.

 

The 8th node removes the item at position 0.

 

The 9th and final node saves the list to "whois.txt" on your desktop.

 

I'll take a +1 if you like it. lol

BotBuddys-whois-scrape.ubot

  • Like 1
Link to post
Share on other sites

Awwwwwwwww Shut up! I'm blushing!!!!! :wub:

 

On a side note... I am the guy in the picture. Ironically enough my wife, who is the beautiful women in the picture, is also named Shannon. Yes, Shannon and Shannon

 

Talk soon,

 

Shannon

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...