Just Started......small Issues!

itexspert · December 18, 2014

Hey there guys i have a bit of a problem since i literally never used Regex at all so i never had to use it inside Ubot so i need small help to determine the best way to scrape Information.

So i am doing this "Test Ride" on this particular Example

Check the image attached first

http://www.ripoffreport.com/r/Routes-Car-and-Truck-Rental-Calgary/Calgary-Alberta/Routes-Car-and-Truck-Rental-Calgary-FRAUD-CHARGING-MY-CREEDIT-CARD-WITHOUT-PERMISSION-1193004

I am trying to Scrape this info with regex the Phone Number, Website and Category

Thing is Phone Number sometimes look different check the below link:

http://www.ripoffreport.com/r/UHAUL/internet/UHAUL-JEFF-vermonts-regional-manager-uhauls-website-said-they-had-a-trailor-avaiable-a-1189474

See those are my issues i tried Wildcarding it a while back but after few loops it makes errors,so i don't even have a code to show you since i never had to use regex before inside Ubot but i have to learn it and all examples were confusing to me so i was hoping you could help me on this simple example with the regex and how to import it into Ubot

(I have no issues with Logic commands)

I simply don't know a lot about regex!

If anyone has the time to look at my example i would greatly appreciate it, if you could help me with this!

EDIT: Ok i just started to check again and it looks like i was a lot younger when i first tried to scrape so this is my new code,seems it works!

But i would still like to know more about how to do this in regex,the procedure!

set(#Category,$scrape attribute(<outerhtml=w"<li> Category: <a href=\"http://www.ripoffreport.com/*\">*</a> </li>">,"innertext"),"Global")
set(#Web,$scrape attribute(<outerhtml=w"<li>Web: <a href=\"*\" rel=\"nofollow\" target=\"_blank\">*</a></li>">,"innertext"),"Global")
set(#Phone,$scrape attribute(<outerhtml=w"<li>Phone: *</li>">,"innertext"),"Global")
set(#adresa,$scrape attribute(<class="address">,"innertext"),"Global")

Gogetta · December 18, 2014

While scraping this regex code should work about 90% of the time.

set(#Category, $find regular expression($find regular expression($document text, "(?<=Category:</strong>).*?(?=/a>)"), "(?<=>).*?(?=<)"), "Global")
alert(#Category)
set(#Web, $find regular expression($document text, "(?<=Web:</strong> <a href=\").*?(?=\")"), "Global")
alert(#Web)
set(#Phone, $find regular expression($document text, "(?<=Phone:</strong> ).*?(?=<)"), "Global")
alert(#Phone)
set(#adresa, $trim($replace regular expression($replace regular expression($find regular expression($find regular expression($replace($document text, $new line, $nothing), "(?<=companyBullet\" style=\";background-position:0px 3px;padding-left:9px\"> <span><strong>).*?(?=</td> <td> <ul>)"), "(?<=</span> </div> <span>).*?(?=</span> </div> )"), "<[^>]*>", $nothing), "\\s\{2,\}", " ")), "Global")
alert(#adresa)

Edited December 18, 2014 by Gogetta
Added additional examples for Website, Category, and Address.

itexspert · December 18, 2014

EDIT: Thanks for adding more Examples Gogetta i still have a lot to learn about this field of Ubot and you were very helpful.

Thank You Kindly,I hope i will help you too some day!

Sign In

Just Started......small Issues!

Recommended Posts

itexspert 47

Link to post

Share on other sites

Gogetta 263

Link to post

Share on other sites

itexspert 47

Link to post

Share on other sites

Join the conversation

Browse

Activity