Jump to content



Photo

Just Started......small Issues!

regex issue isolate text image attached example

  • Please log in to reply
2 replies to this topic

#1 itexspert

itexspert

    Advanced Member

  • Fellow UBotter
  • PipPipPip
  • 423 posts
  • LocationInternet
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v3.5 & v4.0
  • License:Developer Edition

Posted 18 December 2014 - 06:04 AM

Hey there guys i have a bit of a problem since i literally never used Regex at all so i never had to use it inside Ubot so i need small help to determine the best way to scrape Information.

 

So i am doing this "Test Ride" on this particular Example

 

Check the image attached first 

 

http://www.ripoffrep...MISSION-1193004

 

I am trying to Scrape this info with regex the Phone Number, Website and Category

Thing is Phone Number sometimes look different check the below link:

 

 http://www.ripoffrep...iable-a-1189474

 

See those are my issues i tried Wildcarding it a while back but after few loops it makes errors,so i don't even have a code to show you since i never had to use regex before inside Ubot but i have to learn it and all examples were confusing to me  so i was hoping you could help me on this simple example with the regex and how to import it into Ubot

(I have no issues with Logic commands)

 

I simply don't know a lot about regex!

 

If anyone has the time to look at my example i would greatly appreciate it, if you could help me with this!

 

 

EDIT: Ok i just started to check again and it looks like i was a lot younger when i first tried to scrape so this is my new code,seems it works!

 

But i would still like to know more about how to do this in regex,the procedure!

 

 

set(#Category,$scrape attribute(<outerhtml=w"<li> <strong>Category:</strong> <a href=\"http://www.ripoffreport.com/*\">*</a> </li>">,"innertext"),"Global")
set(#Web,$scrape attribute(<outerhtml=w"<li><strong>Web:</strong> <a href=\"*\" rel=\"nofollow\" target=\"_blank\">*</a></li>">,"innertext"),"Global")
set(#Phone,$scrape attribute(<outerhtml=w"<li><strong>Phone:</strong> *</li>">,"innertext"),"Global")
set(#adresa,$scrape attribute(<class="address">,"innertext"),"Global")

Attached Files



#2 Gogetta

Gogetta

    Advanced Member

  • Moderators
  • 922 posts
  • OS:Windows 8
  • Total Memory:More Than 9Gb
  • Framework:v3.5 & v4.0
  • License:Developer Edition

Posted 18 December 2014 - 07:11 AM

While scraping this regex code should work about 90% of the time.

set(#Category, $find regular expression($find regular expression($document text, "(?<=Category:</strong>).*?(?=/a>)"), "(?<=>).*?(?=<)"), "Global")
alert(#Category)
set(#Web, $find regular expression($document text, "(?<=Web:</strong> <a href=\").*?(?=\")"), "Global")
alert(#Web)
set(#Phone, $find regular expression($document text, "(?<=Phone:</strong> ).*?(?=<)"), "Global")
alert(#Phone)
set(#adresa, $trim($replace regular expression($replace regular expression($find regular expression($find regular expression($replace($document text, $new line, $nothing), "(?<=companyBullet\" style=\";background-position:0px 3px;padding-left:9px\"> <span><strong>).*?(?=</td> <td> <ul>)"), "(?<=</span> </div> <span>).*?(?=</span> </div> )"), "<[^>]*>", $nothing), "\\s\{2,\}", " ")), "Global")
alert(#adresa)

Edited by Gogetta, 18 December 2014 - 09:51 AM.
Added additional examples for Website, Category, and Address.


#3 itexspert

itexspert

    Advanced Member

  • Fellow UBotter
  • PipPipPip
  • 423 posts
  • LocationInternet
  • OS:Windows 10
  • Total Memory:More Than 9Gb
  • Framework:v3.5 & v4.0
  • License:Developer Edition

Posted 18 December 2014 - 07:57 AM

EDIT: Thanks for adding more Examples Gogetta i still have a lot to learn about this field of Ubot and you were very helpful.

 

Thank You Kindly,I hope i will help you too some day!







Also tagged with one or more of these keywords: regex, issue, isolate text, image attached, example

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users