itexspert 47 Posted December 18, 2014 Report Share Posted December 18, 2014 Hey there guys i have a bit of a problem since i literally never used Regex at all so i never had to use it inside Ubot so i need small help to determine the best way to scrape Information. So i am doing this "Test Ride" on this particular Example Check the image attached first http://www.ripoffreport.com/r/Routes-Car-and-Truck-Rental-Calgary/Calgary-Alberta/Routes-Car-and-Truck-Rental-Calgary-FRAUD-CHARGING-MY-CREEDIT-CARD-WITHOUT-PERMISSION-1193004 I am trying to Scrape this info with regex the Phone Number, Website and CategoryThing is Phone Number sometimes look different check the below link: http://www.ripoffreport.com/r/UHAUL/internet/UHAUL-JEFF-vermonts-regional-manager-uhauls-website-said-they-had-a-trailor-avaiable-a-1189474 See those are my issues i tried Wildcarding it a while back but after few loops it makes errors,so i don't even have a code to show you since i never had to use regex before inside Ubot but i have to learn it and all examples were confusing to me so i was hoping you could help me on this simple example with the regex and how to import it into Ubot(I have no issues with Logic commands) I simply don't know a lot about regex! If anyone has the time to look at my example i would greatly appreciate it, if you could help me with this! EDIT: Ok i just started to check again and it looks like i was a lot younger when i first tried to scrape so this is my new code,seems it works! But i would still like to know more about how to do this in regex,the procedure! set(#Category,$scrape attribute(<outerhtml=w"<li> <strong>Category:</strong> <a href=\"http://www.ripoffreport.com/*\">*</a> </li>">,"innertext"),"Global")set(#Web,$scrape attribute(<outerhtml=w"<li><strong>Web:</strong> <a href=\"*\" rel=\"nofollow\" target=\"_blank\">*</a></li>">,"innertext"),"Global")set(#Phone,$scrape attribute(<outerhtml=w"<li><strong>Phone:</strong> *</li>">,"innertext"),"Global")set(#adresa,$scrape attribute(<class="address">,"innertext"),"Global") Quote Link to post Share on other sites
Gogetta 263 Posted December 18, 2014 Report Share Posted December 18, 2014 (edited) While scraping this regex code should work about 90% of the time. set(#Category, $find regular expression($find regular expression($document text, "(?<=Category:</strong>).*?(?=/a>)"), "(?<=>).*?(?=<)"), "Global") alert(#Category) set(#Web, $find regular expression($document text, "(?<=Web:</strong> <a href=\").*?(?=\")"), "Global") alert(#Web) set(#Phone, $find regular expression($document text, "(?<=Phone:</strong> ).*?(?=<)"), "Global") alert(#Phone) set(#adresa, $trim($replace regular expression($replace regular expression($find regular expression($find regular expression($replace($document text, $new line, $nothing), "(?<=companyBullet\" style=\";background-position:0px 3px;padding-left:9px\"> <span><strong>).*?(?=</td> <td> <ul>)"), "(?<=</span> </div> <span>).*?(?=</span> </div> )"), "<[^>]*>", $nothing), "\\s\{2,\}", " ")), "Global") alert(#adresa) Edited December 18, 2014 by Gogetta Added additional examples for Website, Category, and Address. Quote Link to post Share on other sites
itexspert 47 Posted December 18, 2014 Author Report Share Posted December 18, 2014 EDIT: Thanks for adding more Examples Gogetta i still have a lot to learn about this field of Ubot and you were very helpful. Thank You Kindly,I hope i will help you too some day! 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.