sk8rjess 3 Posted February 18, 2016 Report Share Posted February 18, 2016 (edited) I can't seem to get regex working properly in ubot with a statement i know for sure is correct. What am i doing wrong in ubot? Let's say i'm trying to search the following: <tr><td class="gr" align="CENTER" valign="TOP" colspan="2"><font size="+2" class="plus2"><b>Ellison Shoji Onizuka</b></font></td></tr> My regex, which works outside of ubot is the following: class="plus2"><b>(\w+).*<\/b> In theory this will select only "Ellison". I'm using the "find regular expression" function. Even thought the quotes may be throwing it off so I escaped those and still didn't work. Am I missing something in ubot? Edited February 18, 2016 by sk8rjess Quote Link to post Share on other sites
pash 504 Posted February 18, 2016 Report Share Posted February 18, 2016 alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","class=\"plus2\"><b>.*?<\\/b>")) alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","(?<=class=\"plus2\"><b>).*?(?=<\\/b>)")) Quote Link to post Share on other sites
sk8rjess 3 Posted February 18, 2016 Author Report Share Posted February 18, 2016 (edited) I wasn't able to get that expression to work either.Upon some further modification, the following worked properly: (?<=class=\"plus2\"><b>)(\w+)?(?=.*<\/b>) Thanks for the push in the right direction, Pash! Edited February 18, 2016 by sk8rjess Quote Link to post Share on other sites
deliter 203 Posted February 18, 2016 Report Share Posted February 18, 2016 just to let you know,assuming you have no other experience with programming and are a newbie(because your regex is already better than mine)try to use Regex only when you really need it,this would get your text <b>Ellison Shoji Onizuka</b> is a child of the class "plus2" load html("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>") alert($scrape attribute($element child(<class="plus2">), "innertext")) Quote Link to post Share on other sites
sk8rjess 3 Posted February 18, 2016 Author Report Share Posted February 18, 2016 Maybe new to the sense of python, but not entirely either of the above. I'm a web dev. Regex is a very powerful tool that is used in almost all languages for good reason. The example you provided would return the full name rather than only the first name as well. Why only use regex when needed? Not saying your wrong but just curious your justification. Quote Link to post Share on other sites
deliter 203 Posted February 18, 2016 Report Share Posted February 18, 2016 Hey,I guessed you were not new to this as your expression is complex,I mentioned that,but really I post to target newbies on the easy way rather than be boggled down with unnecessary complexities,that is the point of Ubot afterall,as newbies benefit from these threads If you google parsing HTML with regular expressions,the guys who really know what they are talking about,do not,also the browser parses the HTML and Ubot gives you the tools to get what you need natively,it just makes things complicatedsince the above code targets the needed a string,a regular expression for the first name becomes much easier, alert($find regular expression($scrape attribute($element child(<class="plus2">), "innertext"), "^\\w+")) I will be releasing a CSS Selector Suite of tools to the forum very shortly,as their are much easier and more effective ways of parsing HTML than with regex such as xpath,css selectors,Ubot matching tools With my plugin the code would be alert($find regular expression($plugin function("myDLL.dll", "Deliter CSS Child Elements Selector", $document text, ".plus2", "TextContent"), "^\\w+")) Quote Link to post Share on other sites
sk8rjess 3 Posted February 18, 2016 Author Report Share Posted February 18, 2016 I wasn't taking it as an insult, just clarifying! Wasn't trying to come across as defensive. Very good to know! Thanks for adding the information. What I'm learning about ubot is that instead of me having to code individual functions by hand as I'm used to, there are already pre-made ones as well as 1,000 different ways to approach something! Quote Link to post Share on other sites
deliter 203 Posted February 18, 2016 Report Share Posted February 18, 2016 No I didnt mean to come across as defensive,suppose just a bit flustered with having to justify myself The truth is my solution above is practically a one size fits all,it I need the innertext of a child of a class the above code should always work,whereas you need to write an individual unique expression for every attribute you want scraped from every single website Quote Link to post Share on other sites
Code Docta (Nick C.) 639 Posted February 19, 2016 Report Share Posted February 19, 2016 u can see this thread as wellsame should apply http://network.ubotstudio.com/forum/index.php/topic/19087-how-to-use-regular-expression-to-captureparse-the-string-between-2-strings/ need to make sure you are using .net flavor of regex CD Quote Link to post Share on other sites
Code Docta (Nick C.) 639 Posted February 19, 2016 Report Share Posted February 19, 2016 this works alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","(?<=<b>).*?(?=\\s)")) tested it Quote Link to post Share on other sites
sk8rjess 3 Posted February 19, 2016 Author Report Share Posted February 19, 2016 Since I don't see a need in starting another thread for something semi related, I was writing an expression to grab the second word. Had it working successfully when I realized that no matter what I put in ubot, it constantly added a new line to my result. Does anyone see why? you can see I've stripped it down to a basic expression to grab anything. add item to list(%guestInfo,$find regular expression(#middleName,".*"),"Don\'t Delete","Global")I can put a standard string in place of the expression finder and it works just fine. Quote Link to post Share on other sites
sk8rjess 3 Posted February 19, 2016 Author Report Share Posted February 19, 2016 And Nick, thank you for adding your input! I confirm the above DOES indeed work. I would have had to modify it slightly as I don't actually know the provided name, was just using that as en example Always good to have more approaches to a solution though! Quote Link to post Share on other sites
Learjet 27 Posted February 19, 2016 Report Share Posted February 19, 2016 Hey Sk8, Ubot is a little quirky with REGEX, I can get an expression perfect in Edit Pad and sometimes it won't work in Ubot. I've found that after I have an expression working in Edit Pad or another tool I can open up the built in REGEX editor and tweak it to Ubot's liking. Just a tip and welcome to the forum my friend! Peace,LJ Quote Link to post Share on other sites
sk8rjess 3 Posted February 19, 2016 Author Report Share Posted February 19, 2016 So i've learned! I can't tell you how many properly working statements I've had that don't work in ubot. I didn't know there was a built in editor, i should have looked for it! Thanks, i'll start testing in there. I still can't figure out why it's adding a line break, though. Quote Link to post Share on other sites
sk8rjess 3 Posted February 19, 2016 Author Report Share Posted February 19, 2016 This must have been a bug. I kept getting script errors(it listed all the HTML from my page so I can't give any more info, sorry) which would make me restart ubot for anything to work. After another restart everything as functioning properly. Quote Link to post Share on other sites
Code Docta (Nick C.) 639 Posted February 19, 2016 Report Share Posted February 19, 2016 np this should absolutely work set(#string,"<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","Global")alert($find regular expression(#srting,"(?<=<b>).*?(?=</b>)"))clear list(%break down)comment("the delimiter is a space")add list to list(%break down,$list from text($find regular expression(#srting,"(?<=<b>).*?(?=</b>)")," "),"Delete","Global")alert($list item(%break down,0))alert($list item(%break down,1))alert($list item(%break down,2)) unfortunately it is not finding the regex some bug but I know better, it works in regex hero and is .net regex I restarted tootry in yoursI tried in version 5.9.17 CD Quote Link to post Share on other sites
sk8rjess 3 Posted February 19, 2016 Author Report Share Posted February 19, 2016 (edited) I've already solved this one by using my previously posted solution and for me it works flawlessly, thank you though! I'm still running into bugs with ubot. Constant crashes get old.. I'd guess they are memory leaks. Edited February 19, 2016 by sk8rjess Quote Link to post Share on other sites
Bill 7 Posted February 20, 2016 Report Share Posted February 20, 2016 I rename the variable string and it worked fine. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.