Jump to content
UBot Underground

Am I Doing Regex In Ubot Wrong?


Recommended Posts

I can't seem to get regex working properly in ubot with a statement i know for sure is correct. What am i doing wrong in ubot? 

 

Let's say i'm trying to search the following:
 

<tr><td class="gr" align="CENTER" valign="TOP" colspan="2"><font size="+2" class="plus2"><b>Ellison Shoji Onizuka</b></font></td></tr>

My regex, which works outside of ubot is the following:
 

class="plus2"><b>(\w+).*<\/b>

In theory this will select only "Ellison". I'm using the "find regular expression" function. 

Even thought the quotes may be throwing it off so I escaped those and still didn't work. 

Am I missing something in ubot?

Edited by sk8rjess
Link to post
Share on other sites


alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","class=\"plus2\"><b>.*?<\\/b>"))


alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","(?<=class=\"plus2\"><b>).*?(?=<\\/b>)"))
Link to post
Share on other sites

I wasn't able to get that expression to work either.

Upon some further modification, the following worked properly:

(?<=class=\"plus2\"><b>)(\w+)?(?=.*<\/b>)

Thanks for the push in the right direction, Pash!

Edited by sk8rjess
Link to post
Share on other sites

just to let you know,assuming you have no other experience with programming and are a newbie(because your regex is already better than mine)

try to use Regex only when you really need it,this would get your text

 

<b>Ellison Shoji Onizuka</b> is a child of the class "plus2"

load html("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>")
alert($scrape attribute($element child(<class="plus2">), "innertext"))

Link to post
Share on other sites

Maybe new to the sense of python, but not entirely either of the above. I'm a web dev. 
Regex is a very powerful tool that is used in almost all languages for good reason. The example you provided would return the full name rather than only the first name as well. 

Why only use regex when needed? Not saying your wrong but just curious your justification.

Link to post
Share on other sites

Hey,

I guessed you were not new to this as your expression is complex,I mentioned that,but really I post to target newbies on the easy way rather than be boggled down with unnecessary complexities,that is the point of Ubot afterall,as newbies benefit from these threads

 

If you google parsing HTML with regular expressions,the guys who really know what they are talking about,do not,also the browser parses the HTML and Ubot gives you the tools to get what you need natively,it just makes things complicated

since the above code targets the needed a string,a regular expression for the first name becomes much easier,

 alert($find regular expression($scrape attribute($element child(<class="plus2">), "innertext"), "^\\w+"))

 

I will be releasing a CSS Selector Suite of tools to the forum very shortly,as their are much easier and more effective ways of parsing HTML than with regex such as xpath,css selectors,Ubot matching tools

 

With my plugin the code would be 

 

 alert($find regular expression($plugin function("myDLL.dll""Deliter CSS Child Elements Selector"$document text".plus2""TextContent"), "^\\w+"))

Link to post
Share on other sites

I wasn't taking it as an insult, just clarifying! Wasn't trying to come across as defensive. :)

Very good to know! Thanks for adding the information. 

What I'm learning about ubot is that instead of me having to code individual functions by hand as I'm used to, there are already pre-made ones as well as 1,000 different ways to approach something!

Link to post
Share on other sites

No I didnt mean to come across as defensive,suppose just a bit flustered with having to justify myself

 

The truth is my solution above is practically a one size fits all,it I need the innertext of a child of a class the above code should always work,whereas you need to write an individual unique expression for every attribute you want scraped from every single website

Link to post
Share on other sites

u can see this thread as well

same should apply

 

http://network.ubotstudio.com/forum/index.php/topic/19087-how-to-use-regular-expression-to-captureparse-the-string-between-2-strings/

 

need to make sure you are using .net flavor of regex

 

CD

Link to post
Share on other sites

this works

 

alert($find regular expression("<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","(?<=<b>).*?(?=\\s)"))

 

tested it

Link to post
Share on other sites

Since I don't see a need in starting another thread for something semi related, I was writing an expression to grab the second word. Had it working successfully when I realized that no matter what I put in ubot, it constantly added a new line to my result. Does anyone see why? you can see I've stripped it down to a basic expression to grab anything. 

 

add item to list(%guestInfo,$find regular expression(#middleName,".*"),"Don\'t Delete","Global")
I can put a standard string in place of the expression finder and it works just fine. 
Link to post
Share on other sites

And Nick, thank you for adding your input! I confirm the above DOES indeed work. I would have had to modify it slightly as I don't actually know the provided name, was just using that as en example :) Always good to have more approaches to a solution though!

Link to post
Share on other sites

Hey Sk8,

 

Ubot is a little quirky with REGEX, I can get an expression perfect in Edit Pad and sometimes it won't work in Ubot.  I've found that after I have an expression working in Edit Pad or another tool I can open up the built in REGEX editor and tweak it to Ubot's liking.

 

Just a tip and welcome to the forum my friend!

 

Peace,

LJ

Link to post
Share on other sites

So i've learned! I can't tell you how many properly working statements I've had that don't work in ubot. I didn't know there was a built in editor, i should have looked for it! Thanks, i'll start testing in there. I still can't figure out why it's adding a line break, though. 

Link to post
Share on other sites

This must have been a bug. I kept getting script errors(it listed all the HTML from my page so I can't give any more info, sorry) which would make me restart ubot for anything to work. After another restart everything as functioning properly. 

Link to post
Share on other sites

np

 

this should absolutely work

 

set(#string,"<tr><td class=\"gr\" align=\"CENTER\" valign=\"TOP\" colspan=\"2\"><font size=\"+2\" class=\"plus2\"><b>Ellison Shoji Onizuka</b></font></td></tr>","Global")
alert($find regular expression(#srting,"(?<=<b>).*?(?=</b>)"))
clear list(%break down)
comment("the delimiter is a space")
add list to list(%break down,$list from text($find regular expression(#srting,"(?<=<b>).*?(?=</b>)")," "),"Delete","Global")
alert($list item(%break down,0))
alert($list item(%break down,1))
alert($list item(%break down,2))

 

unfortunately it is not finding the regex

 

some bug

 

but I know better, it works in regex hero and is .net regex

 

 

I restarted too

try in yours

I tried in version 5.9.17

 

 

CD

Link to post
Share on other sites

I've already solved this one by using my previously posted solution and for me it works flawlessly, thank you though!

 

I'm still running into bugs with ubot. Constant crashes get old.. I'd guess they are memory leaks. 

Edited by sk8rjess
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...