Jump to content
UBot Underground

Regex Checks Out In Rubular And Regex101 But Not In Ubot :(


Recommended Posts

Hello - I'm just starting my first bot with ubot Standard version 5.9.43, trying to apply regex, and have run into a problem. I'm hoping that someone with experience can shed some light on a solution.

I have a regex equation that should work but is not returning any results :(

I have this regex equasion:
(?<=<td>\d{3}<\/td>\n<td>)([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,4}\s[0-9]{1,2}:[0-9]{1,2}\s[A-Z]{1,2})

that looks a the following example code:
<tr class="odd">
<td>108</td>
<td>02/27/2016 12:00 AM</td>
<td>14939</td>
<td>Local Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2331137</td></tr>
<tr class="even">
<td>876</td>
<td>03/14/2016 12:00 AM</td>
<td>14937</td>
<td>Performance Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2347798</td></tr>

The formula works in the regex checking tool https://regex101.comAND but NOT in the Ubot Regex Editor (and produces zero results when running a ubot script) :(
https://www.screencast.com/t/E8ynQw843a. It also works in the Rubular regex checking tool.

 

Oddly, if I remove the "Positive Lookahead" it works in ubot (editor), BUT it seems that it should work WITH the positive lookahead given that it is in the correct format AND proven by regex101.com
https://www.screencast.com/t/vTUiLDN4mt and rubular.
>>using this regex: ([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{1,4}\s[0-9]{1,2}:[0-9]{1,2}\s[A-Z]{1,2})

Can anyone please help? or is this some kind of bug that could be addressed?

Thanks very much!
Chris

Edited by christojuan
Link to post
Share on other sites

works fine for me here
in ubot version 5.9.37

alert($find regular expression("<tr class=\"odd\">
<td>108</td>
<td>02/27/2016 12:00 AM</td>
<td>14939</td>
<td>Local Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2331137</td></tr>
<tr class=\"even\">
<td>876</td>
<td>03/14/2016 12:00 AM</td>
<td>14937</td>
<td>Performance Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2347798</td></tr>","(?<=<td>\\d\{3\}<\\/td>\\n<td>)([0-9]\{1,2\}\\/[0-9]\{1,2\}\\/[0-9]\{1,4\}\\s[0-9]\{1,2\}:[0-9]\{1,2\}\\s[A-Z]\{1,2\})"))

What version?

Did you report this in the tracker?

 

Regards,
Nick

Link to post
Share on other sites

Some are reporting inconsistencies with regex in the latest ubot 5.9.44

So if you can duplicate this repeatedly then it is very important to demo this in the bug tracker

 

you must contact support to get access

 

Regards,

CD

Link to post
Share on other sites

Hey Nick,

Thanks VERY much for the quick response and test.

I updated the post above with some better screenshots and noted that my versions is 5.9.43.  I also tried it on 5.9.37 and replicated the issue.

I applied your test in .43 and it worked perfectly https://www.screencast.com/t/shj14kHUnFx2, which gave me hope but puzzled me as to why my script is failing.

 

So I thought I'd try putting the same exact html into an html file and applied the navigate and $document text functions (https://www.screencast.com/t/W2ELE7RW6IPu)

The result was empty :(

https://www.screencast.com/t/Op24gHLDiSj

 

I did submit a tracker http://tracker.ubotstudio.com/issues/1184referencing this post.  

 

I'd appreciate any additional insights you might have.

 

Thanks again!

Chris

Link to post
Share on other sites

I think the HTML may not be loading in the way you think it is and I got a similar results just using the load html command but I fixed it by wrapping that in the table tags so maybe give that a try:
 

<table>
<tr class="odd">
<td>108</td>
<td>02/27/2016 12:00 AM</td>
<td>14939</td>
<td>Local Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2331137</td></tr>
<tr class="even">
<td>876</td>
<td>03/14/2016 12:00 AM</td>
<td>14937</td>
<td>Performance Review</td>
<td>REVIEW</td>
<td>2482237</td>
<td></td>
<td>1230544</td>
<td>1</td>
<td>Y</td>
<td>2347798</td></tr>
</table>
Link to post
Share on other sites

Hi - thanks for looking at that.
I am actually scraping from a page on a password protected corporate intratent.  For testing purposes I just did a save as and have actually been working with the attached page.  Since I cannot control what the actual page codes reads. I think I'm still stuck :(.

 

Do you possibly have any other thoughts? I'm so lost on this :(

 

 

I've attached an example (zip wiht .html and supporting files so it displays as it does on the intranet) that I have been testing. 

 

btw, your traffic titan software looks awesome.  I have this product, but it's a little bit challening to work with. I'd love to dig into your source, once I get the hang of ubot.

 

Thanks for any help/guidance you can provide.

 

Chris

Link to post
Share on other sites

It looks like the data is in a table for you already so you can just scrape that using the code below and then easily loop through and get whatever data you need:

scrape table(<class="displayTag report">,&report)
Link to post
Share on other sites

Thank you! unfortunately I only have the standard version, so that option is not available to me. I'm 99% sure I'll be ugrading at some point, but I need to prove the value (to myself) by creating some bots.

 

I do, though, think that there is still some kind of regex related bug related to the look ahead assertion.  If I can get that solved I can still sucessfully scrape this table.

 

Input from both you and Coda Nick has been greatly appreciated.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...