Jump to content
UBot Underground

Stupid, Stupid Regex...


Recommended Posts

I humbly come to you all asking for your assistance with this stinking regex (which is what I THINK I need to use.)

 

I have this content that I scrape:

 

We discover apples most below and XfgtsX new information and trot yesterday to wolves.

Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.

This scurry we floor - come take one XhhueoX with us.

 

 

 

I have a bit of script that will capture, save and replace the text UP TO the first 'X'(We discover apples most below and ) with ABCDEF.

 

        set(#aTemp$find regular expression(#content"^([\\s\\S]*?)(?=X)"), "Global")
        add item to list(%aContent#aTemp"Don\'t Delete""Global")
        set(#content$replace regular expression(#content"^([\\s\\S]*?)(?=X)""ABCDEF"), "Global")
        set(#aCount, 1, "Global")
        set(#count, 2, "Global")

 

The next thing I need to do is repeat it with everything BETWEEN the next 2 'X' and the next 2 and so on. I would end up with, in the sample above:

 

ABCDEFXfgtsXABCDEFXgyrtXABCDEFXiretXABCDEFXhhueoXABCDEF

 

(Later I'll be replacing the ABCDEF with a currently unknown value and the X's with curly brackets)

 

The regex that I have been trying to use is (?:[\s\S]*?X+){#count}([\s\S]*?)(?=X) where #count = the nth occurrence of X.

 

Regardless of how I try to escape characters, I simply can't get it to work properly - despite it doing exactly what I want it to at regex101.com

 

Here's the current code

 

loop while($comparison(#aTemp"!="$nothing)) {
    set(#aTemp$find regular expression(#content"(?:[\\s\\S]*?X+)\{{#count}\}([\\s\\S]*?)(?=X)"), "Global")
    add item to list(%aContent#aTemp"Don\'t Delete""Global")
    set(#content$replace regular expression(#content"(?:[\\s\\S]*?X+)\{{#count}\}([\\s\\S]*?)(?=X)""ABCDEF"), "Global")
    set(#aCount$add(#aCount, 1), "Global")
    set(#count$add(#count, 2), "Global")
}

 

PLEASE help!!

 

Thanks!

Edited by jeepinjeff
Link to post
Share on other sites

Hi,

 

This is one approach...

 

This is not a solution for all cases but will head u in the right direction.

 

set(#text,"We discover apples most below and XfgtsX new information and trot yesterday to wolves.

Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.

This scurry we floor - come take one XhhueoX with us.","Global")
clear list(%split)
add list to list(%split,$list from text($find regular expression(#text,"(X.*?X)"),$new line),"Delete","Global")
clear list(%replsce X)
add list to list(%replsce X,$list from text($replace(%split,"X",$nothing),$new line),"Delete","Global")
clear list(%add curlies)
set(#OPEN CURLY,"\{","Global")
set(#CLOSED CURLY,"\}","Global")
loop($list total(%replsce X)) {
    add item to list(%add curlies,"ABCDEF{#OPEN CURLY}{$next list item(%replsce X)}{#CLOSED CURLY}","Don\'t Delete","Global")
}
add item to list(%add curlies,"ABCDEF","Don\'t Delete","Global")
set(#string,$text from list(%add curlies,""),"Global")
alert($text from list(%add curlies,""))

 

should give you more ideas

 

Regex is bad ass but not the solution for everything

ubot Regex is one line at a time and .Net style

 

I use this when it gets tough for me.

 

http://regexhero.net/tester/

 

HTHelps,

 

CD

remove XdfsX.ubot

Link to post
Share on other sites

Nick,

 

Thanks, man. Being new to this stuff, it took me a bit to navigate my way through your suggestion.  One thing that I forgot to explain in the original post is that the values that AREN'T X...X (EX - "We discover apples most below and_" OR "_world for tasty and dog, we_"need to be saved so they can be re-associated with ABCDEF, in order, at a later time.  So, I'm working on using your script and after extracting the X...X values, replacing them with a pipe and then adding the content to a list, delimited by the pipe.

 

For my first programming/coding/Ubot project, I may have bitten off more than I can chew - I'm already dealing with php and SQL scripts as well as this "F-word-from-a-sweet-old-lady"-inducing regex.  It seems SO useful, but so insanely complicated and fickle.

 

Something I don't get is how I can run the same expression on different .NET regex testers and end up with different results.  As my father would have said "Enough to piss off the Pope!"

 

Thanks again!

Link to post
Share on other sites

Maybe try splitting on X

 

set(#text,"We discover apples most below and XfgtsX new information and trot yesterday to wolves.

Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.

This scurry we floor - come take one XhhueoX with us.","Global")
clear list(%cc)
add list to list(%cc,$list from text(#text,"X"),"Delete","Global")
alert($list item(%cc,0))
alert($list item(%cc,1))

 

and pick out what u want with list item

without knowing the whole story it is hard to understand what u really need

feel free to pm me

CD

Link to post
Share on other sites

I'm trying to read between the lines a little bit here so hopefully this isn't far off from what you want but I think what you want is to replace these occurrences of X (insert randomness here) X with spintax?
 
So the output you may want would look like this:

We discover apples most below and {forge|discover|embrace} new information and trot yesterday to wolves.

Scouring the {cruel|nasty|beautiful|amazing} world for tasty and dog, we {humbly|may} give fellow.

This scurry we floor - come take one {sip|last drink|bite} with us.

Let me know if I'm on the right track here because if so I can probably help. Ignore the colorization of the words above it's just because I wrapped code tags around it.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...