jeepinjeff 4 Posted February 11, 2016 Report Share Posted February 11, 2016 (edited) I humbly come to you all asking for your assistance with this stinking regex (which is what I THINK I need to use.) I have this content that I scrape: We discover apples most below and XfgtsX new information and trot yesterday to wolves.Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.This scurry we floor - come take one XhhueoX with us. I have a bit of script that will capture, save and replace the text UP TO the first 'X'(We discover apples most below and ) with ABCDEF. set(#aTemp, $find regular expression(#content, "^([\\s\\S]*?)(?=X)"), "Global") add item to list(%aContent, #aTemp, "Don\'t Delete", "Global") set(#content, $replace regular expression(#content, "^([\\s\\S]*?)(?=X)", "ABCDEF"), "Global") set(#aCount, 1, "Global") set(#count, 2, "Global") The next thing I need to do is repeat it with everything BETWEEN the next 2 'X' and the next 2 and so on. I would end up with, in the sample above: ABCDEFXfgtsXABCDEFXgyrtXABCDEFXiretXABCDEFXhhueoXABCDEF (Later I'll be replacing the ABCDEF with a currently unknown value and the X's with curly brackets) The regex that I have been trying to use is (?:[\s\S]*?X+){#count}([\s\S]*?)(?=X) where #count = the nth occurrence of X. Regardless of how I try to escape characters, I simply can't get it to work properly - despite it doing exactly what I want it to at regex101.com Here's the current code loop while($comparison(#aTemp, "!=", $nothing)) { set(#aTemp, $find regular expression(#content, "(?:[\\s\\S]*?X+)\{{#count}\}([\\s\\S]*?)(?=X)"), "Global") add item to list(%aContent, #aTemp, "Don\'t Delete", "Global") set(#content, $replace regular expression(#content, "(?:[\\s\\S]*?X+)\{{#count}\}([\\s\\S]*?)(?=X)", "ABCDEF"), "Global") set(#aCount, $add(#aCount, 1), "Global") set(#count, $add(#count, 2), "Global")} PLEASE help!! Thanks! Edited February 11, 2016 by jeepinjeff Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted February 11, 2016 Report Share Posted February 11, 2016 Hi, This is one approach... This is not a solution for all cases but will head u in the right direction. set(#text,"We discover apples most below and XfgtsX new information and trot yesterday to wolves.Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.This scurry we floor - come take one XhhueoX with us.","Global")clear list(%split)add list to list(%split,$list from text($find regular expression(#text,"(X.*?X)"),$new line),"Delete","Global")clear list(%replsce X)add list to list(%replsce X,$list from text($replace(%split,"X",$nothing),$new line),"Delete","Global")clear list(%add curlies)set(#OPEN CURLY,"\{","Global")set(#CLOSED CURLY,"\}","Global")loop($list total(%replsce X)) { add item to list(%add curlies,"ABCDEF{#OPEN CURLY}{$next list item(%replsce X)}{#CLOSED CURLY}","Don\'t Delete","Global")}add item to list(%add curlies,"ABCDEF","Don\'t Delete","Global")set(#string,$text from list(%add curlies,""),"Global")alert($text from list(%add curlies,"")) should give you more ideas Regex is bad ass but not the solution for everythingubot Regex is one line at a time and .Net style I use this when it gets tough for me. http://regexhero.net/tester/ HTHelps, CDremove XdfsX.ubot Quote Link to post Share on other sites
jeepinjeff 4 Posted February 11, 2016 Author Report Share Posted February 11, 2016 Nick, Thanks, man. Being new to this stuff, it took me a bit to navigate my way through your suggestion. One thing that I forgot to explain in the original post is that the values that AREN'T X...X (EX - "We discover apples most below and_" OR "_world for tasty and dog, we_"need to be saved so they can be re-associated with ABCDEF, in order, at a later time. So, I'm working on using your script and after extracting the X...X values, replacing them with a pipe and then adding the content to a list, delimited by the pipe. For my first programming/coding/Ubot project, I may have bitten off more than I can chew - I'm already dealing with php and SQL scripts as well as this "F-word-from-a-sweet-old-lady"-inducing regex. It seems SO useful, but so insanely complicated and fickle. Something I don't get is how I can run the same expression on different .NET regex testers and end up with different results. As my father would have said "Enough to piss off the Pope!" Thanks again! Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted February 12, 2016 Report Share Posted February 12, 2016 Maybe try splitting on X set(#text,"We discover apples most below and XfgtsX new information and trot yesterday to wolves.Scouring the XgyrtX world for tasty and dog, we XiretX give fellow.This scurry we floor - come take one XhhueoX with us.","Global")clear list(%cc)add list to list(%cc,$list from text(#text,"X"),"Delete","Global")alert($list item(%cc,0))alert($list item(%cc,1)) and pick out what u want with list itemwithout knowing the whole story it is hard to understand what u really needfeel free to pm meCD Quote Link to post Share on other sites
HelloInsomnia 1103 Posted February 12, 2016 Report Share Posted February 12, 2016 I'm trying to read between the lines a little bit here so hopefully this isn't far off from what you want but I think what you want is to replace these occurrences of X (insert randomness here) X with spintax? So the output you may want would look like this: We discover apples most below and {forge|discover|embrace} new information and trot yesterday to wolves. Scouring the {cruel|nasty|beautiful|amazing} world for tasty and dog, we {humbly|may} give fellow. This scurry we floor - come take one {sip|last drink|bite} with us. Let me know if I'm on the right track here because if so I can probably help. Ignore the colorization of the words above it's just because I wrapped code tags around it. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.