steelersfan 38 Posted November 7, 2016 Report Share Posted November 7, 2016 I am trying to figure out how to strip unneeded text from a file with using regex. I am totally lost! I have been trying to use helloinsomnia's Regex Builder, to assist in my utter lack of regex understanding, but I seem to be unable to get it to work even with the help of that program. I am obviously doing something very wrong. I am simply trying to remove text in the following list: <option value="am">Amharic (አማርኛ)</option> <option value="ar">Arabic (العربية)</option> <option value="hy">Armenian (հայերեն)</option> <option value="bn">Bengali (বাংলা)</option> <option value="bg">Bulgarian (български)</option> <option value="my">Burmese (ဗမာ)</option> <option value="ckb">Central Kurdish (کوردیی ناوەندی)</option> <option value="zh">Chinese (中文)</option> <option value="da">Danish (dansk)</option> <option value="dv">Divehi (Divehi)</option> <option value="nl">Dutch (Nederlands)</option> <option value="en">English (English)</option> <option value="et">Estonian (eesti)</option> <option value="fi">Finnish (suomi)</option> <option value="fr">French (français)</option> <option value="ka">Georgian (ქართული)</option> <option value="de">German (Deutsch)</option> <option value="el">Greek (Ελληνικά)</option> <option value="gu">Gujarati (ગુજરાતી)</option> <option value="ht">Haitian Creole (Haitian Creole)</option> <option value="he">Hebrew (עברית)</option> <option value="hi">Hindi (हिन्दी)</option> <option value="hu">Hungarian (magyar)</option> <option value="is">Icelandic (íslenska)</option> <option value="id">Indonesian (Indonesia)</option> <option value="it">Italian (italiano)</option> <option value="ja">Japanese (日本語)</option> <option value="kn">Kannada (ಕನ್ನಡ)</option> <option value="km">Khmer (ខ្មែរ)</option> <option value="ko">Korean (한국어)</option> <option value="lo">Lao (ລາວ)</option> <option value="lv">Latvian (latviešu)</option> <option value="lt">Lithuanian (lietuvių)</option> <option value="ml">Malayalam (മലയാളം)</option> <option value="mr">Marathi (मराठी)</option> <option value="ne">Nepali (नेपाली)</option> <option value="no">Norwegian (norsk)</option> <option value="or">Oriya (ଓଡ଼ିଆ)</option> <option value="ps">Pashto (پښتو)</option> <option value="fa">Persian (فارسی)</option> <option value="pl">Polish (polski)</option> <option value="pt">Portuguese (português)</option> <option value="pa">Punjabi (ਪੰਜਾਬੀ)</option> <option value="ro">Romanian (română)</option> <option value="ru">Russian (русский)</option> <option value="sr">Serbian (српски)</option> <option value="sd">Sindhi (سنڌي)</option> <option value="si">Sinhala (සිංහල)</option> <option value="sl">Slovenian (slovenščina)</option> <option value="es">Spanish (español)</option> <option value="sv">Swedish (svenska)</option> <option value="tl">Tagalog (Tagalog)</option> <option value="ta">Tamil (தமிழ்)</option> <option value="te">Telugu (తెలుగు)</option> <option value="th">Thai (ไทย)</option> <option value="bo">Tibetan (བོད་སྐད་)</option> <option value="tr">Turkish (Türkçe)</option> <option value="ur">Urdu (اردو)</option> <option value="ug">Uyghur (ئۇيغۇرچە)</option> <option value="vi">Vietnamese (Tiếng Việt)</option> I want to remove everything except the actual language names from the file, and have used the Regex Builder tool to generate this regex: (?<=\<option\ value\=\"it\"\>)[\w\s\'\"\.\-\,\;\:\&\!\?]*?(?=\ \(italiano\)\<\/option\>) or (?<=\<option\ value\=\"it\"\>)[\w\s\'\"\.\-\,\;\:\&\!\?]*?(?=\ \(*\)\<\/option\>) I put a wildcard in the place of "italiano", so as to grab everything, and thought that may be the problem, but it does not work either way! I am further confused as to how to go about this, do I use "replace regular expression", or do I use "remove text from list"? Or perhaps a set command with find regular expression, and then replace it, or an if statement and then replace it with nothing, etc. At this point, I am thoroughly confused... What could I be doing wrong? Please help! Quote Link to post Share on other sites
pash 504 Posted November 7, 2016 Report Share Posted November 7, 2016 not sure this you want. alert($replace regular expression(#Debug," \\(.*?\\)","")) 1 Quote Link to post Share on other sites
steelersfan 38 Posted November 7, 2016 Author Report Share Posted November 7, 2016 I am trying to make a simple bot that can remove useless things from text files. Turning this: <option value="en">English (English)</option> Into this: English For this given example. However, I want to understand the best practice of how to find and replace things in a text file. What is the proper method to remove unwanted text from a text file and return only the wanted text to a list? Quote Link to post Share on other sites
luis carlos 94 Posted November 7, 2016 Report Share Posted November 7, 2016 (?<=">).*?(?= \() 1 Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted November 7, 2016 Report Share Posted November 7, 2016 (?<=">).*?(?= \() I would agree. Replace, is just one way... Find, is another. It is hard to not get stuck on one frame o mind. 1 Quote Link to post Share on other sites
steelersfan 38 Posted November 8, 2016 Author Report Share Posted November 8, 2016 So I actually ended up answering my own first question with some trial and error, it was indeed that I messed up on the actual regex (Lord, I hate regex!), and I found a pretty simple way to extract exactly what I needed from that specific text file. Answer: ui open file("File To Clean:",#file to clean) clear list(%clean this file) add list to list(%clean this file,$list from file(#file to clean),"Delete","Global") wait(1) set(#test,$find regular expression(%clean this file,"(?<=\\>)[a-zA-Z]*?(?=\\ \\()"),"Global") load html(#test) However, I am still wondering if there is a way to extract writing from ANY text file filled with HTML or whatever? Is it even possible and worth trying to do? I just want to build a handy bot that I can use to clean and process text files, like removing empty spaces, blank lines, etc. Quote Link to post Share on other sites
steelersfan 38 Posted November 10, 2016 Author Report Share Posted November 10, 2016 I am trying to create a bot that does all (or at least as many as possible) of the things with text that can be done here: http://textmechanic.com/ I am just wondering if it is even possible, or worth trying? Quote Link to post Share on other sites
pash 504 Posted November 10, 2016 Report Share Posted November 10, 2016 I am trying to create a bot that does all (or at least as many as possible) of the things with text that can be done here: http://textmechanic.com/ I am just wondering if it is even possible, or worth trying?It's possibleBut I think it will be easier. If you use plugins Quote Link to post Share on other sites
steelersfan 38 Posted November 11, 2016 Author Report Share Posted November 11, 2016 It's possibleBut I think it will be easier. If you use pluginsWhich particular plugin would do the trick? Feel free to promote your own, anything will do. Thanks! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.