Jump to content
UBot Underground

Recommended Posts

I am trying to figure out how to strip unneeded text from a file with using regex. I am totally lost! I have been trying to use helloinsomnia's Regex Builder, to assist in my utter lack of regex understanding, but I seem to be unable to get it to work even with the help of that program. I am obviously doing something very wrong.

 

I am simply trying to remove text in the following list:

              <option value="am">Amharic (አማርኛ)</option>
              <option value="ar">Arabic (العربية)</option>
              <option value="hy">Armenian (հայերեն)</option>
              <option value="bn">Bengali (বাংলা)</option>
              <option value="bg">Bulgarian (български)</option>
              <option value="my">Burmese (ဗမာ)</option>
              <option value="ckb">Central Kurdish (کوردیی ناوەندی)</option>
              <option value="zh">Chinese (中文)</option>
              <option value="da">Danish (dansk)</option>
              <option value="dv">Divehi (Divehi)</option>
              <option value="nl">Dutch (Nederlands)</option>
              <option value="en">English (English)</option>
              <option value="et">Estonian (eesti)</option>
              <option value="fi">Finnish (suomi)</option>
              <option value="fr">French (français)</option>
              <option value="ka">Georgian (ქართული)</option>
              <option value="de">German (Deutsch)</option>
              <option value="el">Greek (Ελληνικά)</option>
              <option value="gu">Gujarati (ગુજરાતી)</option>
              <option value="ht">Haitian Creole (Haitian Creole)</option>
              <option value="he">Hebrew (עברית)</option>
              <option value="hi">Hindi (हिन्दी)</option>
              <option value="hu">Hungarian (magyar)</option>
              <option value="is">Icelandic (íslenska)</option>
              <option value="id">Indonesian (Indonesia)</option>
              <option value="it">Italian (italiano)</option>
              <option value="ja">Japanese (日本語)</option>
              <option value="kn">Kannada (ಕನ್ನಡ)</option>
              <option value="km">Khmer (ខ្មែរ)</option>
              <option value="ko">Korean (한국어)</option>
              <option value="lo">Lao (ລາວ)</option>
              <option value="lv">Latvian (latviešu)</option>
              <option value="lt">Lithuanian (lietuvių)</option>
              <option value="ml">Malayalam (മലയാളം)</option>
              <option value="mr">Marathi (मराठी)</option>
              <option value="ne">Nepali (नेपाली)</option>
              <option value="no">Norwegian (norsk)</option>
              <option value="or">Oriya (ଓଡ଼ିଆ)</option>
              <option value="ps">Pashto (پښتو)</option>
              <option value="fa">Persian (فارسی)</option>
              <option value="pl">Polish (polski)</option>
              <option value="pt">Portuguese (português)</option>
              <option value="pa">Punjabi (ਪੰਜਾਬੀ)</option>
              <option value="ro">Romanian (română)</option>
              <option value="ru">Russian (русский)</option>
              <option value="sr">Serbian (српски)</option>
              <option value="sd">Sindhi (سنڌي)</option>
              <option value="si">Sinhala (සිංහල)</option>
              <option value="sl">Slovenian (slovenščina)</option>
              <option value="es">Spanish (español)</option>
              <option value="sv">Swedish (svenska)</option>
              <option value="tl">Tagalog (Tagalog)</option>
              <option value="ta">Tamil (தமிழ்)</option>
              <option value="te">Telugu (తెలుగు)</option>
              <option value="th">Thai (ไทย)</option>
              <option value="bo">Tibetan (བོད་སྐད་)</option>
              <option value="tr">Turkish (Türkçe)</option>
              <option value="ur">Urdu (اردو)</option>
              <option value="ug">Uyghur (ئۇيغۇرچە)</option>
              <option value="vi">Vietnamese (Tiếng Việt)</option>

I want to remove everything except the actual language names from the file, and have used the Regex Builder tool to generate this regex:

(?<=\<option\ value\=\"it\"\>)[\w\s\'\"\.\-\,\;\:\&\!\?]*?(?=\ \(italiano\)\<\/option\>)

or

(?<=\<option\ value\=\"it\"\>)[\w\s\'\"\.\-\,\;\:\&\!\?]*?(?=\ \(*\)\<\/option\>)

I put a wildcard in the place of "italiano", so as to grab everything, and thought that may be the problem, but it does not work either way!

 

I am further confused as to how to go about this, do I use "replace regular expression", or do I use "remove text from list"? Or perhaps a set command with find regular expression, and then replace it, or an if statement and then replace it with nothing, etc. At this point, I am thoroughly confused...

 

What could I be doing wrong? Please help! :(

Link to post
Share on other sites

I am trying to make a simple bot that can remove useless things from text files.

 

Turning this:  <option value="en">English (English)</option>

 

Into this:  English

 

For this given example. However, I want to understand the best practice of how to find and replace things in a text file.

 

What is the proper method to remove unwanted text from a text file and return only the wanted text to a list?

Link to post
Share on other sites

So I actually ended up answering my own first question with some trial and error, it was indeed that I messed up on the actual regex (Lord, I hate regex!), and I found a pretty simple way to extract exactly what I needed from that specific text file.

 

Answer:

ui open file("File To Clean:",#file to clean)
clear list(%clean this file)
add list to list(%clean this file,$list from file(#file to clean),"Delete","Global")
wait(1)
set(#test,$find regular expression(%clean this file,"(?<=\\>)[a-zA-Z]*?(?=\\ \\()"),"Global")
load html(#test)

However, I am still wondering if there is a way to extract writing from ANY text file filled with HTML or whatever? Is it even possible and worth trying to do? I just want to build a handy bot that I can use to clean and process text files, like removing empty spaces, blank lines, etc.

Link to post
Share on other sites

I am trying to create a bot that does all (or at least as many as possible) of the things with text that can be done here: http://textmechanic.com/

 

I am just wondering if it is even possible, or worth trying?

It's possible

But I think it will be easier. If you use plugins

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...