Grabbing three parts of a string with regexp

Anonym · January 8, 2013

I seem not to get this to work. I have a file that contains some info.

WEB 193798 2013-01-01 C A NOT AYT SID 2013-07-21 NOTAYT77 SAQUA 0 15726 15726
WEB 3293813 2013-01-01 C A NOT LPA PDI 2013-02-13 NOTLPA3B ICA 0 6855 1652

...and some code that is supposed to grab three columns of data. The columns is no 2, 13 and 14 (as seen above), but for some reason the below code doesn't work.

    add list to list(%bookingNo, $find regular expression(#tmp, "\\tWEB\\s\\t\\t([0-9]\{6,7\})\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t[0-9]+\\t[0-9]+"), "Don\'t Delete", "Global")
    add list to list(%num1, $find regular expression(#tmp, "\\tWEB\\s\\t\\t[0-9]\{6,7\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t([0-9]+)\\t[0-9]+"), "Don\'t Delete", "Global")
    add list to list(%num2, $find regular expression(#tmp, "\\tWEB\\s\\t\\t[0-9]\{6,7\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t[0-9]+\\t([0-9]+)"), "Don\'t Delete", "Global")

#tmp is set to a string for each row of the file, so for the first row, #tmp equals:

" WEB 193798 2013-01-01 C A GOT AYT SID 2013-07-21 GOTAYT77 SAQUA 0 15726 15726"

Any ideas what the error is?

Thanks!

VaultBoss · January 8, 2013

Why don't you add each row of data to a TMP list and extract from it only the columns you want?

Seems to be quite easy to do so, IMHO...

set(#var_INP_TMP, "WEB         193798    2013-01-01    C    A    NOT    AYT    SID    2013-07-21        NOTAYT77    SAQUA        0    15726    15726", "Global")
add list to list(%lst_INP_TMP, $list from text(#var_INP_TMP, " "), "Delete", "Global")

Anonym · January 9, 2013

It's not as easy as having a space as delimiter, sometimes it's a kombination of tabs and spaces, just tabs or just spaces. So if we go back to the question, is there an obvious error in my code that causes the erronous result? Despite using the regexps they return #tmp in all cases.

VaultBoss · January 9, 2013

No matter how many spaces there are, for each space there will be a new line added, BUT as the new item is added with the Delete Duplicates option, in fact there will be no extra blank lines added...

If you really want to dig very deep into this, you could always use $replace regular expression for each row and change:

\s (basically ANY space)
\t (TAB)
\n (new line)

into a single space " " or whatever else you want...

So, did you try my code in actual production environment to see if it works?

VaultBoss · January 9, 2013

So if we go back to the question, is there an obvious error in my code that causes the erronous result? Despite using the regexps they return #tmp in all cases.

Well, it looks like you basically coded the whole row in your regex (didn't quite check if it's accurate, but presumably...) so it is just normal to return all the matching data (all columns).

In order to select separate columns from that, you need different regex for each variable.

Despite using different variable names in separate SET commands for each column that you want extracted, the code you posted (regex) is almost the same for each of them, so obviously, there will be no column separation (more or less).

The regex is going to give you the MATCH the way you use it, not the exclusion. So basically, even when you tried to slightly alter the regex for each variable, to reflect the column you want, I suspect you're rather trying to 'eliminate' the columns, instead of selecting (matching) them.

Hope this helps...

Anonym · January 10, 2013

Thanks!

You are sooo right about that it's a lot easier to replace regexp. Good idea! Thanks!

VaultBoss · January 10, 2013

Thanks!

You are sooo right about that it's a lot easier to replace regexp. Good idea! Thanks!

You're welcome! Glad I could point you in the right direction. I only came with the idea, YOU will do the heavy work

However, if you liked my contribution, always feel free to hit the ✔ LIKE THIS button at the bottom-right corner of the specific post. Thanks!

Sign In

Grabbing three parts of a string with regexp

Recommended Posts

Anonym 53

Link to post

Share on other sites

VaultBoss 310

Link to post

Share on other sites

Anonym 53

Link to post

Share on other sites

VaultBoss 310

Link to post

Share on other sites

VaultBoss 310

Link to post

Share on other sites

Anonym 53

Link to post

Share on other sites

VaultBoss 310

Link to post

Share on other sites

Join the conversation

Browse

Activity