Jump to content
UBot Underground

Grabbing three parts of a string with regexp


Recommended Posts

I seem not to get this to work. I have a file that contains some info.
 

 

    WEB         193798    2013-01-01    C    A    NOT    AYT    SID    2013-07-21        NOTAYT77    SAQUA        0    15726    15726
    WEB         3293813    2013-01-01    C    A    NOT    LPA    PDI    2013-02-13        NOTLPA3B    ICA        0    6855    1652
 

 

 


...and some code that is supposed to grab three columns of data. The columns is no 2, 13 and 14 (as seen above), but for some reason the below code doesn't work.
 

 

    add list to list(%bookingNo, $find regular expression(#tmp, "\\tWEB\\s\\t\\t([0-9]\{6,7\})\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t[0-9]+\\t[0-9]+"), "Don\'t Delete", "Global")
    add list to list(%num1, $find regular expression(#tmp, "\\tWEB\\s\\t\\t[0-9]\{6,7\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t([0-9]+)\\t[0-9]+"), "Don\'t Delete", "Global")
    add list to list(%num2, $find regular expression(#tmp, "\\tWEB\\s\\t\\t[0-9]\{6,7\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t[a-zA-Z]+\\t[a-zA-Z]+\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[a-zA-Z]\{3\}\\t[0-9][0-9][0-9][0-9]\\-[0-9][0-9]\\-[0-9][0-9]\\t\\t[a-zA-Z]\{6\}[0-9]\{2\}\\t[a-zA-Z]+\\t\\t[0-9]\{1\}\\t[0-9]+\\t([0-9]+)"), "Don\'t Delete", "Global")
 

 

 

#tmp is set to a string for each row of the file, so for the first row, #tmp equals:
 

 

"    WEB         193798    2013-01-01    C    A    GOT    AYT    SID    2013-07-21        GOTAYT77    SAQUA        0    15726    15726"
 

 

 


Any ideas what the error is?


Thanks!
 

Link to post
Share on other sites

Why don't you add each row of data to a TMP list and extract from it only the columns you want?

 

Seems to be quite easy to do so, IMHO...

 

set(#var_INP_TMP, "WEB         193798    2013-01-01    C    A    NOT    AYT    SID    2013-07-21        NOTAYT77    SAQUA        0    15726    15726", "Global")
add list to list(%lst_INP_TMP, $list from text(#var_INP_TMP, " "), "Delete", "Global")
Link to post
Share on other sites

It's not as easy as having a space as delimiter, sometimes it's a kombination of tabs and spaces, just tabs or just spaces. So if we go back to the question, is there an obvious error in my code that causes the erronous result? Despite using the regexps they return #tmp in all cases.

Link to post
Share on other sites

No matter how many spaces there are, for each space there will be a new line added, BUT as the new item is added with the Delete Duplicates option, in fact there will be no extra blank lines added...

 

If you really want to dig very deep into this, you could always use $replace regular expression for each row and change:

  • \s (basically ANY space)
  • \t (TAB)
  • \n (new line)

into a single space " " or whatever else you want...

 

So, did you try my code in actual production environment to see if it works?

  • Like 1
Link to post
Share on other sites
So if we go back to the question, is there an obvious error in my code that causes the erronous result? Despite using the regexps they return #tmp in all cases.

 

Well, it looks like you basically coded the whole row in your regex (didn't quite check if it's accurate, but presumably...) so it is just normal to return all the matching data (all columns).

 

In order to select separate columns from that, you need different regex for each variable.

 

Despite using different variable names in separate SET commands for each column that you want extracted, the code you posted (regex) is almost the same for each of them, so obviously, there will be no column separation (more or less).

 

The regex is going to give you the MATCH the way you use it, not the exclusion.  So basically, even when you tried to slightly alter the regex for each variable, to reflect the column you want, I suspect you're rather trying to 'eliminate' the columns, instead of selecting (matching) them.

 

Hope this helps...

Link to post
Share on other sites
Thanks!

 

You are sooo right about that it's a lot easier to replace regexp. Good idea! Thanks!

 

You're welcome!  Glad I could point you in the right direction.  I only came with the idea, YOU will do the heavy work :P

 

However, if you liked my contribution, always feel free to hit the LIKE THIS button at the bottom-right corner of the specific post.  Thanks!

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...