#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0

    Regex exclude word in string without numbers


    Hello,

    I've got a list of filenames of my Acronis backup, e.g.
    txt_KW10.tib
    txt_KW102.tib
    txt_KW103.tib
    txt_KW11.tib
    txt_KW112.tib
    txt_KW113.tib

    Now I want to to find the files which match the current calendar week and use this regex:
    Code:
    ^[a-zA-Z_]{1,}_KW11\d{0,2}\.tib$
    The txt_KW11x.tib - files will then be renamed without the "_KWxx" and I get a new list of files:
    txt.tib
    txt2.tib
    txt3.tib
    txt_KW10.tib
    txt_KW102.tib
    txt_KW103.tib

    After the backup the files should be renamed back with "_KWxx". Therefore I want to search the files without "_KW" but the same string type. I thought about lookbehind and something like this:
    Code:
    ^[a-zA-Z_]{1,}(?!_KW)\d{2,4}\.tib$
    But that doesn't match my filenames at all. Then I found this
    Code:
    ^((?!_KW).)*$
    and tried to extend with my string definitions:
    Code:
    ^((?!_KW).)*\d{2,4}\.tib$
    But I want to limit that after "_KW" there are only 2-4 digits and with this string there could be unlimited digits after _KW.

    Isn't there a simple inversion, e.g. filename begins with any characters but not digit characters, MUST NOT followed by "_KW" and after that there should be 2-4 digits and then ".tib"?

    I would be glad if someone knows a solution!
    Thanks in advance.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    I did not understand really your whole process, but from your last sentence, maybe something like this:

    - begins with any characters but not digit characters : use a negative character class: ^[^\d]
    - MUST NOT followed by "_KW" : possibly a zero_width negative look-ahead assertion if you have that in your language: (?!_KW)
    - after that there should be 2-4 digits and then ".tib": \d{2,4}\.tib
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by Laurent_R
    I did not understand really your whole process, but from your last sentence, maybe something like this:

    - begins with any characters but not digit characters : use a negative character class: ^[^\d]
    - MUST NOT followed by "_KW" : possibly a zero_width negative look-ahead assertion if you have that in your language: (?!_KW)
    - after that there should be 2-4 digits and then ".tib": \d{2,4}\.tib
    Would this be the string?
    ^[^\d](?!_KW)\d{2,4}\.tib$

    That doesn't match
    txt11.tib

    Any other idea?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    Sure it will not match, because it looks for a non digit followed by a digit. Actually, the negative look-ahead assertion does not really make sense here.

    The problem is that your description of what you want is insufficient.

    Maybe you want this (session under the Perl debugger):

    Code:
      DB<1>  $_ ='txt11.tib':
    
      DB<2>  print "true" if /^[^\d](?!_KW)\w{2}\d{2,4}\.tib$/
    true
    So, this works. But the problem is that I described the string as "a non digit, followed by 2 word characters, followed by 2 to 4 digits, etc.". This is not your original description. When describing regexes, you need to be very precise on what you need.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    If instead of "_KW" a "_KX" is in the string it will not match even the first part is not followed by _KW, e.g.
    txt_KX11.tib

    Do I need to allow underscore in the first part?

    What do you mean that is not precise in my description:
    Filename begins with any characters but not digit characters, MUST NOT followed by "_KW" and after that there should be 2-4 digits and then ".tib"
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    Because, as I said, I defined the expression as "a non digit, followed by 2 word characters, followed by 2 to 4 digits", and now you have more characters (six) before the 2 to 4 digits. You really need to specify exactly what you need, especially how many non-digits at the beginning, how many characters before the characters that must be different from KW_, etc.

    For example,

    Code:
     /^[^\d]{3}(?!_KW)\w{2,5}\d{2,4}\.tib$/
    will match "txt_KX11.tib" but not "txt_KW11.tib", but I am still not sure this is actually what you need.

    Actually, you did not say in which language you are working, but, if possible in your context, the simplest would be to use successively two regexes: one to exclude filenames with "_KW" and one to match the rest of what you need only if the first regex did not lead to exclusion of the string.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    The number of non-digit characters at the beginning is at least 1. Therfore I would specifiy {1,} after the first part:
    Code:
    ^[^\d]{1,}(?!_KW)\w{2,5}\d{2,4}\.tib$
    But then it matches considerably more.

    I'm programming in powershell with the -match operator, e.g.:
    Code:
    if ("txt_KW22.tib" -match "^[^\d]{3}(?!_KW)\w{2,5}\d{2,4}\.tib$") {...}
    But thanks for the idea to use two regexes! Thereby I should wangle it

IMN logo majestic logo threadwatch logo seochat tools logo