November 10th, 2013, 04:43 AM
-
Regex exclude word in string without numbers
Hello,
I've got a list of filenames of my Acronis backup, e.g.
txt_KW10.tib
txt_KW102.tib
txt_KW103.tib
txt_KW11.tib
txt_KW112.tib
txt_KW113.tib
Now I want to to find the files which match the current calendar week and use this regex:
Code:
^[a-zA-Z_]{1,}_KW11\d{0,2}\.tib$
The txt_KW11x.tib - files will then be renamed without the "_KWxx" and I get a new list of files:
txt.tib
txt2.tib
txt3.tib
txt_KW10.tib
txt_KW102.tib
txt_KW103.tib
After the backup the files should be renamed back with "_KWxx". Therefore I want to search the files without "_KW" but the same string type. I thought about lookbehind and something like this:
Code:
^[a-zA-Z_]{1,}(?!_KW)\d{2,4}\.tib$
But that doesn't match my filenames at all. Then I found this
and tried to extend with my string definitions:
Code:
^((?!_KW).)*\d{2,4}\.tib$
But I want to limit that after "_KW" there are only 2-4 digits and with this string there could be unlimited digits after _KW.
Isn't there a simple inversion, e.g. filename begins with any characters but not digit characters, MUST NOT followed by "_KW" and after that there should be 2-4 digits and then ".tib"?
I would be glad if someone knows a solution!
Thanks in advance.
November 15th, 2013, 04:14 PM
-
I did not understand really your whole process, but from your last sentence, maybe something like this:
- begins with any characters but not digit characters : use a negative character class: ^[^\d]
- MUST NOT followed by "_KW" : possibly a zero_width negative look-ahead assertion if you have that in your language: (?!_KW)
- after that there should be 2-4 digits and then ".tib": \d{2,4}\.tib
November 16th, 2013, 10:42 AM
-
Originally Posted by Laurent_R
I did not understand really your whole process, but from your last sentence, maybe something like this:
- begins with any characters but not digit characters : use a negative character class: ^[^\d]
- MUST NOT followed by "_KW" : possibly a zero_width negative look-ahead assertion if you have that in your language: (?!_KW)
- after that there should be 2-4 digits and then ".tib": \d{2,4}\.tib
Would this be the string?
^[^\d](?!_KW)\d{2,4}\.tib$
That doesn't match
txt11.tib
Any other idea?
November 16th, 2013, 04:49 PM
-
Sure it will not match, because it looks for a non digit followed by a digit. Actually, the negative look-ahead assertion does not really make sense here.
The problem is that your description of what you want is insufficient.
Maybe you want this (session under the Perl debugger):
Code:
DB<1> $_ ='txt11.tib':
DB<2> print "true" if /^[^\d](?!_KW)\w{2}\d{2,4}\.tib$/
true
So, this works. But the problem is that I described the string as "a non digit, followed by 2 word characters, followed by 2 to 4 digits, etc.". This is not your original description. When describing regexes, you need to be very precise on what you need.
November 20th, 2013, 04:01 PM
-
If instead of "_KW" a "_KX" is in the string it will not match even the first part is not followed by _KW, e.g.
txt_KX11.tib
Do I need to allow underscore in the first part?
What do you mean that is not precise in my description:
Filename begins with any characters but not digit characters, MUST NOT followed by "_KW" and after that there should be 2-4 digits and then ".tib"
November 21st, 2013, 08:00 AM
-
Because, as I said, I defined the expression as "a non digit, followed by 2 word characters, followed by 2 to 4 digits", and now you have more characters (six) before the 2 to 4 digits. You really need to specify exactly what you need, especially how many non-digits at the beginning, how many characters before the characters that must be different from KW_, etc.
For example,
Code:
/^[^\d]{3}(?!_KW)\w{2,5}\d{2,4}\.tib$/
will match "txt_KX11.tib" but not "txt_KW11.tib", but I am still not sure this is actually what you need.
Actually, you did not say in which language you are working, but, if possible in your context, the simplest would be to use successively two regexes: one to exclude filenames with "_KW" and one to match the rest of what you need only if the first regex did not lead to exclusion of the string.
November 23rd, 2013, 12:46 PM
-
The number of non-digit characters at the beginning is at least 1. Therfore I would specifiy {1,} after the first part:
Code:
^[^\d]{1,}(?!_KW)\w{2,5}\d{2,4}\.tib$
But then it matches considerably more.
I'm programming in powershell with the -match operator, e.g.:
Code:
if ("txt_KW22.tib" -match "^[^\d]{3}(?!_KW)\w{2,5}\d{2,4}\.tib$") {...}
But thanks for the idea to use two regexes! Thereby I should wangle it