### Thread: Another RegEx file path question....

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Jul 2010
Posts
26
Rep Power
0

#### SOVLED Another RegEx file path question....

So what gives in this code? why no match?

Code:
import re

file = "D:\path\to\file"

regEx = re.compile(r'(%s)' % file)
sea = regEx.search("D:\path\to\file").group()

if sea:
print(sea)
else:
print("doesn't match")
Last edited by dogdaynoon; June 20th, 2013 at 12:21 AM. Reason: sovled
2. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Jul 2010
Posts
26
Rep Power
0
or here is another piece:
Code:
file2 = "ab\\\\c"

RegEx = re.compile(r'%s' % file2)
sea = RegEx.search("ab\c")
the above matches
3. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Oct 2012
Posts
194
Rep Power
3
Your problem is string escaping. In regular strings "\" is a special character (i.e. "\n" is newline, "\t" is tab, etc.).

In regular expressions it is also a special character. If you use a raw string with backslashes in it, backslash is still a special character because of its re meaning.

For example:
Code:
regEx = re.compile(r'\w+')
sea = regEx.search("friday")

if sea:
print(sea.group())
else:
print("doesn't match")
Result:
Code:
>>>
friday
>>>
So even in a raw string regular expression, you still need to escape your backslashes.

This for instance works:
Code:
import re

myfile = r"D:\\path\\to\\file"

regEx = re.compile(r'(%s)' % myfile)
sea = regEx.search(r"D:\path\to\file")

if sea:
print(sea.group())
else:
print("doesn't match")
However, the much better solution is to not use backslashes in file paths. Either use regular "/" (yes, this will still work on windows), or when possible use os.path.join().

-Mek
4. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Jul 2010
Posts
26
Rep Power
0
thank you so much.
5. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Jul 2010
Posts
26
Rep Power
0
okay here is what i have now...
Code:
import os
import re
from functions import *

#declare the lists we will be working with
fullpaths = []
full_xmlpaths = []

#regex for checking xml file extension
xml_ext = re.compile('(\.xml)\$')

#create lists of all the file names

for (root, dirs, files) in os.walk("testDir"):
if files:
for file in files:
file_fullpath = os.path.abspath(os.path.join(root, file))
is_xml = xml_ext.search(file_fullpath)
if is_xml:
file_fullpath = file_fullpath.partition(".")
file_fullpath = file_fullpath[0]
full_xmlpaths.append(file_fullpath)
else:
fullpaths.append(file_fullpath)

#compare 2 lists and if xml file matches any other file type name ie... path\file.xml will match path\file.asp then return match.

for file in full_xmlpaths:
i = 0
if i < len(fullpaths):
regEx = re.compile(r'(%s)' % file)
match = regEx.search(fullpaths[i])
if match:
match = match.group()
print(file + " : " + match)
else:
i += 1
I have built the lists using os.path.join but i still get no match.
do you see where i have gone wrong.
I split the xml file at the first . in the file name because i wanted
that list to reflect everything except the file extension.
6. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Oct 2012
Posts
194
Rep Power
3
However, the much better solution is to not use backslashes in file paths. Either use regular "/" (yes, this will still work on windows), or when possible use os.path.join().
I meant this more as general advice than specifically pertaining to your issue here. If the strings in your data set you are trying to parse all use backslashes, you might not have any choice but to match the slashes explicitly.

I'm honestly no expert on regular expressions so you would be better served by someone who was.

-Mek
7. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Jul 2010
Posts
26
Rep Power
0
Originally Posted by Mekire
I meant this more as general advice than specifically pertaining to your issue here. If the strings in your data set you are trying to parse all use backslashes, you might not have any choice but to match the slashes explicitly.

I'm honestly no expert on regular expressions so you would be better served by someone who was.

-Mek
Well i appreciate your time. I ended up having cycle through the list of xml files and build a new list where all the \ are replaced with \\.
ridiculous i know. and i am sure there is a better way to do this but got me?
Anyway here is the code.
Code:
# pass a list of file paths to this and it replaces \'s with \\'s
return ch_list