#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    26
    Rep Power
    0

    SOVLED Another RegEx file path question....


    So what gives in this code? why no match?

    Code:
    import re 
    
    file = "D:\path\to\file" 
    
    regEx = re.compile(r'(%s)' % file) 
    sea = regEx.search("D:\path\to\file").group()
    
    if sea: 
    	print(sea) 
    else: 
    	print("doesn't match")
    Last edited by dogdaynoon; June 20th, 2013 at 12:21 AM. Reason: sovled
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    26
    Rep Power
    0
    or here is another piece:
    Code:
    file2 = "ab\\\\c"
    
    RegEx = re.compile(r'%s' % file2)
    sea = RegEx.search("ab\c")
    the above matches
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    3
    Your problem is string escaping. In regular strings "\" is a special character (i.e. "\n" is newline, "\t" is tab, etc.).

    In regular expressions it is also a special character. If you use a raw string with backslashes in it, backslash is still a special character because of its re meaning.

    For example:
    Code:
    regEx = re.compile(r'\w+')
    sea = regEx.search("friday")
    
    if sea:
    	print(sea.group())
    else:
    	print("doesn't match")
    Result:
    Code:
    >>> 
    friday
    >>>
    So even in a raw string regular expression, you still need to escape your backslashes.

    This for instance works:
    Code:
    import re
    
    myfile = r"D:\\path\\to\\file"
    
    regEx = re.compile(r'(%s)' % myfile)
    sea = regEx.search(r"D:\path\to\file")
    
    if sea:
    	print(sea.group())
    else:
    	print("doesn't match")
    However, the much better solution is to not use backslashes in file paths. Either use regular "/" (yes, this will still work on windows), or when possible use os.path.join().

    -Mek
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    26
    Rep Power
    0
    thank you so much.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    26
    Rep Power
    0
    okay here is what i have now...
    Code:
    import os
    import re
    from functions import *
    
    #declare the lists we will be working with
    fullpaths = []
    full_xmlpaths = []
    
    
    
    #regex for checking xml file extension
    xml_ext = re.compile('(\.xml)$')
    
    #create lists of all the file names
    
    for (root, dirs, files) in os.walk("testDir"):
    	if files:
    		for file in files:
    			file_fullpath = os.path.abspath(os.path.join(root, file))
    			is_xml = xml_ext.search(file_fullpath)
    			if is_xml:
    				file_fullpath = file_fullpath.partition(".")
    				file_fullpath = file_fullpath[0]
    				full_xmlpaths.append(file_fullpath)
    			else: 
    				fullpaths.append(file_fullpath)
    				
    #compare 2 lists and if xml file matches any other file type name ie... path\file.xml will match path\file.asp then return match.
    
    for file in full_xmlpaths:
    	i = 0
    	if i < len(fullpaths):
    		regEx = re.compile(r'(%s)' % file)
    		match = regEx.search(fullpaths[i])
    		if match:
    			match = match.group()
    			print(file + " : " + match)		
    		else:
    			i += 1
    I have built the lists using os.path.join but i still get no match.
    do you see where i have gone wrong.
    I split the xml file at the first . in the file name because i wanted
    that list to reflect everything except the file extension.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    3
    However, the much better solution is to not use backslashes in file paths. Either use regular "/" (yes, this will still work on windows), or when possible use os.path.join().
    I meant this more as general advice than specifically pertaining to your issue here. If the strings in your data set you are trying to parse all use backslashes, you might not have any choice but to match the slashes explicitly.

    I'm honestly no expert on regular expressions so you would be better served by someone who was.

    -Mek
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    26
    Rep Power
    0
    Originally Posted by Mekire
    I meant this more as general advice than specifically pertaining to your issue here. If the strings in your data set you are trying to parse all use backslashes, you might not have any choice but to match the slashes explicitly.

    I'm honestly no expert on regular expressions so you would be better served by someone who was.

    -Mek
    Well i appreciate your time. I ended up having cycle through the list of xml files and build a new list where all the \ are replaced with \\.
    ridiculous i know. and i am sure there is a better way to do this but got me?
    Anyway here is the code.
    Code:
    # pass a list of file paths to this and it replaces \'s with \\'s
    def add_db_bslash(list):
    	ch_list = []
    	for path in list:
    		new_path = string.replace(path, "\\", "\\\\")
    		ch_list.append(new_path)
    	return ch_list

IMN logo majestic logo threadwatch logo seochat tools logo