#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2007
    Posts
    50
    Rep Power
    8

    Getting raw string form of string variables


    I would like to convert string variable content to raw form.

    #What I've got.
    >>> str = "\f"
    >>> str
    '\x0c'

    #What I want.
    >>> str = r"\f"
    >>> str
    '\\f'

    #But I don't have the literal because I'm reading my str from a text file.

    name = open(sys.argv[1], 'r')
    str = name.readline()
    #Is there something like raw(str)?

    Or...is there a simpler way to say "convert this Windows path to a Unix path"?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    34
    Escape sequences "\n, \t, \r, ..." exist as a way of displaying unprintable characters to humans.

    What creating a raw string ( r"..." ) does is switch off interpreting escape sequences in a string.


    When you read from a file this isn't a problem because files can store a newline as a newline (one character), and so can string variables. Your string contains what was in the file. It already is "in raw form".

    Or, to put it another way, the distinction between "raw form" and "string with escape sequences" happens when you cross the boundary from computer format to human readable, or human to computer.

    When reading from a file to a variable the data can stay in the same form all the way - there is no raw/not raw divide.

    Or...is there a simpler way to say "convert this Windows path to a Unix path"?
    There isn't a simple way because it's not a simple operation - what's the unix path equivalent of "c:\windows\system32\cmd.exe"?
    Last edited by sfb; January 8th, 2008 at 04:36 PM.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2007
    Posts
    50
    Rep Power
    8
    There isn't a simple way because it's not a simple operation - what's the unix path equivalent of "c:\windows\system32\cmd.exe"?
    Hmmm.... yes, I need to speak more plainly.

    As the rest of my post indicates, I'm concerned about the slashes. I'd like to make

    "a\b\c\d\e.aaa"

    (the Windows convention) into

    "a/b/c/d/e.aaa"

    (the Unix convention).
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    430
    Rep Power
    67
    Originally Posted by auguri
    Hmmm.... yes, I need to speak more plainly.

    As the rest of my post indicates, I'm concerned about the slashes. I'd like to make

    "a\b\c\d\e.aaa"

    (the Windows convention) into

    "a/b/c/d/e.aaa"

    (the Unix convention).
    Still not quite clear, I’m afraid

    If you have already have a string like "a\b\c\d\e.aaa" somewhere, you can just use the “replace” method:

    Python Code:
    >>> s = r"a\b\c\d\e.aaa"
    >>> print s.replace("\\", "/")
    "a/b/c/d/e.aaa"


    But maybe you should take a look at the os.path module as well.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2007
    Posts
    50
    Rep Power
    8
    Ugh. Replace works. Before you get angry, hear me out. The first thing I tried was replace! To be cautious, I first tried it in the interactive window. For example:

    [Dbg]>>> bla = "a\f"
    [Dbg]>>> bla.replace("\\","/")
    'a\x0c'
    [Dbg]>>> bla
    'a\x0c'

    Replace sees \f as an escape sequence and so it's not replacing the slash. I was asking about getting variable contents into raw form because, of course, if bla = r"a\f", then bla.replace("\\","/") yields "a/f".

    But today I threw caution to the wind and tried the replace inside my code (though I thought I tried this yesterday...maybe I was looking at the wrong version of output?). Because I am getting the strings from an iterator (as opposed to assigning them as literals) replace works as desired. Though I'm still not sure I truly understand the reasons, the script is doing its job now. Here's the relevant bit:

    output = open(sys.argv[1], 'w')
    #walk through files in mounted directory
    for root, dirs, files in os.walk("Y:/"):
    root = root.replace("Y:/", "/servername/")
    root = root.replace("\\","/")

    thanks for listening.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    34
    Replace doesn't see \f as an escape sequence, it's already gone before that, handled by the python interpreter when it creates 'bla' for you:


    [Dbg]>>> bla = "a\f"
    Create a string, the first character is the character 'a'.
    The second character is the character '\' - wait, that indicates the start of an escape sequence... hold that thought...
    The next character is 'f' and \f is a valid escape sequence, so treat these two as one character.
    So the second character is actually "ascii formfeed character"
    End of string.

    From now, bla is two characters, ascii characters number 97 (a) and 12 (the form-feed instruction).

    [Dbg]>>> bla.replace("\\","/")
    'a\x0c'

    Here, you are telling replace to replace one backslash with one forward slash. One forward slash is a valid character, but one backslash isn't - it marks the start of an escape sequence. So you need the escape sequence which means "I didn't want an escape sequence this time, I just wanted a \ character" - which is "\\".

    When replace does it's work, there are no backslashes in bla - just 'a' and 'form feed', so it doesn't find what it's looking for, and does nothing.

    Afterwards, Python has to show you the result of the replace, and it can't print the formfeed character. So it prints it as an escape sequence - \x0c - which is a two digit number in hexadecimal, value 0C (aka, 12).

    Which is why it looks like there's a backslash which has been ignored - but it doesn't really exist, it's only there to show you something which would otherwise be invisible, like tab, return, form feed, new line, etc.


    This is really awkward to explain
    Last edited by sfb; January 9th, 2008 at 04:51 PM.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Posts
    6
    Rep Power
    0
    I think your query is related to this. ( If at all you still have the query )
    http://code.activestate.com/recipes/65211/
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2009
    Posts
    475
    Rep Power
    33
    When dealing with escape sequences, it usually helps to drop down to decimal/ord.
    Code:
    fname = r"\a\\b\c\d\e.aaa"
    backslash = 92
    subpath = ""
    path_list = []
    for ch in fname:
       if (backslash == ord(ch)):
           if len(subpath):
              path_list.append(subpath)
              subpath = ""
       else:
          subpath += ch
    if len(subpath):      ## final name
       path_list.append(subpath)
    
    print("/" + "/".join(path_list))
    #
    # prints
    # /a/b/c/d/e.aaa
    Last edited by dwblas; February 6th, 2010 at 02:25 PM.
  16. #9
  17. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    610
    Rep Power
    65
    Some more thoughts:
    [code=python]# make sure you save s as a raw string
    s = r"a\b\c\d\e.aaa"

    fname = "atest.txt"
    with open(fname, "w") as fout:
    fout.write(s)

    with open(fname, "r") as fin:
    s2 = "%r" % fin.read()

    print(s2)

    s3 = s2.replace(r'\\', '/')
    print(eval(s3))

    '''result -->
    'a\\b\\c\\d\\e.aaa'
    a/b/c/d/e.aaa
    '''
    [/code]
    Last edited by Dietrich; July 31st, 2012 at 12:30 PM.
    Real Programmers always confuse Christmas and Halloween because Oct31 == Dec25

IMN logo majestic logo threadwatch logo seochat tools logo