#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    11

    Question How do i add carriage returns to fixed width text file


    i have a large fixed with text file that i need to import into an access database. i therefore need to add a carriage return to the text file after every 52 characters.

    thanks in advance
  2. #2
  3. (retired)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2003
    Location
    The Laboratory
    Posts
    10,101
    Rep Power
    0
    What type of carriage return? Why do you need python?

    I'm guessing you're taking a file from linux to windows? If so it probably already has carriage returns, just not the windows/dos ones. You easily use the linux programs sed or tr to translate them.

    If you REALLY want to do it in python, you could just replace all the linux line breaks for dos/windows ones.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    11
    i receive the file as a continuous txt file and i need to add the carriage returns so that i can import it into an access database. heres a sample of the text file i receive and where the line breaks should be

    1023FLEETWOOD 01015620800249500000003***041023FLEETWOOD 01015620800249500000003***041023FLEETWOOD 01015681000199500000003 041023FLEETWOOD 01015681000199500000003 041023FLEETWOOD 01015681100199500000003 041023FLEETWOOD 01015681100199500000003 041023FLEETWOOD 01015721000149500000003 04
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Location
    Regensburg, Germany
    Posts
    147
    Rep Power
    17
    If the line segments are always of the same length, you could do something like this:
    Code:
    seg_len = 52
    lines = []
    for i in range(0, len(in_text), seg_len):
    	lines.append(in_text[i: i +seglen])
    out_text = string.join(lines, "\n")
    Or if you like compact code:
    Code:
    seg_len = 52
    out_text = string.join(map(lambda x : in_text[x:x+seg_len],
                               range(0, len(in_text), seg_len)), "\n")
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    11
    i tried the following


    >>> file = open('C:\\documents and settings\\barrym\desktop\\dave\\bar01.txt','r+')
    >>> for i in range(0, len(file.read()), seg):
    lines.append(file[i:i+seg])
    out = open('c:\\documents and settings\\barrym\\desktop\\dave\\bar01out.txt')
    out = string.join(lines, "\n")


    and got the following error

    Traceback (most recent call last):
    File "<pyshell#59>", line 2, in -toplevel-
    lines.append(file[i:i+seg])
    TypeError: unsubscriptable object
  10. #6
  11. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    In your code, file is a file() object but what you actually want is a string, this is happening because you no data has been read into the variable yet. You can do this using either the read() or readlines() methods; however since the file contains only one continuous line you should use the read() method.

    There could also be other errors here but since Python raises errors in turn you'll have to fix them as they arise if there are any (some other possible errors would include lines and string being undefined).

    Anyway here's a working example that you should be able to follow:

    Code:
    >>> fileContents = file('Sample.txt', 'r').read()
    >>> lines = []
    >>> for index in range(0, len(fileContents), 52):
    ...     lines.append(fileContents[index:index + 52])
    ... 
    >>> lines
    ['1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET', 'WOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101', '5681000199500000003 04\xc21023FLEETWOOD 010156810001995', '00000003 04\xc21023FLEETWOOD 01015681100199500000003 04', '\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW', 'OOD 01015721000149500000003 04']
    >>> '\n'.join(lines)
    '1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET\nWOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101\n5681000199500000003 04\xc21023FLEETWOOD 010156810001995\n00000003 04\xc21023FLEETWOOD 01015681100199500000003 04\n\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW\nOOD 01015721000149500000003 04'
    >>> 
    >>> [fileContents[index:index + 52] for index in range(0, len(fileContents), 52)]
    ['1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET', 'WOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101', '5681000199500000003 04\xc21023FLEETWOOD 010156810001995', '00000003 04\xc21023FLEETWOOD 01015681100199500000003 04', '\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW', 'OOD 01015721000149500000003 04']
    >>>
    The second example is a list comprehension however this could also be a generator expression. In either case, this is a more readable way to write compact code, if needed .

    Please read the sticky at the top of this forum regarding how to ask a question. I point this out because Python relies heavily on indentation .

    Hope this helps,

    Mark.

    Comments on this post

    • macca1707 disagrees
    programming language development: www.netytan.com Hula

  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    It seems that I keep coming accross uses for a function to split a string into set lengths. Is there nothing like this in the standard library?

    "123123123"[::3] isn't right. I keep imagining there's something in itertools to take lumps from a string (take 5, take 5, take 5, etc), but there isn't.

    There has to be a neater way than:
    Code:
    >>> for index in range(0, len(fileContents), 52):
    ...     lines.append(fileContents[index:index + 52])
    Surely?
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    OK...

    Code:
    def splitByLength(sequence, length, pad=None):
        """ Splits a sequence into chunks of a specified length.
            If the length doesn't evenly fit in the sequence, return
            either the last bit, or pad it out with the padding 
            character """
    
        seqLen = len(sequence)
        (start, end) = (0, length)
        while end <= seqLen:
            yield sequence[start:end]
            (start, end) = (end, end+length)
    
        remainder = sequence[start:]
        if pad:
            missing = length-len(remainder)
            yield remainder + pad*missing
        else:
            yield remainder
    
    
    >>> test = "123abc123abc123abc"
    >>> for token in splitByLength(test, 3):
    >>>    print token
    >>> 
    123
    abc
    123
    abc
    123
    abc
    
    >>> for token in splitByLength(test, 6):
    >>>    print token
    123abc
    123abc
    123abc
    
    >>> for token in splitByLength(test, 7):
    >>>    print token
    123abc1
    23abc12
    3abc
    
    >>> for token in splitByLength(test, 7, "-"):
    >>>    print token
    123abc1
    23abc12
    3abc---
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    Even better if it was integrated into string.split()...

    Code:
    class newSplitString(str):
        def split(self, param1=None, pad=None):
            if not param1:
                return str.split(self)
    
            elif type(param1) == type('') and pad == None:
                return str.split(self, param1)
    
            elif type(param1) == type(1):
                length = param1
                tokens = []
                seqLen = len(self)
    
                (start, end) = (0, length)
    
                while end <= seqLen:
                    tokens.append(self[start:end])
                    (start, end) = (end, end+length)
    
                remainder = self[start:]
                if remainder:
                    if pad:
                        missing = length-len(remainder)
                        tokens.append(remainder + pad*missing)
                    else:
                        tokens.append(remainder)
                return tokens
                    
            else:
                raise Exception, "eh?"
    
    >>> test = newSplitString("123abc123abc123abc")
    >>> print test
    123abc123abc123abc
    
    >>> print test.split()
    ['123abc123abc123abc']
    
    >>> print test.split(3)
    ['123', 'abc', '123', 'abc', '123', 'abc']
    
    >>> print test.split(6)
    ['123abc', '123abc', '123abc']
    
    >>> print test.split(7)
    ['123abc1', '23abc12', '3abc']
    
    >>> print test.split(7, "*")
    ['123abc1', '23abc12', '3abc***']
  18. #10
  19. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Originally Posted by sfb
    OK...

    Code:
    def splitByLength(sequence, length, pad=None):
        """ Splits a sequence into chunks of a specified length.
            If the length doesn't evenly fit in the sequence, return
            either the last bit, or pad it out with the padding 
            character """
    
        seqLen = len(sequence)
        (start, end) = (0, length)
        while end <= seqLen:
            yield sequence[start:end]
            (start, end) = (end, end+length)
    
        remainder = sequence[start:]
        if pad:
            missing = length-len(remainder)
            yield remainder + pad*missing
        else:
            yield remainder
    I would use the ljust() string method to avoid doing the padding myself. This pretty much halves the amount of code in the function. I agree with you that the range statement doesn't look very tidy! But it does look cleaner than the equivalent while loop IMO.

    Part of the problem with this kind of thing is that Pythons range() functions doesn't call the __len__ method so we are forced to use the len() function to get the length manually; hopefully this will change in the future .

    Anyway heres the example function:

    Code:
    def chop(targetString, length, paddingCharacter = None):
        for index in xrange(0, len(targetString), length):
            #Iterates over the length of the targetString by length. Slices the
            #targetString into a segment of index to index + length and assigns
            #the results to the result variable.
            result = targetString[index:index + length]
            
            if paddingCharacter:
                #If a paddingCharacter was supplied then pad the result string to
                #length. Yields the value of result.
                yield result.ljust(length, paddingCharacter)
            else:
                yield result
    Note: this function can be used just like splitByLength() in sbfs example above.

    Take care,

    Mark.
    Last edited by netytan; January 29th, 2005 at 04:03 PM.
    programming language development: www.netytan.com Hula

  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    I would use the ljust() string method to avoid doing the padding myself.
    ... ah, you're using Python 2.4 then. in 2.3, it only pads with spaces. You're also making a call to ljust for every token... but it does look much neater.

    Code:
    result = targetString[index:index + length]
    I was expecting that to crash if the string wasn't an even multiple of the length, so:

    "abc123a"

    would print

    "abc"
    "123"
    Error: List index out of bounds.

    Maybe I should have tested it first...

    I agree with you that the range statement doesn't look very tidy! But it does look cleaner than the equivalent while loop IMO.
    I used while instead of for to avoid building a new list, but I suppose that comes under 'premature optimisation', really.
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    11
    thanks for all the posts, very helpful indeed
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    40
    Rep Power
    11
    found this very helpful article on file management in python

    http://www.devshed.com/c/a/Python/Fi...ent-in-Python/

IMN logo majestic logo threadwatch logo seochat tools logo