### Thread: How do i add carriage returns to fixed width text file

1. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2004
Posts
40
Rep Power
11

#### How do i add carriage returns to fixed width text file

i have a large fixed with text file that i need to import into an access database. i therefore need to add a carriage return to the text file after every 52 characters.

2. What type of carriage return? Why do you need python?

I'm guessing you're taking a file from linux to windows? If so it probably already has carriage returns, just not the windows/dos ones. You easily use the linux programs sed or tr to translate them.

If you REALLY want to do it in python, you could just replace all the linux line breaks for dos/windows ones.
3. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2004
Posts
40
Rep Power
11
i receive the file as a continuous txt file and i need to add the carriage returns so that i can import it into an access database. heres a sample of the text file i receive and where the line breaks should be

1023FLEETWOOD 01015620800249500000003***04¬1023FLEETWOOD 01015620800249500000003***04¬1023FLEETWOOD 01015681000199500000003 04¬1023FLEETWOOD 01015681000199500000003 04¬1023FLEETWOOD 01015681100199500000003 04¬1023FLEETWOOD 01015681100199500000003 04¬1023FLEETWOOD 01015721000149500000003 04
4. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Dec 2004
Location
Regensburg, Germany
Posts
147
Rep Power
17
If the line segments are always of the same length, you could do something like this:
Code:
seg_len = 52
lines = []
for i in range(0, len(in_text), seg_len):
lines.append(in_text[i: i +seglen])
out_text = string.join(lines, "\n")
Or if you like compact code:
Code:
seg_len = 52
out_text = string.join(map(lambda x : in_text[x:x+seg_len],
range(0, len(in_text), seg_len)), "\n")
5. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2004
Posts
40
Rep Power
11
i tried the following

>>> file = open('C:\\documents and settings\\barrym\desktop\\dave\\bar01.txt','r+')
>>> for i in range(0, len(file.read()), seg):
lines.append(file[i:i+seg])
out = open('c:\\documents and settings\\barrym\\desktop\\dave\\bar01out.txt')
out = string.join(lines, "\n")

and got the following error

Traceback (most recent call last):
File "<pyshell#59>", line 2, in -toplevel-
lines.append(file[i:i+seg])
TypeError: unsubscriptable object
6. In your code, file is a file() object but what you actually want is a string, this is happening because you no data has been read into the variable yet. You can do this using either the read() or readlines() methods; however since the file contains only one continuous line you should use the read() method.

There could also be other errors here but since Python raises errors in turn you'll have to fix them as they arise if there are any (some other possible errors would include lines and string being undefined).

Anyway here's a working example that you should be able to follow:

Code:
>>> fileContents = file('Sample.txt', 'r').read()
>>> lines = []
>>> for index in range(0, len(fileContents), 52):
...     lines.append(fileContents[index:index + 52])
...
>>> lines
['1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET', 'WOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101', '5681000199500000003 04\xc21023FLEETWOOD 010156810001995', '00000003 04\xc21023FLEETWOOD 01015681100199500000003 04', '\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW', 'OOD 01015721000149500000003 04']
>>> '\n'.join(lines)
'1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET\nWOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101\n5681000199500000003 04\xc21023FLEETWOOD 010156810001995\n00000003 04\xc21023FLEETWOOD 01015681100199500000003 04\n\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW\nOOD 01015721000149500000003 04'
>>>
>>> [fileContents[index:index + 52] for index in range(0, len(fileContents), 52)]
['1023FLEETWOOD 01015620800249500000003***04\xc21023FLEET', 'WOOD 01015620800249500000003***04\xc21023FLEETWOOD 0101', '5681000199500000003 04\xc21023FLEETWOOD 010156810001995', '00000003 04\xc21023FLEETWOOD 01015681100199500000003 04', '\xc21023FLEETWOOD 01015681100199500000003 04\xc21023FLEETW', 'OOD 01015721000149500000003 04']
>>>
The second example is a list comprehension however this could also be a generator expression. In either case, this is a more readable way to write compact code, if needed .

Please read the sticky at the top of this forum regarding how to ask a question. I point this out because Python relies heavily on indentation .

Hope this helps,

Mark.

• macca1707 disagrees
7. No Profile Picture
sfb
Contributing User
Devshed Novice (500 - 999 posts)

Join Date
Nov 2003
Posts
624
Rep Power
35
It seems that I keep coming accross uses for a function to split a string into set lengths. Is there nothing like this in the standard library?

"123123123"[::3] isn't right. I keep imagining there's something in itertools to take lumps from a string (take 5, take 5, take 5, etc), but there isn't.

There has to be a neater way than:
Code:
>>> for index in range(0, len(fileContents), 52):
...     lines.append(fileContents[index:index + 52])
Surely?
8. No Profile Picture
sfb
Contributing User
Devshed Novice (500 - 999 posts)

Join Date
Nov 2003
Posts
624
Rep Power
35
OK...

Code:
def splitByLength(sequence, length, pad=None):
""" Splits a sequence into chunks of a specified length.
If the length doesn't evenly fit in the sequence, return
character """

seqLen = len(sequence)
(start, end) = (0, length)
while end <= seqLen:
yield sequence[start:end]
(start, end) = (end, end+length)

remainder = sequence[start:]
missing = length-len(remainder)
else:
yield remainder

>>> test = "123abc123abc123abc"
>>> for token in splitByLength(test, 3):
>>>    print token
>>>
123
abc
123
abc
123
abc

>>> for token in splitByLength(test, 6):
>>>    print token
123abc
123abc
123abc

>>> for token in splitByLength(test, 7):
>>>    print token
123abc1
23abc12
3abc

>>> for token in splitByLength(test, 7, "-"):
>>>    print token
123abc1
23abc12
3abc---
9. No Profile Picture
sfb
Contributing User
Devshed Novice (500 - 999 posts)

Join Date
Nov 2003
Posts
624
Rep Power
35
Even better if it was integrated into string.split()...

Code:
class newSplitString(str):
if not param1:
return str.split(self)

elif type(param1) == type('') and pad == None:
return str.split(self, param1)

elif type(param1) == type(1):
length = param1
tokens = []
seqLen = len(self)

(start, end) = (0, length)

while end <= seqLen:
tokens.append(self[start:end])
(start, end) = (end, end+length)

remainder = self[start:]
if remainder:
missing = length-len(remainder)
else:
tokens.append(remainder)

else:
raise Exception, "eh?"

>>> test = newSplitString("123abc123abc123abc")
>>> print test
123abc123abc123abc

>>> print test.split()
['123abc123abc123abc']

>>> print test.split(3)
['123', 'abc', '123', 'abc', '123', 'abc']

>>> print test.split(6)
['123abc', '123abc', '123abc']

>>> print test.split(7)
['123abc1', '23abc12', '3abc']

>>> print test.split(7, "*")
['123abc1', '23abc12', '3abc***']
10. Originally Posted by sfb
OK...

Code:
def splitByLength(sequence, length, pad=None):
""" Splits a sequence into chunks of a specified length.
If the length doesn't evenly fit in the sequence, return
character """

seqLen = len(sequence)
(start, end) = (0, length)
while end <= seqLen:
yield sequence[start:end]
(start, end) = (end, end+length)

remainder = sequence[start:]
missing = length-len(remainder)
else:
yield remainder
I would use the ljust() string method to avoid doing the padding myself. This pretty much halves the amount of code in the function. I agree with you that the range statement doesn't look very tidy! But it does look cleaner than the equivalent while loop IMO.

Part of the problem with this kind of thing is that Pythons range() functions doesn't call the __len__ method so we are forced to use the len() function to get the length manually; hopefully this will change in the future .

Anyway heres the example function:

Code:
def chop(targetString, length, paddingCharacter = None):
for index in xrange(0, len(targetString), length):
#Iterates over the length of the targetString by length. Slices the
#targetString into a segment of index to index + length and assigns
#the results to the result variable.
result = targetString[index:index + length]

#length. Yields the value of result.
else:
yield result
Note: this function can be used just like splitByLength() in sbfs example above.

Take care,

Mark.
Last edited by netytan; January 29th, 2005 at 03:03 PM.
11. No Profile Picture
sfb
Contributing User
Devshed Novice (500 - 999 posts)

Join Date
Nov 2003
Posts
624
Rep Power
35
I would use the ljust() string method to avoid doing the padding myself.
... ah, you're using Python 2.4 then. in 2.3, it only pads with spaces. You're also making a call to ljust for every token... but it does look much neater.

Code:
result = targetString[index:index + length]
I was expecting that to crash if the string wasn't an even multiple of the length, so:

"abc123a"

would print

"abc"
"123"
Error: List index out of bounds.

Maybe I should have tested it first...

I agree with you that the range statement doesn't look very tidy! But it does look cleaner than the equivalent while loop IMO.
I used while instead of for to avoid building a new list, but I suppose that comes under 'premature optimisation', really.
12. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2004
Posts
40
Rep Power
11
thanks for all the posts, very helpful indeed
13. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2004
Posts
40
Rep Power
11
found this very helpful article on file management in python

http://www.devshed.com/c/a/Python/Fi...ent-in-Python/