March 27th, 2005, 04:24 AM
Cleaning python code?
I am trying to pull python code out of an XML file and white spaces are killing me. Does anyone know of a kind of "precompiler" that will clean up the code with the proper whitespace. Kinda like tidy but for pythong code.
March 27th, 2005, 05:18 AM
If you mean a program that will take any python code with the indentation removed and correctly indent it, then the answer is no - in fact it is impossible, since there is no way to determine the correct indentation without understanding what the program is doing.
However there used to be script that shipped with early version of Python (up to 2.2) that may solve your problem. It adds begin/end comments to a correctly formatted Python source file, and then can re-indent it based on the comments. This will require your python code to be processed before it was put into the XML file. I have dug the file out from the depths of my hard drive and attached it.
On the other hand XML should preserve the whitespace of elements, so this should not be necessary. If the parser is stripping out whitespace then use a different parser. If the problem is that the entire python program is indented by a fixed number of spaces then you can count the number of spaces at the start of the first line of code, and remove that number of spaces from all of the lines. If you have control of the XML file creation then you could put the python code in a [[DATA ]] block and left align it to ensure this is not a problem.
Dave - The Developers' Coach
Comments on this post
March 27th, 2005, 05:25 AM
Thanks a bunch, I was thinking the same thing about the counting whitespaces but I wanted to make sure there was nothing out there before I implemented something myself. I will try the DATA suggestion as well.
March 27th, 2005, 06:23 AM
Just a few other thoughts.
It might require some work but its gotta be possible to write a better script to restore indentation than this one (no offense to anyone). As I see it the script is stored in quite a long form so takes up quite a lot more space than you need – however it is still human readable .
What about if we wrote it so that it works on byte code, or stored the indentation in a special tag. This way less space is needed to store the code and we could get an easily readable, formatted Python program.
Another way would be to tokenize the program and reformat it that way but this requires the presence of indentation to start with; Grim's syntax program works something like this.
All in all if you can I would go with the [[DATA]] idea since it seems the most like XML.
March 28th, 2005, 01:25 AM
I tried the CDATA and got the same results. I am alright with Python but I am not real familiar with all the libraries. I played around with using cStringIO and just counting the tabs in each line and finding the least amount of tabs. This isn't very attractive but if I'm writing all the code to be interpreted, it works. I don't think cStringIO liked getting unicode style strings. not actually unicode but they get interpreted as u'\t'. I will play around a bit more.