#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Location
    Dallas, TX
    Posts
    65
    Rep Power
    11

    Cleaning python code?


    I am trying to pull python code out of an XML file and white spaces are killing me. Does anyone know of a kind of "precompiler" that will clean up the code with the proper whitespace. Kinda like tidy but for pythong code.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    If you mean a program that will take any python code with the indentation removed and correctly indent it, then the answer is no - in fact it is impossible, since there is no way to determine the correct indentation without understanding what the program is doing.

    However there used to be script that shipped with early version of Python (up to 2.2) that may solve your problem. It adds begin/end comments to a correctly formatted Python source file, and then can re-indent it based on the comments. This will require your python code to be processed before it was put into the XML file. I have dug the file out from the depths of my hard drive and attached it.

    On the other hand XML should preserve the whitespace of elements, so this should not be necessary. If the parser is stripping out whitespace then use a different parser. If the problem is that the entire python program is indented by a fixed number of spaces then you can count the number of spaces at the start of the first line of code, and remove that number of spaces from all of the lines. If you have control of the XML file creation then you could put the python code in a [[DATA ]] block and left align it to ensure this is not a problem.

    Dave - The Developers' Coach

    Comments on this post

    • netytan agrees : Great memory Dave.
    Attached Files
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Location
    Dallas, TX
    Posts
    65
    Rep Power
    11
    Thanks a bunch, I was thinking the same thing about the counting whitespaces but I wanted to make sure there was nothing out there before I implemented something myself. I will try the DATA suggestion as well.
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Just a few other thoughts.

    It might require some work but its gotta be possible to write a better script to restore indentation than this one (no offense to anyone). As I see it the script is stored in quite a long form so takes up quite a lot more space than you need however it is still human readable .

    What about if we wrote it so that it works on byte code, or stored the indentation in a special tag. This way less space is needed to store the code and we could get an easily readable, formatted Python program.

    Another way would be to tokenize the program and reformat it that way but this requires the presence of indentation to start with; Grim's syntax program works something like this.

    http://www.pharscape.org/cgi-bin/syntax.py

    All in all if you can I would go with the [[DATA]] idea since it seems the most like XML.

    Take care,

    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Location
    Dallas, TX
    Posts
    65
    Rep Power
    11
    I tried the CDATA and got the same results. I am alright with Python but I am not real familiar with all the libraries. I played around with using cStringIO and just counting the tabs in each line and finding the least amount of tabs. This isn't very attractive but if I'm writing all the code to be interpreted, it works. I don't think cStringIO liked getting unicode style strings. not actually unicode but they get interpreted as u'\t'. I will play around a bit more.

IMN logo majestic logo threadwatch logo seochat tools logo