Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0

    Newbie read .txt file question.


    I would like to be able to pull out parts of a text file and out put as a csv file. Here's a sample of the text file below. There's some structure to the text file each record is defined by {}. Could someone please explain how I open the text file and then for example extract the field after name and the number after age. I've looked at parsing the file or using regex but with little knowledge am struggling to know which is the best way forward or how to implement. Thank you for any help in advance. (Using Python 3.)

    Code:
    {"id":"24443102","club":"318615","no":"1","ban":"0","ban_points":"1","inj":null,"name":"Mahmoud El-Shazly","routine":"45.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/eg.png\" alt=\"Flag [eg]\" class=\"flag\"\/>","age":"32.00","months":"0","fp":"D C","asi":25390,"country":"eg","str":17,"sta":15,"pac":16,"mar":19,"tac":18,"wor":13,"pos":15,"pas":13,"cro":6,"tec":6,"hea":17,"fin":5,"lon":5,"set":12,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"730283","rec":8,"gp":6,"goals":0,"assists":0,"productivity":0,"rat":"4.00","mom":0,"cards":1,"ga":0,"scout":"4","txt":"","plot":[1,2,2,2,2,1,1,1,1,0,"1","0","-3","1","-3","-2","-4"],"status":"","js_name":"Mahmoud El-Shazly","ti":"-4","ti_dif":-2},{"id":"31663569","club":"318615","no":"5","ban":"0","ban_points":"0","inj":null,"name":"Bobby Roberts","routine":"34.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/en.png\" alt=\"Flag [en]\" class=\"flag\"\/>","age":"30.08","months":"8","fp":"DM R","asi":65217,"country":"en","str":16,"sta":16,"pac":17,"mar":16,"tac":17,"wor":16,"pos":17,"pas":20,"cro":16,"tec":16,"hea":6,"fin":6,"lon":8,"set":15,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"1818320","rec":9,"gp":6,"goals":0,"assists":2,"productivity":2,"rat":"5.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[0,1,2,1,2,1,1,0,1,1,1,1,0,1,1,0,1,"1","0","1","0","-1","0","0"],"status":"","js_name":"Bobby Roberts","ti":"0","ti_dif":0},{"id":"77782637","club":"318615","no":"6","ban":"0","ban_points":"0","inj":null,"name":"J\u00f3zef Barna","routine":"9.7","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/pl.png\" alt=\"Flag [pl]\" class=\"flag\"\/>","age":"20.10","months":"10","fp":"M\/OM R","asi":2750,"country":"pl","str":12,"sta":9,"pac":13,"mar":12,"tac":11,"wor":13,"pos":6,"pas":13,"cro":9,"tec":7,"hea":4,"fin":4,"lon":8,"set":7,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"56809","rec":5,"gp":0,"goals":0,"assists":0,"productivity":0,"rat":"0.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[16,"16","17","18","17","19","17","19"],"status":"","js_name":"J\u00f3zef Barna","ti":"19","ti_dif":"+2"}
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    Originally Posted by skyblues
    I would like to be able to pull out parts of a text file and out put as a csv file. Here's a sample of the text file below. There's some structure to the text file each record is defined by {}. Could someone please explain how I open the text file and then for example extract the field after name and the number after age. I've looked at parsing the file or using regex but with little knowledge am struggling to know which is the best way forward or how to implement. Thank you for any help in advance. (Using Python 3.)

    Code:
    {"id":"24443102","club":"318615","no":"1","ban":"0","ban_points":"1","inj":null,"name":"Mahmoud El-Shazly","routine":"45.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/eg.png\" alt=\"Flag [eg]\" class=\"flag\"\/>","age":"32.00","months":"0","fp":"D C","asi":25390,"country":"eg","str":17,"sta":15,"pac":16,"mar":19,"tac":18,"wor":13,"pos":15,"pas":13,"cro":6,"tec":6,"hea":17,"fin":5,"lon":5,"set":12,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"730283","rec":8,"gp":6,"goals":0,"assists":0,"productivity":0,"rat":"4.00","mom":0,"cards":1,"ga":0,"scout":"4","txt":"","plot":[1,2,2,2,2,1,1,1,1,0,"1","0","-3","1","-3","-2","-4"],"status":"","js_name":"Mahmoud El-Shazly","ti":"-4","ti_dif":-2},{"id":"31663569","club":"318615","no":"5","ban":"0","ban_points":"0","inj":null,"name":"Bobby Roberts","routine":"34.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/en.png\" alt=\"Flag [en]\" class=\"flag\"\/>","age":"30.08","months":"8","fp":"DM R","asi":65217,"country":"en","str":16,"sta":16,"pac":17,"mar":16,"tac":17,"wor":16,"pos":17,"pas":20,"cro":16,"tec":16,"hea":6,"fin":6,"lon":8,"set":15,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"1818320","rec":9,"gp":6,"goals":0,"assists":2,"productivity":2,"rat":"5.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[0,1,2,1,2,1,1,0,1,1,1,1,0,1,1,0,1,"1","0","1","0","-1","0","0"],"status":"","js_name":"Bobby Roberts","ti":"0","ti_dif":0},{"id":"77782637","club":"318615","no":"6","ban":"0","ban_points":"0","inj":null,"name":"J\u00f3zef Barna","routine":"9.7","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/pl.png\" alt=\"Flag [pl]\" class=\"flag\"\/>","age":"20.10","months":"10","fp":"M\/OM R","asi":2750,"country":"pl","str":12,"sta":9,"pac":13,"mar":12,"tac":11,"wor":13,"pos":6,"pas":13,"cro":9,"tec":7,"hea":4,"fin":4,"lon":8,"set":7,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"56809","rec":5,"gp":0,"goals":0,"assists":0,"productivity":0,"rat":"0.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[16,"16","17","18","17","19","17","19"],"status":"","js_name":"J\u00f3zef Barna","ti":"19","ti_dif":"+2"}
    Let's say your file is called, "textfile.txt". To open it:
    Code:
    fid=open("textfile.txt")
    Now "fid" is a file object associated with that file. That means it's also an iterator, so you can easily loop over the lines.
    Code:
    for line in fid:
    The structure of the file you have posted is very much like a Python dictionary. However, you would have to use some dangerous code to just treat it as such, so let's not.

    As you read each line, strip off the line feed and the "{" and the "}"
    Code:
    line=line.strip("\n{}")
    Now you could do some fancy splitting and you could use regular expressions, but just to get 2 values out I don't think that's necessary. Take the "name" field. Find where "name": is:
    Code:
    indx=line.find("\"name\":")
    Now the actual name is at indx plus the length of "name": or indx+7, and until
    Code:
    indx2=line.find(",",indx)
    so
    Code:
    name=line[indx+7:indx2]
    and similarly for age but you'll have to convert the string to a number (int(string)).
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    Thank you very much rrashkin for the prompt and helpful reply. I will try to implement what you have advised.
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,855
    Rep Power
    481

    Bad advice


    Bad advice removed. See following posts.
    Last edited by b49P23TIvg; December 18th, 2012 at 11:24 AM. Reason: Dangerous advice caveat.
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    @b49P23TIvg: on other forums I have frequently been admonished (in the harshest of terms!) never to use eval (although I do like to use it myself).

    Comments on this post

    • b49P23TIvg agrees : OK, I'll change my post.
    • Dietrich agrees : eval() got its bad rep because it was used in Python2 raw_input(). Bad users could enter a sys or os command that could give misery. For normal use eval() is great!
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    I now have the following code which works as intended. (Thank you.) How do I go to the next lines in the text file? I guess I need to setup some kind of loop? Thank you again for all the help.
    Code:
    fid=open("test.txt")
    for line in fid: 
    line=line.strip("\n{}") 
    indx=line.find("\"name\":")
    indx2=line.find(",",indx) 
    name=line[indx+8:indx2-1] 
    indx1=line.find("\"age\":") 
    indx3=line.find(",",indx1) 
    age=line[indx1+7:indx3-1] 
    print(name, age)
  12. #7
  13. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,625
    Rep Power
    4247
    You guys are complicating things a little. The data is in JSON format, which is easy to parse, as python already has a built-in module to do this.

    1. Just add a pair of [ ] around the input data so it is a proper JSON data with 3 records. This is what I have in input.json
    Code:
    [
    {"id":"24443102","club":"318615","no":"1","ban":"0","ban_points":"1","inj":null,"name":"Mahmoud El-Shazly","routine":"45.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/eg.png\" alt=\"Flag [eg]\" class=\"flag\"\/>","age":"32.00","months":"0","fp":"D C","asi":25390,"country":"eg","str":17,"sta":15,"pac":16,"mar":19,"tac":18,"wor":13,"pos":15,"pas":13,"cro":6,"tec":6,"hea":17,"fin":5,"lon":5,"set":12,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"730283","rec":8,"gp":6,"goals":0,"assists":0,"productivity":0,"rat":"4.00","mom":0,"cards":1,"ga":0,"scout":"4","txt":"","plot":[1,2,2,2,2,1,1,1,1,0,"1","0","-3","1","-3","-2","-4"],"status":"","js_name":"Mahmoud El-Shazly","ti":"-4","ti_dif":-2},{"id":"31663569","club":"318615","no":"5","ban":"0","ban_points":"0","inj":null,"name":"Bobby Roberts","routine":"34.9","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/en.png\" alt=\"Flag [en]\" class=\"flag\"\/>","age":"30.08","months":"8","fp":"DM R","asi":65217,"country":"en","str":16,"sta":16,"pac":17,"mar":16,"tac":17,"wor":16,"pos":17,"pas":20,"cro":16,"tec":16,"hea":6,"fin":6,"lon":8,"set":15,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"1818320","rec":9,"gp":6,"goals":0,"assists":2,"productivity":2,"rat":"5.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[0,1,2,1,2,1,1,0,1,1,1,1,0,1,1,0,1,"1","0","1","0","-1","0","0"],"status":"","js_name":"Bobby Roberts","ti":"0","ti_dif":0},{"id":"77782637","club":"318615","no":"6","ban":"0","ban_points":"0","inj":null,"name":"J\u00f3zef Barna","routine":"9.7","retire":"0","nat":"<img src=\"\/pics\/flags\/gradient\/pl.png\" alt=\"Flag [pl]\" class=\"flag\"\/>","age":"20.10","months":"10","fp":"M\/OM R","asi":2750,"country":"pl","str":12,"sta":9,"pac":13,"mar":12,"tac":11,"wor":13,"pos":6,"pas":13,"cro":9,"tec":7,"hea":4,"fin":4,"lon":8,"set":7,"han":0,"one":0,"ref":0,"ari":0,"jum":0,"com":0,"kic":0,"thr":0,"trans":0,"wage":"56809","rec":5,"gp":0,"goals":0,"assists":0,"productivity":0,"rat":"0.00","mom":0,"cards":0,"ga":0,"scout":"4","txt":"","plot":[16,"16","17","18","17","19","17","19"],"status":"","js_name":"J\u00f3zef Barna","ti":"19","ti_dif":"+2"}
    ]
    I merely added [ and ] around the data in the file

    Next, here's the code I used to parse it:
    Code:
    #!/usr/bin/python
    
    import json
    
    fp = open('input.json', 'r')
    json_obj = json.load(fp)
    fp.close()
    
    #import pprint
    for record in json_obj:
        print(record[u'name'] + " " + record[u'age'])
        #pprint.pprint(record)
    I've commented out the code that calls pprint, but you can uncomment it to see what the structure of each record is like.

    What json.load() does is load up a file and convert it into a python object (in this case, an array of dictionary objects, each of which has other data bits).

    Then we simply use a for loop to loop through the array and print out the values of specific dictionary keys. Note that the keys are in unicode (which is why they are specified as u'name' and u'age' instead of 'name' and 'age'. This is because JSON is supposed to work in unicode per the spec. To convert to ASCII keys, see http://stackoverflow.com/questions/9...ork-with-ascii for details)

    The nice thing about this approach is that it is very clean and parses all 3 records correctly. Hope this helps.
    Last edited by Scorpions4ever; December 18th, 2012 at 01:59 PM.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    Thank you Scorpions for a different solution. I will look into this tomorrow.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    Originally Posted by skyblues
    Thank you Scorpions for a different solution. I will look into this tomorrow.
    Hi Scorpions,

    I tried your code but I receive the following errors. Not sure what I've done wrong but guess it must be something to do with data?
    Thank you for the help.

    Code:
    Traceback (most recent call last): File "C:/Python33/tester1.py", line 5, in <module> json_obj = json.load(fp) File "C:\Python33\lib\json\__init__.py", line 264, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "C:\Python33\lib\json\__init__.py", line 309, in loads return _default_decoder.decode(s) File "C:\Python33\lib\json\decoder.py", line 352, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Python33\lib\json\decoder.py", line 370, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    436
    Rep Power
    67
    Originally Posted by skyblues
    I tried your code but I receive the following errors. Not sure what I've done wrong but guess it must be something to do with data?
    You didnít read Scorpionís post through and omitted the first step: the need to insert brackets around the data.

    If you canít or donít want to tamper with the data in the file, you might try adding the brackets before parsing:

    Code:
    import json
    
    buff = '[' + '\n'.join(open('test.dat', 'r').readlines()) + ']'
    json_obj = json.loads(buff)
    
    for record in json_obj:
        print(record[u'name'] + " " + record[u'age'])
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (work laptop), Trisquel GNU/Linux 6.0.1 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    SuperOscar thank you for the reply. I double checked and I have included [ at the beginning and ] end of the text file. I must be doing something else wrong. As I explained in the opening post I'm new to Python so I could be easily making a silly error somewhere else. I will try your solution. Thank you all again for your patience and expert advice whilst dealing with a beginner.
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    SuperOscar I receive a similar error message when I try your code.

    Does the the text file need to be saved in a certain format or am I barking up the wrong tree?

    Here's the error code:
    Traceback (most recent call last):
    File "C:\Python33\tester1.py", line 5, in <module>
    json_obj = json.loads(buff)
    File "C:\Python33\lib\json\__init__.py", line 309, in loads
    return _default_decoder.decode(s)
    File "C:\Python33\lib\json\decoder.py", line 352, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    File "C:\Python33\lib\json\decoder.py", line 370, in raw_decode
    raise ValueError("No JSON object could be decoded")
    ValueError: No JSON object could be decoded

    I hope I'm not testing everyone's patience with these newbie errors. Thank you all.
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    436
    Rep Power
    67
    Originally Posted by skyblues
    SuperOscar I receive a similar error message when I try your code.
    Well, itís hard to say from afar. I copied-and-pastied the data you provided in the first post in this thread into a file and used the code I provided, and the output was:

    Code:
    Mahmoud El-Shazly 32.00
    Bobby Roberts 30.08
    Jůzef Barna 20.10
    So I guess the data in your file differs somehow from that you copied in here.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (work laptop), Trisquel GNU/Linux 6.0.1 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    25
    Rep Power
    0
    Thank you for the reply SuperOscar. I've taken the data from my original post and still receive the same errors. If I save the data as ANSI file I do receive less errors.

    Traceback (most recent call last):
    File "C:\Python33\tester1.py", line 8, in <module>
    print(record[u'name'] + " " + record[u'age'])
    TypeError: list indices must be integers, not str

    Would it be to do with the version of Python I'm using? (3.3.0)
  28. #15
  29. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,625
    Rep Power
    4247
    ANSI file?? What exactly are you using to edit your files? That might help explain what's going on.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo