Thread: Sorting .csv

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    5
    Rep Power
    0

    Sorting .csv


    Hi everyone!

    I just started writing my first program to automate some processes I have to do all the time. The only programming I have ever done is one class in college (and that was VB, lol), and another part of this one that I finished already. I'm not lazy. I'm doing research on how to get this part done, but it's just so slow of a process for me without programming experience. So if you could point me to some useful links, give me some code, or just flat out write this for me that would be awesome!

    OK, I have a .csv file with five columns (number, x, y, z, name). What I would like to do is take all of the rows where the name is a, sort those based on the x column from smallest to largest, then take all of the rows where the name is b, and sort those largest to smallest. All in one file, as shown in the example.

    For example, I may start with:
    1 50 -45 1 a
    2 10 15 0 b
    3 10 -15 0 a
    4 50 45 0 b
    5 100 46 1 b
    6 100 -52 0 a

    I want to end with:
    3 10 -15 0 a
    1 50 -45 1 a
    6 100 -52 0 a
    5 100 46 1 b
    4 50 45 0 b
    2 10 15 0 b

    Any help would be greatly appreciated. Until then, I'll continue at this painfully slow pace, lol!
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2005
    Location
    U.S A.Z.
    Posts
    24
    Rep Power
    0
    I would check out the csv module and just take some time to learn the very most basics of python. If you do this I don't think it will take too long to learn what you need to make this work. Which is part of the beauty of python, there are library's it seems for everything.
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,960
    Rep Power
    481
    your example sure looks like it's white space separated, not comma separated.

    Basically, and I wouldn't have stated it more clearly, I agree with knutrainer post 2.


    If you could read the data into a bunch of lines, you could use

    Lines.sort(key = cleverKey)

    after defining
    Code:
    def cleverKey(a):
        fields = a.split()
        fields[-1:] + [float(field) for field in fields[:-1]]
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    5
    Rep Power
    0
    SOME progress has been made. Here is what I have ..

    Code:
    import csv
    import sys
    
    fieldnames = ["1","0","0","0","entry"] 
    
    surveyfile = open("source.csv", "r")
    reader = csv.DictReader(surveyfile, delimiter=",")
    writer = open("left.csv",'wb')
    writer = csv.DictWriter(writer, fieldnames, delimiter=",")
    
    for row in reader:
        if row["entry"] == "l":
            writer.writerow(row)
    This is ALMOST getting me exactly what I want. The problem is that in the source file, there are no titles and the first row could be anything. Is there a way I can refer to those columns as I see it in excel? ie..

    If the source file is like this:
    1,0,0,0,entry
    2,400,1,1,cp
    3,20,-10,0,l

    I may (in another source file) have:
    1,400,1,1,cp
    2,0,0,0,entry
    3,20,-10,0,l

    I want to refer to these columns as A,B,C,D,E or something static for the fieldnames rather than what I have now. Also, when it copies, it cannot distinguish between the different "0" fieldnames. The output right now is:

    3,-10,-10,-10,l

    I realize I may have done a terrible job explaining that. Let me know if I need to rephrase my question.

    ****EDIT - SOLVED***
    Sorry, realized it after I posted. Answer was to define what I wanted to call the fieldnames in the DictReader. New code:

    Code:
    import csv
    import sys
    
    fieldnames = ["A","B","C","D","E"]
    
    surveyfile = open("source.csv", "r")
    reader = csv.DictReader(surveyfile, fieldnames=["A","B","C","D","E"], delimiter=",")
    writer = open("left.csv",'wb')
    writer = csv.DictWriter(writer, fieldnames, delimiter=",")
    
    for row in reader:
        if row["E"] == "l":
            writer.writerow(row)
    Last edited by kingsrook; December 4th, 2012 at 10:08 AM. Reason: SOLVED
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,960
    Rep Power
    481
    Code:
    import csv
    import sys
    
    fieldnames = 'abcde'
    
    surveyfile = open("source.csv", "r")
    reader = csv.DictReader(surveyfile, fieldnames=fieldnames, delimiter=",")
    writer = open("left.csv",'wb')
    writer = csv.DictWriter(writer, fieldnames, delimiter=",")
    
    for row in reader:
        if row["e"] == "l":
            writer.writerow(row)
    I presume you can permute or omit names from the DictWriter fieldnames .
    [code]Code tags[/code] are essential for python code and Makefiles!
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    5
    Rep Power
    0
    [QUOTE
    I presume you can permute or omit names from the DictWriter fieldnames .
    [/QUOTE]

    No, I need all the columns. The code I put at the end of my last reply successfully strips out all of the rows that have an "l" in column E, and moves them into a new file named "left.csv". Now I need to sort all of the data in left.csv from largest to smallest based on column B.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    5
    Rep Power
    0
    I presume you can permute or omit names from the DictWriter fieldnames .
    No, I need all the columns. The code I put at the end of my last reply successfully strips out all of the rows that have an "l" in column E, and moves them into a new file named "left.csv". Now I need to sort all of the data in left.csv from largest to smallest based on column B. Any hints?
  14. #8
  15. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,960
    Rep Power
    481
    can permute the fieldnames, not MUST permute them.

    Code:
    >>> a=[{1:3},{1:5},{1:2}]
    >>> a.sort(key=lambda x:x[1])
    >>> a
    [{1: 2}, {1: 3}, {1: 5}]
    >>>
    [code]Code tags[/code] are essential for python code and Makefiles!
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    5
    Rep Power
    0
    Update: I'm getting closer. But for some reason when I run the code below, I only get results in the "left.csv" file - NOTHING in the "right.csv" file. But if I take the code out that has to do with making the right.csv file and run it by itself, it works fine. Do I need some kind of break after that first for loop? A second reader?

    Code:
    import csv
    import sys
    import operator
    
    fieldnames = ["A","B","C","D","E"]
    surveyfile = open("source.csv", "r")
    left_file = open("left.csv",'wb')
    right_file = open("right.csv",'wb')
    
    
    reader = csv.DictReader(surveyfile, fieldnames=fieldnames, delimiter=",")
    left_writer = csv.DictWriter(left_file, fieldnames, delimiter=",")
    sortedlefts = sorted(reader,key=lambda x:float(x["B"]))
    
    right_writer = csv.DictWriter(right_file, fieldnames, delimiter=",")
    sortedrights = sorted(reader,key=lambda x:float(x["B"]), reverse=True)
    
    for row in sortedlefts:
        if row["E"] == "l":
            left_writer.writerow(row)
    	
    
    for row in sortedrights:
        if row["E"] == "r":
            right_writer.writerow(row)
    Solved by adding surveyfile.seek(0,0) in between the writers.
    Last edited by kingsrook; December 5th, 2012 at 10:29 AM. Reason: Solved.

IMN logo majestic logo threadwatch logo seochat tools logo