#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0

    How to filter csv.reader data?


    I'm a little lost in how to modify the content loaded into the csv.reader. Any help be appreciated..

    Thanks..

    Code:

    with open(infile, 'rb') as f:
    output = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
    out_csv = csv.writer(open('C:\BHV\BHV_output.csv', 'ab'))


    out_csv.writerows(output)


    # for row in output:
    # for col in row:
    # row = row.replace("notused","")

    # out_csv.writerow(new_row)

    # out_csv.writerows(replace(output,'notused','')
    # output = output[:-3]


    I would like suggestions to modify the output from:

    inotusedInbound,1350324983,,0054,6629,anonymous,0,1350324983.35,1350325007.758,Success,English,IVRH UP,10,NA,Y,Y

    ~notusedPrompt-and-Collect,1350325024,,0058,6629,Gather Customer Information,1350325025.419,1350325026.834,DTMF,2,Valid,100,1,NA,Y,N


    To look like this: (Remove blank line and remove characters from the first column)

    Inbound,1350324983,,0054,6629,anonymous,0,1350324983.35,1350325007.758,Success,English,IVRHUP,10,NA, Y,Y
    Prompt-and-Collect,1350325024,,0058,6629,Gather Customer Information,1350325025.419,1350325026.834,DTMF,2,Valid,100,1,NA,Y,N


    What seemed simple just left me digging for the correct funcion calls to modify the data before output to file..

    Thanks,
    Dave.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    439
    Rep Power
    67
    Adding CODE tags...

    Originally Posted by djonesyyz
    Code:
    with open(infile, 'rb') as f:
            output = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
    (I find it a trifle strange that you are READING a file named OUTPUT!)

    To ignore blank lines you might check the length of the list read:

    Code:
        for row in in_csv:
            if len(row) > some_margin_value:
                # now do something
    I wasn’t sure of your other question. Does the “notused” always appear in the first column only? Is it always prepended with a single char that should be ignored as well? If so, do:

    Code:
    row[0] = row[0][1:].replace('notused', '')
    where “[1:]” slices out the first character and .replace() replaces the given literal string with an empty string.
    My armada: openSUSE 13.2 (home desktop, work desktop), openSUSE 13.1 (home laptop), Debian GNU/Linux 7.7.0 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    I also understood your questions poorly. You've created an interesting structure of nested generators. I created an intermediate step because you didn't show any of the input. SuperOscar's alternatives to ignore blank (or short) lines could well be more useful than my regular expression, as could the proposed "remove bad fields" algorithm.
    Code:
    # untested.
    # probably has unicode str bytes confusion.
    # I didn't keep track of which lines might end in a new line.
    
    import re
    import io
    import csv
    
    isBlank = re.compile(u'^[ \t]*$').match
    bad = u'notused'
    
    with open(infile, 'rb') as f:
        output = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
    
    with io.StringIO() as buffer:
        with csv.writer(buffer) as middle_man:
            middle_man.writerows(output)
    
    with open('C:\BHV\BHV_output.csv','ab') as out_csv:
        for LINE in middle_man:
            if isBlank(LINE):
                continue
            if LINE.startswith(bad): # or notused~ or whatever
                LINE = LINE[len(bad):]
            out_csv.write(LINE)
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by b49P23TIvg
    I also understood your questions poorly. You've created an interesting structure of nested generators. I created an intermediate step because you didn't show any of the input. SuperOscar's alternatives to ignore blank (or short) lines could well be more useful than my regular expression, as could the proposed "remove bad fields" algorithm.
    Code:
    # untested.
    # probably has unicode str bytes confusion.
    # I didn't keep track of which lines might end in a new line.
    
    import re
    import io
    import csv
    
    isBlank = re.compile(u'^[ \t]*$').match
    bad = u'notused'
    
    with open(infile, 'rb') as f:
        output = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
    
    with io.StringIO() as buffer:
        with csv.writer(buffer) as middle_man:
            middle_man.writerows(output)
    
    with open('C:\BHV\BHV_output.csv','ab') as out_csv:
        for LINE in middle_man:
            if isBlank(LINE):
                continue
            if LINE.startswith(bad): # or notused~ or whatever
                LINE = LINE[len(bad):]
            out_csv.write(LINE)
    Creative piece of code! I tried it out and got the following message:

    with csv.writer(buffer) as middle_man:
    AttributeError: __exit__
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    I'm afraid I've presented an invalid mix of python2 and python3.
    [code]Code tags[/code] are essential for python code and Makefiles!
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    Ok.. I'm now a little closer.. Having a problem with the .csv.writerows not writing any data back out. The print statements show the data:

    wnotusedPrompt-and-Collect
    wnotusedPrompt-and-Collect
    Prompt-and-Collect
    lnotusedInbound
    lnotusedInbound
    Inbound

    So successfully stripped out the leading data chars and notused string.. But the data file written contains no data.. ugg...

    Code:
    with open(filename, 'rb') as f:
    #   reader = csv.reader(f, delimiter='|', quoting=csv.QUOTE_NONE)
        file_input = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
        for row in file_input:
            print row[0]
            print row[0][1:]
            row[0] = row[0][2:].replace('notused', '')
            print row[0]
    out_csv = csv.writer(open('C:\Users\dajones\workspace\BHV_T1\Read1\BHV_output.csv', 'ab'))
    out_csv.writerows(file_input)
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    439
    Rep Power
    67
    Originally Posted by djonesyyz
    So successfully stripped out the leading data chars and notused string.. But the data file written contains no data.. ugg...
    Well, of course it doesn’t. You’ve already exhausted the input in the “for” loop above, creating and printing and then discarding (since after printing you don’t actually do anything with the row!) each row in its turn. Then you ask csv.writer to save to a file anything that csv.reader gives, although you’ve already encountered an EOF there.

    Add CSV write commands inside the reading loop and write each row immediately after changing it.
    My armada: openSUSE 13.2 (home desktop, work desktop), openSUSE 13.1 (home laptop), Debian GNU/Linux 7.7.0 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0
    That explains it.. Lesson learned about exhausting the input. I got a little stuck because I was using .writerows vs. .writerow... Thanks.. for the help..

    Code:
    import re, io, csv
    #filename=raw_input('Please enter BHV log file:')
    #file_output=raw_input('Please enter output file name:') 
    filename = 'BHV151110-00060-'
    file_output= 'BHV_output'
    out_csv = csv.writer(open('C:\Users\dajones\workspace\BHV_T1\Read1\BHV_output.csv', 'ab'))
    
    
    with open(filename, 'rb') as f:
        file_input = csv.reader( (line.replace('\0','') for line in f) , delimiter='|',quotechar = '"')
        for row in file_input:
            row[0] = row[0][2:].replace('notused', '')
            out_csv.writerow(row)

IMN logo majestic logo threadwatch logo seochat tools logo