#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    23
    Rep Power
    0

    Need help in file processing


    Hello all, I just started learning Python and I get stuck on a basic file processing :
    I have to convert a file format, the program should read a input file and after processing (which counts the number of each term in each group, see below), yield a output file.
    Here is the format of the input file:

    Group1: somo|112345478 somo|734567233 homo|233689876
    Group2: somo|904686712 somo|891145662 somo|106736432
    Group3: aomo|397634567 aomo|123446789
    Group4: aomo|905672345 aomo|120846789
    ........
    ..
    From the above input, the output file should look like (the numbers after the pipe should not be considered):

    somo homo aomo
    Group1 2 1 0
    Group2 3 0 0
    Group3 0 0 2
    Group4 0 0 2

    I get stuck on this file conversion for sometime now and unable to figure it out. Any guidance is highly appreciated. Thanks in advance.. Apologies if this post doesn't belong here...
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    1
    Rep Power
    0

    i am new to python, so below one is complex but should work


    Code:
    import re
    l=[]
    dict={}
    with open("a.txt","r") as f:
     for line in f:
      items=line.split(":")
      key=items[0]
      if key not in dict:
       dict[key]={}
      string=items[1]
      words=re.findall("\S+\|\S+",string)
      for w in words:
       tmp=w.split("|")
       if tmp[0] not in l:
        l.append(tmp[0])
       if tmp[0] in dict[key]:
        dict[key][tmp[0]]=1+dict[key][tmp[0]]
       else:
        dict[key][tmp[0]]=1
    for i in sorted(l):
     print(i,end=" ")
    print("")
    for k in sorted(dict.keys()):
     print(k,end=" ")
     for i in sorted(l):
      if i in dict[k]:
       print(dict[k][i],end=" ")
      else:
       print("0", end=" ")
     print("")
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    23
    Rep Power
    0
    yeeeeeeeppp ..It works......... thank u very much..
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    439
    Rep Power
    67
    Somewhat shorter:

    Code:
    from collections import OrderedDict
    labels = ('somo', 'homo', 'aomo')
    with open('a.txt', 'r') as f:
        result = OrderedDict()
        for line in f:
            group, entries = line.split(':')
            data = [entry.split('|')[0] for entry in entries.strip().split(' ')]
            count = {}
            for lbl in labels:
                count[lbl] = data.count(lbl)
            result[group] = count
    print('\t{0}'.format('\t'.join(labels)))
    for group, data in result.items():
        print('{0}:\t{somo}\t{homo}\t{aomo}'.format(group, **data))
    My armada: openSUSE 13.2 (home desktop, work desktop), openSUSE 13.1 (home laptop), Debian GNU/Linux 7.7.0 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)

IMN logo majestic logo threadwatch logo seochat tools logo