#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    1
    Rep Power
    0

    Finding word frequencies of list of words in text file


    i have a list of word pairs in the file result.txt:

    the of
    the by
    they is
    group their
    and so on.. I need to check for their pairwise occurrences in a directory with multiple files(at most one occurrence per file), and print the pair and their frequency count,in decreasing order of the frequency count.

    import os
    import re
    from collections import Counter
    from glob import iglob
    from collections import defaultdict
    import itertools as it

    folderpath = 'path/to/directory'
    pairs=defaultdict(int)

    logfile = open('result.txt', 'r')
    loglist = logfile.readlines()
    logfile.close()
    found = False
    for line in loglist:
    for filepath in iglob(os.path.join(folderpath,'*.txt')):
    with open(filepath,'r') as filehandle:
    for pair in it.combinations(re.findall('\w+',line),2):
    pairs[tuple(pair)]+=1
    found=True
    resultList=[pair+(occurences, ) for pair, occurences in pairs.iterkeys()]
    the output must be of the form:

    group their 205
    they is 180
    the of 56
    and so on...

    plz help..i am lost
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    19
    Code:
    i = open ('/path/to/file', 'r')
    x = {}
    for line in i:
        line = line.rstrip()
        if line in x:
            x[line] += 1
        else:
            x[line] = 1
    
    i.close()
    
    for k in sorted(x, key=x.get, reverse=True):
      print (k, x[k])

IMN logo majestic logo threadwatch logo seochat tools logo