### Thread: Return common words in two files

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2013
Posts
1
Rep Power
0

#### Return common words in two files

I've been looking around for an answer to this but have had no luck. I need to take two files and print the top most frequent words they have in common as well as their combined(sum) frequencies. This might be simple but I'm pretty new to programming. Any help?

Code:
```def mostFrequent(word,frequency,n):
my_list = zip(word,frequency) #combine the two lists
my_list.sort(key=lambda x:x[1],reverse=True) #sort by freq
words,freqs = zip(*my_list[:n]) #take the top n entries and split back to seperate lists
return words, freqs #return our most frequent words in order

from wordFrequencies import * #gives both the word and its frequency in a file
L1 = wordFrequencies('file1.txt')
words1 = L1[0]
freqs1 = L1[1]
L2 = wordFrequencies('file2.txt')
words2 = L2[0]
freqs2 = L2[1]
print mostFrequent(words,freqs,20)

L1 = WordFrequencies('file1.txt')#what I tried
words1 = set(L1[0])
freqs1 = set(L1[1])
L2 = WordFrequencies('file2.txt')
words2 = set(L2[0])
freqs2 = set(L2[1])
words3 = words1.intersection(words2)
freqs3 = freqs1.intersection(freqs2)
print mostFrequent(words3,freqs3,20)```
It didn't work. It outputed the wrong words.
2. sets are unordered. These statements

words1 = set(L1[0])
freqs1 = set(L1[1])

break the correlation between words and frequencies in

words1 and freqs1

Stick with dictionaries. I haven't explored your code beyond that. You could save us time by showing a small example of L1, L2, and the expected output. Otherwise we have to implement wordFrequencies, guessing what it's output should be.

For me, this involves looking up default dictionary (searching for container since I didn't recall the module is named collections), finding some good sample texts, reading the files, remove punctuation, split words, count them, convert from dictionary to lists. Many simple steps but many, plus test code and debugging. After that there's still the guesswork about the result of wordFrequencies. Oh, and before splitting the text conversion to common case would be useful. (Your problem specification didn't indicate "position within a sentence".) What else will I remember after a few tries?