Thread: Longest words

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    13
    Rep Power
    0

    Longest words


    Hi everyone!

    I've got the following problem: I want to find ALL the longest words in a text.
    My code works, but he returns only one word which has the longest length. But as mentioned: I'd like to see all these words with the longest length. How can I achieve this?

    Code:
    import sys 
    
    def main():
    	max_word = ''
    	max_word_length = 0
    
    	filename = sys.argv[1]
    	
    	infile = open(filename, 'r')
    	
    	for line in infile:
    		line_list = line.split()
    		
    		for word in line_list:
    			if len(word) > max_word_length:
    				max_word = word
    				max_word_length = len(word)
    				
    	infile.close()
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    19
    Rep Power
    0
    First, just to make sure I understand the specification correctly, you want to find the longest word length in the file and then find all the words of that length.

    Assuming you know how to do the rest, here's a function to do what you're describing given a list of words.

    Code:
    def find_max_words(words):
        'Given list of words, returns list of words of max length'
        max_len = len(max(words, key=len))
        return [wrd for wrd in words if len(wrd) == max_len]
    While your code does efficiently find the max word in a list, inorder to find all the max words the best way seems to be to iterate through the list twice, which is what my code does using a builtin max() function and a list comprehension. Both efficient and clean. You can easily do an implimentation where it only iterates over the list once, but it wouldn't be as efficient especially for worst case scenario.

    Ahhh... what the hell I got nothing better to do.

    Code:
    def find_max_words(words):
        'Given list of words, returns list of words of max length'
        max_len = 0
        max_words = []
        for word in words:
            word_len = len(word)
            if word_len > max_len:
                max_len = word_len
                max_words = []
            if word_len == max_len:
                max_words.append(word)
        return max_words
    It goes through once but I bet if you time it will be consistently slower than the other one, aside from being much uglier.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    13
    Rep Power
    0
    Question to your first proposition: Where is the problem that the following function only find 1 word (which is not the word with the longest length...)?

    Code:
    def maxi():
    	filename = sys.argv[1]
    	infile = open(filename, 'r')
    	for line in infile:
    		words = line.split()
    		max_len = len(max(words, key=len))
    		return [word for word in words if len(word) == max_len]
    				
    	infile.close()
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,931
    Rep Power
    481
    Let's trace the execution.
    Open a file named on command line. Good.

    for line in infile:
    read first line of input, assign it to line, execute statements in the block.

    words = line.split()
    split the string into words. Fine.


    max_len = len(max(words, key=len))
    I did not know that max took a "key" argument, but whatever. You've found length of the longest word.


    return [word for word in words if len(word) == max_len]
    make a list of all long words.
    AND RETURN THAT LIST FROM THE FUNCTION.



    I dislike the "scan the input twice" approach, russ123's "Ah what the hell..." code has got to be much faster. Fixing your recent code, read the whole file instead of line-by-line.
    Code:
    import sys
    
    def maxi():
        filename = sys.argv[1]
        with open(filename, 'r') as infile:
            words = infile.read().split()
        max_len = len(max(words, key=len))
        return [word for word in words if len(word) == max_len]
    
    print(maxi())
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    13
    Rep Power
    0
    Yeah, thanks very much! I saw my faults - now everything works
    I've got one last question: What / how should the code be changed, if we'd like to find words by giving their stems? (ex. find('comp') --> computer, computing [etc.])

    ...well, this is nothing I'm required to do, but I'm just wondering.. =)
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    19
    Rep Power
    0
    I'm a little confused about both posts. I'm not sure what Marc is asking.

    b49 you said you disliked the "iterate over list twice" approach and after checking it I realized you were right. The iterate ver list once approach is more efficient for average case scenario. But then I see you use the iterate twice approach in your code. I thought max() could be considered as an iteration even though it's a builtin function. Anyways I might just be confused.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    19
    Rep Power
    0
    Originally Posted by MarcF6
    Yeah, thanks very much! I saw my faults - now everything works
    I've got one last question: What / how should the code be changed, if we'd like to find words by giving their stems? (ex. find('comp') --> computer, computing [etc.])

    ...well, this is nothing I'm required to do, but I'm just wondering.. =)
    I think the correct answer might be something like regular expressions...? I dinno, anyway regex is too hard for me, this is how i'd do it.

    Code:
    def from_stem(words, stem):
        end = len(stem)
        result = []
        for word in words:
            if word[:end] == stem:
                result.append(word)
        return result
    ...or...

    Code:
    def from_stem(words, stem):
        end = len(stem)
        return [word for word in words if word[:end] == stem]
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    13
    Rep Power
    0
    Thanks!
    I've a similar solution. (my function only needs the stems, because the words come from a file).
    Thanks all of you for your help.
    Everything works fine
  16. #9
  17. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,931
    Rep Power
    481
    Yeah, my code iterated twice through the data. I was merely trying to fix this post "Today 05:21 PM" using MarkF6's idea.

    The string methods startswith and endswith could also work.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo