#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    15
    Rep Power
    0

    Check if Items in a List are Part of Items in A Bigger List


    list1 = ['a', 'b', 'c', 'd', 'e']
    list2 = ['b', 'c']

    I want to compare list1 and list2. All elements of list2 have to be in list1 but not vice versa.

    Also, an element in list2 can not have more occurrences than list1 does. So if b shows up two times in list1, list 2 can not have 3 or more.

    Any ideas?

    Thanks
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    3
    This seems more of a job for sets than lists. This of course makes the assumption that all items in your lists will be unique but:
    Code:
    >>> first = {'a', 'b', 'c', 'd', 'e'}
    >>> second = {'b', 'c'}
    >>> first.issubset(second)
    False
    >>> second.issubset(first)
    True
    >>>
    If the values are allowed multiple times as you seem to want, you will have to do some dancing around with dictionaries and counting items. Make an attempt and post back.

    -Mek
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    15
    Rep Power
    0
    Originally Posted by Mekire
    This seems more of a job for sets than lists. This of course makes the assumption that all items in your lists will be unique but:
    Code:
    >>> first = {'a', 'b', 'c', 'd', 'e'}
    >>> second = {'b', 'c'}
    >>> first.issubset(second)
    False
    >>> second.issubset(first)
    True
    >>>
    If the values are allowed multiple times as you seem to want, you will have to do some dancing around with dictionaries and counting items. Make an attempt and post back.

    -Mek
    Will do, I will try it out and get back to you. Thanks for the help, much appreciated.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    80
    Rep Power
    3
    *whistles away to find his list intersection function*
    *comes back smiling*

    Okay, so awhile back I defined a function to find the intersection of two lists:

    Code:
    def intersect(a,b):
    	#Gives the intersection of two sets, a and b.
    	out=list(v for v in a if v in b)
    	for v in out:
    		if out.count(v) > min(a.count(v),b.count(v)):
    			out.remove(v)
    	return out
    The code can easily be modified to do what you want. This one will do the counts also, and doesn't rely on dictionaries or any of that.

    Code:
    def inclusive(little,big):
    	#Figures out if all items in the list 'little' are represented in the list 'big'
    	if not set(little).issubset(set(big)): #Simple test, to determine if little list has extra inclusions
    		return False
    	for item in set(little): #set() makes sure we only check once.
    		if little.count(item) > big.count(item):
    			return False #The little list has more instances of 'item' than it should.
    	return True #If these tests are passed, the list is purely inclusive.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    3
    This one will do the counts also, and doesn't rely on dictionaries or any of that.
    You are missing the point of counting with dictionaries I believe.

    Your code does not scale. It is O(N^2). Try using it on much bigger lists and you will see what I mean. However, there is actually an even bigger problem. You are changing the size of the list out while you are iterating over it. This is actually giving incorrect results in certain cases. You need to iterate over a copy of the list if you plan to change its size mid-iteration.

    Let's compare the intersect function with one that counts first using a dictionary:
    python Code:
    import timeit
    from random import randint
     
    def count_items(mylist):
        count_dict = {}
        for item in mylist:
            count_dict[item] = count_dict.get(item,0)+1
        return count_dict
     
    def intersect(a,b):
        result = []
        a_dict = count_items(a)
        b_dict = count_items(b)
        for item in a_dict:
            if item in b_dict:
                for i in xrange(min(a_dict[item],b_dict[item])):
                    result.append(item)
        return result
     
    def mr909_intersect(a,b):
    	#Gives the intersection of two sets, a and b.
    	out=list(v for v in a if v in b)
    	for v in out:
    		if out.count(v) > min(a.count(v),b.count(v)):
    			out.remove(v)
    	return out
    First let's see an example of an incorrect result. We'll test with this:
    python Code:
    if __name__ == "__main__":
        a = [randint(0,1000) for _ in xrange(1000)]
        b = [randint(0,1000) for _ in xrange(1000)]
        test = intersect(a[:],b[:])
        print("My intersect:")
        print(sorted(test))
        print("Length: {}\n".format(len(test)))
        print("Your interesect:")
        test = mr909_intersect(a[:],b[:])
        print(sorted(test))
        print("Length: {}\n".format(len(test)))
    Code:
    >>> 
    My intersect:
    [2, 4, 13, 15, 22, 25, 31, 32, 35, 35, 36, 38, 38, 41, 42, 45, 45, 46, 48, 49, 51, 51, 52, 58, 61, 61, 62, 66, 66, 67, 69, 69, 70, 71, 76, 79, 79, 79, 86, 86, 90, 90, 92, 93, 95, 96, 96, 98, 99, 100, 101, 102, 104, 105, 107, 109, 110, 113, 115, 116, 117, 118, 131, 131, 132, 134, 135, 135, 137, 138, 138, 144, 145, 147, 150, 152, 152, 153, 163, 167, 173, 173, 175, 178, 179, 180, 184, 185, 185, 185, 189, 192, 192, 193, 198, 199, 200, 202, 204, 205, 207, 208, 211, 214, 215, 217, 219, 229, 229, 231, 234, 235, 238, 242, 244, 245, 246, 249, 250, 252, 254, 259, 260, 263, 263, 265, 265, 269, 269, 271, 274, 279, 279, 280, 280, 281, 282, 284, 284, 293, 296, 297, 298, 300, 301, 301, 305, 305, 308, 312, 313, 316, 318, 318, 321, 323, 330, 334, 338, 341, 343, 348, 348, 349, 351, 352, 355, 355, 356, 359, 361, 373, 373, 374, 374, 375, 378, 379, 379, 380, 384, 385, 391, 393, 394, 395, 398, 400, 401, 403, 405, 406, 408, 408, 409, 411, 412, 413, 416, 417, 419, 423, 424, 426, 427, 427, 429, 431, 432, 440, 440, 440, 441, 444, 447, 449, 451, 453, 453, 457, 458, 461, 463, 465, 465, 466, 469, 469, 469, 472, 473, 474, 479, 481, 481, 486, 487, 488, 494, 498, 499, 500, 503, 504, 505, 506, 506, 509, 510, 510, 514, 515, 518, 520, 521, 524, 527, 530, 530, 531, 532, 533, 534, 538, 539, 545, 546, 548, 548, 550, 551, 553, 556, 559, 561, 564, 565, 565, 567, 571, 573, 575, 576, 578, 582, 584, 586, 586, 589, 589, 592, 598, 600, 603, 605, 606, 606, 608, 608, 611, 612, 620, 630, 630, 631, 635, 635, 637, 638, 643, 644, 648, 649, 652, 652, 653, 656, 658, 663, 665, 666, 666, 667, 667, 669, 670, 671, 674, 675, 678, 681, 683, 684, 685, 693, 696, 698, 700, 701, 703, 704, 706, 711, 713, 714, 715, 715, 715, 716, 717, 718, 719, 720, 722, 723, 724, 725, 731, 732, 734, 742, 743, 747, 748, 749, 750, 751, 752, 752, 756, 758, 759, 760, 763, 765, 767, 769, 773, 774, 779, 788, 789, 790, 791, 794, 795, 797, 802, 805, 806, 807, 811, 814, 815, 816, 818, 819, 825, 825, 827, 838, 840, 841, 842, 843, 844, 844, 847, 848, 849, 850, 850, 853, 855, 856, 858, 862, 865, 865, 867, 872, 873, 875, 878, 879, 881, 883, 886, 887, 888, 889, 890, 893, 894, 897, 897, 900, 902, 904, 904, 905, 906, 907, 909, 909, 911, 912, 913, 914, 915, 918, 922, 925, 928, 930, 930, 931, 931, 934, 939, 939, 942, 951, 953, 954, 955, 961, 963, 967, 977, 979, 981, 985, 986, 986, 991, 994, 997, 998]
    Length: 479
    
    Your interesect:
    [2, 4, 13, 15, 22, 25, 31, 32, 35, 35, 36, 38, 38, 41, 42, 45, 45, 46, 48, 49, 51, 51, 52, 58, 61, 61, 62, 66, 66, 67, 69, 69, 70, 71, 76, 79, 79, 79, 86, 86, 90, 90, 92, 93, 95, 96, 96, 98, 99, 99, 100, 101, 102, 104, 105, 107, 109, 110, 113, 115, 116, 117, 118, 131, 131, 132, 134, 135, 135, 137, 138, 138, 144, 145, 147, 150, 152, 152, 153, 163, 167, 173, 173, 175, 175, 178, 179, 180, 184, 185, 185, 185, 189, 192, 192, 193, 198, 199, 200, 202, 204, 205, 207, 208, 211, 214, 215, 217, 219, 229, 229, 231, 234, 235, 238, 242, 244, 245, 246, 249, 250, 252, 254, 259, 260, 263, 263, 265, 265, 269, 269, 271, 274, 279, 279, 280, 280, 281, 282, 284, 284, 293, 296, 297, 298, 300, 301, 301, 305, 305, 308, 312, 313, 316, 318, 318, 321, 323, 330, 334, 338, 341, 343, 343, 348, 348, 349, 351, 352, 355, 355, 356, 359, 361, 373, 373, 374, 374, 375, 378, 379, 379, 380, 384, 385, 391, 393, 394, 395, 398, 400, 401, 403, 405, 406, 408, 408, 409, 411, 412, 413, 416, 417, 419, 423, 424, 424, 424, 426, 427, 427, 429, 431, 432, 440, 440, 440, 441, 444, 447, 449, 451, 453, 453, 457, 458, 461, 463, 465, 465, 466, 469, 469, 469, 472, 473, 474, 479, 481, 481, 486, 487, 487, 488, 494, 498, 499, 500, 503, 504, 505, 506, 506, 509, 510, 510, 514, 515, 518, 520, 521, 524, 527, 530, 530, 531, 532, 533, 534, 538, 539, 545, 546, 548, 548, 550, 551, 553, 556, 559, 561, 564, 565, 565, 567, 571, 573, 575, 576, 578, 582, 584, 586, 586, 589, 589, 592, 598, 600, 603, 605, 606, 606, 608, 608, 611, 612, 620, 630, 630, 631, 635, 635, 637, 638, 643, 644, 648, 649, 652, 652, 653, 656, 658, 663, 665, 666, 666, 667, 667, 669, 670, 671, 674, 675, 678, 681, 683, 684, 685, 693, 696, 698, 700, 701, 703, 704, 706, 711, 713, 714, 715, 715, 715, 716, 717, 718, 719, 720, 722, 723, 724, 725, 731, 732, 734, 742, 743, 747, 748, 749, 750, 751, 752, 752, 756, 758, 759, 760, 763, 765, 767, 769, 773, 774, 779, 788, 789, 790, 791, 794, 795, 797, 802, 805, 806, 807, 811, 814, 815, 816, 818, 819, 825, 825, 827, 838, 840, 841, 842, 843, 844, 844, 847, 848, 849, 850, 850, 853, 855, 856, 858, 862, 865, 865, 867, 872, 873, 875, 878, 879, 881, 883, 886, 887, 888, 889, 890, 893, 894, 897, 897, 900, 902, 904, 904, 905, 906, 907, 909, 909, 911, 912, 913, 914, 915, 918, 922, 925, 928, 930, 930, 931, 931, 934, 939, 939, 942, 951, 953, 954, 955, 961, 963, 967, 967, 977, 979, 981, 985, 986, 986, 991, 994, 997, 998]
    Length: 486
    The first result that differs between our methods is 99 (you found 2, I found 1). But...
    Code:
    >>> a.count(99)
    2
    >>> b.count(99)
    1
    Now we can fix that easily by itterating over out[:] instead of out. So lets fix it and check timing. Using the code found at the top (save for the fix) and timing with this code:
    python Code:
    if __name__ == "__main__":
        a = [randint(0,1000) for _ in xrange(1000)]
        b = [randint(0,1000) for _ in xrange(1000)]
     
        times = 10
        print("Starting")
        T = timeit.Timer("intersect(a[:],b[:])","from __main__ import a,b,intersect")
        print("My intersect: {:.4f} s".format(T.timeit(times)/times))
        T2 = timeit.Timer("mr909_intersect(a[:],b[:])",
                          "from __main__ import a,b,mr909_intersect")
        print("Mr909 intersect: {:.4f} s".format(T2.timeit(times)/times))
    Code:
    >>> 
    Starting
    My intersect: 0.0010 s
    Mr909 intersect: 0.0548 s
    >>>
    Now let's ramp up the ranges and see how it scales. Change a and b to this:
    python Code:
    a = [randint(0,100000) for _ in xrange(10000)]
    b = [randint(0,100000) for _ in xrange(10000)]
    and now:
    Code:
    >>> 
    Starting
    My intersect: 0.0079 s
    Mr909 intersect: 3.1739 s
    >>>
    The list methods count and index are O(N). If you iterate over a list and use index or count on the elements every loop, your code instantly goes to O(N^2). This is why people use the dict approach.

    Sorry about the mile long post,
    -Mek

    Comments on this post

    • zxq9 agrees : An explanation and example worth referencing later.
    • Mr909 agrees : Eh, point taken. Your first code didn't do it, I just wanted to provide one that did. But your complete example is better. Thank you.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    15
    Rep Power
    0
    Thanks for all the input...

    Finally got around to it, and finally finished the code. What I was trying to do is create a list of all words that can be made from a given set of letters.


    Here is my final piece:

    Code:
    def get_letters(letter): # makes sure all characters are letters and removes any numbers, whitespaces or other special characters
        letters = []
        for l in letter:
            if l.isalpha():
                letters.append(l)
        return letters
            
        
    def check_word(letters, word): # checks to see if the potential word is a subset of the initial list
        w = []
        for let in word:
            w.append(let)
        w.sort()
        word_set = set(word)
        letters_set = set(letters)
        
        if word_set.issubset(letters_set):
            #print(word, 'is subset')
            return w
        else:
            #print(word, 'is not subset')
            return False
            
            
        
            
    def get_word():
        letter = input('enter your letters ') # get set of letters to generate a word list from
        letters = get_letters(letter)
        letters.sort()
        fh = open('words.txt') #opens dictionary of words
        words = [] 
        for line in fh: #puts those words into list
            words.append(line.rstrip())
        
        possibles = [] #start a list of words that can be made of the inputed letters
        for w in words:
            if len(w) > 3: #add to the list if the letters are larger than n letters long
                word = check_word(letters, w)
                if word is False:
                    pass
                else:
                    pass
                    
                    
                    count = 0
                    
                    #print('w is', w)
                    #print(); print()
                    for item in set(w): # check if the potential word can be made with the letters entered
                        if word.count(item) > letters.count(item):
                            break
                        elif count + 1== len(set(w)) and word.count(item) <= letters.count(item): #if at the end of iteration and no occurs more times than initial letter list
                            #print('appending', w)
                            possibles.append(w)
                        else:
                            count += 1
                            #print('count is', count)
            else:
                pass
        print(possibles)
    get_word()

    Might be messy but did the job. Any comments or suggestions?

    Thanks again
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    3
    I couldn't actually get yours to perform correctly, but regardless it is a bit overcomplicated.

    Take a look at the following (python 3). I use Counter from the collections module in the standard library here, but it counts in a way identical to my previous post:
    python Code:
    from collections import Counter
     
     
    def get_letters(user_input):
        return [char.lower() for char in user_input if char.isalpha()]
     
     
    def is_subset(word,letters):
        word_dict = Counter(word)
        letter_dict = Counter(letters)
        for item in word_dict:
            if (item not in letter_dict) or (word_dict[item] > letter_dict[item]):
                return False
        return True
     
     
    def get_possible_words():
        letters = input("Enter your letters: ")
        letters = get_letters(letters)
        possible = set()
        with open("words.txt") as myfile:
            for word in myfile:
                word_parse = get_letters(word)
                if is_subset(word_parse,letters):
                    possible.add(word.strip())
        return possible
     
     
    if __name__ == "__main__":
        print(get_possible_words())

    -Mek
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    15
    Rep Power
    0
    Originally Posted by Mekire
    I couldn't actually get yours to perform correctly, but regardless it is a bit overcomplicated.

    Take a look at the following (python 3). I use Counter from the collections module in the standard library here, but it counts in a way identical to my previous post:
    python Code:
    from collections import Counter
     
     
    def get_letters(user_input):
        return [char.lower() for char in user_input if char.isalpha()]
     
     
    def is_subset(word,letters):
        word_dict = Counter(word)
        letter_dict = Counter(letters)
        for item in word_dict:
            if (item not in letter_dict) or (word_dict[item] > letter_dict[item]):
                return False
        return True
     
     
    def get_possible_words():
        letters = input("Enter your letters: ")
        letters = get_letters(letters)
        possible = set()
        with open("words.txt") as myfile:
            for word in myfile:
                word_parse = get_letters(word)
                if is_subset(word_parse,letters):
                    possible.add(word.strip())
        return possible
     
     
    if __name__ == "__main__":
        print(get_possible_words())

    -Mek
    That looks good. Not so good with comprehensions yet, but i'm gonna try this out and get back to you. Thanks

IMN logo majestic logo threadwatch logo seochat tools logo