#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    12
    Rep Power
    0

    Handling duplicate items in list


    Hi,

    I have a txt file with a list of senators and their states. I wrote a program to input state and return senator as well as input senator and return state...

    I am compiling the list into a dict and then splitting the lines to input only the last name of the senator. Problem: There are two "Udall" senators with different states. "Udall" returns the state of only the second occurrence that name. I need ideas for a process that will return a prompt, input first name, and then return correct state. I have tried multiple approaches and nothing seems to work. I tried using sets, and that code you will see is still in there, but returns 50 chars. I see no pattern in the output of chars to give me a clue as to what it is doing there.

    Thanks

    code:

    def createList( filename ):
    # print( filename )
    senateInfo = {}
    try:
    info = open( filename, "r" )

    for line in info:
    # print( line )
    dataOnLine = line.split( "\t" )
    state = dataOnLine[ 0 ]
    senator = dataOnLine[ 1 ]

    if state in senateInfo: # Adding another senator.
    # Create a list of the both senators from that state.
    incumbent = senateInfo[state]
    senators = [ incumbent, senator ]
    senateInfo[state] = senators

    else:
    senateInfo[state] = senator

    #print( senateInfo )

    info.close()
    except:
    print( filename, " did not open! qUITTING." )
    return senateInfo

    def createList2(filename):

    List = []
    senateInfo2 = {}

    info = open( filename, "r" )

    for line in info:

    dataOnLine = line.split( "\t" )
    state = dataOnLine[ 0 ]
    senator = dataOnLine[ 1 ]

    nameSplit = dataOnLine[ 1 ].split(" ")





    if len(nameSplit) == 3:
    lastName = nameSplit[1]


    elif len(nameSplit) == 4:
    lastName = nameSplit[2]

    already_seen = set()

    for name in lastName:
    if name in already_seen:
    print("Already seen", name)
    else:
    already_seen.add(name)







    senateInfo2[lastName] = state




    info.close()

    return senateInfo2

    def test( state, senatorsInfo ):
    print( senatorsInfo[state] )

    def test2( senator, usSenators ):
    print( usSenators[senator] )

    def main():
    usSenators = createList( "USSenators.txt" )
    usSenators2 = createList2( "USSenators.txt" )
    test( "Texas", usSenators )
    test2("Udall", usSenators2 )


    main()

    OUTPUT:

    Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> ================================ RESTART ================================
    >>>
    Already seen s
    Already seen s
    Already seen k
    Already seen r
    Already seen o
    Already seen e
    Already seen i
    Already seen n
    Already seen l
    Already seen n
    Already seen e
    Already seen l
    Already seen r
    Already seen o
    Already seen s
    Already seen s
    Already seen o
    Already seen n
    Already seen l
    Already seen s
    Already seen n
    Already seen l
    Already seen t
    Already seen l
    Already seen k
    Already seen i
    Already seen r
    Already seen n
    Already seen l
    Already seen u
    Already seen e
    Already seen n
    Already seen l
    Already seen e
    Already seen h
    Already seen e
    Already seen t
    Already seen e
    Already seen e
    Already seen n
    Already seen e
    Already seen l
    Already seen i
    Already seen l
    Already seen i
    Already seen r
    Already seen a
    Already seen e
    Already seen e
    Already seen o
    Already seen e
    Already seen h
    Already seen e
    Already seen a
    Already seen t
    Already seen o
    Already seen n
    Already seen e
    Already seen r
    Already seen n
    Already seen e
    Already seen r
    Already seen r
    Already seen l
    Already seen e
    Already seen l
    Already seen e
    Already seen n
    Already seen o
    Already seen n
    Already seen r
    Already seen a
    Already seen s
    ['John Cornyn (R)', 'Ted Cruz (R)']
    New Mexico
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480

    Meet your friend, collections.defaultdict


    You could prepare the data all in the same function.
    Code:
    #untested.  I didn't invent an appropriate data file.
    import collections
    
    def load_data_into_dictionaries(filename):
        senators_by_state = collections.defaultdict(list)
        states_by_senator = collections.defaultdict(list)
        try:
            with open( filename, "r" ) as info:
                for line in info:
                    dataOnLine = line.split( "\t" )
                    state = dataOnLine[ 0 ]
                    senator = dataOnLine[ 1 ]
                    senators_by_state[state].append(senator)
                    states_by_senator[senator].append(state)
        except:
            print( filename, " did not open! qUITTING." )
        return senators_by_state, states_by_senator
    
    def test( state, senatorsInfo ):
        print( senatorsInfo[state] )
    
    def test2( senator, usSenators ):
        print( usSenators[senator] )
    
    def main():
        (senators_by_state, states_by_senator,) = load_data_into_dictionaries( "USSenators.txt" )
        test( "Texas", senators_by_state )
        test2("Udall", states_by_senator )
    
    main()
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480

    explanation


    Your function displays duplicated letters per last name. See comments.
    Code:
    def createList2(filename):
        info = open( filename, "r" )
        for line in info:
            if len(nameSplit) == 3:
                lastName = nameSplit[1]
            elif len(nameSplit) == 4:
                lastName = nameSplit[2]
            else: # you need to always have a lastName
                lastName = senator # what if there aren't 3 or 4 fields??  Your program might work this year then fail next year
            already_seen = set()   # because of the position in program, already_seen is a new empty set for each line in info
            for name in lastName:  # name takes the values of each character in lastName as if you had written for example  for name in ['C', 'r', 'u', 'z']:
                if name in already_seen: # display duplicate letters within each last name
                    print("Already seen", name)
                else:
                    already_seen.add(name)
            senateInfo2[lastName] = state
        info.close()
        return senateInfo2
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    12
    Rep Power
    0
    Already_seen...

    This helped me a lot! I had already tried something similar using sets. Couldnt figure out why it was not a new list for each line. Thanks for the advice. It was very helpful! Will post updated code later when I can get to it.
  8. #5
  9. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    605
    Rep Power
    65
    Another option, first create the state_senator dictionary and then invert it with collision handling ...
    Code:
    ''' dict_invert_collision101.py
    invert a dictionary and handle colliding keys
    '''
    
    import pprint
    
    def invert_dict(d):
        """
        swap key:value dictionary pairs and take care of collisions
        """
        t = {}
        for k, v in d.items():
            t.setdefault(v, []).append(k)
        return t
    
    # a state:senator test dictionary
    state_senator = {
    'Arizona': 'Flake',
    'Colorado': 'Udall',
    'New Mexico': 'Udall',
    'California': 'Boxer',
    'Iowa': 'Harkin',
    'New York': 'Schumer'
    }
    
    pprint.pprint(state_senator)
    print('-'*20)
    
    senator_state = invert_dict(state_senator)
    
    pprint.pprint(senator_state)
    
    '''
    {'Arizona': 'Flake',
     'California': 'Boxer',
     'Colorado': 'Udall',
     'Iowa': 'Harkin',
     'New Mexico': 'Udall',
     'New York': 'Schumer'}
    --------------------
    {'Boxer': ['California'],
     'Flake': ['Arizona'],
     'Harkin': ['Iowa'],
     'Schumer': ['New York'],
     'Udall': ['Colorado', 'New Mexico']}
    '''
    Real Programmers always confuse Christmas and Halloween because Oct31 == Dec25
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    12
    Rep Power
    0

    error


    Originally Posted by Dietrich
    Another option, first create the state_senator dictionary and then invert it with collision handling ...
    Code:
    ''' dict_invert_collision101.py
    invert a dictionary and handle colliding keys
    '''
    
    import pprint
    
    def invert_dict(d):
        """
        swap key:value dictionary pairs and take care of collisions
        """
        t = {}
        for k, v in d.items():
            t.setdefault(v, []).append(k)
        return t
    
    # a state:senator test dictionary
    state_senator = {
    'Arizona': 'Flake',
    'Colorado': 'Udall',
    'New Mexico': 'Udall',
    'California': 'Boxer',
    'Iowa': 'Harkin',
    'New York': 'Schumer'
    }
    
    pprint.pprint(state_senator)
    print('-'*20)
    
    senator_state = invert_dict(state_senator)
    
    pprint.pprint(senator_state)
    
    '''
    {'Arizona': 'Flake',
     'California': 'Boxer',
     'Colorado': 'Udall',
     'Iowa': 'Harkin',
     'New Mexico': 'Udall',
     'New York': 'Schumer'}
    --------------------
    {'Boxer': ['California'],
     'Flake': ['Arizona'],
     'Harkin': ['Iowa'],
     'Schumer': ['New York'],
     'Udall': ['Colorado', 'New Mexico']}
    '''
    I am getting an error:

    File "C:\Users\byron\Desktop\senateRoster (3).py", line 35, in invert_dict
    senator_state.setdefault(v, []).append(k)
    TypeError: unhashable type: 'list'


    def createList( filename ):
    # print( filename )
    senateInfo = {}
    try:
    info = open( filename, "r" )

    for line in info:
    # print( line )
    dataOnLine = line.split( "\t" )
    state = dataOnLine[ 0 ]
    senator = dataOnLine[ 1 ]
    if state in senateInfo: # Adding another senator.
    # Create a list of the both senators from that state.
    incumbent = senateInfo[state]
    senators = [ incumbent, senator ]
    senateInfo[state] = senators
    else:
    senateInfo[state] = senator

    #print( senateInfo )

    info.close()
    except:
    print( filename, " did not open! qUITTING." )
    return senateInfo

    import pprint
    def invert_dict(d):
    """
    swap key:value dictionary pairs and take care of collisions
    """
    senateInfo = createList("USSenators.txt")
    senator_state = {}
    for k, v in senateInfo.items():
    senator_state.setdefault(v, []).append(k)
    print(senator_state)
    return senator_state


    def test( state, senatorsInfo ):
    print( senatorsInfo[state] )

    def main():
    usSenators = createList( "USSenators.txt" )
    senator_state = invert_dict(createList("USSenators.txt"))

    pprint.pprint(senator_state)

    test( "Texas", usSenators )
    test( "North Carolina", usSenators )
    test( "West Virginia", usSenators )
    test( "Colorado", usSenators )
    test( "Louisiana", usSenators )


    main()
  12. #7
  13. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    Concerning dictionary inversion lists being unhashable. Good point.

    Convert the lists as keys to frozen sets.


    key = frozenset(LIST)
    [code]Code tags[/code] are essential for python code and Makefiles!
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    12
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Concerning dictionary inversion lists being unhashable. Good point.

    Convert the lists as keys to frozen sets.


    key = frozenset(LIST)
    I actually fixed that problem already by removing brackets around the values for senators and incumbent. Is that what you meant?

    if state in senateInfo: # Adding another senator.
    # Create a list of the both senators from that state.
    incumbent = senateInfo[state]
    senators = incumbent, senator
    senateInfo[state] = senators
    else:
    senateInfo[state] = senator
  16. #9
  17. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    You used tuples instead of frozen sets. Tuples, since they are immutable, are also hashable. I prefer your solution.
    Code:
    >>> 'smith', 'jones', 'smith'     # tuple preserves duplicates
    ('smith', 'jones', 'smith')
    >>> 
    >>> 
    >>> 
    >>> frozenset(('smith', 'jones', 'smith'))  # frozenset removes duplicates
    frozenset(['jones', 'smith'])
    [code]Code tags[/code] are essential for python code and Makefiles!
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    12
    Rep Power
    0
    Originally Posted by xrlrider08
    I actually fixed that problem already by removing brackets around the values for senators and incumbent. Is that what you meant?

    if state in senateInfo: # Adding another senator.
    # Create a list of the both senators from that state.
    incumbent = senateInfo[state]
    senators = incumbent, senator
    senateInfo[state] = senators
    else:
    senateInfo[state] = senator
    Great, but... if you can't split a tuple of two, 3-part names to isolate and define the last name alone I still have a problem.
  20. #11
  21. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    You can't change a tuple.

    You can access items in tuples

    last_name = ('John Doh', '51st State of NYC')[0].split()[-1]
    last_name == 'Doh'

    and you can make new tuples.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo