#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    Atlantic City, NJ
    Posts
    327
    Rep Power
    13

    Sorting List Data


    Consider the following list:

    Code:
    files = ['agent-10-1117354150-4709.gsm', 'agent-10-1817354150-4709.gsm', 'agent-10-1114554150-4709.gsm',
             'agent-11-4567354150-4709.gsm', 'agent-11-1165454150-4709.gsm', 'agent-11-1458654150-4709.gsm',
             'agent-12-1117534550-4709.gsm', 'agent-12-1345674150-4709.gsm', 'agent-12-1116785450-4709.gsm',
             'agent-13-1132454350-4709.gsm', 'agent-13-1454324450-4709.gsm', 'agent-13-3454363150-4709.gsm',
             'agent-14-1134534150-4709.gsm', 'agent-14-3453354150-4709.gsm', 'agent-14-1113454550-4709.gsm',
             'agent-15-1767654150-4709.gsm', 'agent-15-1456456150-4709.gsm', 'agent-15-1456454150-4709.gsm',
             'agent-16-4621354150-4709.gsm', 'agent-16-1113453330-4709.gsm', 'agent-16-1114564450-4709.gsm',
             'agent-17-1523454150-4709.gsm', 'agent-17-1115454544-4709.gsm', 'agent-17-1117345640-4709.gsm',
             'agent-18-1153244440-4709.gsm', 'agent-18-1114545320-4709.gsm', 'agent-18-1135343150-4709.gsm',
             'agent-19-6324554335-4709.gsm', 'agent-19-1164545150-4709.gsm', 'agent-19-1163463350-4709.gsm',
             'agent-20-1115434555-4709.gsm', 'agent-20-1115443330-4709.gsm', 'agent-20-1766777450-4709.gsm',
             'agent-21-1115345454-4709.gsm', 'agent-21-1117354545-4709.gsm', 'agent-21-1114564550-4709.gsm',
             'agent-22-4534544150-4709.gsm', 'agent-22-1453354150-4709.gsm', 'agent-22-1117456350-4709.gsm',
             'agent-23-1134545150-4709.gsm', 'agent-23-4534454150-4709.gsm', 'agent-23-1134545150-4709.gsm',
             'agent-24-1455223330-4709.gsm', 'agent-24-1345454150-4709.gsm', 'agent-24-1116556560-4709.gsm',
             'agent-25-5634545440-4709.gsm', 'agent-25-3454224150-4709.gsm', 'agent-25-1117375656-4709.gsm',
             'agent-26-1145454634-4709.gsm', 'agent-26-1154545450-4709.gsm', 'agent-26-1254556345-4709.gsm',
             'agent-27-1116346666-4709.gsm', 'agent-27-1113456633-4709.gsm', 'agent-27-1113456777-4709.gsm',
             'agent-28-5845865485-4709.gsm', 'agent-28-1117345577-4709.gsm', 'agent-28-3434534322-4709.gsm',
             'agent-29-1485958757-4709.gsm', 'agent-29-1534518676-4709.gsm', 'agent-29-1463346740-4709.gsm',
             'agent-30-1958403048-4709.gsm', 'agent-30-1176574565-4709.gsm', 'agent-30-1444454150-4709.gsm',
             'agent-31-1587639490-4709.gsm', 'agent-31-1175745670-4709.gsm', 'agent-31-1116734550-4709.gsm',
             'agent-32-9847364857-4709.gsm', 'agent-32-1117674566-4709.gsm', 'agent-32-1543234421-4709.gsm',
             'agent-33-9564345857-4709.gsm', 'agent-33-1456456546-4709.gsm', 'agent-33-1654645221-4709.gsm',
             'agent-34-9847564565-4709.gsm', 'agent-34-1117776567-4709.gsm', 'agent-34-1144456354-4709.gsm',
             'agent-35-9456456456-4709.gsm', 'agent-35-7567765464-4709.gsm', 'agent-35-1656435421-4709.gsm',
             'agent-36-4345344857-4709.gsm', 'agent-36-1767347546-4709.gsm', 'agent-36-1654643321-4709.gsm',
             'agent-37-9645432237-4709.gsm', 'agent-37-1117856807-4709.gsm', 'agent-37-1145454644-4709.gsm',
             'agent-38-9865748940-4709.gsm', 'agent-38-0594736455-4709.gsm', 'agent-38-1564563255-4709.gsm',
             'agent-39-9847564645-4709.gsm', 'agent-39-1456456565-4709.gsm', 'agent-39-1165678896-4709.gsm',
             'agent-40-9834444857-4709.gsm', 'agent-40-1565345577-4709.gsm', 'agent-40-6563345781-4709.gsm',
             'agent-41-2424677347-4709.gsm', 'agent-41-6456734554-4709.gsm', 'agent-41-1145534524-4709.gsm',
             'agent-42-9845456367-4709.gsm', 'agent-42-1115434523-4709.gsm', 'agent-42-7674545671-4709.gsm',
             'agent-43-9345345345-4709.gsm', 'agent-43-1656564324-4709.gsm', 'agent-43-1146456345-4709.gsm']
    The first number after 'agent' refers to a specific employee. I need to group each employee's file into a nested list. So the output would look like this:

    Code:
    [['agent-10-1117354150-4709.gsm', 'agent-10-1817354150-4709.gsm', 'agent-10-1114554150-4709.gsm']
             ['agent-11-4567354150-4709.gsm', 'agent-11-1165454150-4709.gsm', 'agent-11-1458654150-4709.gsm']
             ['agent-12-1117534550-4709.gsm', 'agent-12-1345674150-4709.gsm', 'agent-12-1116785450-4709.gsm']
             ['agent-13-1132454350-4709.gsm', 'agent-13-1454324450-4709.gsm', 'agent-13-3454363150-4709.gsm']]
    Now the data is not always in a certain order and their is not always three files per employee. I some how have to parse the data and take everything that has the same employee number and put it into a nested list.

    I'm not sure how to implement this exactly. I think I need to use regular expressions because looking at the string modules nothing seems to be able to do this task.

    Any ideas?

    Thanks in advance.
    I'll learn this stuff someday.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    394
    Rep Power
    51
    Hi!

    Code:
    dic = {}
    for entry in files:
        number = entry.split('-')[1]
        if dic.has_key(number): dic[number].append(entry)
        else: dic[number] = [entry]
    We iterate through the files-array. At every entry we extract the number after "array" (by splitting the entry at the '-'). Now we look if the dictionary "dic" has this number as a key. If yes, we append the entry to the values of this key. If not, we create the value as an array including the entry.
    Ok, now you have a dictionary, the keys are the emploees numbers, the values are the corresponding files.
    Converting this into a nested array should be easy, so I leave this up to you

    Hope this helps.

    Regards, mawe
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    Atlantic City, NJ
    Posts
    327
    Rep Power
    13
    Thanks a lot. I never thought of using a dictionary. The purpose of seperating these out was to get one random file from each employee. So to finish this I did:

    Code:
    import random
    
    dic = {}
    for entry in files:
        number = entry.split('-')[1]
        if dic.has_key(number): dic[number].append(entry)
        else: dic[number] = [entry]
    
    fin_list = []
    for vals in dic.values():
        fin_list.append(random.choice(vals))
    Thanks again.
    I'll learn this stuff someday.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Originally Posted by mawe
    Code:
        if dic.has_key(number): dic[number].append(entry)
        else: dic[number] = [entry]
    You can do this in one line with the dictionary's setdefault method:

    Code:
        dic.setdefault(number, []).append(entry)
    This sets the entry to an empty list if it does not exist, then returns the current entry.

    Dave

IMN logo majestic logo threadwatch logo seochat tools logo