Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |
 User Name: Password: Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

 Dev Shed Forums Sponsor:
#1
March 1st, 2013, 11:21 AM
 rdkll2k
Registered User

Join Date: Feb 2013
Posts: 3
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Looking for a Better Clustering Algorithm

I have an issue that I've worked out in Python, but to me seems to be more complex than it should be and I'm hoping someone could help me devise a more efficient process.

I have a large set of data in the following format (diff, num1, num2):

SAMPLE:
[[1,5,6]
[1,6,7]
[1,9,10]
[15,15,30]
[16,15,31]
[16,30,46]
[20,42,62]
[20,62,82]
[20,70,90]
[20,82,102]
[32,90,122]
...]

I have those stored in a large python list and need to produce the following results:

{(5,6,7):1,(42,62,82,102):20...}

So I loop through my list of lists, which is sorted by the diff (which is the difference between num1 and num2) and I'm looking for chains where the difference between multiple series of numbers are the same and the first pairs num2 == the second pairs num1 (ie, using the sample above, with a diff of 1, 5,6 is the first pair and 6,7 is the second pair so the result is the tuple 5,6,7 which is the key to a dictionary with a value of 1). The pair 9,10 doesn't work because the chain is broken as there is no 8,9.

The algorithm I've devised is similar to this:

Code:
```list1 = [[1,5,6],[1,6,7],[1,9,10],[15,15,30],[16,15,31],[16,30,46],[20,42,62],[20,62,82],[20,70,90],[20,82,102],[32,90,122]]

allCs = {}

for lCount in xrange(list1):
strC = str(list1[lCount][0]) #used to keep track of the chain being created
diff = list1[lCount][0]
numCs = 0

numCs, chains = processL(diff, lCount, numCs, strC)

if numCs > 0:
allCs[chains] = diff

print allCs

def processL(diff, sCount, numCs, strC):

for lCount in xrange(sCount + 1, len(list1)):
tDiff = list1[lCount][0] #used to keep track of the current lists diff

if tDiff == diff:
if list1[lCount][1] == list1[sCount][2]:
strC += ',' + str(list1[lCount][1])

numCs += 1

numCs, chains = processL(diff, lCount, numCs, strC)

if chains == strC:
chains += ',' + str(list1[lCount][2])

break
```

Any thoughts would be appreciated.

#2
March 1st, 2013, 07:08 PM
 b49P23TIvg
Contributing User

Join Date: Aug 2011
Posts: 3,389
Time spent in forums: 1 Month 2 Weeks 3 Days 14 h 26 m 2 sec
Reputation Power: 383
I take it your program doesn't work.

You need to define processL before you use it.
processL doesn't have a return statement, therefor it returns None . You need it to return two values.

Why are you using a string? Something's amiss, you want your output to look like
{(5,6,7):1,(42,62,82,102):20...}

Yet if I print a dictionary that has a key with type str you'll see quotation marks of some sort:

>>> print dict(a=4)
{'a': 4}

I think you really want the key to be a tuple of numbers. But that also displays a little bit differently than you show---see the spaces following comma?
>>> print {(1,2,3):1}
{(1, 2, 3): 1}

---------------
Now, how will we make this work?
Sorting by difference is a great first step.

Code:
```def process(L):
'''
return a list of all chains
'''
keys = []
while L:
# I is a list of indexes in L forming a chain
I = [0]                          # start at beginning of L

# warning: if you expect len(L) to exceed 6 or so use a binary search
# see the bisect module
# The data is sorted and you can calculate the next value to find.
for i in range(1,len(L)):          # search for the next value
if L[I[-1]][2] == L[i][1]:
I.append(i)

key = L[I[0]][1:2]+[L[i][2] for i in I]

keys.append(key)

# remove the used values from L
for i in reversed(I):
del L[i]

return keys

def main(data=[[1,5,6],[1,6,7],[1,9,10],[15,15,30],[16,15,31],[16,30,46],[20,42,62],[20,62,82],[20,70,90],[20,82,102],[32,90,122]]):
allCs = {}
i = 0
while i < len(data):                  # look at all the data
difference = data[i][0]           # get the next difference
j = i
while (j < len(data)) and (data[j][0] == difference): # find span with this same difference
j += 1
allCs.update({tuple(key) : difference for key in process(data[i:j])}) # return lists of valid subgroups
i = j
return allCs

print(main())```
Then if you want to filter against keys of length 2 that's a separate step, or if you want to keep only the longest key per difference that again is another processing step.
__________________
[code]Code tags[/code] are essential for python code!

#3
March 4th, 2013, 08:08 AM
 rdkll2k
Registered User

Join Date: Feb 2013
Posts: 3
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Quote:
 Originally Posted by b49P23TIvg I take it your program doesn't work.

Thanks for the help! This looks so much simpler than what I devised. I'll take a look today and see if I can implement your solution.

You are absolutely correct that what I posted doesn't work. What I'm really doing is a bit more complex than what I posted here. I simply wanted to give a down and dirty sample of the code that I felt was a bit more cumbersome than necessary. I should have made sure that it worked so we weren't focusing on the wrong stuff, but it appears you addressed the sample code bugs and the actual issue at the same time!

Thanks again!

#4
March 8th, 2013, 08:37 AM
 rdkll2k
Registered User

Join Date: Feb 2013
Posts: 3
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Thanks for the help! The solution you provided worked exactly as I requested.

Unfortunately, I couldn't use it as written because of the complexity of things not mentioned in my initial question, but I was able to use your solution as a basis for what I ultimately needed.

Thanks!

 Viewing: Dev Shed Forums > Programming Languages > Python Programming > Looking for a Better Clustering Algorithm

## Developer Shed Advertisers and Affiliates

 Thread Tools Search this Thread Search this Thread: Advanced Search Display Modes Rate This Thread Linear Mode Rate This Thread: 5 : Excellent 4 : Good 3 : Average 2 : Bad 1 : Terrible

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts vB code is On Smilies are On [IMG] code is On HTML code is Off
 View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox Forum Jump Please select one User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home -------------------- Programming Languages    PHP Development        PHP FAQs and Stickies    Perl Programming        Perl FAQs and Stickies    C Programming        C Programming FAQs and Stickies    Java Help        Java FAQs    Python Programming        Python Programming FAQs    Ruby Programming        Ruby Programming FAQs    Game Development        Game Development FAQs Programming Languages - More    ASP Programming        ASP Programming FAQs    .Net Development        .Net Development FAQs    Visual Basic Programming        Visual Basic Programming FAQs    Software Design        Software Design FAQs    ColdFusion Development        ColdFusion Development FAQs    Delphi Programming        Delphi Programming FAQs    Regex Programming        Regex Programming FAQs    XML Programming        XML Programming FAQs    Other Programming Languages        Other Programming Languages FAQs Web Design    HTML Programming        HTML Programming FAQs    JavaScript Development        JavaScript Development FAQs    CSS Help        CSS Help FAQs    Flash Help        Flash Help FAQs    Photoshop Help        Photoshop Help FAQs    Web Design Help        Web Design Help FAQs    Website Critiques        Website Critiques FAQs    Search Engine Optimization        Search Engine Optimization FAQs Mobile Programming    Mobile Programming        Mobile Programming FAQs    iPhone SDK Development        iPhone SDK Development FAQs    Android Development        Android Development FAQs    BlackBerry Development        BlackBerry Development FAQs Web Site Management    Business Help        Business Help FAQs    Development Software        Development Software FAQs    Scripts        Scripts FAQs Databases    Database Management        Database Management FAQs    DB2 Development        DB2 Development FAQs    MySQL Help        MySQL Help FAQs    PostgreSQL Help        PostgreSQL Help FAQs    Firebird SQL Development        Firebird SQL Development FAQs    MS SQL Development        MS SQL Development FAQs    Oracle Development        Oracle Development FAQs    LDAP Programming        LDAP Programming FAQs System Administration    Mail Server Help        Mail Server Help FAQs    Apache Development        Apache Development FAQs    Security and Cryptography        Security and Cryptography FAQs    Antivirus Protection        Antivirus Protection FAQs    DNS        DNS FAQs    IIS        IIS FAQs    Networking Help        Networking Help FAQs    FTP Help        FTP Help FAQs Operating Systems    BSD Help        BSD Help FAQs    Linux Help        Linux Help FAQs    UNIX Help        UNIX Help FAQs    Windows Help        Windows Help FAQs    Mac Help        Mac Help FAQs Web Hosting    Web Hosting        Web Hosting FAQs    Free Web Hosting        Free Web Hosting FAQs    Web Hosting Requests        Web Hosting Requests FAQs    Web Hosting Offers        Web Hosting Offers FAQs Computer Hardware    Computer Hardware    CPUs        CPUs FAQs    Cooling        Cooling FAQs    Embedded Programming        Embedded Programming FAQs    Motherboards        Motherboards FAQs    Multimedia Hardware        Multimedia Hardware FAQs Other    Dev Shed Lounge        Dev Shed Lounge FAQs    Development Articles        Development Articles FAQs    Beginner Programming        Beginner Programming FAQs    Hire A Programmer        Hire A Programmer FAQs    Project Help Wanted        Project Help Wanted FAQs Latest News Updated Hourly    Technology News    Business News    Science News Forum Information    Forum Rules/Guidelines        Forum Rules/Guidelines FAQs    Forum Announcements        Forum Announcements FAQs    Dev Shed Gaming Center        Go to the Dev Shed Battle Arena        Go to the Dev Shed Arcade Games        Go to the Legend of the Green Dragon    Suggestions & Feedback        Suggestions & Feedback FAQs

 Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |