Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Closed Thread
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 1st, 2013, 11:21 AM
rdkll2k rdkll2k is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2013
Posts: 3 rdkll2k User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Looking for a Better Clustering Algorithm

I have an issue that I've worked out in Python, but to me seems to be more complex than it should be and I'm hoping someone could help me devise a more efficient process.

I have a large set of data in the following format (diff, num1, num2):

SAMPLE:
[[1,5,6]
[1,6,7]
[1,9,10]
[15,15,30]
[16,15,31]
[16,30,46]
[20,42,62]
[20,62,82]
[20,70,90]
[20,82,102]
[32,90,122]
...]


I have those stored in a large python list and need to produce the following results:

{(5,6,7):1,(42,62,82,102):20...}

So I loop through my list of lists, which is sorted by the diff (which is the difference between num1 and num2) and I'm looking for chains where the difference between multiple series of numbers are the same and the first pairs num2 == the second pairs num1 (ie, using the sample above, with a diff of 1, 5,6 is the first pair and 6,7 is the second pair so the result is the tuple 5,6,7 which is the key to a dictionary with a value of 1). The pair 9,10 doesn't work because the chain is broken as there is no 8,9.

The algorithm I've devised is similar to this:

Code:
list1 = [[1,5,6],[1,6,7],[1,9,10],[15,15,30],[16,15,31],[16,30,46],[20,42,62],[20,62,82],[20,70,90],[20,82,102],[32,90,122]]

allCs = {}

for lCount in xrange(list1):
	strC = str(list1[lCount][0]) #used to keep track of the chain being created
	diff = list1[lCount][0]
	numCs = 0

	numCs, chains = processL(diff, lCount, numCs, strC)

	if numCs > 0:
		allCs[chains] = diff


print allCs



def processL(diff, sCount, numCs, strC):
	

	for lCount in xrange(sCount + 1, len(list1)):
		tDiff = list1[lCount][0] #used to keep track of the current lists diff

		if tDiff == diff:
			if list1[lCount][1] == list1[sCount][2]:
				strC += ',' + str(list1[lCount][1])

				numCs += 1

				numCs, chains = processL(diff, lCount, numCs, strC)

				if chains == strC:
					chains += ',' + str(list1[lCount][2])

				break




Any thoughts would be appreciated.

Reply With Quote
  #2  
Old March 1st, 2013, 07:08 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is online now
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,389 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 14 h 26 m 2 sec
Reputation Power: 383
I take it your program doesn't work.

You need to define processL before you use it.
processL doesn't have a return statement, therefor it returns None . You need it to return two values.

Why are you using a string? Something's amiss, you want your output to look like
{(5,6,7):1,(42,62,82,102):20...}

Yet if I print a dictionary that has a key with type str you'll see quotation marks of some sort:

>>> print dict(a=4)
{'a': 4}

I think you really want the key to be a tuple of numbers. But that also displays a little bit differently than you show---see the spaces following comma?
>>> print {(1,2,3):1}
{(1, 2, 3): 1}

---------------
Now, how will we make this work?
Sorting by difference is a great first step.

Code:
def process(L):
    '''
        return a list of all chains
    '''
    keys = []
    while L:
        # I is a list of indexes in L forming a chain
        I = [0]                          # start at beginning of L

        # warning: if you expect len(L) to exceed 6 or so use a binary search
        # see the bisect module
        # The data is sorted and you can calculate the next value to find.
        for i in range(1,len(L)):          # search for the next value
            if L[I[-1]][2] == L[i][1]:
                I.append(i)

        key = L[I[0]][1:2]+[L[i][2] for i in I]

        keys.append(key)

        # remove the used values from L
        for i in reversed(I):
            del L[i]

    return keys

def main(data=[[1,5,6],[1,6,7],[1,9,10],[15,15,30],[16,15,31],[16,30,46],[20,42,62],[20,62,82],[20,70,90],[20,82,102],[32,90,122]]):
    allCs = {}
    i = 0
    while i < len(data):                  # look at all the data
        difference = data[i][0]           # get the next difference
        j = i
        while (j < len(data)) and (data[j][0] == difference): # find span with this same difference
            j += 1
        allCs.update({tuple(key) : difference for key in process(data[i:j])}) # return lists of valid subgroups
        i = j
    return allCs

print(main())
Then if you want to filter against keys of length 2 that's a separate step, or if you want to keep only the longest key per difference that again is another processing step.
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #3  
Old March 4th, 2013, 08:08 AM
rdkll2k rdkll2k is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2013
Posts: 3 rdkll2k User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Quote:
Originally Posted by b49P23TIvg
I take it your program doesn't work.


Thanks for the help! This looks so much simpler than what I devised. I'll take a look today and see if I can implement your solution.

You are absolutely correct that what I posted doesn't work. What I'm really doing is a bit more complex than what I posted here. I simply wanted to give a down and dirty sample of the code that I felt was a bit more cumbersome than necessary. I should have made sure that it worked so we weren't focusing on the wrong stuff, but it appears you addressed the sample code bugs and the actual issue at the same time!

Thanks again!

Reply With Quote
  #4  
Old March 8th, 2013, 08:37 AM
rdkll2k rdkll2k is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2013
Posts: 3 rdkll2k User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 30 m 2 sec
Reputation Power: 0
Thanks for the help! The solution you provided worked exactly as I requested.

Unfortunately, I couldn't use it as written because of the complexity of things not mentioned in my initial question, but I was able to use your solution as a basis for what I ultimately needed.

Thanks!

Reply With Quote
Closed Thread

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Looking for a Better Clustering Algorithm

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap