Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #16  
Old December 30th, 2012, 10:30 AM
Nightmareix35 Nightmareix35 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 32 Nightmareix35 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 42 m 41 sec
Reputation Power: 1
PHP Code:
 def substring_len(fn):
    
set()
    for 
i in range(len(f)-n+1):
        if 
not (f[i:i+nin l):
            
l.add(f[i:i+n])
    return 
l

def BinaryfindLongest
(proteome1,proteome2):
    
proteome1 sorted(proteome1)
    
proteome2 sorted(proteome2)
    
Maximum 0
    Minimum 
0
    Mid 
0
    Closer 
4
    Hold1 
set()
    
Hold2 set()
    
set()
    
    
###finds the longest string
    
if(len(proteome1[-1])>len(proteome2[-1])):
        
Maximum len(proteome1[-1])
    else:
        
Maximum len(proteome1[-1])

    
###tries to find common substrings of length = Mid
    
while(Maximum-Minimum>Closer):
        
        
Mid = (Maximum+Minimum)//2
        
for i in proteome1:
            if(
len(i)<Mid):
                continue
            else:
                
Hold1 Hold1.union(substring_len(i,Mid))
                
        for 
j in proteome2:
            if(
len(j)<Mid):
                continue
            else:
                
Hold2 Hold2.union(substring_len(j,Mid))

    
###performs binary search according "matching" result
        
if(not Hold1.isdisjoint(Hold2)):
            
Minimum = (Mid)
        else:
            
Maximum = (Mid)
            
        
Hold1 set()
        
Hold2 set()

    
###Continue... 



This is the code that I have so far. I am using sets and a binary search. Basically, I sorted the lists from longest string to shortest. Then I calculated the longest string of both lists and set it to be the Maximum. (Minimum = 0 by default)

Then I look for the common sub-string between the lists which is (Maximum+Minimum)/2 long. If such match is found, then I continue with an upper binary search, otherwise, bottom binary search till I get to a point where Maximum and Minimum are close enough, that's where I will do a final manual search.

The problem is that I have no idea how to make a fast comparison of finding the common string of length "Mid" as described in the code.

Directions?

Reply With Quote
  #17  
Old December 30th, 2012, 02:22 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,350 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 7 h 38 m 45 sec
Reputation Power: 383
Looks like we're solving different problems altogether.
You have

if(len(proteome1[-1])>len(proteome2[-1])):

which might make sense if proteome[12] are lists of strings. I treated them as strings.

You also sort proteome[12], again, which makes sense if these are lists of strings.

However, you (probably) did not sort them by length. What is the class definition of the objects in proteome[12]?

(In other words, what is proteome1[0] )?

Also, you might want the standard bisect module.
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #18  
Old December 30th, 2012, 03:55 PM
Nightmareix35 Nightmareix35 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 32 Nightmareix35 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 42 m 41 sec
Reputation Power: 1
PHP Code:
 Amino_Acids = ["A","C","D","E","F","G","H","I","K","L","M","N","P","Q","R","S","T","V","W","Y"]
files = ["Vibrio_cholerae_B33.txt","Mycobacterium_leprae_Br4923.txt","Mycobacterium_tuberculosis_H37Ra.txt","Plasmodium_falciparum_malaria.txt","Salmonella_Typhi_AG3.txt"]
def readProteome(filename):
    
lst = []
    
open(filename)
    
line F.readline()
    
line F.readline()
    
PRO ""
    
while(line!=""):
        for 
ELEM in line[0:-1]:
            if 
ELEM in Amino_Acids:
                
PRO += ELEM
        line 
F.readline()
        if (
line==""):
            
lst.append(PRO)
            break
        if 
line[0]==">":
            
line F.readline()
            
lst.append(PRO)
            
PRO ""
    
return lst

def substring_len
(fn):
    
set()
    for 
i in range(len(f)-n+1):
        if 
not (f[i:i+nin l):
            
l.add(f[i:i+n])
    return 
l

def BinaryfindLongest
(proteome1,proteome2):
    
proteome1.sort(key lambda slen(s))
    
proteome2.sort(key lambda slen(s))
    
Maximum 0
    Minimum 
0
    Mid 
0
    Closer 
4
    H 
set()
    
''.join(proteome1)
    
''.join(proteome2)
    
found False
    count 
len(proteome2)
    
    
###finds the longest string
    
if(len(proteome1[-1])>len(proteome2[-1])):
        
Maximum len(proteome1[-1])
    else:
        
Maximum len(proteome2[-1])

    while(
Maximum-Minimum>Closer):
        
found False
        Mid 
= (Maximum+Minimum)//2
        
for i in proteome2:
            if(
found):
                break
            if(
len(i)<Mid):
                continue
            else:
                
substring_len(i,Mid)
                for 
k in H:
                    if 
k in X:
                        
Minimum = (Mid)
                        
found True
                        
break
        if(
not found):
            
Maximum = (Mid)

    print (
Maximum,Minimum,Mid



Whoa big mistake big mistake!!!...

forget about that one code above, I posted the wrong one!
Here it is, sorting the listing according to the length of strings. You can assume freely that all elements in the list are strings since I have another function called readProteome which makes sure it happens.

Basically, we are back to the point. sets and binary search with two big lists containing a lot of strings. a good algorithm in my opinion but its taking too much time...

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Longest common substring

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap