Page 2 of 2 First 12
  • Jump to page:
    #16
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    32
    Rep Power
    3
    PHP Code:
    def substring_len(fn):
        
    set()
        for 
    i in range(len(f)-n+1):
            if 
    not (f[i:i+nin l):
                
    l.add(f[i:i+n])
        return 
    l

    def BinaryfindLongest
    (proteome1,proteome2):
        
    proteome1 sorted(proteome1)
        
    proteome2 sorted(proteome2)
        
    Maximum 0
        Minimum 
    0
        Mid 
    0
        Closer 
    4
        Hold1 
    set()
        
    Hold2 set()
        
    set()
        
        
    ###finds the longest string
        
    if(len(proteome1[-1])>len(proteome2[-1])):
            
    Maximum len(proteome1[-1])
        else:
            
    Maximum len(proteome1[-1])

        
    ###tries to find common substrings of length = Mid
        
    while(Maximum-Minimum>Closer):
            
            
    Mid = (Maximum+Minimum)//2
            
    for i in proteome1:
                if(
    len(i)<Mid):
                    continue
                else:
                    
    Hold1 Hold1.union(substring_len(i,Mid))
                    
            for 
    j in proteome2:
                if(
    len(j)<Mid):
                    continue
                else:
                    
    Hold2 Hold2.union(substring_len(j,Mid))

        
    ###performs binary search according "matching" result
            
    if(not Hold1.isdisjoint(Hold2)):
                
    Minimum = (Mid)
            else:
                
    Maximum = (Mid)
                
            
    Hold1 set()
            
    Hold2 set()

        
    ###Continue... 

    This is the code that I have so far. I am using sets and a binary search. Basically, I sorted the lists from longest string to shortest. Then I calculated the longest string of both lists and set it to be the Maximum. (Minimum = 0 by default)

    Then I look for the common sub-string between the lists which is (Maximum+Minimum)/2 long. If such match is found, then I continue with an upper binary search, otherwise, bottom binary search till I get to a point where Maximum and Minimum are close enough, that's where I will do a final manual search.

    The problem is that I have no idea how to make a fast comparison of finding the common string of length "Mid" as described in the code.

    Directions?
  2. #17
  3. Contributing User

    Join Date
    Aug 2011
    Posts
    5,210
    Rep Power
    483
    Looks like we're solving different problems altogether.
    You have

    if(len(proteome1[-1])>len(proteome2[-1])):

    which might make sense if proteome[12] are lists of strings. I treated them as strings.

    You also sort proteome[12], again, which makes sense if these are lists of strings.

    However, you (probably) did not sort them by length. What is the class definition of the objects in proteome[12]?

    (In other words, what is proteome1[0] )?

    Also, you might want the standard bisect module.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #18
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    32
    Rep Power
    3
    PHP Code:
    Amino_Acids = ["A","C","D","E","F","G","H","I","K","L","M","N","P","Q","R","S","T","V","W","Y"]
    files = ["Vibrio_cholerae_B33.txt","Mycobacterium_leprae_Br4923.txt","Mycobacterium_tuberculosis_H37Ra.txt","Plasmodium_falciparum_malaria.txt","Salmonella_Typhi_AG3.txt"]
    def readProteome(filename):
        
    lst = []
        
    open(filename)
        
    line F.readline()
        
    line F.readline()
        
    PRO ""
        
    while(line!=""):
            for 
    ELEM in line[0:-1]:
                if 
    ELEM in Amino_Acids:
                    
    PRO += ELEM
            line 
    F.readline()
            if (
    line==""):
                
    lst.append(PRO)
                break
            if 
    line[0]==">":
                
    line F.readline()
                
    lst.append(PRO)
                
    PRO ""
        
    return lst

    def substring_len
    (fn):
        
    set()
        for 
    i in range(len(f)-n+1):
            if 
    not (f[i:i+nin l):
                
    l.add(f[i:i+n])
        return 
    l

    def BinaryfindLongest
    (proteome1,proteome2):
        
    proteome1.sort(key lambda slen(s))
        
    proteome2.sort(key lambda slen(s))
        
    Maximum 0
        Minimum 
    0
        Mid 
    0
        Closer 
    4
        H 
    set()
        
    ''.join(proteome1)
        
    ''.join(proteome2)
        
    found False
        count 
    len(proteome2)
        
        
    ###finds the longest string
        
    if(len(proteome1[-1])>len(proteome2[-1])):
            
    Maximum len(proteome1[-1])
        else:
            
    Maximum len(proteome2[-1])

        while(
    Maximum-Minimum>Closer):
            
    found False
            Mid 
    = (Maximum+Minimum)//2
            
    for i in proteome2:
                if(
    found):
                    break
                if(
    len(i)<Mid):
                    continue
                else:
                    
    substring_len(i,Mid)
                    for 
    k in H:
                        if 
    k in X:
                            
    Minimum = (Mid)
                            
    found True
                            
    break
            if(
    not found):
                
    Maximum = (Mid)

        print (
    Maximum,Minimum,Mid

    Whoa big mistake big mistake!!!...

    forget about that one code above, I posted the wrong one!
    Here it is, sorting the listing according to the length of strings. You can assume freely that all elements in the list are strings since I have another function called readProteome which makes sure it happens.

    Basically, we are back to the point. sets and binary search with two big lists containing a lot of strings. a good algorithm in my opinion but its taking too much time...
Page 2 of 2 First 12
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo