#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    2
    Rep Power
    0

    Matrix help please


    I am new to python (and coding in general) and was hoping to get some help with setting up and initializing a matrix. I am in a bioinformatics class and am attempting to code the Needleman-Wunsch algorithm. It is a method of comparing two DNA sequences together to determine similarity.

    Wikipedia and a basic google search will give a brief overview (sorry I am a new user and can not post URLs to summaries).

    The algorithm involves a matix which is sequence length 1+1 by sequence length 2+1. You then fill out the matrix based on whether based on the sequence's positions.

    I was hoping someone could show me some code on how to set up the matrix to begin with.
    Thanks!
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,890
    Rep Power
    481
    Using the wikipedia example,
    Originally Posted by wikipedia
    The pseudo-code for the algorithm to compute the F matrix therefore looks like this:
    and assuming their index origin is 1 the code below computes F as (the matrix displays better in j, the leading underscore is the j negative sign)
    Code:
       _12[\F
      0  _5 _10 _15 _20 _25 _30 _35 _40 _45 _50 _55
     _5  _3  _6   0  _5 _10 _15 _20 _25 _30 _35 _40
    _10  _8   4  _1  _5 _10 _15  _8 _13 _18 _23 _28
    _15 _13  _1  14   9   4  _1  _6   2  _3  _8 _13
    _20  _6  _6   9   9   4  _1  _6  _3  11   6   1
    _25 _11  _9   4   4   4  _1  _4  _8   6   8  14
    _30 _16 _12   1  _1  _1  _1  _2   6   1   5   9
    _35 _21  _9  _4  _4  _6  _6   6   1   1   8   4
    _40 _26 _14  _9  _9  _9 _11   1   2   1   3  16
    _45 _31 _19 _14 _14 _14 _14  _4  _3   2  _2  11
    _50 _36 _24  _9 _14 _19 _19  _9   6   1   1   6
    _55 _41 _29 _14 _14 _19 _24 _14   1  15  10   5

    Code:
    import pprint
    import collections
    
    gap_penalty = -5
    
    #values of s are determined by chemistry?
    s = collections.defaultdict(lambda*args:gap_penalty)
    
    s.update(dict(
        AA=10,AG=-1,AC=-3,AT=-4,
        GA=-1,GG= 7,GC=-5,GT=-3,
        CA=-3,CG=-5,CC= 9,CT= 0,
        TA=-4,TG=-3,TC= 0,TT= 8,
        ))
    
    def compute_F(A, B, s):
        F = []
        #for i=0 to length(A)
        #    F(i,0) ← d*i
        for i in range(len(A)+1):
            F.append([i * s['']])
        #for j=0 to length(B)
        #    F(0,j) ← d*j
        for j in range(1,len(B)+1):
            F[0].append(j * s[''])
        #for i=1 to length(A)
        #    for j=1 to length(B)
        #    {
        #        Match ← F(i-1,j-1) + S(Ai, Bj)
        #        Delete ← F(i-1, j) + d
        #        Insert ← F(i, j-1) + d
        #        F(i,j) ← max(Match, Insert, Delete)
        #    }
        for i in range(1,len(A)+1):
            a = A[i-1]
            for j in range(1,len(B)+1):
                b = B[j-1]
                match = F[i-1][j-1] + s[a+b]
                delet = F[i-1][j] + s['']
                inser = F[i][j-1] + s['']
                F[i].append(max((match, inser, delet)))
        return F
    
    pprint.pprint(compute_F('AGACTAGTTAC', 'CGA---GACGT', s))
    Please use your experience and deep knowledge to figure out if F is correct. My indexing could easily be incorrect. Generating all the lists of correct length at the start of the algorithm might be a clearer programming style. Actually, if I had to use python for this project I'd go with numpy and preallocate

    F = numpy.zeros((len(A)+1,len(B)+1))
    Last edited by b49P23TIvg; April 17th, 2013 at 09:08 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo