1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Apr 2013
Posts
2
Rep Power
0

I am new to python (and coding in general) and was hoping to get some help with setting up and initializing a matrix. I am in a bioinformatics class and am attempting to code the Needleman-Wunsch algorithm. It is a method of comparing two DNA sequences together to determine similarity.

Wikipedia and a basic google search will give a brief overview (sorry I am a new user and can not post URLs to summaries).

The algorithm involves a matix which is sequence length 1+1 by sequence length 2+1. You then fill out the matrix based on whether based on the sequence's positions.

I was hoping someone could show me some code on how to set up the matrix to begin with.
Thanks!
2. Using the wikipedia example,
Originally Posted by wikipedia
The pseudo-code for the algorithm to compute the F matrix therefore looks like this:
and assuming their index origin is 1 the code below computes F as (the matrix displays better in j, the leading underscore is the j negative sign)
Code:
```   _12[\F
0  _5 _10 _15 _20 _25 _30 _35 _40 _45 _50 _55
_5  _3  _6   0  _5 _10 _15 _20 _25 _30 _35 _40
_10  _8   4  _1  _5 _10 _15  _8 _13 _18 _23 _28
_15 _13  _1  14   9   4  _1  _6   2  _3  _8 _13
_20  _6  _6   9   9   4  _1  _6  _3  11   6   1
_25 _11  _9   4   4   4  _1  _4  _8   6   8  14
_30 _16 _12   1  _1  _1  _1  _2   6   1   5   9
_35 _21  _9  _4  _4  _6  _6   6   1   1   8   4
_40 _26 _14  _9  _9  _9 _11   1   2   1   3  16
_45 _31 _19 _14 _14 _14 _14  _4  _3   2  _2  11
_50 _36 _24  _9 _14 _19 _19  _9   6   1   1   6
_55 _41 _29 _14 _14 _19 _24 _14   1  15  10   5```

Code:
```import pprint
import collections

gap_penalty = -5

#values of s are determined by chemistry?
s = collections.defaultdict(lambda*args:gap_penalty)

s.update(dict(
AA=10,AG=-1,AC=-3,AT=-4,
GA=-1,GG= 7,GC=-5,GT=-3,
CA=-3,CG=-5,CC= 9,CT= 0,
TA=-4,TG=-3,TC= 0,TT= 8,
))

def compute_F(A, B, s):
F = []
#for i=0 to length(A)
#    F(i,0) ← d*i
for i in range(len(A)+1):
F.append([i * s['']])
#for j=0 to length(B)
#    F(0,j) ← d*j
for j in range(1,len(B)+1):
F[0].append(j * s[''])
#for i=1 to length(A)
#    for j=1 to length(B)
#    {
#        Match ← F(i-1,j-1) + S(Ai, Bj)
#        Delete ← F(i-1, j) + d
#        Insert ← F(i, j-1) + d
#        F(i,j) ← max(Match, Insert, Delete)
#    }
for i in range(1,len(A)+1):
a = A[i-1]
for j in range(1,len(B)+1):
b = B[j-1]
match = F[i-1][j-1] + s[a+b]
delet = F[i-1][j] + s['']
inser = F[i][j-1] + s['']
F[i].append(max((match, inser, delet)))
return F

pprint.pprint(compute_F('AGACTAGTTAC', 'CGA---GACGT', s))```
Please use your experience and deep knowledge to figure out if F is correct. My indexing could easily be incorrect. Generating all the lists of correct length at the start of the algorithm might be a clearer programming style. Actually, if I had to use python for this project I'd go with numpy and preallocate

F = numpy.zeros((len(A)+1,len(B)+1))
Last edited by b49P23TIvg; April 17th, 2013 at 09:08 PM.