Forums: » Register « |  Free Tools |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |

New Free Tools on Dev Shed!

#1
September 12th, 2013, 03:31 PM
 rajivn786
Registered User

Join Date: Sep 2013
Posts: 4
Time spent in forums: 1 h 13 m 57 sec
Reputation Power: 0
Python code for DNA sequence

Hi,

Need code to find the following :

Create a Python program that processes one or more FASTA-formatted files. For each sequence in the file(s), output:

the sequence name, description, or identifier
the sequence length
the percentage of each nucleotide present in the sequence

Sample file (fasta)

Code:
```>1E95:A|PDBID|CHAIN|SEQUENCE
GCGGCCAGCUCCAGGCCGCCAAACAAUAUGGAGCAC
>2TPK:A|PDBID|CHAIN|SEQUENCE
GCUGACCAGCUAUGAGGUCAUACAUCGUCAUAGCAC
>1KXK:A|PDBID|CHAIN|SEQUENCE
GUCUACCUAUCGGGCUAAGGAGCCGUAUGCGAUGAAAGUCGCACGUACGGUUCUAUGCCCGGGGGAAAAC```

Thanks,
Priya

#2
September 12th, 2013, 05:27 PM
 b49P23TIvg
Contributing User

Join Date: Aug 2011
Posts: 4,157
Time spent in forums: 1 Month 3 Weeks 2 Days 8 h 41 m 50 sec
Reputation Power: 455
if the variable x stored a string you could count all the A's in it using x.count('A') , at least so I've heard. Let's store that number in the variable a.
a = x.count('A')
And likewise for the others.
The number of nucleotides would certainly be
a+c+g+u
and the percent of g would be
100*float(g)/(a+c+g+u)
because we'd need to avoid integer division.

if you had an file open for input, stored in the variable inf you could read lines from it
for line in inf:
# do something with line

if line.startswith('>') then you'd probably want to parse the line to find the name, rank, or serial number. line.split('"') might help!

if not line.startswith('>') you'd perhaps treat it at a sequence.

If you wanted to open a fasta file you could use the file type.
inf = open('seq.fasta')

Your turn. You write some code.
__________________
[code]Code tags[/code] are essential for python code!

#3
September 12th, 2013, 06:20 PM
 rajivn786
Registered User

Join Date: Sep 2013
Posts: 4
Time spent in forums: 1 h 13 m 57 sec
Reputation Power: 0
Hi,

I wrote a code which reads through the lines .

def parse_fasta (fasta_file):
sequences = []
description = ''
sequence = ''
for line in fasta_file:
if line.startswith('>'):
if sequence:
sequences.append((description, sequence))
description = ''
sequence = ''
description = line[1:].rstrip()
else:
sequence += line.rstrip()
if sequence:
sequences.append((description, sequence))
return sequences

Not sure how to add this and get sequrence name :

with open('f.fasta') as fasta_file:
for name, seq in parse_fasta (fasta_file):
print(name, seq)

need to pass file name instead of f.fasta??

Need code for those 3 questions.

 Viewing: Dev Shed Forums > Programming Languages > Python Programming > Python code for DNA sequence