#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    4
    Rep Power
    0

    Python code for DNA sequence


    Hi,

    Need code to find the following :

    Create a Python program that processes one or more FASTA-formatted files. For each sequence in the file(s), output:

    the sequence name, description, or identifier
    the sequence length
    the percentage of each nucleotide present in the sequence

    Sample file (fasta)

    Code:
    >1E95:A|PDBID|CHAIN|SEQUENCE
    GCGGCCAGCUCCAGGCCGCCAAACAAUAUGGAGCAC
    >2TPK:A|PDBID|CHAIN|SEQUENCE
    GCUGACCAGCUAUGAGGUCAUACAUCGUCAUAGCAC
    >1KXK:A|PDBID|CHAIN|SEQUENCE
    GUCUACCUAUCGGGCUAAGGAGCCGUAUGCGAUGAAAGUCGCACGUACGGUUCUAUGCCCGGGGGAAAAC



    Thanks,
    Priya
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,854
    Rep Power
    481
    if the variable x stored a string you could count all the A's in it using x.count('A') , at least so I've heard. Let's store that number in the variable a.
    a = x.count('A')
    And likewise for the others.
    The number of nucleotides would certainly be
    a+c+g+u
    and the percent of g would be
    100*float(g)/(a+c+g+u)
    because we'd need to avoid integer division.

    if you had an file open for input, stored in the variable inf you could read lines from it
    for line in inf:
    # do something with line

    if line.startswith('>') then you'd probably want to parse the line to find the name, rank, or serial number. line.split('"') might help!

    if not line.startswith('>') you'd perhaps treat it at a sequence.

    If you wanted to open a fasta file you could use the file type.
    inf = open('seq.fasta')


    Your turn. You write some code.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    4
    Rep Power
    0
    Hi,

    I wrote a code which reads through the lines .

    def parse_fasta (fasta_file):
    sequences = []
    description = ''
    sequence = ''
    for line in fasta_file:
    if line.startswith('>'):
    if sequence:
    sequences.append((description, sequence))
    description = ''
    sequence = ''
    description = line[1:].rstrip()
    else:
    sequence += line.rstrip()
    if sequence:
    sequences.append((description, sequence))
    return sequences

    Not sure how to add this and get sequrence name :

    with open('f.fasta') as fasta_file:
    for name, seq in parse_fasta (fasta_file):
    print(name, seq)

    need to pass file name instead of f.fasta??

    Need code for those 3 questions.

IMN logo majestic logo threadwatch logo seochat tools logo