#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    232
    Rep Power
    1

    Need Help With Genome Path Program


    Hello,

    So just for kicks, since I'm following a Bioinformatics course on Coursera.org, I decided to make a Genome Path program that will take in k-mers or 3 nucleotide codons of a made up genome, and compare the suffix of the first k-mer sent to the function and the prefix of the second k-mer sent to the function, and see if they match.

    The idea is that if the suffix of the first k-mer matches the prefix of the second k-mer, then we can possibly find out the genome of a particular organism, since you're really moving 1 nucleotide at a time at the end of each kmer.

    Here's my program, any idea on how to get the n+1 k-mer in the for loop on line 41?:

    Code:
    # Create a path genome program that will analyze the suffix of the first
    # k-mer and look to see if it matches the prefix of the second k-mer
    
    # Make a function that will take in each k-mer and analyze their
    # prefixes and suffixes:
    
    def GenomePath(kmer1, kmer2):
        # Find the suffix of k_mer1:
        # Since they are three nucleotide codons, index into kmer_1 and
        # find the '2nd' character aka in reality, its 3rd but you know how
        # computers count!
        suffixKmer1 = kmer1[2]
        prefixKmer2 = kmer2[0]
    
        if suffixKmer1 == prefixKmer2:
            print("We have a match!")
            print("Proceed to the next kmer!")
    
    # Call the function you just made:
    kmer1 = "GTCC"
    kmer2 = "CTAG"
    
    GenomePath(kmer1, kmer2)
    
    MadeupGenome = ["GTCC", "CTAG", "GATC", "CATG"]
    
    # Make a list of k-mers and loop with a for loop to call the function
    # (Make the k-mers match arbitrarily to make sure the function works)
    for kmer in MadeupGenome:
        GenomePath(kmer,kmer+1)
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    Unambiguously "first" and index 0 in index origin 0 mean the same.

    Otherwise, perhaps you'd show input and expected output---in other words, test cases.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    232
    Rep Power
    1
    Yeah, after looking at the code, I realized I should have used i instead of k-mer, and also realized I gave 4 letter sequences when in reality, codons are in pairs of 3 nucleotides.

    Anyway, here's my fixed code so far with the relevant case I guess at the bottom in a comment (though they should all test positive for this function, its just writing the for loop to iterate each one is the only issue for me so far):
    Code:
    # Create a path genome program that will analyze the suffix of the first
    # k-mer and look to see if it matches the prefix of the second k-mer
    
    # Make a function that will take in each k-mer and analyze their
    # prefixes and suffixes:
    
    def GenomePath(kmer1, kmer2):
        # Find the suffix of k_mer1:
        # Since they are three nucleotide codons, index into kmer_1 and
        # find the '2nd' character aka in reality, its 3rd but you know how
        # computers count!
    
        # Test to see if proper kmers were given to the function:
        print("kmer1 =", kmer1)
        print("kmer2 =", kmer2)
        suffixKmer1 = kmer1[2]
        prefixKmer2 = kmer2[0]
    
        if suffixKmer1 == prefixKmer2:
            print("We have a match!")
            print("Proceed to the next kmer!")
    
    # Call the function you just made:
    kmer1 = "GTC"
    kmer2 = "CTA"
    
    GenomePath(kmer1, kmer2)
    
    MadeupGenome = ["GTC", "CTG", "GAT", "TGC"]
    
    # Make a list of k-mers and loop with a for loop to call the function
    # (Make the k-mers match arbitrarily to make sure the function works)
    # for i in MadeupGenome:
    #     GenomePath(MadeupGenome[i], MadeupGenome[i+1])
    
    
    # Example call will iterate through each of these
    
    # Each of these cases should work since the suffix for each matches the prefix
    # of the next codon
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480

    no communication


    See http://forums.devshed.com/python-pro...ws-957067.html, a recent post which exemplifies "this is the input" and "this is the expected output".
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo