Thread: Dna compress

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    21
    Rep Power
    0

    Dna compress


    Hi, can someone give me some tips how can I program it!

    I want to integrate the part of the programm in this function. start is my first node. thanks for any help.

    c Code:
    void compressed()
    {
        struct node *q;
        if(start == NULL)
        {
            printf ("\n\nList is empty");
            return;
        }
        q=start;
        printf("\n\nList is : ");
        while(q!=NULL)
        {
     
          if(q->info==q->link)
     
            printf ("%c", q->info);
            q=q->link;
        }
        printf ("\n");
     
    }


    DNA sequences can be very long. One way to (potentially) conserve space is to group together
    sequences of common characters. For example, ‘AAAGTTTACACCT’ could be compressed as
    ‘A3G1T3A1C1A1C2T1’.
    Write a function compress that inputs a linked list DNA sequence, and produces a new linked
    list, in which the items in the Nodes are lists of length 2, [x,y], where x is one of the strings
    ‘A’,’C’,’G’,’T’, and y is the number of times that character appears in consecutive positions
    in the sequence.
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,897
    Rep Power
    481
    Originally Posted by instructor
    Write a function compress
    compress is the function name I expect. Not compressed.

    Originally Posted by instructor
    that inputs a linked list DNA sequence,
    I expect compress to take an argument, the head of a singly linked list.

    Originally Posted by instructor
    and produces a new linked list,
    Function compress should return a linked list, be it an array or a pointer to some sort of complicated node. void is incorrect.

    Originally Posted by instructor
    in which the items in the Nodes are lists of length 2, [x,y], where x is one of the strings
    ‘A’,’C’,’G’,’T’, and y is the number of times that character appears in consecutive positions
    in the sequence.
    Admittedly confusing. Let's clarify "lists of length 2" as "members x and y, x is a character, y is an (perhaps unsigned) int."

    We need a new structure. I'd write (others know how to use typedef) (well, actually I'd use someone else's linked list, that of glib)
    Code:
    #define CB struct compressed_base
    CB {
      CB*next;
      char x; /* base is a better name, in my opinion */
      int y; /* I prefer repeats */
    };
    Now we know our function signature.
    Code:
    CB*compress(SEQUENCE*dna) {
      ;/* your code here */
    }
    Dreadfully different from your concept!

    Please read the documentation for a function you might find useful,
    malloc
    Last edited by b49P23TIvg; December 4th, 2012 at 07:55 PM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,392
    Rep Power
    1871
    AAAGTTTACACCT
    A3G1T3A1C1A1C2T1

    It's hardly "compression" when the result is longer than what you started with.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  6. #4
  7. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    A new and exciting homework problem. I guess the OP couldn't be bothered with searching the archives, eh?

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  8. #5
  9. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,549
    Rep Power
    2337
    They would have found their own question!

    I actually wrote this program for fun at that time. Biology fascinates me.

    Comments on this post

    • salem agrees : I thought it looked familiar - nice catch
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  10. #6
  11. I'm Baaaaaaack!
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Jul 2003
    Location
    Maryland
    Posts
    5,538
    Rep Power
    244
    You might be interested in the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov), they have a toolkit that does all sorts of biology related stuff, including compression of DNA sequences.

    My blog, The Fount of Useless Information http://sol-biotech.com/wordpress/
    Free code: http://sol-biotech.com/code/.
    Secure Programming: http://sol-biotech.com/code/SecProgFAQ.html.
    Performance Programming: http://sol-biotech.com/code/PerformanceProgramming.html.
    LinkedIn Profile: http://www.linkedin.com/in/keithoxenrider

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    21
    Rep Power
    0
    I would write the compress function like this, I have already counted my molecules and I want to compress it. I dont know how can I do it in array, but here I tried something. Below you will see my complete program.

    c Code:
    //This function will display all the element(s) in the linked list
    void compress()
    {
        struct node *q;
        if(start == NULL)
        {
            printf ("\n\nList is empty");
            return;
        }
        q=start;
        printf("\n\nList is : ");
        int A = 0, G =0, T =0, C = 0;
        double An = 0, Gn =0, Tn=0, Cn=0;
        while(q!=NULL)
        {
     
            if(q->info=='A'){A++;}
            if(q->info=='G'){G++;}
            if(q->info=='T'){T++;}
            if(q->info=='C'){C++;}
     
            q=q->link;
        }
        An =A;
        Gn =G;
        Tn =T;
        Cn =C;
        printf ("\n");
        printf ("%c, %d, %c, %d, %c, %d, %c, %d", 
        q->info, An, q->info, Gn, q->info, Tn, q->info, Cn, );
     
    }/*End of compress() */











    This is my complete programm!



    c Code:
    #include<stdio.h>
    #include<stdlib.h>
     
    //Structure declaration for the node
    struct node
    {
        char info;
        struct node *link;
    }*start;
     
     
    //This function will create a new linked list
    void Create_List(char data){
        struct node *q, *tmp;
        //Dynamic memory is been allocated for a node
        tmp= (struct node*)malloc(sizeof(struct node));
        tmp->info=data;
        tmp->link=NULL;
        if(start==NULL)
        /*If list is empty*/
            start=tmp;
        else{
            /*Element inserted at the end*/
            q=start;
            while(q->link!=NULL)
            if(q->info=='A'){A++;}
            if(q->info=='G'){G++;}
            if(q->info=='T'){T++;}
            if(q->info=='C'){C++;}
            q=q->link;
            q->link=tmp;
     
        }
    }
     
     
    //This function will display all the element(s) in the linked list
    void Display()
    {
        struct node *q;
        if(start == NULL)
        {
            printf ("\n\nList is empty");
            return;
        }
        q=start;
        printf("\n\nList is : ");
        while(q!=NULL)
        {
            printf ("%c", q->info);
            q=q->link;
        }
        printf ("\n");
     
    }/*End of display() */
     
    void Count(int inputMax, char comp)
    {
        struct node *q;
        if(start == NULL)
        {
            printf ("\n\nList is empty");
            return;
        }
        q=start;
        printf("\n\nList is : ");
        int A = 0, G =0, T =0, C = 0;
        double Afinal = 0, Gfinal =0, Tfinal=0, Cfinal=0;
        while(q!=NULL)
        {
            if(q->info=='A'){A++;}
            if(q->info=='G'){G++;}
            if(q->info=='T'){T++;}
            if(q->info=='C'){C++;}
            printf("%d \n",C);
            q=q->link;
     
     
        }
        printf ("\n");
     
        Afinal =A;
        Gfinal =G;
        Tfinal =T;
        Cfinal =C;
     
        Afinal =(Afinal/inputMax)*100;
        Gfinal =(Gfinal/inputMax)*100;
        Tfinal =(Tfinal/inputMax)*100;
        Cfinal =(Cfinal/inputMax)*100;
     
        printf("\nA: %lf", Afinal);
        printf("\nG: %lf", Gfinal);
        printf("\nT: %lf", Tfinal);
        printf("\nC: %lf", Cfinal);
     
    }/*End of count()*/
     
    int main()
    {
        unsigned choice,n,i, s=0;
        char m;
        do
        {
            printf("\n 1. Create List\n");
            printf("\n 2. Display \n");
            printf("\n 3. Count\n");
            printf("\nEnter your choice (0 to 3)\n");
            scanf("%d",&choice);
     
            switch (choice)
            {
                case 1:
                    printf ("\n\nHow many nodes you want:");
                    scanf ("%d",&n);
     
                    getchar();
                    s = s + n;
                    for(i = 0;i<n;i++){
                        printf ("\nEnter the element:");
                        scanf ("%c",&m);
                        getchar();
                        if(m == 'A'|| m == 'G'||m == 'T'||m == 'C')
                        {
                            Create_List(m);
                        }
                        else
                        {   printf("-----------------------------------");
                            printf("\nInvalid Element! Try again!\n");
                            printf("-----------------------------------");
                            i--;
                        }
     
                    }
                    break;
                case 2:
                    Display();
                    break;
                case 3:
                    Count(s, m);
                    break;
                default:
                    printf("-----------------------------------");
                    printf ("\nWRONG CHOICE!");
                    printf("-----------------------------------");
            }
        }
        while (choice !=0);
     
        return 0 ;
    }

IMN logo majestic logo threadwatch logo seochat tools logo