#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Posts
    63
    Rep Power
    2

    Request to check


    Hi all.

    Kindly check it it's urgent!!

    I have one big file from which which I have to fetch certain data

    I have attached a small part of this file.

    from the attached file, I have to fetch and arrange data in 3 columns

    1 Generic name 2. Brand names 3. Drug Target_gene name

    In the attached file this data is arranged like this along with so much other data mixed in between
    Quote:
    Drug Card I

    Generic name

    Brand names

    Drug_Target_ gene name

    End Drug card I


    Begin Drug card II

    Generic name

    Brand names

    Drug_Target_gene name
    Drug_Target_gene name
    Drug_Target_gene name
    Drug_Target_gene name...so on
    End Drug card II....

    in this way there are many drug cards.


    Please check the attached file and Kindly let me know scripting to perform this task.

    I want the output shuld be like this:

    Quote:
    Generic name Brand names Drug Target_gene name

    Lepirudin Refludan F2
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,704
    Rep Power
    480
    Learn and use awk
    The program is about trivial in awk.


    Please give a specific card with specific output and I'll write you a program. Here's an example of what you might want. Note that where you have plural I used multiple lines. Consider what you actually want. Prettier formatting is certainly possible.

    The input file drug
    Code:
    Drug Card I
    
    aspirin
    
    Bayer
    Eucalyptus tree bark
    
    Headache 2c
    Backpain 14-273
    
    End Drug card I
    The unix command line and output
    Code:
    $ gawk -f g.gawk drug 
    aspirin	Bayer	Headache 2c
    aspirin	Bayer	Backpain 14-273
    aspirin	Eucalyptus tree bark	Headache 2c
    aspirin	Eucalyptus tree bark	Backpain 14-273
    The program g.gawk
    Code:
    #! /usr/bin/gawk
    # Generic name Brand names Drug Target_gene name 
    accumulate {
        if  (/^[ \t]*$/) {
    	++accumulate
        } else {
    	if (2 == accumulate) {
    	    generic_name = $0
    	} else if (3 == accumulate) {
    	    brand_names[brand++] = $0
    	} else if (4 == accumulate) {
    	    target_gene[gene++] = $0
    	}
        }
    }
    /^End Drug/ {
        for (b = 0; b < brand; ++b) {
    	for (g = 0; g < gene; ++g) {
    	    printf"%s\t%s\t%s\n",generic_name,brand_names[b],target_gene[g]
    	}
        }
        brand = gene = accumulate = 0
    }
    /^Drug Card/ {
        accumulate = 1
    }

    If you had wanted all of the information on one line, what constitutes a column? gawk has useful variables you can set OFS and ORS that specify the output record separator and output field separator (clearly in the other order).
    Last edited by b49P23TIvg; July 27th, 2012 at 10:59 AM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,593
    Rep Power
    4207
    Please do not cross-post all over the place. Either pick the perl forum or the UNIX forum.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo

IMN logo majestic logo threadwatch logo seochat tools logo