#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2014
    Posts
    3
    Rep Power
    0

    delete lines in text


    Hi
    I have a text file which looks like

    Code:
    PDK_30s1000201L003	1	1	Z	11684	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	18302	1398	d
    PDK_30s1000201L003	-1	1	5	18520	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	17134	1398	d
    PDK_30s1000201L003	-1	1	5	17217	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	15898	1398	d
    PDK_30s1000201L003	-1	1	5	17043	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	15813	1398	d
    PDK_30s1000201L003	-1	1	5	13063	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	12428	1398	d
    PDK_30s1000201L003	-1	1	5	12624	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	12124	1398	d
    PDK_30s1000201L003	-1	1	5	12131	20849	1398	d
    PDK_30s1000201L003	-1	1	P0	11624	12124	1398	d
    PDK_30s1000201L004	1	1	Z	25328	26858	1398	b
    PDK_30s1000201L004	-1	1	4	25416	26858	1398	b
    PDK_30s1000201L004	-1	1	5	25328	25469	1398	b
    PDK_30s1000201L004	-1	1	4	25751	26858	1398	b
    PDK_30s1000201L004	-1	1	5	25328	25828	1398	b
    PDK_30s1000201L004	-1	1	4	25856	26858	1398	b
    PDK_30s1000201L004	-1	1	F1	25856	26858	1398	b
    The colum1 represents id_name,The column2 represents model ,The column3 represents mode,The column4 represents media|
    The column5 ,6,7,8 represents start , end ,id and level.
    I have many entries of same id name belonging to same median number.However,I want to only retain the entries having the shortest start to end position and discard the remaining entries

    I want to keep only those lines having the minumum differnce between start and end.

    My result should look like
    Code:
    PDK_30s1000201L003	1	1	Z	11684	20849	1398	d
    PDK_30s1000201L003	-1	1	4	11684	12124	1398	d
    PDK_30s1000201L003	-1	1	5	18520	20849	1398	d
    PDK_30s1000201L003	-1	1	P0	11624	12124	1398	d
    PDK_30s1000201L004	1	1	Z	25328	26858	1398	b
    PDK_30s1000201L004	-1	1	4	25856	26858	1398	b
    PDK_30s1000201L004	-1	1	F1	25856	26858	1398	b
    PDK_30s1000201L004	-1	1	5	25328	25469	1398	b
    Can it be done by any simple command?

    (Newbie in Unix)
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,632
    Rep Power
    1811
    Cross-posted, from the Linux forum
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  4. #3
  5. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    867
    Rep Power
    391

    Thumbs down


    . . . lines having the minimum difference between start and end.
    You need to clarify this statement, the expected result(s) do not correspond to the minimum difference between start and end.
    Following lines have the minimum difference according to your data:
    Code:
    PDK_30s1000201L003      -1      1       4       11684   12124   1398    d
    PDK_30s1000201L004      -1      1       5       25328   25469   1398    b
  6. #4
  7. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    867
    Rep Power
    391
    Or try this:
    Code:
    ==> awk '{d=$6-$5; k=$1":"$4; if(d0[k]==0||d<d0[k]){l0[k]=$0;d0[k]=d;}}END{for(k in l0) print l0[k];}' infile.txt|sort
    PDK_30s1000201L003      -1      1       4       11684   12124   1398    d
    PDK_30s1000201L003      -1      1       5       18520   20849   1398    d
    PDK_30s1000201L003      -1      1       P0      11624   12124   1398    d
    PDK_30s1000201L003      1       1       Z       11684   20849   1398    d
    PDK_30s1000201L004      -1      1       4       25856   26858   1398    b
    PDK_30s1000201L004      -1      1       5       25328   25469   1398    b
    PDK_30s1000201L004      -1      1       F1      25856   26858   1398    b
    PDK_30s1000201L004      1       1       Z       25328   26858   1398    b

IMN logo majestic logo threadwatch logo seochat tools logo