November 3rd, 2013, 08:52 AM
Need help with perl program.
i have file which contains data:
chr1 10 12
chr1 10 15
chr1 10 16
Output to be generated:
chr1 10 12 3
chr1 12 15 2
chr1 15 16 1
that means it calculates how many times the coverage that is 10-12 has appeared(overlap) in the file which is 3(in 10-12,10-15,10,16). sililarly, since 12-15 has appeared(overlap) in the coverage 10-15 and 10-16 so the count displayed should be 2.
November 3rd, 2013, 01:07 PM
Is your file big and are your ranges large?
I am asking because the simple solution that comes to my mind consists in listing all integers within the ranges and count their occurrence and, at the end, summarize the results. This is OK is the ranges are small to medium size, but not for very large ranges.
November 3rd, 2013, 01:19 PM
the range is not that big. you can have a look at it.
Originally Posted by Laurent_R
241525932 241526132(range is between 1000-2000)
I have created a hash with chr as key and the start and end of range as value. i am trying to figure out a way to get the overlap.
please feel free to ask for more detailed explanation.
November 3rd, 2013, 04:40 PM
OK, very interesting information, good that I asked. Given the size of your numbers, the first solution I was thinking of is gone, since an array indexed on your numbers is more or less excluded (it would probably blow up your memory). You might not realize, but when you give an example, it has to be somewhat realistic. You gave an example where the numbers are in the 10-20 range, and your actual numbers are in the 100 million range. The initial solution I was thinking of is thus not possible, not because of the range size (very manageable), but because your numbers themselves are very large.
Having said that, using an array is probably no longer possible with such large numbers (or, at best, very inefficient), but we can still use a hash, something somewhat perhaps slightly less practical in the context, but still quite easily workable in principle.
Can you provide a realistic example of your data before I come out with another solution that might also not be workable with real data?
November 3rd, 2013, 05:29 PM