Hi all,

I am a newbee in python and currently working on a matrix file processing and get stuck and desparately need help.

I notice that a somewhat similar problem is already been mentioned in the forum (in the post: Help needed in random sampling). Although my matrix (and problem) is bit different, I present it in the form of 1 and 0 (presence/absence) notion, as par as the previous problem in the hope that modification of the previous program might do the trick.

okk.. now to problem,

I have an large input matrix file as:

Group1 Group2 Group3 Group4 .............

First 1 0 1 1 .............

Second 0 0 1 1 .............

Third 1 0 1 0 .............

Forth 1 1 0 1 .............

.............

........

I want to randomly recombine each column (with values), for each possible random combinations.. but as terabyte of combinations will form, I want to restrict 500 combinations for each number.

After combining , I want shared_count, non_shared_count and total_count for each combination (explained below):

.................................................................................................... .................................................................

For example, in start, the combination of 2 (here 2 means 2 group combinations) will be:

Group1-Group2 Group1-Group3 Group1-Group4 Group2-Group3 ..... (upto 500 random com)

and calculations of shared, non_shared and total_count for each combination is as follows :

Group1-Group2, Group1-Group3 .... (upto 500 random com)

1-0 , 1-1 .............

0-0 , 0-1 .................

1-0 , 1-1 ........................

1-1 , 1-0 ..................

----------- ---------------

shared_count= 1, shared_count= 2 ..............

(shared count means total count of 1-1 in each column)

non_shared count=2, non_shared count=2.............

(count of 1-0/0-1 sharing)

total_count= 3 ,total_count= 4 ..................

(total_count=shared_count+non_shared count)

(notice that 0-0 count is rejected)

thus after combination of each number of groups (as number 2 here), three output files will generate..

output file 1: (shared_count_2.txt) contain shared_count result of 500 combinations.. eg..

(shared_count ) Group1-Group2 1

(shared_count ) Group1-Group3 2

......

output file 2: (non_shared_count_2.txt) contain non_shared_count result of 500 combinations.. eg..

(non_shared_count ) Group1-Group2 2

(non_shared_count ) Group1-Group3 2

......

output file 3: (total_count_2.txt) contain total_count result of 500 combinations.. eg..

(total_count ) Group1-Group2 3

(total_count ) Group1-Group3 4

......

.................................................................................................... ...................................................................

with the same input file , the combination of 3 (here 3 means 3 group combinations) will be (here 1-1-1 combinations are considered shared and 1-0-0,0-0-1,0-1-0 etc as non_shared; 0-0-0 count is rejected):

output file 1: (shared_count_3.txt) contain shared_count result of 500 combinations.. eg..

(shared_count ) Group1-Group2-Group3 0

(shared_count ) Group1-Group3-Group4 1

......

output file 2: (non_shared_count_3.txt) contain non_shared_count result of 500 combinations.. eg..

(non_shared_count ) Group1-Group2-Group3 4

(non_shared_count ) Group1-Group3-Group4 3

......

output file 3: (total_count_3.txt) contain total_count result of 500 combinations.. eg..

(total_count ) Group1-Group2-Group3 4

(total_count ) Group1-Group3-Group4 4

......

.................................................................................................... ...............................................................

and so on for 4, 5..... combinations....

.................................................................................................... ...............................................................

thus for each number combinations, there will three output files.. for (say, upto) 50 number combinations, there will be 50x3=150 output files...

Any kind of help for solving this problem is highly appreciated.. and thank you for your consideration...

Tweet This+ 1 thisPost To Linkedin