### Thread: Help in matrix file processing and random sampling

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Feb 2013
Posts
20
Rep Power
0

#### Help in matrix file processing and random sampling

Hi all,
I am a newbee in python and currently working on a matrix file processing and get stuck and desparately need help.
I notice that a somewhat similar problem is already been mentioned in the forum (in the post: Help needed in random sampling). Although my matrix (and problem) is bit different, I present it in the form of 1 and 0 (presence/absence) notion, as par as the previous problem in the hope that modification of the previous program might do the trick.
okk.. now to problem,

I have an large input matrix file as:

Group1 Group2 Group3 Group4 .............
First 1 0 1 1 .............
Second 0 0 1 1 .............
Third 1 0 1 0 .............
Forth 1 1 0 1 .............
.............
........

I want to randomly recombine each column (with values), for each possible random combinations.. but as terabyte of combinations will form, I want to restrict 500 combinations for each number.

After combining , I want shared_count, non_shared_count and total_count for each combination (explained below):
.................................................................................................... .................................................................
For example, in start, the combination of 2 (here 2 means 2 group combinations) will be:

Group1-Group2 Group1-Group3 Group1-Group4 Group2-Group3 ..... (upto 500 random com)

and calculations of shared, non_shared and total_count for each combination is as follows :

Group1-Group2, Group1-Group3 .... (upto 500 random com)
1-0 , 1-1 .............
0-0 , 0-1 .................
1-0 , 1-1 ........................
1-1 , 1-0 ..................
----------- ---------------
shared_count= 1, shared_count= 2 ..............

(shared count means total count of 1-1 in each column)

non_shared count=2, non_shared count=2.............

(count of 1-0/0-1 sharing)

total_count= 3 ,total_count= 4 ..................

(total_count=shared_count+non_shared count)

(notice that 0-0 count is rejected)

thus after combination of each number of groups (as number 2 here), three output files will generate..

output file 1: (shared_count_2.txt) contain shared_count result of 500 combinations.. eg..

(shared_count ) Group1-Group2 1
(shared_count ) Group1-Group3 2
......

output file 2: (non_shared_count_2.txt) contain non_shared_count result of 500 combinations.. eg..

(non_shared_count ) Group1-Group2 2
(non_shared_count ) Group1-Group3 2
......

output file 3: (total_count_2.txt) contain total_count result of 500 combinations.. eg..

(total_count ) Group1-Group2 3
(total_count ) Group1-Group3 4
......

.................................................................................................... ...................................................................
with the same input file , the combination of 3 (here 3 means 3 group combinations) will be (here 1-1-1 combinations are considered shared and 1-0-0,0-0-1,0-1-0 etc as non_shared; 0-0-0 count is rejected):

output file 1: (shared_count_3.txt) contain shared_count result of 500 combinations.. eg..

(shared_count ) Group1-Group2-Group3 0
(shared_count ) Group1-Group3-Group4 1
......

output file 2: (non_shared_count_3.txt) contain non_shared_count result of 500 combinations.. eg..
(non_shared_count ) Group1-Group2-Group3 4
(non_shared_count ) Group1-Group3-Group4 3
......

output file 3: (total_count_3.txt) contain total_count result of 500 combinations.. eg..

(total_count ) Group1-Group2-Group3 4
(total_count ) Group1-Group3-Group4 4
......

.................................................................................................... ...............................................................
and so on for 4, 5..... combinations....
.................................................................................................... ...............................................................

thus for each number combinations, there will three output files.. for (say, upto) 50 number combinations, there will be 50x3=150 output files...

Any kind of help for solving this problem is highly appreciated.. and thank you for your consideration...
2. What has randomization got to do with this problem?
3. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Feb 2013
Posts
20
Rep Power
0

#### right

Errrr.. my bad...Actually combinations of columns are needed here (upto 500),, and, you are right,, randomization is not the proper term here..
4. Your "(count of 1-0/0-1 sharing)"

(1 ^ 0) == (0 ^ 1) == 1

numpy.logical_xor((1,0,1,0),(1,1,0,0)).sum()

which in J is the generalized dot product
(+/ . (2b0110 b.)) /|:#:i.4
sum DotProduct xor

Taking columns as row vectors,
shared_count = numpy.dot(A,B)

for example
numpy.dot((1,0,1,0),(1,1,0,0))
Last edited by b49P23TIvg; February 21st, 2013 at 10:29 AM.
5. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Feb 2013
Posts
20
Rep Power
0
???