Thread: Hash references

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    19
    Rep Power
    0

    Hash references


    hey folks,
    i have two hashes and one array.
    hash1 consists of key1 and value1,value2
    hash2 consists of key2 and value2.

    now i have an array containing some elements which are also the keys(key2) in hash2(not all but few).

    key1 and value2 are same.

    i need value1 and value2 of hash1 based on those array elements which are keys for hash2.

    open my $file, '<', $knowngene;
    #$line = <$file>;
    while($line = <$file>) #alwaz remember to assign the file handle to a variable in while statement
    {
    chomp ($line);
    @splitarray = split('\s+',$line); #file is splitted with white space. tab doesnt work here.
    $knowngenehash{$splitarray[0]} = [$splitarray[3],$splitarray[4]];
    #print Dumper \%knowngenehash;
    }
    close $file;

    open my $file2, '<', $kgxref;
    #$line = <$file2>;
    while($line = <$file2>) #alwaz remember to assign the file handle to a variable in while statement
    {
    chomp ($line);
    @splitarray = split('\s+',$line); #file is splitted with white space. tab doesnt work here.
    $kgxrefhash{$splitarray[4]} = $splitarray[0];
    #print Dumper \%kgxrefhash;
    }
    close $file2;



    #my $file = "junk";
    open (file3, "< $geneset") or die "Can't open $geneset for read: $!";
    $line = <file3>;
    while ($line = <file3>)
    {

    push (@genesetarray,$line);
    chomp($line);

    }

    close file3 or die "Cannot close $file: $!";

    #print @genesetarray;

    my $genes = \@genesetarray;
    my $test = \%kgxrefhash;
    $finalhash1{$genes} = $kgxrefhash{genes};
    print Dumper \%finalhash1;
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,223
    Rep Power
    1809
    I'd rather see an example of the data file with an explanation of what you are trying to accomplish.

    Comments on this post

    • Laurent_R agrees : I agree, I would like to see a data sample.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    19
    Rep Power
    0
    Originally Posted by keath
    I'd rather see an example of the data file with an explanation of what you are trying to accomplish.
    sorry for the late reply.

    knowngene:

    uc001aaa.3 chr1 + 11873 14409 11873 11873 3 11873,12612,13220, 12227,12721,14409, uc001aaa.3
    uc010nxr.1 chr1 + 11873 14409 11873 11873 3 11873,12645,13220, 12227,12697,14409, uc010nxr.1
    uc002onf.3 chr1 + 11873 14409 12189 13639 3 11873,12594,13402, 12227,12721,14409, B7ZGX9 uc010nxq.1

    here i need the first fourth and fifth column(uc001aaa.3,11873,14409)

    kgxref:

    uc002onf.3 NM_001626 P31751 AKT2_HUMAN AKT2 NM_001626

    here 5th and first()
    geneset:

    ADCY9
    AKT1
    AKT2
    AKT3
    APOA1


    i need the output to be
    AKT2 11873 14409
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    781
    Rep Power
    495
    Sorry, I don't understand how you marry the uc001aaa.3 line of the first file with the uc002onf.3 line of the second file.

    I would make more sense to me if you joined this line of file 1:

    uc002onf.3 chr1 + 11873 14409 12189 13639 3 11873,12594,13402, 12227,12721,14409, B7ZGX9 uc010nxq.1

    with this line of file 2:

    uc002onf.3 NM_001626 P31751 AKT2_HUMAN AKT2 NM_001626

    Or did I miss your point?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    19
    Rep Power
    0
    Originally Posted by Laurent_R
    Sorry, I don't understand how you marry the uc001aaa.3 line of the first file with the uc002onf.3 line of the second file.

    I would make more sense to me if you joined this line of file 1:

    uc002onf.3 chr1 + 11873 14409 12189 13639 3 11873,12594,13402, 12227,12721,14409, B7ZGX9 uc010nxq.1

    with this line of file 2:

    uc002onf.3 NM_001626 P31751 AKT2_HUMAN AKT2 NM_001626

    Or did I miss your point?
    look at the data in knowngene(line 3 : uc002onf.3) tis id is present in kgxrefhash as value of AKT2. using AKT2 present in geneset, get the value uc002onf.3, which inturn will give me the value of knowngenehash (11873 14409) which are present in knowngenehash.
  10. #6
  11. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,223
    Rep Power
    1809
    Near as I can tell, you mean something like this:

    Code:
    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    open my $known_fh, '<', 'knowngene.txt' or die "Can't open known file: $!";
    my %known;
    while (<$known_fh>) {
    	chomp;
    	my @row = split /\t/;
    	warn "redefining data for $row[0]" if exists $known{$row[0]};
    	$known{$row[0]} = "$row[3]\t$row[4]";
    }
    close $known_fh;
    
    # ================
    
    open my $ref_fh, '<', 'kgxref.txt' or die "Can't open ref file: $!";
    while (<$ref_fh>) {
    	chomp;
    	my @row = split /\t/;
    	if (exists $known{$row[0]}) {
    		my $value = $known{$row[0]};
    		print "$row[4]\t$value\n";
    	} else {
    		warn "$row[0] was not found in the known geneset\n";
    	}
    }
    close $ref_fh;
    I don't understand the significance of 'geneset' which you included at the bottom of your post, unless that is a set of the only genes you are interested in.

    If so, you need to make a hash of that set as well, and only print in the second loop (over $ref_fh) if $row[4] exists in that hash.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    781
    Rep Power
    495
    Originally Posted by ap88
    look at the data in knowngene(line 3 : uc002onf.3) tis id is present in kgxrefhash as value of AKT2.
    OK, but in your original example, you underlined line 1:

    Originally Posted by ap88
    uc001aaa.3 chr1 + 11873 14409 11873 11873 3 11873,12612,13220, 12227,12721,14409, uc001aaa.3
    and not line 3, which is why I could not make sense of your data and of what you wanted to do with it.

    I guess that the solution proposed by Keath using the %known hash probably does what you need.

IMN logo majestic logo threadwatch logo seochat tools logo