November 13th, 2012, 12:25 AM
Find common entries in first column and fetch whatever in fornt of it
I have 2 files
one is like this
Second is like this
I have to check if there if there is any entry common between first file and first column of second file then I have to fetch whatever is present in front of it from second file
so if CALR is common then output is
Please let me know perl scripting regarding to help one of my friend.
November 13th, 2012, 03:56 AM
The way to do it really depends of the size (in terms of number of lines)) of each file. Depending on whether one has many more lines than the other, the algorithm may differ.
In almost all cases, though, I think that the first thing to do with this type of problem is to read the first file line by line, chomp each line and store each line as a key in a hash (the associated values don't really matter, could be 1 for each hash entry.
Then, you read the second file line by line and for each line test it against each hash entry. The way to do it may differ depending on various factors pertaining to the data: relative size of the files, size of each line in the second file, volume of data (i.e. do you want to optimize for code simplicity or for speed and performance), etc. and also the Perl version you are using. You could use:
- Regular expressions to find a match and capture whatever is before the match in the line
- Index and substr function
- Possibly the smart match (if your Perl version allows it)
Another possible approach may be to use the List::Utils (and/or possibly List::More::Utils)) modules to compare the list of words in the first file and the list of words in each line of the second file.
November 13th, 2012, 04:07 AM
Thanks for reply.
Yes, the second file is larger but follow the same pattern as sample presented here.
But first file is small and this much only which I presented
Initially I tried one code in unix which worked for very small sample to certain extent but not for large original data so
here is the code in shell which I tried:
Now taking help in from perl!
November 13th, 2012, 09:37 AM
What have you tried?
How big are the files?
Is there a possibility of one or both of the files having duplicate entries in the first column?