
November 13th, 2012, 02:56 AM
|
|
|
|
Hi,
The way to do it really depends of the size (in terms of number of lines)) of each file. Depending on whether one has many more lines than the other, the algorithm may differ.
In almost all cases, though, I think that the first thing to do with this type of problem is to read the first file line by line, chomp each line and store each line as a key in a hash (the associated values don't really matter, could be 1 for each hash entry.
Then, you read the second file line by line and for each line test it against each hash entry. The way to do it may differ depending on various factors pertaining to the data: relative size of the files, size of each line in the second file, volume of data (i.e. do you want to optimize for code simplicity or for speed and performance), etc. and also the Perl version you are using. You could use:
- Regular expressions to find a match and capture whatever is before the match in the line
- Index and substr function
- Possibly the smart match (if your Perl version allows it)
Another possible approach may be to use the List::Utils (and/or possibly List::More::Utils)) modules to compare the list of words in the first file and the list of words in each line of the second file.
|