The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Perl Programming
|
Special character search
Discuss Special character search in the Perl Programming forum on Dev Shed. Special character search Perl Programming forum discussing coding in Perl, utilizing Perl modules, and other Perl-related topics. Perl, the Practical Extraction and Reporting Language, is the choice for many for parsing textual information.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

August 18th, 2012, 07:23 AM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 2
Time spent in forums: 54 m 29 sec
Reputation Power: 0
|
|
|
Special character search
Hi all,
I was wondering if anyone could help me search for special characters from a document e.g /, *, ^.....
So far I can only search letters or words.
The code requires input from terminal e.g perl search <file1> <file2> ....
It returns the ammount of times that string has been used, what line and prints that line.
Here is my code so far:
Code:
#!/usr/bin/perl
print "Please enter search string:";
chomp($input=<STDIN>);
while ($n <= $#ARGV) {
$file = @ARGV[$n];
open(txt, $file);
print "\n$file contains:\n";
while($line = <txt>) {
$linenum++;
if ($line =~ (/$input/i)) {
print "Line:$linenum, $line";
while ($line =~ (/$input/g)) {
$found++;
}
}
}
print "\nIt was found $found times.\n";
$linenum = 0;
$found = 0;
$n++;
}
close(txt);
Thanks alot.
|

August 18th, 2012, 08:10 AM
|
 |
!~ /m$/
|
|
Join Date: May 2004
Location: Reno, NV
|
|
You need to escape the $input variable in the regex so that any special characters are treated as a literal search instead of a regex directive. You do that with \Q and \E.
Code:
#!/usr/bin/perl
use strict;
use warnings;
print "Please enter search string: ";
my $input;
chomp($input = <STDIN>);
foreach my $file (@ARGV) {
my $found;
open my $fh, "<", $file or die "Unable to open $file: $!";
print "\n$file contains:\n";
while(<$fh>) {
if (/\Q$input\E/i) {
print "Line $.: $_";
$found++ while /\Q$input\E/g;
}
}
print "\n'$input' found $found times.\n";
}
I've done a few other things here which could be helpful.
First, use strict and warnings at the top. Very important. Always use them. Strict mode might be confusing at first, but it just means that you have to declare your variables. You do that my using the 'my' keyword the first time a variable is used in that scope. After that, perl will make sure you don't make typos or change the variable name later in the script.
Check for failure when you open files. A user can easily enter a bad filename, or fail to provide the complete path.
You don't have to use special variables to keep track of the line number or contents of the line in a file. $. is the line number. $_ is the line itself in this context, though it is perfectly fine to use your own variable name. If $line is more clear to you, it's good.
An example of proper scope for variables: notice that my $found is declared inside the foreach loop. At the end of the loop I don't have to reset $found to zero. That is done automatically when the end of loop is reached. A new $found variable is created at the top of the loop next iteration. Same for the $fh (file handle).
Last edited by keath : August 18th, 2012 at 08:12 AM.
|

August 18th, 2012, 08:30 AM
|
|
|
Hi,
the following characters: "+ ? . * ˆ $ ( ) [ ] { } | \" have a special meaning in regular expressions and therefore need to be escaped (i.e. preceded by the escape character, "\") if you need to use their literal value. For example, if you are looking for the + character, your search should be for the string "\+". To look the the escape character ("\"), search the string "\\". Etc.
Either your user will have to enter this escape character, or you can build a function that will rework the user's input to add this escape character before any character belonging to the list above.
A couple of comments about your code. The $linenum variable is useless, the built-in $. special variable contains at any time the line number of the file being read. The most inner while loop seems useless to me. Unless I miss something, you could have just:
Perl Code:
Original
- Perl Code |
|
|
|
while($line = <txt>) { if ($line =~ (/$input/i)) { $found++; } }
The other thing is that the part:
Perl Code:
Original
- Perl Code |
|
|
|
while ($n <= $#ARGV) { $file = @ARGV[$n];
is not optimal. First, you should always check the return status of an "open" statement. Second, it would be better to use each of the values of @ARGV directly with a foreach statement, rather than using these somewhat clumsy $n and $#ARGV variables:
Perl Code:
Original
- Perl Code |
|
|
|
foreach my $file (@ARGV) { open my $FILE_IN, "<", $file or die "could not open $file $! \n";
EDIT: just when I was about to post this message, I was interrupted by a long phone call by someone from a charity asking for a donation. I had not seen Keath's answer when I made mine. And BTW, Keath, I did not know about these \Q and \E tags, it must be something new, I'll look it up, as it seems fairly handy.
Last edited by Laurent_R : August 18th, 2012 at 08:38 AM.
|

August 18th, 2012, 08:40 AM
|
|
|
|
Hmm,
Which of you three is going to get the grade for doing the homework assignment?
:/
|

August 18th, 2012, 09:10 AM
|
|
Registered User
|
|
Join Date: Jan 2006
Posts: 2
Time spent in forums: 54 m 29 sec
Reputation Power: 0
|
|
|
I have modified it now and its working thanks to your help, and Fishmonger what are you talking about? This is just for practise, im not doing any sort of perl course.
|

August 18th, 2012, 09:11 AM
|
 |
!~ /m$/
|
|
Join Date: May 2004
Location: Reno, NV
|
|
|
First time poster. Provides working code. Doesn't know how to escape regex.
I don't see any abuse of the forum. Seems totally fair, and I'm happy to make the minor effort.
|

August 18th, 2012, 09:17 AM
|
|
|
Quote: | Originally Posted by FishMonger Hmm,
Which of you three is going to get the grade for doing the homework assignment?
:/ |
This may be homework (or maybe not), but slurch901 has done reasonable work to produce something that more or less works, so why not help her or him on these special characters? And why not giving a couple of advice to improve the code?
@Keath: I have looked the \Q and the \E quote metas, which I did not know, thank you, this will be useful to me.
|

August 18th, 2012, 09:38 AM
|
|
|
|
I might be getting just a little cynical in my old age.
|

August 18th, 2012, 01:01 PM
|
|
|
|
BTW, as an additional comment on your code, the index function would probably better than a regex for what you are trying to do (looking for an exact match, not for a pattern): on the one hand, it is faster, and, on the other hand, it will not fail on special characters (at least most of them).
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|