#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2006
    Posts
    2
    Rep Power
    0

    Special character search


    Hi all,

    I was wondering if anyone could help me search for special characters from a document e.g /, *, ^.....
    So far I can only search letters or words.

    The code requires input from terminal e.g perl search <file1> <file2> ....
    It returns the ammount of times that string has been used, what line and prints that line.

    Here is my code so far:

    Code:
    #!/usr/bin/perl
    print "Please enter search string:";
    chomp($input=<STDIN>); 
    while ($n <= $#ARGV) {
    	$file = @ARGV[$n];
    	open(txt, $file);
    	print "\n$file contains:\n";
    	while($line = <txt>) {
    		$linenum++;
    		if ($line =~ (/$input/i)) {
    		print "Line:$linenum, $line";
    			while ($line =~ (/$input/g)) {
    				$found++;
    			}
    		}
    	}
    print "\nIt was found $found times.\n";
    $linenum = 0;
    $found = 0;
    $n++;
    }
    close(txt);

    Thanks alot.
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,251
    Rep Power
    1810
    You need to escape the $input variable in the regex so that any special characters are treated as a literal search instead of a regex directive. You do that with \Q and \E.

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    print "Please enter search string: ";
    my $input;
    chomp($input = <STDIN>);
    
    foreach my $file (@ARGV) {
    	my $found;
    	open my $fh, "<", $file or die "Unable to open $file: $!";
    
    	print "\n$file contains:\n";
    	while(<$fh>) {
    		if (/\Q$input\E/i) {
    			print "Line $.: $_";
    			$found++ while /\Q$input\E/g;
    		}
    	}
    	print "\n'$input'  found $found times.\n";
    }
    I've done a few other things here which could be helpful.

    First, use strict and warnings at the top. Very important. Always use them. Strict mode might be confusing at first, but it just means that you have to declare your variables. You do that my using the 'my' keyword the first time a variable is used in that scope. After that, perl will make sure you don't make typos or change the variable name later in the script.

    Check for failure when you open files. A user can easily enter a bad filename, or fail to provide the complete path.

    You don't have to use special variables to keep track of the line number or contents of the line in a file. $. is the line number. $_ is the line itself in this context, though it is perfectly fine to use your own variable name. If $line is more clear to you, it's good.

    An example of proper scope for variables: notice that my $found is declared inside the foreach loop. At the end of the loop I don't have to reset $found to zero. That is done automatically when the end of loop is reached. A new $found variable is created at the top of the loop next iteration. Same for the $fh (file handle).

    Comments on this post

    • Laurent_R agrees
    Last edited by keath; August 18th, 2012 at 08:12 AM.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    Hi,

    the following characters: "+ ? . * $ ( ) [ ] { } | \" have a special meaning in regular expressions and therefore need to be escaped (i.e. preceded by the escape character, "\") if you need to use their literal value. For example, if you are looking for the + character, your search should be for the string "\+". To look the the escape character ("\"), search the string "\\". Etc.

    Either your user will have to enter this escape character, or you can build a function that will rework the user's input to add this escape character before any character belonging to the list above.

    A couple of comments about your code. The $linenum variable is useless, the built-in $. special variable contains at any time the line number of the file being read. The most inner while loop seems useless to me. Unless I miss something, you could have just:

    Perl Code:
    	while($line = <txt>) {
    		if ($line =~ (/$input/i)) {
    			print "Line: $., $line";
    			$found++;
    		}
    	}


    The other thing is that the part:

    Perl Code:
    while ($n <= $#ARGV) {
    	$file = @ARGV[$n];
    	open(txt, $file);


    is not optimal. First, you should always check the return status of an "open" statement. Second, it would be better to use each of the values of @ARGV directly with a foreach statement, rather than using these somewhat clumsy $n and $#ARGV variables:

    Perl Code:
    foreach my $file  (@ARGV) {
    	open my $FILE_IN, "<", $file or die "could not open $file $! \n";


    EDIT: just when I was about to post this message, I was interrupted by a long phone call by someone from a charity asking for a donation. I had not seen Keath's answer when I made mine. And BTW, Keath, I did not know about these \Q and \E tags, it must be something new, I'll look it up, as it seems fairly handy.
    Last edited by Laurent_R; August 18th, 2012 at 08:38 AM.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,921
    Rep Power
    1225
    Hmm,

    Which of you three is going to get the grade for doing the homework assignment?

    :/
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2006
    Posts
    2
    Rep Power
    0
    I have modified it now and its working thanks to your help, and Fishmonger what are you talking about? This is just for practise, im not doing any sort of perl course.
  10. #6
  11. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,251
    Rep Power
    1810
    First time poster. Provides working code. Doesn't know how to escape regex.

    I don't see any abuse of the forum. Seems totally fair, and I'm happy to make the minor effort.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    Originally Posted by FishMonger
    Hmm,

    Which of you three is going to get the grade for doing the homework assignment?

    :/
    This may be homework (or maybe not), but slurch901 has done reasonable work to produce something that more or less works, so why not help her or him on these special characters? And why not giving a couple of advice to improve the code?

    @Keath: I have looked the \Q and the \E quote metas, which I did not know, thank you, this will be useful to me.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,921
    Rep Power
    1225
    I might be getting just a little cynical in my old age.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    BTW, as an additional comment on your code, the index function would probably better than a regex for what you are trying to do (looking for an exact match, not for a pattern): on the one hand, it is faster, and, on the other hand, it will not fail on special characters (at least most of them).

IMN logo majestic logo threadwatch logo seochat tools logo