#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    2
    Rep Power
    0

    Smile Regular expressions


    Hi everyone, I'm working on this example to extract some values from the lines which are provided as input from a text file. However, I'm encountering a strange result.

    =====> Inputs
    Monday I have an appointment with Alex at 12:40 PM.
    we are going to a party on Thursday at 10:30 PM.
    =====> Output
    2:40
    10:30


    ====> script
    #!/usr/bin/perl
    open ($fhandle1, "<", $ARGV[0]) or die $!;
    open ($fhandle2, ">", $ARGV[1]) or die $!;
    while (<$fhandle1>)
    {
    $time=$_;
    $time=~ m/.*(([0]?[1-9]|[1-1][0-2])(([0-5][0-9])).*/;
    $time="$1";
    print $fhandle2 "Time: $time\n";
    }
    close ($fhandle2);
    close ($fhandle1);
    ==========
    Any help will be appreciated?

    Best
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,179
    Rep Power
    9398
    Because the expression starts with a .* the engine will go all the way to the end of the string before it starts trying to match anything. Since it won't immediately it backtracks one character at a time until it does.

    The strings it looks at are, in order:
    Code:
    .
    M.
    PM.
     PM.
    0 PM.
    40 PM.
    :40 PM.
    2:40 PM.
    That last one matches so it stops.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2011
    Posts
    2
    Rep Power
    0

    Angry


    Originally Posted by requinix
    Because the expression starts with a .* the engine will go all the way to the end of the string before it starts trying to match anything. Since it won't immediately it backtracks one character at a time until it does.

    The strings it looks at are, in order:
    Code:
    .
    M.
    PM.
     PM.
    0 PM.
    40 PM.
    :40 PM.
    2:40 PM.
    That last one matches so it stops.
    Yeah....you are right...but how about the second example? 10:30===>i should get 0:30 result instead?
    Thank you for taking time to look at this issue.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    You never specified what results you expected or what you received, but this is what your regex is doing.
    Code:
    use strict;
    use warnings;
    use YAPE::Regex::Explain;
    
    my $regex = 'm/.*(([0]?[1-9]|[1-1][0-2])(([0-5][0-9])).*/';
    
    print YAPE::Regex::Explain->new($regex)->explain;
    explaination:
    Code:
    The regular expression:
    
    (?-imsx:m/.*(([0]?[1-9]|[1-1][0-2])(([0-5][0-9])).*/))
    
    matches as follows:
    
    NODE                     EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx:                 group, but do not capture (case-sensitive)
                             (with ^ and $ matching normally) (with . not
                             matching \n) (matching whitespace and #
                             normally):
    ----------------------------------------------------------------------
      m/                       'm/'
    ----------------------------------------------------------------------
      .*                       any character except \n (0 or more times
                               (matching the most amount possible))
    ----------------------------------------------------------------------
      (                        group and capture to \1:
    ----------------------------------------------------------------------
        (                        group and capture to \2:
    ----------------------------------------------------------------------
          [0]?                     any character of: '0' (optional
                                   (matching the most amount possible))
    ----------------------------------------------------------------------
          [1-9]                    any character of: '1' to '9'
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          [1-1]                    any character of: '1' to '1'
    ----------------------------------------------------------------------
          [0-2]                    any character of: '0' to '2'
    ----------------------------------------------------------------------
        )                        end of \2
    ----------------------------------------------------------------------
        (                        group and capture to \3:
    ----------------------------------------------------------------------
          (                        group and capture to \4:
    ----------------------------------------------------------------------
            [0-5]                    any character of: '0' to '5'
    ----------------------------------------------------------------------
            [0-9]                    any character of: '0' to '9'
    ----------------------------------------------------------------------
          )                        end of \4
    ----------------------------------------------------------------------
        )                        end of \3
    ----------------------------------------------------------------------
        .*                       any character except \n (0 or more times
                                 (matching the most amount possible))
    ----------------------------------------------------------------------
        /                        '/'
    ----------------------------------------------------------------------
      )                        end of \1
    ----------------------------------------------------------------------
    )                        end of grouping
    ----------------------------------------------------------------------
    What did you want it to do?
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2011
    Posts
    46
    Rep Power
    58
    Unless your requirements are more complex than the example, why not kep it simple and just match the time at end of line as a simple sequence of digits:

    Code:
    m/([0-9]{1,2}:[0-9]{2}) (AM|PM)\.$/
  10. #6
  11. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,179
    Rep Power
    9398
    Originally Posted by k-Alex
    Yeah....you are right...but how about the second example? 10:30===>i should get 0:30 result instead?
    No, because "0:30" doesn't match: there has to be at least one non-zero digit in the hour (like 01 or 10).

IMN logo majestic logo threadwatch logo seochat tools logo