1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Rep Power

    Literal period at the beginning of a string?

    Hi all,

    I'm looking to get the value of a capacitor from the description.

    I'm using the following REGEX:
    \b(\d|\.)+(PF| PF|UF| UF|NF| NF)

    CAP .01UF 10% 25V X5R 0402 I want .01UF, I get 01UF
    CAP 0.01UF 10% 25V X5R 0402 I want 0.01UF, I get 0.01UF

    So why does it skip the literal period.
    I expect that the regex engine does not recognize a literal period as the begining of a word.

    Any suggestions would be appreciated.

  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Washington, USA
    Rep Power
    \b means a word boundary and is defined as having a \w on one side and a \W on the other. That's why it finds a boundary between a period (\W) and a number (\w).

    For whatever reason the engine decided to match after the period. Probably a result of optimizations - maybe it expanded the beginning of the expression to (\b\d|\b\.) and found the \b\d first. Who knows.

    Be more specific with your regex. Try
    CAP ([\d.]+)\s?([PUN]F)
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Rep Power
    I do not think it is the result of an optimization. It is just that, in " .", there is no word boundary between the space and the dot.

    Try this (Perl syntax):

    my  $d = " .";
    print $1 if $d =~ /\b(\.)/;
    It will not print anything. But:

    my $d = ".5";
    print $1 if $d =~ /\b(\d)/;
    will print "5".

    The syntax proposed by Requinix should work.

    Comments on this post

    • requinix agrees : yeah, reading it again you're absolutely right

IMN logo majestic logo threadwatch logo seochat tools logo