#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2017
    Posts
    2
    Rep Power
    0

    Understanding this regular expression


    Hi
    I have inherited some PHP code containing this regular expression

    Code:
       /([^$&.]+)(?:((?:\.\.\.)?(?:\$|&)[^\s]+)(?:(\s+)(.*))?)?/
    being new to both PHP and regular expressions I would be grateful if someone could break down and explain this expression

    Thanks
  2. #2
  3. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,382
    Rep Power
    9645
    regex101.com (and probably others) can give an automated, technical breakdown of a regex:
    Code:
    / ([^$&.]+)(?:((?:\.\.\.)?(?:\$|&)[^\s]+)(?:(\s+)(.*))?)? /
    
    1st Capturing Group ([^$&.]+)
    	Match a single character not present in the list below [^$&.]+
    		+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
    		$&. matches a single character in the list $&. (case sensitive)
    Non-capturing group (?:((?:\.\.\.)?(?:\$|&)[^\s]+)(?:(\s+)(.*))?)?
    	? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
    	2nd Capturing Group ((?:\.\.\.)?(?:\$|&)[^\s]+)
    		Non-capturing group (?:\.\.\.)?
    			? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
    			\. matches the character . literally (case sensitive)
    			\. matches the character . literally (case sensitive)
    			\. matches the character . literally (case sensitive)
    		Non-capturing group (?:\$|&)
    			1st Alternative \$
    				\$ matches the character $ literally (case sensitive)
    			2nd Alternative &
    				& matches the character & literally (case sensitive)
    		Match a single character not present in the list below [^\s]+
    			+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
    			\s matches any whitespace character (equal to [\r\n\t\f\v ])
    	Non-capturing group (?:(\s+)(.*))?
    		? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
    		3rd Capturing Group (\s+)
    			\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
    				+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
    		4th Capturing Group (.*)
    			.* matches any character (except for line terminators)
    				* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    The human explanation is that it matches:
    a) Anything but $ & .
    b) Anything but $ & . followed by three periods (optionally) then a $ or & and non-whitespace
    c) Anything but $ & . followed by three periods (optionally) then a $ or & and non-whitespace, then whitespace and the rest of the string

    I get the feeling that the regex won't quite work the way its author intended it to...
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2017
    Posts
    2
    Rep Power
    0
    Thanks for you help
    I will investigate further and see what it is supposed to do
  6. #4
  7. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,382
    Rep Power
    9645
    To clarify, if this regex is used for validation (it might not be), then it could potentially match some strings that it should not. If this is the case then the likely fix would be to add ^ and $ anchors to force the regex to match the entire string rather than just a part of it.
    Code:
    /^([^$&.]+)(?:((?:\.\.\.)?(?:\$|&)[^\s]+)(?:(\s+)(.*))?)?$/

IMN logo majestic logo threadwatch logo seochat tools logo