Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 28th, 2012, 05:40 PM
TwainKor TwainKor is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2012
Posts: 4 TwainKor User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 6 m 24 sec
Reputation Power: 0
Trying to achieve the following email-related regex

I'm using PCRE in PHP under the preg functions.

I want:

-at least one thing before the @
-exactly one @
-at least one thing after the @
-a period between above and below
-must be something, anything, after period
-check for multiple @s

Here are two expressions that I wrote:
Version 1: ^(.)+@(.)+\.(.)+$
Version 2: ^(.+)@(.+)\.(.+)$

Any problems with either of these expression corresponding to the above criteria? (and which one is preferred?)

I'm new to regex, so all in-depth advice would be appreciated.

If possible, I'd like to do the following as well, but I don't know how to do it in regex. Right now I just do it through other programming functions. Can regex do this? I am not sure it could be built into the expression above, though.

disallow: multiple @s, semicolons, back or forward slashes, commas, and single quotes.

Reply With Quote
  #2  
Old January 29th, 2012, 12:45 AM
ragax's Avatar
ragax ragax is offline
Turn left at the third duck
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Location: Nelson, NZ
Posts: 93 ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Day 24 m 37 sec
Reputation Power: 92
Hi TwainKor!

The parentheses in your expressions are only useful if you want to capture the content of the parentheses, and retrieve them via $match[1], $match[2] etc (assuming $match is the third parameter in your preg_match).

If you do want to capture something, your second version is better, as the first only captures one character.
But if you only want to validate (no capture), do away with the parentheses.

Warning: both of your expressions allow multiple @ (the DOT can be an @).

However, you can easily disallow all the characters you specified. Here's the perhaps simplest way to match what you asked for:

Input:
joe@mama.com

Code:
Code:
<?php
$regex='~^[^@;/\\\\,\']+@[^@;/\\\\,\']+\.[^@;/\\\\,\']+$~';
$string='joe@mama.com';
echo preg_match($regex,$string);
?>

Output:
1

You can be a whole lot more specific if you like, without or without lookarounds.

For lookarounds, have a look at how you can validate a password by using regex lookaheads. The technique is the same for an email.

Hope this helps, let me know if you have any questions.

Wishing you a fun weekend.
__________________
Regex Tutorial | Latest RegexBuddy Demo

Last edited by ragax : January 29th, 2012 at 01:00 AM.

Reply With Quote
  #3  
Old January 29th, 2012, 01:09 AM
ragax's Avatar
ragax ragax is offline
Turn left at the third duck
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Location: Nelson, NZ
Posts: 93 ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Day 24 m 37 sec
Reputation Power: 92
Quote:
Here's the perhaps simplest way to match what you asked for


And here's a less simple version of the code that has several benefits.

Code:
Code:
<?php
$regex='~^([^@;/\\\\,\'])++@(?>(?1)+?\.)(?1)++$~';
$emails=array('joe@mama.com','joe@ma@ma.com','joe@ma,ma.com','joe@ma/ma.com','joe@ma\\ma.com','joe@ma\'ma.com');
foreach($emails as $email) echo preg_match($regex,$email).'<br />';
?>


Output:
1
0
0
0
0
0

First, there's an array of test email addresses so you can add addresses and check that the expression works for you.

Second, it only says "no @, no semicolons etc" explicitly ([^@;/\\\\,\']) once at the beginning, later just calling that expression twice with the pattern repeat syntax: (?1).
This allows you to tweak this condition in a single place, which makes the expression easier to maintain.

I also threw in a few performance tweaks.

Last edited by ragax : January 29th, 2012 at 01:21 AM. Reason: performance tweaks

Reply With Quote
  #4  
Old January 29th, 2012, 06:31 AM
abareplace abareplace is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Posts: 29 abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 8 h 25 m 9 sec
Reputation Power: 0
ragax, it's a perfect example of repeating expressions. With them, the code is easier to maintain and just shorter. Nice regex!

Reply With Quote
  #5  
Old January 29th, 2012, 12:39 PM
TwainKor TwainKor is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2012
Posts: 4 TwainKor User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 6 m 24 sec
Reputation Power: 0
Wow, thanks for the help!
I am going to have to mull this over to get a better understanding of what is happening (I still can't look at a regex without looking things up!), but I just wanted to give you a quick thanks for the speedy help

Reply With Quote
  #6  
Old January 29th, 2012, 04:57 PM
TwainKor TwainKor is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2012
Posts: 4 TwainKor User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 6 m 24 sec
Reputation Power: 0
I tried your two regex's out in RegexBuddy (which I use to make sense of regex's along with google) and the first one came out okay but the second one gave some errors. However, when I tried out your code, and added a few test emails of my own, everything worked well. So regexbuddy could be wrong.

Unfortunately though, I could not understand the second regex you provided (although I was able to understand the first).

Here's the second one for reference:
^([^@;/\\\\,\'])++@(?>(?1)+?\.)(?1)++$


My confusion begins at "(?1)" -- I'm not sure what this is doing. Same again with the second time the set of characters appears. (remember I am very inexperienced with regex!)

For the record, the error reported though is for the atomic group "(?>(?1)+?", the characters "(?" before the 1 (so in "(?1" are said to be "Erroneous characters - possibly incomplete regextoken or unescaped metacharacters"

-it also gives the same error for the ) in "\.)"
-again for the characters (? in the second "(?1)"
-it's not an error, but for every 1 in the regex, it says you are trying to match it literally (I don't know what the "(?1)" syntax means, but I am going to assume you are not trying to match 1 literally.
-same error for the ) in the second ")++"
-it also gives the error that "Quantifiers must be proceeded by a token that can be repeated." for the characters ++ in the second set of "++" (at the end of the regex)

I don't know if any of this makes sense to you or if regexbuddy just got it wrong, but I thought I would let you know so you could comment.

Thanks again!

Reply With Quote
  #7  
Old January 29th, 2012, 06:19 PM
abareplace abareplace is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Posts: 29 abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level)abareplace User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 8 h 25 m 9 sec
Reputation Power: 0
RegexBuddy does not support the repeating expression (?1) that ragax used.

Reply With Quote
  #8  
Old January 29th, 2012, 10:33 PM
ragax's Avatar
ragax ragax is offline
Turn left at the third duck
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Location: Nelson, NZ
Posts: 93 ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Day 24 m 37 sec
Reputation Power: 92
Hi ABA, yes, isn't it cool? Thought you'd like that. Planning to start using those more and more.

Hi TwainKor,

ABA is right about RB. I love RB, but there are quite a few cool features of PHP regex that don't work in it. By the way I also love ABA's own tool, ABA search and replace. It has a different focus---searching and replacing across multiple text files. Very powerful.

Quote:
My confusion begins at "(?1)


Did you sort it out by looking at the link ABA sent you?
It just means "repeat the regex in the parentheses of group 1". A great way to make your expressions more compact and maintainable.

As far as RB goes, that's really the only hiccup with this expression. When you remove the two (?1) patterns, RB calms down.

Let us know if you have other questions!

Reply With Quote
  #9  
Old January 31st, 2012, 12:23 PM
TwainKor TwainKor is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2012
Posts: 4 TwainKor User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 6 m 24 sec
Reputation Power: 0
Thanks ragax/abareplace. The link was helpful; I haven't used that feature before.

I have been looking at the regex for a few days now and it's helped improve other expressions I have had to write. One question I had though was about this sub-expression: "(?>(?1)+?\.)", in particular the +? part. In my original criteria, I needed to specify that there was something (except the characters listed) in between the "@" in an email and the "." before the TLD. ( so @(anything, but at least one thing).com ). I recognize that the + is operating on the repeating expression "(?!)" but could you explain what the immediately following ? is doing? I understand "?" usually makes things optional, but if I am interpreting this correctly, it is making the (?1)+ optional, which in turn is making the check for "([^@;/\\\\,\'])" optional that is supposed to take place after the @. I am assuming I am just misinterpreting the work of the ? in the atomic expression after the repeating expression. Could you clear up what that is doing here?

Thanks again!

Reply With Quote
  #10  
Old January 31st, 2012, 12:50 PM
ragax's Avatar
ragax ragax is offline
Turn left at the third duck
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2011
Location: Nelson, NZ
Posts: 93 ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level)ragax User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Day 24 m 37 sec
Reputation Power: 92
Hi TwainKor,

A ? after a quantifier (such as + or *) is not an "optional" quantifier, but a "lazy" flag. It turns the quantifier lazy (quantifiers are greedy by default).

In this case, it's a bug on my part, you can remove it. It works, but it is not needed, and it slows down the match on the order of a millionth of a a second, so a purist would not want it.

The reason it is not needed is that the expression (?1) repeated by the + can in fact be greedy. There is no risk that its "greed" will make it roll over the period after it (\.), because there is no period in the character class contained in (?1). That pattern (?1) will never eat a period, so let it eat up anything it likes without impediment.

If you're interested in this topic, I encourage you to read this little piece of mine up on greedy and lazy quantifiers.

I'm really pleased you asked this question because (i) you're really getting into the nuts and bolts of the regex, which is awesome, and (ii) you put your finger on something to improve!

Wishing you a beautiful day.


Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Trying to achieve the following email-related regex

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap