#1
  1. Off-topic
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Nov 2003
    Location
    Geneva / Genève / Genf / Ginevra
    Posts
    1,632
    Rep Power
    1364

    Unique characters only in string


    Hi,

    Need help with a simple expression - it's driving me nuts though! Basically need to match words that have unique letters (although they can contain characters too) in them - i.e. "bent" would return a match, but "here" wouldn't.

    Any ideas? Tons of rep for the winner
  2. #2
  3. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    Hmm...I don't think it's possible to do this quickly. Plus, it's probably always going to be quicker to loop through the string one character at a time. In PHP:
    PHP Code:
    function has_duplicate_letters$string ) {
        
    $len strlen($string);
        for ( 
    $p 1$p $len$p++ ) {
            for ( 
    $q 0$q $p$q++ ) {
                if ( 
    $string[$p] == $string[$q] ) return true
            }
        }
        return 
    false;

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  4. #3
  5. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by riotx
    Hi,

    Need help with a simple expression - it's driving me nuts though! Basically need to match words that have unique letters (although they can contain characters too) in them - i.e. "bent" would return a match, but "here" wouldn't.
    This regex will do the trick:

    Code:
    ^(?:(.)(?!.*?\1))*$
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    That regexp may work in languages that support \1 references inside the match condition, which I don't normally use. However, it seems to only check that the first letter is unique, not all of them. Though without experience in that style of expression, I can't tell you.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    You don't normally use backreferences?

    What Cavemann posted works - try it. Kinda complicated though.
    Code:
    (.)[^\1]*\1
    [edit] This expression tests if something has a repeated character. If you want to test for uniqueness then just negate the result you get. In PHP
    PHP Code:
    if (!preg_match('/(.)[^\1]*\1/'$text)) {
        
    // each character in $text is unique

    Last edited by requinix; February 4th, 2009 at 04:00 AM.
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by requinix
    ...

    What Cavemann posted works - try it.
    Err, "Cavemann"?

    Originally Posted by requinix
    Kinda complicated though.
    Code:
    (.)[^\1]*\1
    Most PCRE regex engines I know don't accept back references inside character sets. May I ask how you tested yours (in what language and with what input)?
    Besides, if that had worked, you only seem to be checking if one character is repeated once, not what the OP is looking for (checking if all characters are unique).
    Last edited by prometheuzz; February 4th, 2009 at 03:07 AM.
  12. #7
  13. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by ManiacDan
    That regexp may work in languages that support \1 references inside the match condition, which I don't normally use.
    I see you are familiar with PHP, which supports back references. Note that (nearly) all PCRE regex engines (like PHP's preg-functions) support them.

    Originally Posted by ManiacDan
    However, it seems to only check that the first letter is unique, not all of them.
    No, that is not correct.

    Originally Posted by ManiacDan
    Though without experience in that style of expression, I can't tell you.

    -Dan
    No offence, but before commenting on something you don't fully understand, perhaps you should first try it?
  14. #8
  15. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    Originally Posted by prometheuzz
    Err, "Cavemann"?
    Sometimes when I refer to people I call them by a different name. For fun. No bad feelings.

    The name Prometheus reminds me of an old clay animation called Prometheus and Bob. While the caveman is actually Bob (the alien is Prometheus) for some reason I remember it the other way around.
    Since you repeated the last character in your name I repeated the last in mine too.

    Thus "prometheuzz" -> "Cavemann" (capitalized because it's a name)

    [edit] Yeah, I'll admit that was a bit of a stretch. Most of the time it's more obvious (like E-Oreo becoming just Oreo). [/edit]

    Originally Posted by Cavemann
    Most PCRE regex engines I know don't accept back references inside character sets. May I ask how you tested yours (in what language and with what input)?
    I tested with PHP's preg_ functions (PHP 5.2.8, PCRE 7.8). If I had Perl I would have tried that.
    PHP Code:
    $words = array(
        
    "there",
        
    "foo",
        
    "bar",
        
    "was not"
    );

    foreach (
    $words as $w) {
        echo 
    "$w: ";
        
    var_dump(preg_match('/(.)[^\1]*\1/'$w));

    Originally Posted by Cavemann
    Besides, if that had worked, you only seem to be checking if one character is repeated once, not what the OP is looking for (checking if all characters are unique).
    Right. It checks for repeated characters. If it fails this test then all characters are unique.

    I have a "check if it's invalid" mentality (as opposed to "check if it's valid") and considering how OP asked for something that does the exact opposite I probably should have mentioned that the result of my regex should be inverted.
    (That, and I don't like using lookaheads or lookbehinds if I don't need to.)

    PS: In that other thread, when I said "{ and } are special characters" I was simplifying. They're just as special as . * and ? (that is, most of the time but not always).
    Last edited by requinix; February 4th, 2009 at 04:07 AM. Reason: various small edits, couple bigger ones
  16. #9
  17. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by requinix
    Sometimes when I refer to people I call them by a different name. For fun. No bad feelings.

    The name Prometheus reminds me of an old clay animation called Prometheus and Bob. While the caveman is actually Bob (the alien is Prometheus) for some reason I remember it the other way around.
    Since you repeated the last character in your name I repeated the last in mine too.

    Thus "prometheuzz" -> "Cavemann" (capitalized because it's a name)

    [edit] Yeah, I'll admit that was a bit of a stretch. Most of the time it's more obvious (like E-Oreo becoming just Oreo). [/edit]
    Ah, I noticed the double "n" in "Cavemann", but didn't know the animation. Thanks for the link. ; )

    Originally Posted by requinix
    I tested with PHP's preg_ functions (PHP 5.2.8, PCRE 7.8). If I had Perl I would have tried that.
    Hmm, Java's java.util.regex package (a high PCRE degree) does not support them. I would have guessed PHP's preg-functions wouldn't either, which is not the case!

    Originally Posted by requinix
    PHP Code:
    $words = array(
        
    "there",
        
    "foo",
        
    "bar",
        
    "was not"
    );

    foreach (
    $words as $w) {
        echo 
    "$w: ";
        
    var_dump(preg_match('/(.)[^\1]*\1/'$w));

    Right. It checks for repeated characters. If it fails this test then all characters are unique.

    I have a "check if it's invalid" mentality (as opposed to "check if it's valid") and considering how OP asked for something that does the exact opposite I probably should have mentioned that the result of my regex should be inverted.
    (That, and I don't like using lookaheads or lookbehinds if I don't need to.)
    Since your "raw" regex pattern only matched a single character, I couldn't see it working. But it seems that the if() statement in in PHP does a bit more than I would think (I know very little PHP...).
    Anyway, thank you for you clarification on the nickname and your example!
    ; )
    Last edited by prometheuzz; February 4th, 2009 at 04:13 AM. Reason: Sticky fingers!

IMN logo majestic logo threadwatch logo seochat tools logo