Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0

    .Net Regex, looking to make efficient


    I'm looking for someone who would consider themselves something sort of an expert with .net regex to have a look at my regex expressions and make sure that they are as efficient as they can be for the sake of speed.

    I have no need to learn regex any time soon past these 16 expressions, I've done the best that I can, but I want to be as sure as possible that they are efficient. If I am free to post here and everyone look at them, that would be fine, or if someone would like me to email them to them, please let me know.

    Thanks for any help.
  2. #2
  3. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    Post them and we'll take a look. If you'd like to hire someone to actually go through them, make a thread in the "hire a programmer" forum.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    Thanks, I'll start with some of the easier ones, none of these are hard, I just have yet to fully grasp the ins and outs of regex efficiency.

    All these do is find.



    This one is not correct as I am checking for spaces before it. It is only supposed to find exactly what is below, if an extra = is thrown in it should stop finding.

    Finds: >=, <=, == and =
    Regex: @"\s<=|\s>=|\s!=|\s={2}"



    This one, I'm sure there is a better way.

    Finds: +, -, ++, --, /, *
    Regex: @"(\+|\-|\/|\*|\%|\+\+|\-\-)"




    I have a few more that I will post when I get home, but want to see how bad these are first before I get too deep.
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    In general, easy find/replace syntax like this is better suited to straight string replacement functions. Regex is designed for variable strings, like "some number of special characters separated by an unknown amount of whitespace, two at a time." Not "the word 'dog'." you have a variety of "words" here. use string replacement for these, maybe regex for more complex ones.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    well they do get more complex, I am just running everything through the one system to keep everything in one place.

    I do not have the code at work, but an example is below, the letter "a" is just a placeholder for whatever word/letters may be present.

    It would find:

    $, $a, $a:, $a::, $a::a, $a::a:, $a::a::, and so on forever

    however it would not find the following:

    a::a, a:a, a:::, $:, $a:::, $a[, and so on

    I still feel like my two examples before could be more efficiently written, but it is the ones like this that are stumping me or leaving me with something I know for a fact is not good.
  10. #6
  11. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    Well, for the two examples I would do
    Code:
    @"\s[<>!=]?="
    @"++?|--?|[/*]"
    In terms of sheer efficiency, the first one might do better with
    Code:
    @"=(?<=\s([<>!=]?=)"
    (Note that this will only work with .NET and somebody else - no Perl or PHP.)
    Probably depends on the text (especially the number of spaces versus the number of equals).
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    For the first one @"\+\+?|--?|[/*]" works well, could not get the last one you posted to work in .net, expressions containing "?" never work well for me in .Net.

    One problem that has always been persistant for me though, is with these "special" characters having it parse just the word.

    For instance "==" is legal and "=== and ====" are illegal. Or "a==" is illegal and "a ==" is legal. Though I am also automatically inserting spaces, I would rather not search for a "\s" if i can help it. Is there some magical word search that I am missing that is not "\w" or "\b" as those do not work for "+=-/*$!@%^&*()"?
  14. #8
  15. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    Yeah, I totally forgot to escape the pluses.
    Code:
    @"\+\+?|--?|[/*]"
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    So here is the next one that I am working on, right now it is completely wrong, but I'm trying.


    Finds:
    $, $a, $a:, $a::, $a::a, $a::a:, $a::a::, and so on forever

    however it would not find the following:
    a::a, a:a, a:::, $:, $a:::, $a[, and so on

    Regex:
    @"(\$\w+\:\:\w+)|(\$*\w+\:\:?)|(\$\w+)|(\$)"

    I know this is wrong because it will only find at max one $a::a, it will fail at $a::a:, because it is searching quite literally.

    is it possible to take a word/special character combination and have it able to repeat endlessly?
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    Ok, here is my new regex for today for the above match.

    \$((?:(\w+?):?:?)?)+

    This seems to work, I am curious how efficient it is though with so many "?"s. Any thoughts?
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    Ok, one minor change and one problem that I have found, might end up being more, but have yet to find any others.

    @"\B\$((?:(\w+?):?:?)?)+"

    needed to add the \B to prevent "$a::a$"

    however i am still able to do "$$$$$" and "$a::$$$$$". These should be illegal in my case.

    If there was an anchor for any character except \n, \r, \s I could do all of these so easily, however I do not think that anchor exists.
  22. #12
  23. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    The other way to think of the pattern is
    '$', followed by any number of "a::", followed by maybe an 'a' and then maybe a ':' and then maybe another ':'
    Code:
    \$(\w+::)*\w*(\w:?:?)?
  24. #13
  25. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    Ok, I'm getting what you are saying, I have been looking at things too literally.

    I'm still having one problem though, and thank you for all of your guidance so far, it has helped a lot.

    The problem is that with both yours and my regex is that after an illegal character breaks the find, it starts picking up again with the next $. I do not necessarily want to check for a \s, as it can appear at the begining of a line. Checking for [\r\n\s\t\b] first works on all but the very first line, but has a very noticeable 1 second delay.

    \b and \w work for word characters, surely there is something that encompasses special characters like that. I was assuming it would be \B or \W, but no go on those either.
  26. #14
  27. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,126
    Rep Power
    9398
    Originally Posted by rich7424
    The problem is that with both yours and my regex is that after an illegal character breaks the find, it starts picking up again with the next $.
    And that's a problem? So what are the criteria for when the match is valid? At the beginning of the line or after whitespace? Prepend a
    Code:
    (?<=^|\s)
  28. #15
  29. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2011
    Posts
    9
    Rep Power
    0
    Thanks so much, I'm actually understanding about writing these a bit more efficiently now.

    May I ask though what exactly "?<=" means, couldn't find much on google, and it's things like that that I find in examples that I have no clue about.

    Thanks again for all the help.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo