#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0

    How to only grab one space?


    suppose my string is:
    "John Doe. Not a john. But a John. "

    The quotes are added to indicate that the end of the string ends with 2 spaces.

    I would like to match: John Doe

    If I use:
    "[a-zA-Z \.]+ "
    it matches the whole string and not the part I want. I can't figure out how to only get the matching to use one space. I have tried making it lazy and tried lookarounds, but no success. How do I make the single space inside the brackets only match once? Any suggestions. Also, why if I place \. outside the brackets it doesn't match anything?
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,141
    Rep Power
    9398
    One space at a time is equivalent to "[a-zA-Z.] followed by zero or more (space and [a-zA-Z.]+)".

    \. outside the brackets means a literal period. You'd at least have to have a literal period in your string for it to match.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    Now what happens if the string equals this:

    "John Joseph David Doe. This is another sentence. And another. "

    With my target string, it can contain an unknown number of names with single spaces between each. I still just want to capture in this case: "John Joseph David Doe". The attributes I know are that there are single spaces between each name, an unknown number of names, and the name is ended by a period and two spaces. If my regex is:
    "[a-zA-Z ]+\. ", then it will capture the entire string. I need some way of telling it to just capture one space when it is inside names.
  6. #4
  7. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    You need to start writing down a concrete specification. I mean, we can trial-and-error forever, and maybe we'll even come up with a regex that kinda sorta works. But you'll get a much better result much faster if you get clear about what exactly you're looking for.

    What is a "name" for you? Judging from your examples, it's probably an alphanumeric string starting with an uppercase letter:

    Code:
    "[A-Z][a-z]+"
    Note that this is a gross oversimplification of human names. It probably excludes the majority of the world population, starting with the people who happen to have a hyphen or diacritic in their name. Whether you find that acceptable is up to you.

    So now you want a sequence of one or more "names", separated by single spaces. And at the end, there's a colon with two spaces:

    Code:
    "[A-Z][a-z]+(?: [A-Z][a-z]+)*\.  "
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    Here's the exact situation. I was tryiing to generalize and simplify and did a bad job. The target string is:

    1819. Emily Gipson, m. Eddie Rudolph. He is an attorney. Three children.

    The string ends in a carriage return after the period after the word children. This question relates to the second part of the string that I want, which starts with "m.". My regex is this:

    "(, m\. )([a-zA-Z \'\.\(\)\-\u201c\u201d\x22]+)\. "

    This captures: Eddie Rudolph. He is an attorney

    I am interested in the match inside the second parentheses. I would like it to capture:

    Eddie Rudolph

    I don't know how to tell the regex to only capture a single space when it is inside the names, but it gets double spaces b/c of the +.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    Hi,
    what you really need, if I understand you correctly, is to stop the capture at the first dot followed by two spaces. For this, the simplest is lazy evaluation of what comes before and a dot. Something like this:

    Code:
    "(, m\. [\w ]+?\.  "
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    Yes, I have tried adding laziness to the +, but it still captures too much. I have just tried that regex with the target string in 2 websites that allow for regex testing, i.e. regexlib/retester and regextester.

    The first one reported No Results. The second one reported a match of the entire string. The difference may be a result of different regex flavors. I am using Microsoft VBScript 5.5 Regular Expressions and it captures the entire string.

IMN logo majestic logo threadwatch logo seochat tools logo