August 22nd, 2013, 05:43 PM
How to only grab one space?
suppose my string is:
"John Doe. Not a john. But a John. "
The quotes are added to indicate that the end of the string ends with 2 spaces.
I would like to match: John Doe
If I use:
"[a-zA-Z \.]+ "
it matches the whole string and not the part I want. I can't figure out how to only get the matching to use one space. I have tried making it lazy and tried lookarounds, but no success. How do I make the single space inside the brackets only match once? Any suggestions. Also, why if I place \. outside the brackets it doesn't match anything?
August 22nd, 2013, 06:16 PM
One space at a time is equivalent to "[a-zA-Z.] followed by zero or more (space and [a-zA-Z.]+)".
\. outside the brackets means a literal period. You'd at least have to have a literal period in your string for it to match.
August 23rd, 2013, 10:43 AM
Now what happens if the string equals this:
"John Joseph David Doe. This is another sentence. And another. "
With my target string, it can contain an unknown number of names with single spaces between each. I still just want to capture in this case: "John Joseph David Doe". The attributes I know are that there are single spaces between each name, an unknown number of names, and the name is ended by a period and two spaces. If my regex is:
"[a-zA-Z ]+\. ", then it will capture the entire string. I need some way of telling it to just capture one space when it is inside names.
August 23rd, 2013, 11:11 AM
You need to start writing down a concrete specification. I mean, we can trial-and-error forever, and maybe we'll even come up with a regex that kinda sorta works. But you'll get a much better result much faster if you get clear about what exactly you're looking for.
What is a "name" for you? Judging from your examples, it's probably an alphanumeric string starting with an uppercase letter:
Note that this is a gross oversimplification of human names. It probably excludes the majority of the world population, starting with the people who happen to have a hyphen or diacritic in their name. Whether you find that acceptable is up to you.
So now you want a sequence of one or more "names", separated by single spaces. And at the end, there's a colon with two spaces:
"[A-Z][a-z]+(?: [A-Z][a-z]+)*\. "
August 23rd, 2013, 11:34 AM
Here's the exact situation. I was tryiing to generalize and simplify and did a bad job. The target string is:
1819. Emily Gipson, m. Eddie Rudolph. He is an attorney. Three children.
The string ends in a carriage return after the period after the word children. This question relates to the second part of the string that I want, which starts with "m.". My regex is this:
"(, m\. )([a-zA-Z \'\.\(\)\-\u201c\u201d\x22]+)\. "
This captures: Eddie Rudolph. He is an attorney
I am interested in the match inside the second parentheses. I would like it to capture:
I don't know how to tell the regex to only capture a single space when it is inside the names, but it gets double spaces b/c of the +.
August 23rd, 2013, 04:31 PM
what you really need, if I understand you correctly, is to stop the capture at the first dot followed by two spaces. For this, the simplest is lazy evaluation of what comes before and a dot. Something like this:
"(, m\. [\w ]+?\. "
August 23rd, 2013, 05:07 PM
Yes, I have tried adding laziness to the +, but it still captures too much. I have just tried that regex with the target string in 2 websites that allow for regex testing, i.e. regexlib/retester and regextester.
The first one reported No Results. The second one reported a match of the entire string. The difference may be a result of different regex flavors. I am using Microsoft VBScript 5.5 Regular Expressions and it captures the entire string.