July 27th, 2013, 03:39 AM
New to Regex - would need your kind help for a formula
I am new in Regex programming and I have met a problem that requires the knowledge of Regex to solve it. I started learning it and I have been awake all the night but in the end I decided to ask this forum.
I would need the formulas:
1. from: "Paris, France Europe"
2. from: "10.11.2013 - 16.11.2013 20.11.2013 - 26.11.2013"
3. from: Paris Triathlon Charity Association 20100 Paris Tel: +33 1 72720-0 ∑ Fax: +33 1 72720-4709 email address website address scientific secretary Tel: +33 1 72720-1 Fax: +33 1 72720-4801 email address 2 website address 2
extraxt: "Paris Triathlon Charity Association"
extract: "20 100 Paris"
extract: "+33 1 72720-0"
extract: "+33 1 72720-4709"
extract: "email address"
extract: "website address"
extract: "+33 1 72720-1"
extract: "+33 1 72720-4801"
extract: "email address 2"
extract: "website address 2"
Thank you so much in advance for your very kind help,
July 27th, 2013, 04:37 AM
the first step to solving a problem is to define it. Unfortunately, it has become common to merely write down a bunch of examples and then wait for others to figure out the underlying requirements. This really doesn't help anyone. It's additional work for us, it increases the risk of misunderstandings, and it makes you passive and dependend on others.
You don't have to be a regex pro to define a concrete search pattern that would solve the problem. Just put it into English words.
Take the first task. Judging from the examples, the problem might be something like this:
"We have a bunch of words separated by commas and/or spaces. We want to extract the individual words."
If that's actually the problem, then a possible pattern for a word would be this:
"A (non-empty) sequence of characters different from spaces and commas."
The last step of translating this description into a regex is easy and just a matter of learning the syntax:
Note that this is only a guess. It's impossible to derive the actual requirements from a bunch of examples. Maybe there are other unwanted characters, and maybe the correct approach would be to search for sequences of characters of the (French) alphabet.
In the second example, you're probably looking for dates in the format "dd.mm.yyyy". So what's the pattern (in English)?
July 27th, 2013, 09:52 AM
I agree totally with Jacques. Please define what you need rather than just giving examples. In the case of the first two requirements, we can more or less figure out what you need (extractig words, extracting dates in a certain format), but it is completely impossible to answer the third one, because we have absolutely no idea what another line would look like. For example, if I gave you a regex matching the first group of 4 words, it would most probably not work on another record where the association name would contain 3 or 5 words.
Just an additional point: it would be good if you specified in which language you are using regexes, it might sometimes lead to better solutions.
For example, while Jacques's proposal to match words is perfectly valid, in Perl I would probably do just the contrary and match the separators to split the string and load the words into an array:
my @words = split /[\s,]+/, "Paris, France Europe"; # the @words array now contains "Paris, "France", "Europe"