#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    1
    Rep Power
    0

    New to Regex - would need your kind help for a formula


    Hi,

    I am new in Regex programming and I have met a problem that requires the knowledge of Regex to solve it. I started learning it and I have been awake all the night but in the end I decided to ask this forum.

    Problem:
    I would need the formulas:

    1. from: "Paris, France Europe"
    extract: "Paris"
    extract: "France"
    extract: "Europe"

    2. from: "10.11.2013 - 16.11.2013 20.11.2013 - 26.11.2013"
    extract: "10.11.2013"
    extract: "16.11.2013"
    extract: "20.11.2013"
    extract: "26.11.2013"

    3. from: Paris Triathlon Charity Association 20100 Paris Tel: +33 1 72720-0 ∑ Fax: +33 1 72720-4709 email address website address scientific secretary Tel: +33 1 72720-1 Fax: +33 1 72720-4801 email address 2 website address 2

    extraxt: "Paris Triathlon Charity Association"
    extract: "20 100 Paris"
    extract: "+33 1 72720-0"
    extract: "+33 1 72720-4709"
    extract: "email address"
    extract: "website address"
    extract: "+33 1 72720-1"
    extract: "+33 1 72720-4801"
    extract: "email address 2"
    extract: "website address 2"


    Thank you so much in advance for your very kind help,

    Sarah
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    the first step to solving a problem is to define it. Unfortunately, it has become common to merely write down a bunch of examples and then wait for others to figure out the underlying requirements. This really doesn't help anyone. It's additional work for us, it increases the risk of misunderstandings, and it makes you passive and dependend on others.

    You don't have to be a regex pro to define a concrete search pattern that would solve the problem. Just put it into English words.

    Take the first task. Judging from the examples, the problem might be something like this:

    "We have a bunch of words separated by commas and/or spaces. We want to extract the individual words."

    If that's actually the problem, then a possible pattern for a word would be this:

    "A (non-empty) sequence of characters different from spaces and commas."

    The last step of translating this description into a regex is easy and just a matter of learning the syntax:

    Code:
    [^\s,]+
    Note that this is only a guess. It's impossible to derive the actual requirements from a bunch of examples. Maybe there are other unwanted characters, and maybe the correct approach would be to search for sequences of characters of the (French) alphabet.

    In the second example, you're probably looking for dates in the format "dd.mm.yyyy". So what's the pattern (in English)?
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    831
    Rep Power
    496
    I agree totally with Jacques. Please define what you need rather than just giving examples. In the case of the first two requirements, we can more or less figure out what you need (extractig words, extracting dates in a certain format), but it is completely impossible to answer the third one, because we have absolutely no idea what another line would look like. For example, if I gave you a regex matching the first group of 4 words, it would most probably not work on another record where the association name would contain 3 or 5 words.

    Just an additional point: it would be good if you specified in which language you are using regexes, it might sometimes lead to better solutions.

    For example, while Jacques's proposal to match words is perfectly valid, in Perl I would probably do just the contrary and match the separators to split the string and load the words into an array:

    Perl Code:
    my @words = split /[\s,]+/, "Paris, France Europe";  #  the @words array now contains "Paris, "France", "Europe"

IMN logo majestic logo threadwatch logo seochat tools logo