#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    1
    Rep Power
    0

    Regular Expression (linux)


    Hello!
    It is the first time I have to do something with Regular Expression Programing and I have a task to do the following things:

    Write a regular expression for pattern matching of Libian president Moamer Gadhaphy that includes the following variants of his name:

    - Muammar al-Kaddafi (BBC)
    - Moammar Gadhafi (Assiciated Press)
    - Muammar al-Qadhafi (Al-Jazeera)
    - Mu'ammar Al-Qadhafi (U.S. Department of State)
    - Moamer Gadafi (STA)

    At the beginning I thought it would be the best to concetrate only on the first name Moamer. I see that I have substitution and insertions.

    To make things easier I made a table:
    M U A M M A R
    M O A M M A R
    M U ' A M M A R
    M O A M E R

    M [U|O A|' M|A M|E A|M|R R] - I know this is not correct but that's how I understand substitutions and insertions.

    I would be very grateful, if you could paste me some links with some examples of this. I am sitting in front of computer for some while now, but internet is too full of information which aren't useful for me. What should I use?

    Thank you very much for your help!

    Have a nice day,
    Talijana.
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,002
    Rep Power
    9398
    Try building out the possibilities this iterative way (which is a lot easier on paper):
    Code:
    M  u  a  m  m  a  r  _  a  l  -  K  a  d  d  a  f  i
    Code:
        u  a  m  m  a  r  _  a  l  -  K  a  d  d  a  f  i
    M <
        o  a  m  m  a  r  _  G  a  d  h  a  f  i
    Code:
                                     K  a  d  d  a  f  i
      u  a  m  m  a  r  _  a  l  - <
     /                               Q  a  d  h  a  f  i
    M
     \
       o  a  m  m  a  r  _  G  a  d  h  a  f  i
    Code:
                                    K  a  d  d  a  f  i
        a  m  m  a  r  _  a  l  - <
       /                            Q  a  d  h  a  f  i
      u
     / \
    M   '  a  m  m  a  r  _  A  l  -  Q  a  d  h  a  f  i
     \
      o  a  m  m  a  r  _  G  a  d  h  a  f  i
    Code:
                                    K  a  d  d  a  f  i
        a  m  m  a  r  _  a  l  - <
       /                            Q  a  d  h  a  f  i
      u
     / \
    M   '  a  m  m  a  r  _  A  l  -  Q  a  d  h  a  f  i
     \
      \          m  a  r  _  G  a  d  h  a  f  i
       o  a  m <
                 e  r  _  G  a  d  a  f  i
    With the tree built out you can look for optimizations. Like how every name ends with "afi"
    Code:
                                    K  a  d  d
        a  m  m  a  r  _  a  l  - <            >
       /                            Q  a  d  h  \
      u                                          >
     / \                                        / \
    M   '  a  m  m  a  r  _  A  l  -  Q  a  d  h   > a  f  i
     \                                            /
      \             m   a   r  _  G   a   d   h  /
       o   a   m  <                             >
                    e    r    _    G     a    d
    and... well, that's it really.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    22
    Rep Power
    0
    This might be missing the point of the exercise, but couldn't you just keep it simple with something like
    Code:
    Muammar al-Kaddafi|Moammar Gadhafi|Muammar al-Qadhafi|Mu'ammar Al-Qadhafi|Moamer Gadafi

    Comments on this post

    • requinix agrees : I think you are, but technically speaking...
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    You could use this:

    Code:
    M[ou]'?am(ma|e)r (al-Kadd|Gadh|[aA]l-Qadh)afi
    But this would match: Moamer al-Kaddafi, which is not in your list (it is a mix of two of the possible names).

    If you don't want that, you'll need to use Acray's version or a slight improvement on it such as:

    Code:
    M(uammar al-Kadd|oammar Gadh|uammar [aA]l-Qadh|Moamer Gad)afi

IMN logo majestic logo threadwatch logo seochat tools logo