#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    3
    Rep Power
    0

    Simple regex issue - need solving


    Hey everyone,

    I am really new of using regular expressions. I am trying to set up rules in a web analytics system (Adobe Sitecatalyst).

    So here is the dilemma:
    I have 2 types of tracking string I am using for campaigns. They have different lenghts.

    Type 1
    email_promo_EN.promo.BTC__en_ENG_ENG_111222

    Type 2
    affiliate_landing_BTC_home_en_ENG_ENG

    Now the thing I want to accomplish is to separate the parts of the the string that are separated by "_" symbols.

    So ideally they will look like this:
    Type 1
    $1 - email
    $2 - promo
    $3 - EN.promo.BTC
    $4 - *empty*
    $5 - en
    $6 - ENG
    $7 - ENG
    $8 - 111222

    Type 2
    $1 - affiliate
    $2 - landing
    $3 - BTC
    $4 - home
    $5 - en
    $6 - ENG
    $7 - ENG

    Problem 1
    Whatever I tried the system doesnt recognize the empty field in Type 1 (so two underscores "__" doesnt mean an empty field). I would be willing to insert a constant value but would be better if the original idea would work.

    Problem 2
    Since the second Type is shorter by one whatever regex I tried it wasnt working.

    My ideas so far:

    1. Alternation
    a | b. So the string either contains 7 or 8 parts:

    ^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$|^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)$

    Not working.

    2. having a non capturing group
    (?: - idea was the last bit doesnt need to be captured so it wouldnt matter if its 7 or 8 parts.

    ^([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)\_([^\:]+)(?:\_([^\:]+))$

    Not working.

    Thats all I had.

    Can you guys help me out with this one? Any advice would be greatly appreciated!!

    Thanks

    Balint
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Try this:

    Type 1
    email_promo_EN.promo.BTC__en_ENG_ENG_111222

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$
    Type 2
    affiliate_landing_BTC_home_en_ENG_ENG

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$
    (Not tested, I can't right now.)

    There are better (shorter) ways, but it depends on what your regex package actually implements.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by Laurent_R
    Try this:

    Type 1
    email_promo_EN.promo.BTC__en_ENG_ENG_111222

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$
    Type 2
    affiliate_landing_BTC_home_en_ENG_ENG

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$
    (Not tested, I can't right now.)

    There are better (shorter) ways, but it depends on what your regex package actually implements.
    thanks for the response Laurent.

    Unfortunately neither is working what you were suggesting.

    and also I probably havent made myself 100% clear. I need only one regex that will work with both types. so one that will accept both 7 and 8 part long strings.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Hmm, I counted wrongly the number of expressions to be matched!

    This works for both your strings:

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?
    Example on the Perl debugger:

    Code:
      DB<4> print "$1 $2 $3 $4 $5 $6 $7 $8" if $d =~ /^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?/
    email promo EN.promo.BTC  en ENG ENG 111222
      DB<5> print "$1 $2 $3 $4 $5 $6 $7 $8" if $c =~ /^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?/
    affiliate landing BTC home en ENG ENG
      DB<6>
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by Laurent_R
    Hmm, I counted wrongly the number of expressions to be matched!

    This works for both your strings:

    Code:
    ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?
    Example on the Perl debugger:

    Code:
      DB<4> print "$1 $2 $3 $4 $5 $6 $7 $8" if $d =~ /^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?/
    email promo EN.promo.BTC  en ENG ENG 111222
      DB<5> print "$1 $2 $3 $4 $5 $6 $7 $8" if $c =~ /^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)(?:_([^_]*))?/
    affiliate landing BTC home en ENG ENG
      DB<6>
    this is beautiful. working like a charm.

    thanks a lot.
    I guess thread can be closed

IMN logo majestic logo threadwatch logo seochat tools logo