#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0

    Which regex design is faster?


    Hi I have an search app. Searching is achieved by regexes.
    The performance is very important. Now I want to know which regex design is faster...

    Example:

    My key consists of a date and several attributes ( date: 20120101, attr1: A or B, attr2: C or D, attr3: E or F , attr4: letter code(BDAKOO) ). I can design the key like I want..

    1. approach (every key has the same length)
    -20120101BDFBDAKOO
    -20120909ACFDOOOKA
    -20121224ADEAAAAAO

    The user should be able to search after every attr. (f.e year= 2012, month= dec, attr1: B). Which regex is faster? REGXP(^201212\d{2}B\S{1}\S{1}\S{6}) or
    REGEXP(^201212\d{2}B.*) I would say second.

    2. approach (key has marks, attr 1 has mark 'a', attr2 has mark 'b', etc...)
    -20120101aBbDcFdBDAKOO
    -20120909aAbCcFdFDOOKA
    -20121224aAcDcEdAAAAAO

    I think the generation of a regex with this approach is more simple.
    REGEXP"201212.*aB.*";

    Do you know which approach would be the best(fastest) and which key design would be the best. The regex matching is RE2.

    Thank you guys
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,745
    Rep Power
    9397
    Why are you stuffing multiple pieces of data into a single field? As you've already noticed it makes searching a PITA. Making the key composite like that is fine but then you still don't search on it directly.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0
    Because I work with a very big amount of data (millions of entrys). I cannot use a realational sql database like MySql. I have to use a NoSql DataBase like Google Bigtable. It is a big hashtable with key and value.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0
    Does nobody has an advice ?
  8. #5
  9. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Originally Posted by regexonimoo
    Because I work with a very big amount of data (millions of entrys). I cannot use a realational sql database like MySql. I have to use a NoSql DataBase like Google Bigtable. It is a big hashtable with key and value.
    Even still, regular expressions aren't a particularly good way to perform searches. You should build appropriate indices instead (the structure of these will depend on exactly what type of searches you allow).

    I haven't used RE2 personally so can't really answer your question. You might be best off testing out some benchmarks yourself.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0
    Thank you for your response!

    I concede the point with the regex to you, but can you give me an example.
    Every attribute should be searchable and there are millions of entries and every entry consists of 7 attributes.

IMN logo majestic logo threadwatch logo seochat tools logo