#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0

    Question Database Search Structure


    I have a database with over 400 rows, which grows by about three rows weekly. I need to search four columns in that database with a partial text search, which rules out fulltext.

    So, in regards to MySQL and PHP overhead, which of the following would be faster, or if you have a better structure that would work, please share.

    Structure one:
    List each keyword in a row and an index next to it. This would create multiple rows for the same database. Remember, partial text. The PHP would gather those indexes from the results (Using an associative array), count what came back (To keep track of which one had more keywords), then sort that array by value followed by another query to get the full data for all of those indexes.

    Structure two:
    List each keyword in a row and a list of indexes next to it. This would mean one row for each keyword. The PHP would gather the results by exploding the indexes for each row, again, use as associative array to keep track of which is more popular, sort the array and then run another query to get the full data for all of those indexes.

    What do you think? Or is going column by column using LIKE statement just as bad as those? (I would think not)
  2. #2
  3. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,145
    Rep Power
    4274
    400 rows? with 150 new rows per year?

    table fits in memory, so don't do anything
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0
    Fair enough but eventually it will exceed that. And besides, if there's a better way, I'd rather learn that leave it inefficient, even though we're talking in milliseconds.
  6. #4
  7. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    I agree with Rudy
    The table is just so extremely small that the overhead of any more elaborate algorithm you think of will probably eat up any possible benefits.

    The only thing I can think of to possibly speed things up for you is to cache the entire structure in PHP to avoid the overhead of the connection to MySQL.
    But I'm not even sure if that is faster or not.

    Or you can run as is for about 10 years and come back here when you reach 2000 rows and ask again.
    /Stefan
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0
    Fair enough on the overhead. So, not changing it, but are either if those a good option, and if so, which one? (Might as well get an answer now, that way I have it in 10 years. )
  10. #6
  11. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,145
    Rep Power
    4274
    Originally Posted by meesa
    are either if those a good option
    i don't think so

    i couldn't understand what you meant by "an index next to it" or "a list of indexes next to it"
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0
    word | 1
    word | 3
    word | 5
    dwor | 3
    dwor | 6
    dwor | 8

    OR

    word | 1 3 5
    dwor | 3 6 8
  14. #8
  15. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,145
    Rep Power
    4274
    okay, presumably those numbers are foreign keys to your main table? (i.e. not "indexes")

    the first one is far better

    the second one violates first normal form
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0
    What is the first normal form?

    And could you possibly provide an explanation as to why it's bad? My database could have essentially 5 nearly identical columns 20 times, which would make for 100 rows instead of 5. To me searching 5 is better than 100.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,442
    Rep Power
    1688
    First normal form is about removing duplicates from your tables.

    Where it wins out in your example is that by having each occurence of the data being in a separate table in it's own discrete row is that the data can be queried directly via an index. Having the data concatenated into a set of columns means that whilst you can quickly find a value at the start of the column "dwor | 3 6 8" - the 3 could be found quickly as it would form the initial part of an index. Not so the 6 or 8. Every row would need to be accessed by the database and every column parsed to see if the 6 or 8 was present.

    The searching of/in those 5 columns rapidly becomes less rapid!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  20. #11
  21. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,145
    Rep Power
    4274
    Originally Posted by meesa
    What is the first normal form?
    google is your friend

    first normal form means each column is atomic

    "removing duplicates" is an (inaccurate) oversimplification of third normal form


    Originally Posted by meesa
    To me searching 5 is better than 100.
    not if it requires a table scan

    again, google is your friend


    edit to add: but your table scan will take place in memory as mentioned earlier, so don't sweat it

    Last edited by r937; January 23rd, 2014 at 08:34 PM.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    49
    Rep Power
    0
    Fair enough. Thanks for all the info. I think I'll try to pick up an MySQL optimization book on my next break so I can understand it even better.

    Thanks for the feedback!

IMN logo majestic logo threadwatch logo seochat tools logo