#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2010
    Posts
    162
    Rep Power
    62

    MySQL best practice IN vs multi-row identifier?


    I always seems to have trouble making good titles...

    Which of the following would be a better idea. The project implements a support section, and all the articles are categorized. This is option 1 (my current config):

    Code:
    CREATE TABLE IF NOT EXISTS `support_categories` (
      `id` varchar(255) NOT NULL,
      `category_name` varchar(255) NOT NULL,
      `articles` varchar(255) NOT NULL,
      `order` float(3,1) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    Code:
    CREATE TABLE IF NOT EXISTS `support_articles` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `title` varchar(255) NOT NULL,
      `body` text NOT NULL,
      `updated` int(11) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ;
    The 'articles' column will contain a comma-delimited list of 'id's (pointing to 'support_articles'. Calling the category would write a query like

    'SELECT * FROM articles WHERE id IN(8,17,26,42,50);

    From a usability standpoint, I can have one row and only one row for each article, then if an article falls under multiple categories, I can just list it in multiple 'category' rows.

    Option two:

    Code:
    CREATE TABLE IF NOT EXISTS `support_categories` (
      `id` varchar(255) NOT NULL,
      `category_name` varchar(255) NOT NULL,
      `order` float(3,1) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    Code:
    CREATE TABLE IF NOT EXISTS `support_articles` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `category` varchar(255) NOT NULL,
      `title` varchar(255) NOT NULL,
      `body` text NOT NULL,
      `updated` int(11) NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ;
    Code:
    ALTER TABLE `support_articles` ADD INDEX ( `category` )
    From this, the category is selected and every article with that category will be selected. For this method, a row will need to be added each time an article falls under another category.

    Is there a reason one is significantly better than the other? Or is it more of a 'which would you prefer to maintain?' type of question.
    Last edited by ntomsheck; September 6th, 2012 at 02:23 PM.
  2. #2
  3. Hockey face
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    St. Catharines, Canada
    Posts
    8,144
    Rep Power
    1316
    Any time you are storing comma delimited values you are breaking FIRST NORMAL FORM of database normalization.

    you need a separate table for categories and a table for article categories, with a new row for each category an article can belong to.

    Simple example

    articles
    articleID, articlebody, etc

    categories
    categoryID, categoryname

    article_categories
    articleid, categoryid

    a new row in this table for every category for an article. If it belongs in five categories then this table has five rowss.

    Comments on this post

    • SimonJM agrees : Gah, sniped! ^_^
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,400
    Rep Power
    1688
    If you ever find yourself thinking, "gee, it'd be a good idea to put multiple values in this single column" do yourself a favour and poke yourself with a spork until the feeling passes.

    I'd like to go for a third option ...
    You have a categories table and an articles table. You should also have a category_article table which has the primary keys of the categories and articles table, acting a a link between them.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  6. #4
  7. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,026
    Rep Power
    4210
    Originally Posted by ntomsheck
    Is there a reason one is significantly better than the other?
    the 2nd method is far superior

    the first method requires two queries to get a category's articles -- one to retrieve the category, then a second to retrieve its articles

    even more complex is the query you need to write if you want to list articles under more than one category

    furthermore, you cannot easily find all the categories that an article belongs to, without doing a table scan of the categories table
    r937.com | rudy.ca
    please visit Simply SQL and buy my book
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2010
    Posts
    162
    Rep Power
    62
    Originally Posted by SimonJM
    If you ever find yourself thinking, "gee, it'd be a good idea to put multiple values in this single column" do yourself a favour and poke yourself with a spork until the feeling passes.
    Hahahaha.


    That's why I love posting here. I never thought of adding a linking table. Thanks!

IMN logo majestic logo threadwatch logo seochat tools logo