#1
  1. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Sep 2006
    Posts
    2,122
    Rep Power
    539

    Order of multi-key indexes


    I have the following table. Only idPublic is exposed to the user and not id. accountsId is known by the server via a session (or maybe jwt). The same idPublic can be used by different accounts, but not by the same account. A typical WHERE clause might be WHERE idPublic=5 AND idAccounts=123;

    • id (INT PK)
    • idPublic (INT)
    • accountsId (INT)
    • otherStuff


    I wish to place a unique index to prevent duplicate idPublic's for a given account, and also wish that index to be used to improve performance.

    Am I correct in saying:
    1. If there are many accounts with have few records in this table, I should use only UNIQUE INDEX somename (`accountsId` ASC, `idPublic` ASC)?
    2. If there are few accounts with have many records in this table, I should use only UNIQUE INDEX somename (`idPublic` ASC, `accountsId` ASC)?
    3. If there are many accounts with have many records in this table, I should have one of each? Does it matter if they are both UNIQUE, or should one be a normal INDEX?


    Thanks
  2. #2
  3. Wiser? Not exactly.
    Devshed God 2nd Plane (6000 - 6499 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    6,276
    Rep Power
    4193
    You should consider the type of querying you will be doing rather than the data you will have.

    If your query will always use both columns in the where clause then I don't think it matters too much which order they are specified in (though I could be wrong).

    If your queries will frequently use only one of the two columns, put that column first. As an analogy imagine your index contains a value for each row consisting of the given column values concatenated in the order specified. Matching against an index is then essentially a prefix match. The values specified in your where clause are concatenated in the same order as the index up to the first unknown then mysql scans the index for items matching that index. As such you want your optional columns specified last.

    The data amounts/layout only comes into play if you end up having multiple indexes that could be used. Mysql will try and use which ever one it deems to be most restrictive based on the index cardinality. For example if you had 1000 total rows in your table and two separate indexes IX_idPublic and IX_accountsId:

    IX_idPublic with a cardinality of 200 would mean any given idPublic value should reduce that 1000 rows to roughly 5 (assuming uniform distribution).
    IX_accountsId with a cardinality of 500 would mean any given accountsId value should reduce that 1000 rows to roughly 2 (assuming uniform distribution).

    So if you did a query with WHERE idPublic=11 AND accountsId=12 mysql would choose to use IX_accountsId to immediately take the row count from 1000 to 2, then it'd test each of those row's idPublic value to see if it matched.


    There's more involved in indexes than the above, but taking the above into consideration should help determine most of your indexing needs.
    Recycle your old CD's



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Sep 2006
    Posts
    2,122
    Rep Power
    539
    Hi Kicken, Appreciate the help. I do, however, believe that the order makes a big difference even when both are being used. I've seen multiple people using the phone book (what are those?) analogy a few times. Maybe not, but worth knowing for sure.
  6. #4
  7. Wiser? Not exactly.
    Devshed God 2nd Plane (6000 - 6499 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    6,276
    Rep Power
    4193
    I have no idea if it would make a difference if when using both columns. If you have a large data set you could probably run some tests to see. When I am adding indexes to a large table and have some choices I'll usually do that. Add one index, test a few queries, then remove it and add another.

    Assuming it does make a difference, I'd probably order them based on the cardinality of each column with the highest first.
    Recycle your old CD's



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Sep 2006
    Posts
    2,122
    Rep Power
    539
    I think you might be correct that it will not matter if both columns are used in the WHERE, but am not positive. Even if testing shows it makes a difference, still would be good to know why it makes a difference.

    I am sure I also need to know what other queries I might be doing. For instance, I expect index idPublic, accountsId will not be used for the following:
    Code:
    SELECT id, idPublic, otherStuff FROM t WHERE accountsId IN (6,8,9);
    Regarding cardinality, specifying the highest first makes intuitive sense.
  10. #6
  11. Backwards Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,914
    Rep Power
    9646
    AFAIK it's better to construct the index so that the first column whittles down the possible results most. So if there are many accounts with few records, put the account first.
    And having two keys, one on A+B and the other on B+A, is redundant: only one of those will ever be used with a query. A unique index on A+B then a regular index on B covers all possible situations.
  12. #7
  13. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,795
    Rep Power
    4331
    Originally Posted by requinix
    And having two keys, one on A+B and the other on B+A, is redundant: only one of those will ever be used with a query. A unique index on A+B then a regular index on B covers all possible situations.
    beg to disagree... B+A is a covering index (and hence useful) if only value(s) for B are given in WHERE clause, but only A required to return in SELECT

    Comments on this post

    • requinix agrees
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  14. #8
  15. Backwards Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,914
    Rep Power
    9646
    Ah, I was focused too much on the indexes and hadn't even considered that as a use case. Good point.

IMN logo majestic logo threadwatch logo seochat tools logo