#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2011
    Location
    Zaragoza (Spain)
    Posts
    8
    Rep Power
    0

    Priority in WHERE statements


    Hello,

    I have a (quite big) historic table that stores information about "Users" table. A very brief example:

    Table "Users":
    - id (Int 11)
    - name (Varchar 225)

    Table "Datos":
    - user_id (Int 11)
    - value (Decimal 15,2)
    - created_at (Datetime)
    - INDEX user_id
    - INDEX created_at

    The "Datos" table, as I said, is quite bit (about 1Gb and 2Gb and keeps growing on) as it stores minute-to-minute values about the users (currently it stores more than one year of information).

    I use the "Datos" table to generate graphic charts so, the most common query I do is:

    SELECT Datos.value FROM Datos
    WHERE Datos.user_id = <some id>
    AND Datos.created_at BETWEEN "<some date>" AND "<another date>";

    (I've simplified the query for post purposes)

    I've noticed that if I query the last 24 hours of data of a user that has been added to the system a couple days ago, it takes much less time to execute the query than if I request the same last 24 hours of a user added six months ago.

    My question is: Does the indexes or the order of the statements in the WHERE clause matter in the order the data are filtered in the table? (First by user_id and then by created_at or viceversa).

    Thanks and best regards.
    Jose Garcia
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Oct 2003
    Location
    Germany
    Posts
    2,777
    Rep Power
    348
    Originally Posted by asejua
    My question is: Does the indexes or the order of the statements in the WHERE clause matter in the order the data are filtered in the table? (First by user_id and then by created_at or viceversa).
    In general, yes. The query optimizer will look at the data distribution to pick out the "best" index. Whether that is a good choice depends entirely on the optimizer itself and the "correctness" of the data distribution that it uses. Most DBMS update statistics about your data exactly for this purpose.

    Only the execution plan of the statements can show what is really going on in your case.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2011
    Location
    Zaragoza (Spain)
    Posts
    8
    Rep Power
    0
    I understand, so... Executing this:

    SELECT Datos.value FROM Datos
    WHERE Datos.user_id = <some id>
    AND Datos.created_at BETWEEN "<some date>" AND "<another date>";

    Won't differ in speed from executing this:

    SELECT Datos.value FROM Datos
    WHERE Datos.created_at BETWEEN "<some date>" AND "<another date>"
    AND Datos.user_id = <some id>;


    The Query Optimizer selects the best way to filter the query, isn't it?

    I know it is a very basic question but needed an explanation to be fully sure

    Thanks.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Oct 2003
    Location
    Germany
    Posts
    2,777
    Rep Power
    348
    That's correct. No decent query optimizer should be influenced by the order in which the conditions appear in the statement.

    There might be "tips" floating around in the internet which might tell the opposite, but they are usually for old optimizers that were "rule based" i.e. they were looking at the order of the joins and conditions.

    Modern optimizers are all(?) cost based nowadays and they are hardly every influenced by the order of the conditions.

    But that all depends on the DBMS you are using.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2011
    Location
    Zaragoza (Spain)
    Posts
    8
    Rep Power
    0
    I work with MySQL, using mainly InnoDB tables.

    Thanks for your answers.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Oct 2003
    Location
    Germany
    Posts
    2,777
    Rep Power
    348
    Originally Posted by asejua
    I work with MySQL, using mainly InnoDB tables.
    Which unfortunately does not have a very good optimizer...

IMN logo majestic logo threadwatch logo seochat tools logo