#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Posts
    6
    Rep Power
    0

    How can i optimize this ORDER BY query


    Hi!

    I have a question on how to speed up the query below which selects all entries and then DESC the date creation order.

    Leaving the order by out leaves this query with
    0.0073 sec on 500 entries which is ok but once i add the order by it hits 0.1116 sec

    I dont have any composite index, just single index on date for instance. Would i benefit from a outer query somehow?

    PHP Code:
    SELECT FROM `CEM_cicoLEFT JOIN (SELECT FROM CEM_description GROUP BY `TICKET_ID`) AS jdk ON CEM_cico.CASE_ID=jdk.TICKET_ID WHERE (STATUS!='Resolved') OR (STATUS='Resolved' AND HASMAIL='UNREAD'ORDER BY CEM_cico.DATE DESC 
    Thank you
  2. #2
  3. Lord of the Dance
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2003
    Posts
    3,575
    Rep Power
    1906
    What is the result if you run an EXPLAIN SELECT on the query?
    Both with and without the ORDER BY.

    Some comments about the SELECT and GROUP BY:
    It is better to specify each column instead of using SELECT *; especially when you have a group by field as it is not clear (at least from a human point of view) what you want to return.

    When using GROUP BY, column to be returned should be either used in an aggregate function or be part of the GROUP BY list.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Posts
    6
    Rep Power
    0
    Hi, thanks!

    It must be something i am doing wrong because simply doing

    SELECT *
    FROM `CEM_cico`
    ORDER BY `DATE` DESC

    takes 0.0015 sec

    I would like to have this view/table as a start to do the left join operation (see my orginal SQL query) on rahter than the default table (Which has the date order in reverse)

    For the explain command see below (sorry it lost it table format, can i keep it in this js editor?). The only difference i can see is that ORDER by uses "Using where; Using temporary; Using filesort" on extra.

    Without ORDER BY:

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra
    1
    PRIMARY
    CEM_cico
    ALL
    STATUS
    NULL
    NULL
    NULL
    697
    Using where
    1
    PRIMARY
    <derived2>
    ALL
    NULL
    NULL
    NULL
    NULL
    162
    2
    DERIVED
    CEM_description
    index
    NULL
    TICKET_ID
    767
    NULL
    124


    and with order by

    id
    select_type
    table
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra
    1
    PRIMARY
    CEM_cico
    ALL
    STATUS
    NULL
    NULL
    NULL
    697
    Using where; Using temporary; Using filesort
    1
    PRIMARY
    <derived2>
    ALL
    NULL
    NULL
    NULL
    NULL
    162
    2
    DERIVED
    CEM_description
    index
    NULL
    TICKET_ID
    767
    NULL
    124

    Originally Posted by MrFujin
    What is the result if you run an EXPLAIN SELECT on the query?
    Both with and without the ORDER BY.

    Some comments about the SELECT and GROUP BY:
    It is better to specify each column instead of using SELECT *; especially when you have a group by field as it is not clear (at least from a human point of view) what you want to return.

    When using GROUP BY, column to be returned should be either used in an aggregate function or be part of the GROUP BY list.
  6. #4
  7. Lord of the Dance
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2003
    Posts
    3,575
    Rep Power
    1906
    As you saw, it is using filesort, which mean it didn't t used any index for the sorting.

    Can you specify which of the fields have indexes in the tables?
    It looks like the date field doesn't have an index.
    The fields to be looked at are those participating in the conditions:
    CEM_cico.CASE_ID, jdk.TICKET_ID, STATUS, HASMAIL and CEM_cico.DATE

    I will also suggest you take a look at "rewriting" the SELECT *, as I recommended in my last post.
    What exactly do you want to return with the GROUP BY?

    About the indention/text-format:
    The PHP tag you used in first post will keep the indention.
    But you will have to copy the text into the textfield, mark it and then select the button (when in advanced mode/form).
    It will not be saved if you press the button first and then insert the text into the appearing pop-up text-fields.

    Beside PHP, there are also code and highlight.
    you can read more of the possibility at the vB code list

    Example using highlight:
    sql Code:
     
    SELECT * 
    FROM `CEM_cico` 
    LEFT JOIN 
    	(	SELECT * 
    		FROM CEM_description 
    		GROUP BY `TICKET_ID`
    	) AS jdk ON CEM_cico.CASE_ID=jdk.TICKET_ID 
    WHERE (STATUS!='Resolved') 
    	OR (STATUS='Resolved' AND HASMAIL='UNREAD') 
    ORDER BY CEM_cico.DATE DESC
    Last edited by MrFujin; February 3rd, 2013 at 11:45 AM.
  8. #5
  9. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,118
    Rep Power
    4274
    Originally Posted by Linda_swe
    ... (SELECT * FROM CEM_description GROUP BY `TICKET_ID`) ...
    this doesn't make sense, could you explain why you're using GROUP BY please
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Posts
    6
    Rep Power
    0
    MrFujin and r937,

    Thanks for your input. I will check and test accordingly.

    DATE is an index as well as STATUS, CASE_ID,HASMAIL was not an index i see, will add that. DATE is a timestamp field but the strange thing is that it keeps using filesort!

    As for the GROUP BY, good question. What i would like to achive is that in CEM_description table there is three columns:

    TICKET_ID, DESCRIPTION, AND DATE (note there are two dates field in play, both in this table and in the other one).

    I would simply like join the DESCRIPTION field onto the CASE_ID=TICKET_ID. CASE_ID is the field in the first table and TICKET_ID in the second, they shall be equal.

    So i simplified it down to remove the GROUP BY altogheter it made the speed even worse?

    PHP Code:
    SELECT *
    FROM `CEM_cico
    LEFT JOIN `CEM_description`
    ON CEM_cico.CASE_ID=CEM_description.TICKET_ID
    WHERE 
    (STATUS!='Resolved') OR (STATUS='Resolved' AND HASMAIL='UNREAD'ORDER BY CEM_cico.DATE DESC 

    So guiess the SELECT * in the GROUP BY is not good written, it should be SELECT DESCRIPTION.

    Is this bad indexed on my CEM_cico table ? i mean the Cardinality, the sequence etc. Should i use a composed index rather than many single index ?

    PHP Code:
     Edit     Drop    PRIMARY    BTREE    Yes    No    CASE_ID    697    A    No    
     Edit     Drop    CASE_ID    BTREE    No    No    CASE_ID    697    A    No    
     Edit     Drop    CUSTOMER_NAME    BTREE    No    No    CUSTOMER_NAME    99    A    No    
     Edit     Drop    GROUP_ID    BTREE    No    No    GROUP_ID    9    A    No    
     Edit     Drop    STATUS    BTREE    No    No    STATUS    12    A    No    
     Edit     Drop    DATE    BTREE    No    No    DATE    63    A    No 
    and other other table has

    PHP Code:
     Edit     Drop    TICKET_ID    BTREE    No    No    TICKET_ID    124    A    No 
    The CEM_cico table, is large having 60 columns, CEM_decription having 3.

    UPDATE. So i added an index also in HASMAIL and the qyery time went down to
    0.0978 sec VS 0.0036 sec without ORDER BY

    So an improvement indeed, still the ORDER BY slows it down many times.I know that order by has it limits and that is slows down but this much ? I might be better off sorting the data set (date) on the client in javascript then.

    I was thinking i could use something like (which might help the order by issue or?):

    PHP Code:
    SELECT FROM SELECT 
    FROM `CEM_cico`  
    LEFT JOIN `CEM_description` as jdk
    ON CEM_cico
    .CASE_ID=jdk.TICKET_ID 
    WHERE 
    (STATUS!='Resolved') OR (STATUS='Resolved' AND HASMAIL='UNREAD')) ORDER BY CEM_cico.DATE DESC 
    But it says: Every derived table must have its own alias

    Originally Posted by r937
    this doesn't make sense, could you explain why you're using GROUP BY please
  12. #7
  13. Lord of the Dance
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2003
    Posts
    3,575
    Rep Power
    1906
    The first query in above post is correct syntax-wise, but you didn't really said whether it does return the correct result.
    The difference with this one is that it will return ALL rows from CEM_description, compared to the previous where it (supposedly) only return one row for each TICKET_ID.
    This is also the reason why you saw it "made the speed even worse" and why you should try to "optimize" the query to only return the required information.

    So the question is, what kind of result do you want from CEM_description?
    Should it return all or only the most recent/oldest record by date?

    It could be interesting to see the explain for the new query you made:
    sql Code:
    SELECT *
    FROM `CEM_cico` 
    LEFT JOIN `CEM_description` ON CEM_cico.CASE_ID=CEM_description.TICKET_ID
    WHERE (STATUS!='Resolved') 
    	OR (STATUS='Resolved' AND HASMAIL='UNREAD') 
    ORDER BY CEM_cico.DATE DESC

    Did you made any change to the index?
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Posts
    6
    Rep Power
    0
    Ah now i remember why i did the GROUP BY statement in the first query (which is the only one which gives correct result)

    The second one i came up with does not and that is becuase:

    CEM_description table contains description records (1-to many) that relates to a (1) specific case_id in the first table.
    This is the foundation of relational db right ? A key that relates to another key in another table.

    So it can be say 500 entries in the first table.Then in CEM_description it can be


    1 my first text
    1 another text
    1 third text

    where 1 is TICKET_ID,All those relates to the first ticket/case CASE_ID.

    What i really want, is to join one field, the description, from the CEM_description table first entry:

    1 my first text

    to the case_id table by doing the left join.Somehow that works with group by.It is equal to say

    PHP Code:
    SELECT FROM CEM_description where TICKET_ID='8' GROUP BY `TICKET_ID
    Which also gets the first entry out of say 3 entries added to ticket id 8.

    All in all there are 461 tickets in CEM_cico and the second query give me a 481 entries.It add upp more tickets than there even exists in total.

    So yes the first query gives correct result, and it also gives the first description entry in CEM_description added into the view.Just like i wanted.

    As for explain using the second query, it still shows
    Using where; Using temporary; Using filesort

    Any idea ? Mabey i can optimize the group by instead, i just want the first description entry added to each TICKET_ID.

    Does it make sense at all ? =)

    It should be noted that i can optimize it by simple not doing SELECT * in the first statment but choose the columns i actually need STILL i dont understand how ORDER BY DESC can take so much time doing its thing in the new view, it is beyond me.
    I have read on the net that the order/sequence of the index might influence the speed ? So DATE index needs a special sequence order. Is that true ?
    Also how come that ORDER BY DESC on a single table is fast but doing it in the result view is not! It seems to be the temptable which is the bootleneck not the filesort method (rather than index).Is it really, syntax wise, impossible to order by before join another table ? That would have solved the speed issue wouldnt it ?

    And, oh thanks for your understanding and help! I remember i got help here last time as well =)

    /L
    Originally Posted by MrFujin
    The first query in above post is correct syntax-wise, but you didn't really said whether it does return the correct result.
    The difference with this one is that it will return ALL rows from CEM_description, compared to the previous where it (supposedly) only return one row for each TICKET_ID.
    This is also the reason why you saw it "made the speed even worse" and why you should try to "optimize" the query to only return the required information.

    So the question is, what kind of result do you want from CEM_description?
    Should it return all or only the most recent/oldest record by date?

    It could be interesting to see the explain for the new query you made:
    sql Code:
    SELECT *
    FROM `CEM_cico` 
    LEFT JOIN `CEM_description` ON CEM_cico.CASE_ID=CEM_description.TICKET_ID
    WHERE (STATUS!='Resolved') 
    	OR (STATUS='Resolved' AND HASMAIL='UNREAD') 
    ORDER BY CEM_cico.DATE DESC

    Did you made any change to the index?
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2009
    Posts
    6
    Rep Power
    0
    Now i have made several queries that i initally though would solve the problem, becuase they 1) removed the tempview and filesort method and uses index instead


    PHP Code:
    SELECT * (SELECT *
    FROM `CEM_cicoORDER BY CEM_cico.DATE DESC) AS first
    LEFT JOIN 
    (SELECT FROM CEM_description GROUP BY `TICKET_ID`) AS jdk 
    ON first
    .CASE_ID=jdk.TICKET_ID WHERE (STATUS!='Resolved') OR (STATUS='Resolved' AND HASMAIL='UNREAD'
    Or i can even ORDER BY CEM_cico.CASE_ID becuase ticket id is auto_increament so it will get sorted correctly as well.

    Running explain, it used where on the dervied tables but to my suprise the time is nearly identical!

    The description field is a TEXT field. I simply concludes that this is the best it can do at this point. It seems silly to me that you can get the same result by removing the ORDER by clause but in ASC in so much faster than DESC but i have tried everything now.



    Originally Posted by MrFujin
    The first query in above post is correct syntax-wise, but you didn't really said whether it does return the correct result.
    The difference with this one is that it will return ALL rows from CEM_description, compared to the previous where it (supposedly) only return one row for each TICKET_ID.
    This is also the reason why you saw it "made the speed even worse" and why you should try to "optimize" the query to only return the required information.

    So the question is, what kind of result do you want from CEM_description?
    Should it return all or only the most recent/oldest record by date?

    It could be interesting to see the explain for the new query you made:
    sql Code:
    SELECT *
    FROM `CEM_cico` 
    LEFT JOIN `CEM_description` ON CEM_cico.CASE_ID=CEM_description.TICKET_ID
    WHERE (STATUS!='Resolved') 
    	OR (STATUS='Resolved' AND HASMAIL='UNREAD') 
    ORDER BY CEM_cico.DATE DESC

    Did you made any change to the index?

IMN logo majestic logo threadwatch logo seochat tools logo