Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0

    PAINFULLY slow inner join query


    Hi guys,

    I have this query:

    Code:
    SELECT SQL_CALC_FOUND_ROWS l.* FROM fmd_listings l INNER JOIN fmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=181 AND l.status='active' ORDER BY date DESC,priority DESC LIMIT 0,10
    fmd_listings has 550,000 rows - fmd_listings_categories has around 250 rows.

    The query takes 11 seconds on average to complete, and that's on a dedicated MySQL 5.5 server with 4x i7 equiv CPU's, 8GB Ram...etc

    A straight forward SELECT * FROM fmd_listings WHERE cat_id=181 takes milliseconds, so what's causing the massive overhead?

    thx in advance for any advice.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    74
    Rep Power
    18
    Originally Posted by jimxms
    Hi guys,

    I have this query:

    Code:
    SELECT SQL_CALC_FOUND_ROWS l.* FROM fmd_listings l INNER JOIN fmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=181 AND l.status='active' ORDER BY date DESC,priority DESC LIMIT 0,10
    fmd_listings has 550,000 rows - fmd_listings_categories has around 250 rows.

    The query takes 11 seconds on average to complete, and that's on a dedicated MySQL 5.5 server with 4x i7 equiv CPU's, 8GB Ram...etc

    A straight forward SELECT * FROM fmd_listings WHERE cat_id=181 takes milliseconds, so what's causing the massive overhead?

    thx in advance for any advice.
    Does the foreign key lc.list_id have an index?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by Aurum84
    Does the foreign key lc.list_id have an index?
    Yes it does. Here is a list of all the indexes on both the listings and listing_categories tables

    hxxp://picxy.com/gallery/EQ

    Edit: Im new so cant post proper urls
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    74
    Rep Power
    18
    Originally Posted by jimxms
    Yes it does. Here is a list of all the indexes on both the listings and listing_categories tables
    Where is lc.list_id on that screenshot?

    Did you run an explain command on your query?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Here's the index I assumed would cover it:

    Code:
     PRIMARY	BTREE	Yes	No	list_id		A	No
    I dont really understand the output of EXPLAIN, but here it is:

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
    1 	SIMPLE 	lc 	ref 	PRIMARY,cat_id_2 	cat_id_2 	4 	const 	42713 	Using index; Using temporary; Using filesort
    1 	SIMPLE 	l 	eq_ref 	PRIMARY,featured,impressions 	PRIMARY 	4 	xxx_pma.lc.list_id 	1 	Using where
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    You are doing a good job - I see that it uses short keys in both the tables, and the JOIN type if eq_ref (well-optimized semijoin).

    The problem is that it is using an on-disk file and a temp table (Using filesort, Temporary table) with 40,000+ rows.

    Try dropping the ORDER BY: I think that this problem should go away.

    To use the ORDER BY, you should create a proper index - and maybe FORCE MySQL to use it.

    Let me know.

    PS- if you need to show more EXPLAINs, may I ask you to save them in this page? It makes them more readable:
    https://mariadb.org/explain_analyzer/analyze/
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by f_razzoli
    You are doing a good job - I see that it uses short keys in both the tables, and the JOIN type if eq_ref (well-optimized semijoin).

    The problem is that it is using an on-disk file and a temp table (Using filesort, Temporary table) with 40,000+ rows.

    Try dropping the ORDER BY: I think that this problem should go away.

    To use the ORDER BY, you should create a proper index - and maybe FORCE MySQL to use it.

    Let me know.
    Yes you're right. Removing the ORDER BY statement drops the query time down to 1.7secs. This is obviously much better so thanks for the pointer, but I do wonder if this is as good as it can get?

    For me to roll out this database into production i really need the whole query to complete within tenths of a sec.

    Removing the ordering from the equation, do you see any ways of further improving the execution time of the query?

    Thanks for your time.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    I didn't notice something important: the first read table is lc, and you do a point search on it, but no index is used for that condition.

    I don't know what se you are using, so I'll assume it's InnoDB. If it's MyISAM, index are made in a different way.

    I suggest that first you add a UNIQUE index on cat_id, and FORCE its use.

    Also, you can try on l to include the status field in the primary key, as the second field. BEFORE DOING SO, if you can, turn status in a ENUM field. If you can't, at least specify ASCII as character set.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by f_razzoli
    I didn't notice something important: the first read table is lc, and you do a point search on it, but no index is used for that condition.

    I don't know what se you are using, so I'll assume it's InnoDB. If it's MyISAM, index are made in a different way.

    I suggest that first you add a UNIQUE index on cat_id, and FORCE its use.

    Also, you can try on l to include the status field in the primary key, as the second field. BEFORE DOING SO, if you can, turn status in a ENUM field. If you can't, at least specify ASCII as character set.
    Its actually MyISAM i'm running on at the moment. I have been tempted to migrate everything over to InnoDB but I recall reading that the use of COUNT is ridiculously slow on large data sets?

    I was even toying with the idea of moving to MySQL 5.6 last night as I've heard that it has large performance improvements (especially InnoDB). I know its only in RC, but I don't mind living on the knife edge for the sake of performance.

    The status field is ENUM at the moment so I'll try putting those indexes on as you suggested.

    Thanks again for all your help so far.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    Yes, MyISAM always knows how many rows are stored in a table, because it doesn't have to handle transactions. But it is slower for other things and has some limitations I don't like (no foreign keys, no transactions).

    The advices I wrote before are only valid for InnoDB. The main difference is that InnoDB stores the primary key's value at the end of all indexes. So, if you define an index on name, actually it is on (name,id). Remember: at the end, not at the beginning. The index order is relevant.

    So, for example, one of my advices with MyISAM turns to: make an index on (cat_id,id).
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    I forgot:
    MySQL 5.6 is not stable yet. And its performance enhancements affect subqueries and some non-optimized queries.

    Maybe the forks MariaDB and Percona Server have more enhancements in the InnoDB engine, but don't rely on it: work hard to optimize your query
  22. #12
  23. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Some suggestions:
    1. Move the lc.cat_id=181 condition to the join condition for that table. It won't do that much difference in performance right now but it makes the query easier to read and with more complex queries it makes it much easier for the optimizer to figure out a good execution plan.
    Code:
    SELECT SQL_CALC_FOUND_ROWS
      l.*
    FROM
      fmd_listings l
    INNER
      JOIN fmd_listings_categories lc
        ON l.id=lc.list_id and lc.cat_id=181
    WHERE
      l.status='active'
    ORDER
      BY
        date DESC
        ,priority DESC
    LIMIT 0,10
    2. Do you really need l.* or is that just lazy (hey I do it sometimes also), the reason is that the temporary table created by the join before the filesort is run will contain all these fields and sorting 300MB takes much longer time than sorting 3MB, so keeping down the size of the data that you want returned can greatly reduce query time.

    3. Is the second sort field "priority" also in the table fmd_listings?
    And since you have a LIMIT 10 in your query then you can add an index that makes it possible for MySQL to use it to retrieve the rows in order instead of sorting them afterwards:
    Code:
    ALTER TABLE fmd_listings ADD INDEX fl_ix_status_date_priority (id, status, date, priority);
    But to make it work you would probably have to skip the SQL_CALC_FOUND_ROWS in the query. Because MySQL has an LIMIT optimization that ends the query execution as soon as it has returned the requested amount of rows.
    So if you try this query with that index:
    Code:
    SELECT
      l.something
      ,l.something
      ,l.something
      ...
    FROM
      fmd_listings l
    INNER
      JOIN fmd_listings_categories lc
        ON l.id=lc.list_id
          and lc.cat_id=181
    WHERE
      l.status='active'
    ORDER
      BY
        date DESC
        ,priority DESC
    LIMIT 0,10

    4. And then if you want to know total nr of rows found you can just throw another query with just count(*), since you don't want to return any data and it can all take place in the indexes it should be pretty fast:
    Code:
    SELECT
      count(*)
    FROM
      fmd_listings l
    INNER
      JOIN fmd_listings_categories lc
        ON l.id=lc.list_id and lc.cat_id=181
    WHERE
      l.status='active'
    /Stefan
  24. #13
  25. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by sr
    Some suggestions:
    1. Move the lc.cat_id=181 condition to the join condition for that table. It won't do that much difference in performance right now but it makes the query easier to read and with more complex queries it makes it much easier for the optimizer to figure out a good execution plan.
    Code:
    SELECT SQL_CALC_FOUND_ROWS
      l.*
    FROM
      fmd_listings l
    INNER
      JOIN fmd_listings_categories lc
        ON l.id=lc.list_id and lc.cat_id=181
    WHERE
      l.status='active'
    ORDER
      BY
        date DESC
        ,priority DESC
    LIMIT 0,10
    You say that, but it just reduced my query time from 9 seconds to 0.0014 seconds!!!! WTF that's voodoo right there!! I can't see any difference in the result returned. How can this be??

    Originally Posted by sr
    2. Do you really need l.* or is that just lazy (hey I do it sometimes also), the reason is that the temporary table created by the join before the filesort is run will contain all these fields and sorting 300MB takes much longer time than sorting 3MB, so keeping down the size of the data that you want returned can greatly reduce query time.
    Its a combination of lazyness and needing _most_ of the columns. However, I just run the query with every column but one (description - the big one) removed and the query is down to 0.0005 seconds. I'm really starting to love you!

    Originally Posted by sr
    3. Is the second sort field "priority" also in the table fmd_listings?
    And since you have a LIMIT 10 in your query then you can add an index that makes it possible for MySQL to use it to retrieve the rows in order instead of sorting them afterwards:
    Code:
    ALTER TABLE fmd_listings ADD INDEX fl_ix_status_date_priority (id, status, date, priority);
    But to make it work you would probably have to skip the SQL_CALC_FOUND_ROWS in the query. Because MySQL has an LIMIT optimization that ends the query execution as soon as it has returned the requested amount of rows.
    So if you try this query with that index:
    Code:
    SELECT
      l.something
      ,l.something
      ,l.something
      ...
    FROM
      fmd_listings l
    INNER
      JOIN fmd_listings_categories lc
        ON l.id=lc.list_id
          and lc.cat_id=181
    WHERE
      l.status='active'
    ORDER
      BY
        date DESC
        ,priority DESC
    LIMIT 0,10
    I'll give this a try as well, but the changes so far are more than impressive! I'm finding it hard to believe that those small changes could make such a difference
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Oh damn, I think I must have experienced some sort of query cache, as changing the cat_id to a different number brought the time back up to 15 seconds.

    I'll go back, re-read yout thread and try some of the suggestions.

    Thanks again
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Location
    Italy
    Posts
    36
    Rep Power
    2
    Moving conditions in the ON clause does the same thing, but that's just because the optimizer is smart If it would say what you say, it would read unneded rows from the left table.

    Oh, WTF, I forgot to tell you about the l.*! Did you remove it? Of course he's right, the more fields you read, the more time the query needs.

    But your problem was clear from the EXPLAIN output: MySQL reads 40,000+ rows from the right table. Using a good index will address this problem. Everything else should not have the priority for you.

    Can you post the current query, the current explain, and the current show index output?
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo