The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Databases
> MySQL Help
|
PAINFULLY slow inner join query
Discuss PAINFULLY slow inner join query in the MySQL Help forum on Dev Shed. PAINFULLY slow inner join query MySQL Help forum discussing administration, SQL syntax, and other MySQL-related topics. MySQL is an open-source relational database management system (RDBMS).
|
|
 |
|
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

January 23rd, 2013, 11:50 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
|
PAINFULLY slow inner join query
Hi guys,
I have this query:
Code:
SELECT SQL_CALC_FOUND_ROWS l.* FROM fmd_listings l INNER JOIN fmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=181 AND l.status='active' ORDER BY date DESC,priority DESC LIMIT 0,10
fmd_listings has 550,000 rows - fmd_listings_categories has around 250 rows.
The query takes 11 seconds on average to complete, and that's on a dedicated MySQL 5.5 server with 4x i7 equiv CPU's, 8GB Ram...etc
A straight forward SELECT * FROM fmd_listings WHERE cat_id=181 takes milliseconds, so what's causing the massive overhead?
thx in advance for any advice.
|

January 23rd, 2013, 11:55 AM
|
|
|
Quote: | Originally Posted by jimxms Hi guys,
I have this query:
Code:
SELECT SQL_CALC_FOUND_ROWS l.* FROM fmd_listings l INNER JOIN fmd_listings_categories lc ON l.id=lc.list_id WHERE lc.cat_id=181 AND l.status='active' ORDER BY date DESC,priority DESC LIMIT 0,10
fmd_listings has 550,000 rows - fmd_listings_categories has around 250 rows.
The query takes 11 seconds on average to complete, and that's on a dedicated MySQL 5.5 server with 4x i7 equiv CPU's, 8GB Ram...etc
A straight forward SELECT * FROM fmd_listings WHERE cat_id=181 takes milliseconds, so what's causing the massive overhead?
thx in advance for any advice. |
Does the foreign key lc.list_id have an index?
|

January 23rd, 2013, 12:02 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by Aurum84 Does the foreign key lc.list_id have an index? |
Yes it does. Here is a list of all the indexes on both the listings and listing_categories tables
hxxp://picxy.com/gallery/EQ
Edit: Im new so cant post proper urls
|

January 23rd, 2013, 12:26 PM
|
|
|
Quote: | Originally Posted by jimxms Yes it does. Here is a list of all the indexes on both the listings and listing_categories tables
|
Where is lc.list_id on that screenshot?
Did you run an explain command on your query?
|

January 23rd, 2013, 12:31 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Here's the index I assumed would cover it:
Code:
PRIMARY BTREE Yes No list_id A No
I dont really understand the output of EXPLAIN, but here it is:
Code:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE lc ref PRIMARY,cat_id_2 cat_id_2 4 const 42713 Using index; Using temporary; Using filesort
1 SIMPLE l eq_ref PRIMARY,featured,impressions PRIMARY 4 xxx_pma.lc.list_id 1 Using where
|

January 23rd, 2013, 08:13 PM
|
|
Contributing User
|
|
Join Date: Jan 2013
Location: Italy
Posts: 36
Time spent in forums: 4 h 37 m 48 sec
Reputation Power: 1
|
|
|
You are doing a good job - I see that it uses short keys in both the tables, and the JOIN type if eq_ref (well-optimized semijoin).
The problem is that it is using an on-disk file and a temp table (Using filesort, Temporary table) with 40,000+ rows.
Try dropping the ORDER BY: I think that this problem should go away.
To use the ORDER BY, you should create a proper index - and maybe FORCE MySQL to use it.
Let me know.
PS- if you need to show more EXPLAINs, may I ask you to save them in this page? It makes them more readable:
https://mariadb.org/explain_analyzer/analyze/
|

January 24th, 2013, 01:24 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by f_razzoli You are doing a good job - I see that it uses short keys in both the tables, and the JOIN type if eq_ref (well-optimized semijoin).
The problem is that it is using an on-disk file and a temp table (Using filesort, Temporary table) with 40,000+ rows.
Try dropping the ORDER BY: I think that this problem should go away.
To use the ORDER BY, you should create a proper index - and maybe FORCE MySQL to use it.
Let me know.
|
Yes you're right. Removing the ORDER BY statement drops the query time down to 1.7secs. This is obviously much better so thanks for the pointer, but I do wonder if this is as good as it can get?
For me to roll out this database into production i really need the whole query to complete within tenths of a sec.
Removing the ordering from the equation, do you see any ways of further improving the execution time of the query?
Thanks for your time.
|

January 24th, 2013, 02:22 AM
|
|
Contributing User
|
|
Join Date: Jan 2013
Location: Italy
Posts: 36
Time spent in forums: 4 h 37 m 48 sec
Reputation Power: 1
|
|
|
I didn't notice something important: the first read table is lc, and you do a point search on it, but no index is used for that condition.
I don't know what se you are using, so I'll assume it's InnoDB. If it's MyISAM, index are made in a different way.
I suggest that first you add a UNIQUE index on cat_id, and FORCE its use.
Also, you can try on l to include the status field in the primary key, as the second field. BEFORE DOING SO, if you can, turn status in a ENUM field. If you can't, at least specify ASCII as character set.
|

January 24th, 2013, 03:44 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by f_razzoli I didn't notice something important: the first read table is lc, and you do a point search on it, but no index is used for that condition.
I don't know what se you are using, so I'll assume it's InnoDB. If it's MyISAM, index are made in a different way.
I suggest that first you add a UNIQUE index on cat_id, and FORCE its use.
Also, you can try on l to include the status field in the primary key, as the second field. BEFORE DOING SO, if you can, turn status in a ENUM field. If you can't, at least specify ASCII as character set. |
Its actually MyISAM i'm running on at the moment. I have been tempted to migrate everything over to InnoDB but I recall reading that the use of COUNT is ridiculously slow on large data sets?
I was even toying with the idea of moving to MySQL 5.6 last night as I've heard that it has large performance improvements (especially InnoDB). I know its only in RC, but I don't mind living on the knife edge for the sake of performance.
The status field is ENUM at the moment so I'll try putting those indexes on as you suggested.
Thanks again for all your help so far.
|

January 24th, 2013, 04:54 AM
|
|
Contributing User
|
|
Join Date: Jan 2013
Location: Italy
Posts: 36
Time spent in forums: 4 h 37 m 48 sec
Reputation Power: 1
|
|
|
Yes, MyISAM always knows how many rows are stored in a table, because it doesn't have to handle transactions. But it is slower for other things and has some limitations I don't like (no foreign keys, no transactions).
The advices I wrote before are only valid for InnoDB. The main difference is that InnoDB stores the primary key's value at the end of all indexes. So, if you define an index on name, actually it is on (name,id). Remember: at the end, not at the beginning. The index order is relevant.
So, for example, one of my advices with MyISAM turns to: make an index on (cat_id,id).
|

January 24th, 2013, 05:01 AM
|
|
Contributing User
|
|
Join Date: Jan 2013
Location: Italy
Posts: 36
Time spent in forums: 4 h 37 m 48 sec
Reputation Power: 1
|
|
I forgot:
MySQL 5.6 is not stable yet. And its performance enhancements affect subqueries and some non-optimized queries.
Maybe the forks MariaDB and Percona Server have more enhancements in the InnoDB engine, but don't rely on it: work hard to optimize your query 
|

January 24th, 2013, 08:32 AM
|
|
Problem Solver
|
|
Join Date: Jan 2001
Location: Stockholm, Sweden
|
|
Some suggestions:
1. Move the lc.cat_id=181 condition to the join condition for that table. It won't do that much difference in performance right now but it makes the query easier to read and with more complex queries it makes it much easier for the optimizer to figure out a good execution plan.
Code:
SELECT SQL_CALC_FOUND_ROWS
l.*
FROM
fmd_listings l
INNER
JOIN fmd_listings_categories lc
ON l.id=lc.list_id and lc.cat_id=181
WHERE
l.status='active'
ORDER
BY
date DESC
,priority DESC
LIMIT 0,10
2. Do you really need l.* or is that just lazy (hey I do it sometimes also), the reason is that the temporary table created by the join before the filesort is run will contain all these fields and sorting 300MB takes much longer time than sorting 3MB, so keeping down the size of the data that you want returned can greatly reduce query time.
3. Is the second sort field "priority" also in the table fmd_listings?
And since you have a LIMIT 10 in your query then you can add an index that makes it possible for MySQL to use it to retrieve the rows in order instead of sorting them afterwards:
Code:
ALTER TABLE fmd_listings ADD INDEX fl_ix_status_date_priority (id, status, date, priority);
But to make it work you would probably have to skip the SQL_CALC_FOUND_ROWS in the query. Because MySQL has an LIMIT optimization that ends the query execution as soon as it has returned the requested amount of rows.
So if you try this query with that index:
Code:
SELECT
l.something
,l.something
,l.something
...
FROM
fmd_listings l
INNER
JOIN fmd_listings_categories lc
ON l.id=lc.list_id
and lc.cat_id=181
WHERE
l.status='active'
ORDER
BY
date DESC
,priority DESC
LIMIT 0,10
4. And then if you want to know total nr of rows found you can just throw another query with just count(*), since you don't want to return any data and it can all take place in the indexes it should be pretty fast:
Code:
SELECT
count(*)
FROM
fmd_listings l
INNER
JOIN fmd_listings_categories lc
ON l.id=lc.list_id and lc.cat_id=181
WHERE
l.status='active'
__________________
/Stefan
|

January 24th, 2013, 01:11 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by sr Some suggestions:
1. Move the lc.cat_id=181 condition to the join condition for that table. It won't do that much difference in performance right now but it makes the query easier to read and with more complex queries it makes it much easier for the optimizer to figure out a good execution plan.
Code:
SELECT SQL_CALC_FOUND_ROWS
l.*
FROM
fmd_listings l
INNER
JOIN fmd_listings_categories lc
ON l.id=lc.list_id and lc.cat_id=181
WHERE
l.status='active'
ORDER
BY
date DESC
,priority DESC
LIMIT 0,10
|
You say that, but it just reduced my query time from 9 seconds to 0.0014 seconds!!!! WTF that's voodoo right there!! I can't see any difference in the result returned. How can this be??
Quote: | Originally Posted by sr
2. Do you really need l.* or is that just lazy (hey I do it sometimes also), the reason is that the temporary table created by the join before the filesort is run will contain all these fields and sorting 300MB takes much longer time than sorting 3MB, so keeping down the size of the data that you want returned can greatly reduce query time.
|
Its a combination of lazyness and needing _most_ of the columns. However, I just run the query with every column but one (description - the big one) removed and the query is down to 0.0005 seconds. I'm really starting to love you!
Quote: | Originally Posted by sr
3. Is the second sort field "priority" also in the table fmd_listings?
And since you have a LIMIT 10 in your query then you can add an index that makes it possible for MySQL to use it to retrieve the rows in order instead of sorting them afterwards:
Code:
ALTER TABLE fmd_listings ADD INDEX fl_ix_status_date_priority (id, status, date, priority);
But to make it work you would probably have to skip the SQL_CALC_FOUND_ROWS in the query. Because MySQL has an LIMIT optimization that ends the query execution as soon as it has returned the requested amount of rows.
So if you try this query with that index:
Code:
SELECT
l.something
,l.something
,l.something
...
FROM
fmd_listings l
INNER
JOIN fmd_listings_categories lc
ON l.id=lc.list_id
and lc.cat_id=181
WHERE
l.status='active'
ORDER
BY
date DESC
,priority DESC
LIMIT 0,10
|
I'll give this a try as well, but the changes so far are more than impressive! I'm finding it hard to believe that those small changes could make such a difference
|

January 24th, 2013, 01:16 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 8
Time spent in forums: 54 m 9 sec
Reputation Power: 0
|
|
Oh damn, I think I must have experienced some sort of query cache, as changing the cat_id to a different number brought the time back up to 15 seconds.
I'll go back, re-read yout thread and try some of the suggestions.
Thanks again 
|

January 24th, 2013, 05:18 PM
|
|
Contributing User
|
|
Join Date: Jan 2013
Location: Italy
Posts: 36
Time spent in forums: 4 h 37 m 48 sec
Reputation Power: 1
|
|
Moving conditions in the ON clause does the same thing, but that's just because the optimizer is smart  If it would say what you say, it would read unneded rows from the left table.
Oh, WTF, I forgot to tell you about the l.*! Did you remove it? Of course he's right, the more fields you read, the more time the query needs.
But your problem was clear from the EXPLAIN output: MySQL reads 40,000+ rows from the right table. Using a good index will address this problem. Everything else should not have the priority for you.
Can you post the current query, the current explain, and the current show index output?
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|