September 19th, 2012, 12:06 PM
MySQL Query throwing unexpected MAX_JOIN_SIZE error
First question here, so hopefully I get my formatting and presentation right. I'm administering a wordpress database into which I have hacked unread comments tracking for logged in users. The blog has relatively few users, but lots of activity, so the relevant tables have the following sizes:
wp_posts: ~8000 (wp_posts is the only table with ID as a field)
wp_unreadcomments: ~23,000 (this is my tracking table)
My host has set MAX_JOIN_SIZE at 5,000,000
What I need is one query that identifies each post with comments newer than the last-visited date in unreadcomments, gathers some information about that post, and counts the number of unread comments. I would assume I could do this without coming anywhere near a 5,000,000 join size, but this code doesn't do it:
So my question is why is MySQL seeing this as a >5,000,000 join size, and how do I fix it?
SELECT [needed information fields], count(ID) as newcomments from
#Here I'm filtering wp_posts based upon type and a drop dead date.
#This select, run alone, returns ~130 rows.
#I know for other reasons that anything before this date is read.
(SELECT ID, post_date, comment_count, post_title FROM wp_posts WHERE post_type = 'post' AND post_status = 'publish' AND post_date > '2012-07-16 22:37:03') as filterPosts
#Here I'm filtering unreadcomments to the specific user. This returns 512 rows.
LEFT OUTER JOIN (select lastvisit, is_done, postid from wp_unreadcomments WHERE userid = 1) as filterUnread ON ID = postid
#And here I'm filtering the comments for other users and the same date as posts. 32,000 rows.
LEFT OUTER JOIN (select comment_date, comment_post_ID from wp_comments WHERE wp_comments.comment_date > '2012-07-16 22:37:03' AND user_id <> 1) as filterComments ON comment_post_ID = ID AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03')
#is_done is a binary flag users can set to ignore posts.
#I need to coalesce because lastvisit is either Null
#(if never visited) or larger than the date I know going in.
WHERE (is_done IS NULL OR is_done = 0) AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03')
#The group by gives me the unread comment count per post
GROUP BY ID
September 20th, 2012, 12:31 AM
I know self-replies are frowned upon
But I've continued to explore this question, and apparently some part of the issue relates to the way MySQL handles derived tables (google "Derived Tables and Views Performance", I can't post links in this forum).
That said, it seems like creating four views each time would be an awfully ungainly solution, so I'm still looking for any bright ideas from the audience.
September 20th, 2012, 05:05 PM
Efficiency step 2
Based on the article I referenced above, I rewrote my PHP code to this:
That, of course, is ugly in that it's creating three views every time a user loads the page (which is not infrequent). I could, of course, not recreate posts and unread every time because they do not filter using a time (which changes load-to-load), but I didn't rewrite that part since I would still have to recreate the comment view.
$sql_posts = "create OR REPLACE view zPosts_$uID as SELECT ID, post_date, comment_count, post_title FROM wp_posts
WHERE post_type = 'post' AND post_status = 'publish'";
$sql_comments = "create OR REPLACE view zComments_$uID as select comment_date, comment_post_ID from wp_comments
WHERE comment_date > '$stopDate' AND user_id <> $uID";
$sql_unread = "create OR REPLACE view zUnread_$uID as select lastvisit, is_done, postid from fk_unreadcomments WHERE userid = $uID";
$sql = "SELECT ID, post_date, lastvisit, is_done, comment_count, post_title, count(ID) as newcomments from
zPosts_$uID LEFT OUTER JOIN zUnread_$uID ON ID = postid LEFT OUTER JOIN zComments_$uID ON comment_post_ID = ID
WHERE (is_done IS NULL OR is_done = 0) AND comment_date > COALESCE(lastvisit,'$stopDate')
GROUP BY ID";
$wpdb->query ($sql_posts );
$wpdb->query ($sql_comments );
$wpdb->query ($sql_unread );
$recent_posts = $wpdb->get_results($sql);
Is there a problem storing/recreating ~500 views in MySQL? Is there a better way to avoid the derived tables MySQL appears unable to handle?
September 21st, 2012, 05:40 AM
mysql can handle derived tables just fine
creating views on the fly is very inefficient
you need to determine why you are getting so many intermediate rows in the FROM clause
strategy: remove the GROUP BY and take a look at the actual rows being produced by your joins
September 21st, 2012, 12:56 PM
When you ask about the rows produced by joins, is that something different than the counts I put in the comments to the initial SQL code? If not, the answer is 130, 512, and 32,000.
The basis for the derived table issue is that when I run an explain on the first version, it finds no keys to use (which suggests a problem to me, and squares with the article I can't link). When I use the views, explain identifies appropriate keys.
I'm not surprised to hear that view creation is inefficient, as it certainly feels wrong as a solution.
Can you let me know if you meant something else about the actual rows, or if any other information would be helpful?
September 21st, 2012, 01:08 PM
yes, i meant the intermediate rows produced by your joins
Originally Posted by nevermoor
suppose, for instance, that each of the 512 rows was matched with each of the 32,000 rows -- that's 16 million intermediate rows
you need to check that your joins are correct
the way to do it is to remove the GROUP BY, and print something from each of the tables in the joins --
if that takes forever, maybe throw a LIMIT 1000 on it so that you do get something back
SELECT filterPosts.ID AS filterPosts_ID
, filterUnread.postid AS filterUnread_postid
, filterComments.comment_post_ID AS filterComments_comment_post_ID
from ( SELECT ID
WHERE post_type = 'post'
AND post_status = 'publish'
AND post_date > '2012-07-16 22:37:03' ) as filterPosts
JOIN ( select lastvisit
WHERE userid = 1 ) as filterUnread
ON ID = postid
JOIN ( select comment_date
WHERE wp_comments.comment_date > '2012-07-16 22:37:03'
AND user_id <> 1 ) as filterComments
ON comment_post_ID = ID
AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03')
is_done IS NULL
OR is_done = 0
AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03')
September 21st, 2012, 01:19 PM
I ran exactly that SQL query, and got 30,651 total, Query took 0.7062 sec.
Your query also required a "SET SQL_BIG_SELECTS=1;" to run.
September 21st, 2012, 02:37 PM
30,000 rows is not a lot
i wonder why you got the "unexpected MAX_JOIN_SIZE error"
in any case, i trust that your mistrust of derived tables has been dispelled, and you are no longer thinking of views on the fly
September 21st, 2012, 03:45 PM
I don't know why it is happening (to be clear, it happens both with my query and with yours, but not when I convert the derived tables to views).
That suggests to me that there is something wrong with derived tables, but at this point I don't know if I can better optimize the query.
September 21st, 2012, 08:07 PM
mysql can handle hundreds of millions of rows, terrabytes of data in a table. yours is nowhere near that capacity.
I'd say there is either something wrong with your my.cfg/my.ini configuration file or something else. There should be no issues with derived tables and the data you have, even with cross joins should still run reasonably well with indexes.
the issue is a)
which isn't terrible but could be low or b) something isn't right with your join that you aren't seeing that is causing that limit to be exceeded.
nothing to do with derived tables.
September 22nd, 2012, 02:19 AM
Makes sense to me
All of that makes sense, but I'm really stumped about why it is such a large (and relatively slow at ~0.7 to 1 second) operation.
I don't see how it could be a join issue as the joining is pretty straightforward (postID in posts and unread comments, and comment date by linking comments to posts) and there is no issue when using views instead of derived tables (which also has explain showing keys used).
You guys are saying that this is a small beans usage issue, and I agree it should be, but for whatever reason it isn't in practice.
Are there any other suggestions for tests it would make sense to run?