#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0

    MySQL Query throwing unexpected MAX_JOIN_SIZE error


    Hello,

    First question here, so hopefully I get my formatting and presentation right. I'm administering a wordpress database into which I have hacked unread comments tracking for logged in users. The blog has relatively few users, but lots of activity, so the relevant tables have the following sizes:

    wp_posts: ~8000 (wp_posts is the only table with ID as a field)
    wp_comments: ~234000
    wp_unreadcomments: ~23,000 (this is my tracking table)
    My host has set MAX_JOIN_SIZE at 5,000,000

    What I need is one query that identifies each post with comments newer than the last-visited date in unreadcomments, gathers some information about that post, and counts the number of unread comments. I would assume I could do this without coming anywhere near a 5,000,000 join size, but this code doesn't do it:

    Code:
    SELECT [needed information fields], count(ID) as newcomments from 
    
    #Here I'm filtering wp_posts based upon type and a drop dead date.  
    #This select, run alone, returns ~130 rows.  
    #I know for other reasons that anything before this date is read.
    (SELECT ID, post_date, comment_count, post_title FROM wp_posts WHERE post_type = 'post' AND post_status = 'publish' AND post_date > '2012-07-16 22:37:03') as filterPosts 
    
    #Here I'm filtering unreadcomments to the specific user.  This returns 512 rows.
    LEFT OUTER JOIN (select lastvisit, is_done, postid from wp_unreadcomments WHERE userid = 1) as filterUnread ON ID = postid 
    
    #And here I'm filtering the comments for other users and the same date as posts.  32,000 rows.
    LEFT OUTER JOIN (select comment_date, comment_post_ID from wp_comments WHERE wp_comments.comment_date > '2012-07-16 22:37:03' AND user_id <> 1) as filterComments ON comment_post_ID = ID AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03') 
    
    #is_done is a binary flag users can set to ignore posts.  
    #I need to coalesce because lastvisit is either Null 
    #(if never visited) or larger than the date I know going in.
    WHERE (is_done IS NULL OR is_done = 0) AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03') 
    
    #The group by gives me the unread comment count per post
    GROUP BY ID
    So my question is why is MySQL seeing this as a >5,000,000 join size, and how do I fix it?

    Any ideas?
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0

    I know self-replies are frowned upon


    But I've continued to explore this question, and apparently some part of the issue relates to the way MySQL handles derived tables (google "Derived Tables and Views Performance", I can't post links in this forum).

    That said, it seems like creating four views each time would be an awfully ungainly solution, so I'm still looking for any bright ideas from the audience.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0

    Efficiency step 2


    Based on the article I referenced above, I rewrote my PHP code to this:

    PHP Code:
        $sql_posts "create OR REPLACE view zPosts_$uID as SELECT ID, post_date, comment_count, post_title FROM wp_posts 
                WHERE post_type = 'post' AND post_status = 'publish'"
    ;
        
    $sql_comments "create OR REPLACE view zComments_$uID  as select comment_date, comment_post_ID from wp_comments 
                WHERE comment_date > '
    $stopDate' AND user_id <> $uID";
        
    $sql_unread "create OR REPLACE view zUnread_$uID as select lastvisit, is_done, postid from fk_unreadcomments WHERE userid = $uID";

        
    $sql "SELECT  ID, post_date, lastvisit, is_done, comment_count, post_title, count(ID) as newcomments from 
                zPosts_
    $uID LEFT OUTER JOIN zUnread_$uID ON ID = postid LEFT OUTER JOIN zComments_$uID ON comment_post_ID = ID 
                WHERE (is_done IS NULL OR is_done = 0) AND comment_date > COALESCE(lastvisit,'
    $stopDate') 
                GROUP BY ID"
    ;
                
                
    $wpdb->query ($sql_posts );
                
    $wpdb->query ($sql_comments );
                
    $wpdb->query ($sql_unread );
                
    $recent_posts $wpdb->get_results($sql); 
    That, of course, is ugly in that it's creating three views every time a user loads the page (which is not infrequent). I could, of course, not recreate posts and unread every time because they do not filter using a time (which changes load-to-load), but I didn't rewrite that part since I would still have to recreate the comment view.

    Is there a problem storing/recreating ~500 views in MySQL? Is there a better way to avoid the derived tables MySQL appears unable to handle?
  6. #4
  7. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,207
    Rep Power
    4279
    mysql can handle derived tables just fine

    creating views on the fly is very inefficient

    you need to determine why you are getting so many intermediate rows in the FROM clause

    strategy: remove the GROUP BY and take a look at the actual rows being produced by your joins
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0
    Thanks.

    When you ask about the rows produced by joins, is that something different than the counts I put in the comments to the initial SQL code? If not, the answer is 130, 512, and 32,000.

    The basis for the derived table issue is that when I run an explain on the first version, it finds no keys to use (which suggests a problem to me, and squares with the article I can't link). When I use the views, explain identifies appropriate keys.

    I'm not surprised to hear that view creation is inefficient, as it certainly feels wrong as a solution.

    Can you let me know if you meant something else about the actual rows, or if any other information would be helpful?
  10. #6
  11. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,207
    Rep Power
    4279
    Originally Posted by nevermoor
    When you ask about the rows produced by joins, is that something different than the counts I put in the comments to the initial SQL code?
    yes, i meant the intermediate rows produced by your joins

    suppose, for instance, that each of the 512 rows was matched with each of the 32,000 rows -- that's 16 million intermediate rows

    you need to check that your joins are correct

    the way to do it is to remove the GROUP BY, and print something from each of the tables in the joins --
    Code:
    SELECT filterPosts.ID                 AS filterPosts_ID
         , filterUnread.postid            AS filterUnread_postid
         , filterComments.comment_post_ID AS filterComments_comment_post_ID
      from ( SELECT ID
                  , post_date
                  , comment_count
                  , post_title 
               FROM wp_posts 
              WHERE post_type = 'post' 
                AND post_status = 'publish' 
                AND post_date > '2012-07-16 22:37:03' ) as filterPosts 
    LEFT OUTER 
      JOIN ( select lastvisit
                  , is_done
                  , postid 
               from wp_unreadcomments 
              WHERE userid = 1 ) as filterUnread 
        ON ID = postid 
    LEFT OUTER 
      JOIN ( select comment_date
                  , comment_post_ID 
               from wp_comments 
              WHERE wp_comments.comment_date > '2012-07-16 22:37:03' 
                AND user_id <> 1 ) as filterComments 
        ON comment_post_ID = ID 
       AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03') 
     WHERE ( 
           is_done IS NULL 
        OR is_done = 0
           ) 
       AND comment_date > COALESCE(lastvisit,'2012-07-16 22:37:03')
    if that takes forever, maybe throw a LIMIT 1000 on it so that you do get something back
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0
    Got it.

    I ran exactly that SQL query, and got 30,651 total, Query took 0.7062 sec.

    Your query also required a "SET SQL_BIG_SELECTS=1;" to run.
  14. #8
  15. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,207
    Rep Power
    4279
    30,000 rows is not a lot

    i wonder why you got the "unexpected MAX_JOIN_SIZE error"

    in any case, i trust that your mistrust of derived tables has been dispelled, and you are no longer thinking of views on the fly
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0
    I don't know why it is happening (to be clear, it happens both with my query and with yours, but not when I convert the derived tables to views).

    That suggests to me that there is something wrong with derived tables, but at this point I don't know if I can better optimize the query.
  18. #10
  19. Hockey face
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    St. Catharines, Canada
    Posts
    8,147
    Rep Power
    1316
    mysql can handle hundreds of millions of rows, terrabytes of data in a table. yours is nowhere near that capacity.
    I'd say there is either something wrong with your my.cfg/my.ini configuration file or something else. There should be no issues with derived tables and the data you have, even with cross joins should still run reasonably well with indexes.

    the issue is a)
    My host has set MAX_JOIN_SIZE at 5,000,000
    which isn't terrible but could be low or b) something isn't right with your join that you aren't seeing that is causing that limit to be exceeded.

    nothing to do with derived tables.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    7
    Rep Power
    0

    Makes sense to me


    All of that makes sense, but I'm really stumped about why it is such a large (and relatively slow at ~0.7 to 1 second) operation.

    I don't see how it could be a join issue as the joining is pretty straightforward (postID in posts and unread comments, and comment date by linking comments to posts) and there is no issue when using views instead of derived tables (which also has explain showing keys used).

    You guys are saying that this is a small beans usage issue, and I agree it should be, but for whatever reason it isn't in practice.

    Are there any other suggestions for tests it would make sense to run?

IMN logo majestic logo threadwatch logo seochat tools logo