July 31st, 2013, 03:42 PM
Rule of Thumb? Multiple Joins vs. Multiple Queries?
I have never seen any and a quick Google search didn't help me.
Are there any rules of thumbs which help with making the decision when it is better to run a separate query vs. adding some extra tables via join to get the info in one big request. My guess is the answer is case-by-case via testing.
I have a query where I need to join four tables for the basic information to display to the user. I have one more bit of information to grab which is two tables away. (An ID in the main query gets me the the row I need on another table which gets me the row I need on a third table which has the name I am looking up.)
So, more or less:
SELECT ID, other-stuff FROM a join b join c join d WHERE ...
SELECT name FROM e join f WHERE ID=id-from-before
SELECT ID, name, other-stuff FROM a join b join c join d join e join f
The tables are not enormous (yet). The place where the main information is - is growing quickly. The other lookup tables will only be a couple of hundred records.
At this time, I can measure no significant performance difference in a big-join vs a second query.
My preference is for the two queries because it keeps the purpose separate and easier to explain for maintenance purposes.
But I'm wondering if there might be long term performance benefit of one big join as the tables and number of users grows.
Thanks in advance.
July 31st, 2013, 05:40 PM
this is going to remain the case for the foreseeable future
Originally Posted by EEsterling
after that, you can tweak the indexes, and make it work for even longer
go fight some other battles
July 31st, 2013, 10:46 PM
If you're running the queries from an application language like PHP, it's quite possible that the overhead involved in running the extra queries will make using joins significantly faster. There are exceptions, but no hard and fast rules that define them. This is one of the primary reasons why WordPress is so inefficient.
August 1st, 2013, 06:01 AM
Ah, so nice to hear, thank you for the morsel of sanity.
Regarding the original question, my knowledge is second-hand, by reading High-Performance MySQL, The Art of SQL and others. While everything in these books points to a nuanced, case-by-case look at each situation, my impression is that overall, as a fat rule of thumb, for four tables, the authors would often favor a single well-formulated query with JOINs rather than multiple queries, under the assumption that the optimizer should handle JOINs particularly well—plus the round-trip multi-layer latency story. In "well-formulated," there are a number of things, such as the question of using appropriate (possibly covering) indexes, and also the question of reducing the data set as fast as possible. Not an expert answer, just sharing the fruits of my reading.
On a personal note, I had a case recently where the two-query approach was faster (benchmarking by looping 10 thousand times). That had to do with one of the two queries being cache-able, but not the big one; also, the big query was a contrived spaghetti with some left JOINs to retrieve disconnected data into a single row, probably not your case.
August 1st, 2013, 08:33 AM
It is a PHP application.
That sounds exactly the same to me.
Originally Posted by ragax
Unfortunately, I realized I erred in thinking about the problem for the specific case that got me thinking about it.
The big ugly query gets about 30 rows. (Probably will always be 20-50). PHP then lays out the data for each row.
Where I erred is I forgot that I don't run the second simple query once. I run it for each row. So, I'm really making ~31 queries, not just two.
So, maybe I should make my big ugly query bigger and uglier. But I won't bother now since I don't have a performance issue.
August 1st, 2013, 03:16 PM
Sounds like a fun tweaking project for a winter evening.
I've started adding copious comments to large queries, but definitely need to learn more about making "ugly" queries look more presentable. Indentation also helps. But I guess the core of the problem is going from the multi-dimensional nature of a query involving several tables, conditions etc to the linear nature of written code.