#1
  1. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,672
    Rep Power
    171

    Subquery VS Derived Table


    I realise for me personally learning "why"s makes them understand concepts and apply them properly. I am reading about EXPLAIN and I ran into derived and subquery in select_type. I need to understand a few things:

    1 - Why is it important to know the difference between derived table and subquery?

    2 - Why are they different?

    3 - How can I differentiate?

    I see this in a few places
    - derived tables are used in the FROM clause - subqueries are used in the WHERE clause,
    Thanks
    Last edited by zxcvbnm; April 24th, 2013 at 09:26 PM.
  2. #2
  3. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    1. It's important because MySQL executes a query with subqueries from the outside in. So it begins with the outer query executes it, fetches the first record and then evaluates the subquery, and then the second record, evaluates the subquery, etc, etc. This can have very big implications on performance if your subquery is very heavy to execute because for every row that the outer query returns the subquery needs to be reexecuted. The reason for this is because you can (and should) inject values from the outer query into the sub-query to limit the result set and that value needs to be known before the sub-query is executed.

    2. They are different because a derived table tells the database that the result should be treated as a static table and not something that can alter or is dependent on the execution of the rest of the query.

    3. Derived tables are when the query is used in the place of a table in a FROM or a JOIN clause just as a table.

    Eg:
    Derived table means that you create an implicit temporary table through a query and then you use this temporary table as a normal table in JOINs etc.
    Code:
    SELECT
      ...
    FROM
      tableA
    INNER
      JOIN
        (
         SELECT
          ..
         FROM
           tableB
         WHERE
           ..
         ) derivedTableA
         ON derivedTableA.xx = tableA.yy
    While the subquery is used basically everywhere else in a query but most commonly in the WHERE or SELECT clauses:
    Code:
    SELECT
      ...
      (SELECT something FROM tableB WHERE tableB.id = a.someID)
    FROM
      tableA a
    WHERE
      a.columnA IN ( SELECT columnB FROM tableC WHERE tableC.id = a.someCid)
    As you can see in the subselects above I'm pushing in the value of a.someID from the outer query to make the result from those subqueries as small as possible.

    I would say that the rule is that if you can write the query as a derived table instead of a subquery you should.
    Generally most people have experienced performance problems with subqueries compared to derived tables.
    I mention this since if your derived table is very large then the handling of that temporary table can become rather heavy for the database which is the possible drawback with a derived table.
    /Stefan
  4. #3
  5. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,672
    Rep Power
    171
    @Sr

    Ok thanks, would it be correct to say it this way: Derived tables are the tables used only in JOINS for example the red part below:
    Code:
    SELECT q_allotments.total,
           q_allotments.allotments_stats,
           q_allotments.allotments_nor,
           q_allotments.allotments_statuses,
           hotelrooms.room_id           AS RID,
           packages.type                AS PTYPE,
           countries.countryname        AS COUNTRY,
           regions.regionname           AS REGION_NAME,
           destinations.destinationname AS DESTINATION
    FROM   packages
           INNER JOIN (SELECT allotments_new.package_id,
                              Sum(day_rate)                                AS TOTAL,
                              Group_concat(allotments_new.status)          AS
                                              ALLOTMENTS_STATS,
                              Group_concat(allotments_new.number_of_rooms) AS
                                              ALLOTMENTS_NOR,
                              Group_concat(allotments_new.status)          AS
                                              ALLOTMENTS_STATUSES
                       FROM   allotments_new
                       WHERE  allotments_new.day_date >= '2013-04-19'
                              AND allotments_new.day_date < '2013-04-22'
                              AND allotments_new.day_rate > 0
                              AND allotments_new.status != 'n'
                              AND allotments_new.minimum_number_of_nights <= 3
                       GROUP  BY package_id) AS q_allotments
                   ON q_allotments.package_id = packages.id
           LEFT OUTER JOIN hotels
                        ON hotels.hotel_id = packages.hotel_id
                           AND hotels.status = 'active'
           LEFT OUTER JOIN countries
                        ON hotels.country_id = countries.country_id
           LEFT OUTER JOIN hotelrooms
                        ON hotelrooms.room_id = packages.room_id
                           AND hotelrooms.status = 'active'
           LEFT OUTER JOIN room_rates
                        ON room_rates.room_id = hotelrooms.room_id
           LEFT OUTER JOIN destinations
                        ON destinations.destination_id = hotels.destination_id
           LEFT OUTER JOIN regions
                        ON regions.region_id = hotels.region_id
           LEFT OUTER JOIN places
                        ON places.id = hotels.places_id
    WHERE  hotels.hotel_id = '239'
           AND packages.status = 'active'
           AND hotels.status = 'active'
           AND hotelrooms.status = 'active'
           AND maximum_number_of_adults_and_children >= '1'
         
    GROUP  BY ppiidd
    ORDER  BY total
    One thing about derived tables that doesnt make sense to me is that when I EXPLAIN the type is almost always ALL for me no matter what index I add to that table. Almost all columns in WHERE have been indexed but still FLL TABLE SCAN. This for example. Package id is foreign key. Can you please explain that?

    If u need table info for that, try this one, exact same situation:
    Code:
    SELECT 
    table_a.id,  
    derived.id AS DERVIED_ID
    FROM   table_a
           INNER JOIN (SELECT id AS ID,
                              pid
                       FROM   table_b
                       WHERE  date_added < '2012-12-12') AS derived
                   ON derived.pid = table_a.id
    you see I have the date_added and ID as indexes in the Derived table but the EXPLAIN shows a full scan. I dont think there is a way to avoid this, is there?
  6. #4
  7. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Originally Posted by sr
    Derived table means that you create an implicit temporary table through a query and then you use this temporary table as a normal table in JOINs
    Originally Posted by zxcvbnm
    Derived tables are the tables used only in JOINS ...
    Correct! You have grasped the concept of a Derived Table.

    Originally Posted by zxcvbnm
    One thing about derived tables that doesnt make sense to me is that when I EXPLAIN the type is almost always ALL for me no matter what index I add to that table.
    What line in the EXPLAIN are you referring to?
    1. The line that JOINs the derived table where table=<derived2>?
    Or:
    2. The line at the bottom that actually creates the derived table where select_type=DERIVED?

    Nr 1. above will always be type=ALL and possible_keys=NULL since the temporary table doesn't have any indexes.

    Nr 2. above will use indexes if it's advantageous.

    Why it isn't using indexes in your case might be because if you only have indexes on the individual columns and you have so many columns in the WHERE then maybe none of the indexes will return few enough rows to justify using the index since using an index means that you will have to fetch the rest of the information from the table to be able to evaluate the rest of the WHERE conditions. And fetching data randomly from the table is even more expensive compared to just running through a table scan.
    My rule of thumb is that if a condition will return more than 10% of the rows then a table scan is faster than using an index.

    The good thing about Derived Tables is that you can optimize the query for the derived table individually since it's not dependent on anything.
    So isolate that query and optimize it without the outer query.
    And suggestions to optimize it:
    1. How many rows does the query for the derived table return without the GROUP BY?
    If it's more than 350,000 then performing a table scan is usually faster than trying to use an index. Random reads are very expensive compared to sequential reads.

    2. Think about adding a multi-column index on a couple of the columns, choose 2-4 columns that combined should return the least amount of rows. Since you are using a lot of ranges in this query it will perform an index scan so I think keeping the index smaller and only having a portion of the columns might be better than all columns but you will have to try a few combinations to get the best selection.
    /Stefan
  8. #5
  9. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,672
    Rep Power
    171
    Originally Posted by sr
    What line in the EXPLAIN are you referring to?
    1. The line that JOINs the derived table where table=<derived2>?

    Nr 1. above will always be type=ALL and possible_keys=NULL since the temporary table doesn't have any indexes.
    Which brings me to this question I was gonna open a new thread for:
    About temporary tables:
    Sometimes you need to get reports or something like, that runs a complex query which includes `JOIN`, `GROUP BY`, `HAVING` and etc. The time of running is depends on amount of records and complexity. To run queries faster we can use temporary tables.

    It is faster to get data from temporary table instead of getting data form DB tables with several filter conditions.Temporary tables are connection specific. When the current client session terminates all the temporary tables are automatically deleted.
    But when does MySQL create them? What columns are inside of these tables? How complex should the query be so there is temporary table created? How does MySQL measyre this?
    Originally Posted by sr
    2. The line at the bottom that actually creates the derived table where select_type=DERIVED.
    Nr 2. above will use indexes if it's advantageous.
    This happens everytime I use such derived tables which makes me ask you about file sort. Why? How?
    Originally Posted by sr
    The good thing about Derived Tables is that you can optimize the query for the derived table individually since it's not dependent on anything.
    So isolate that query and optimize it without the outer query.
    And suggestions to optimize it.
    Great.

IMN logo majestic logo threadwatch logo seochat tools logo