#1
  1. Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2002
    Location
    close to the edge
    Posts
    1,035
    Rep Power
    69

    Speeding up query


    I have a query that takes about a second to run that I would like to make faster. Looking for some input. The query is below as is the create statement and the explain for the query:

    Query:
    Code:
    SELECT
    	mfp_lead_tracker_id
    FROM
    	consolidated_data
    WHERE
    	partner_id IN (77,499)
    	AND
    	(
    		email = "kamilakamel@hotmail.com"
    		OR phone = ""
    	)
    	AND date_lead_submitted >= "2013-04-01"
    	AND date_lead_submitted <= "2013-10-01"
    ORDER BY
    	date_lead_submitted DESC
    Explain:
    Code:
    id	select_type	table			type	possible_keys	keys key_len	ref	rows	Extra
    1	SIMPLE		consolidated_data	ALL	NULL		NULL		NULL	356982	Using where; Using filesort
    Create:
    Code:
    CREATE TABLE `consolidated_data` (
      `mfp_lead_tracker_id` int(10) unsigned NOT NULL,
      `bf_lead_id` int(10) unsigned DEFAULT NULL,
      `list_id` int(10) unsigned DEFAULT NULL,
      `source_id` int(10) unsigned DEFAULT NULL,
      `source_name` varchar(70) DEFAULT NULL,
      `ppl` decimal(5,2) DEFAULT NULL,
      `passback_id` varchar(20) DEFAULT NULL,
      `success` tinyint(3) unsigned DEFAULT NULL,
      `good_yes` tinyint(1) NOT NULL DEFAULT '0',
      `bf_call_time` datetime DEFAULT NULL,
      `date_lead_submitted` datetime DEFAULT NULL,
      `partner_id` int(10) unsigned DEFAULT NULL,
      `partner_name` varchar(100) DEFAULT NULL,
      `school` varchar(100) DEFAULT NULL,
      `campus` varchar(100) DEFAULT NULL,
      `campus_type` varchar(20) DEFAULT NULL,
      `campus_online_both` varchar(10) DEFAULT NULL,
      `warm_transfer` char(1) DEFAULT NULL,
      `degree_submitted` varchar(150) DEFAULT NULL,
      `first_name` varchar(30) DEFAULT NULL,
      `last_name` varchar(30) DEFAULT NULL,
      `email` varchar(80) DEFAULT NULL,
      `phone` varchar(10) DEFAULT NULL,
      `city` varchar(40) DEFAULT NULL,
      `state` varchar(2) DEFAULT NULL,
      `zip` varchar(5) DEFAULT NULL,
      `dob` date DEFAULT NULL,
      `military` char(1) DEFAULT NULL,
      `hs_grad_year` smallint(5) unsigned DEFAULT NULL,
      `cap_group_name` varchar(120) DEFAULT NULL,
      `points` decimal(2,1) DEFAULT NULL,
      `agent` varchar(20) DEFAULT NULL,
      `area_of_interest_bf` varchar(40) DEFAULT NULL,
      `area_of_interest_search` text,
      `highest_edu_level` varchar(30) DEFAULT NULL,
      `ebureau_campus_score` smallint(5) unsigned DEFAULT NULL,
      `ebureau_education_score` smallint(5) unsigned DEFAULT NULL,
      `ebureau_online_score` smallint(5) unsigned DEFAULT NULL,
      `motivation` tinyint(3) unsigned DEFAULT NULL,
      `time_to_start` varchar(20) DEFAULT NULL,
      `credit_hours` smallint(5) unsigned DEFAULT NULL,
      `desired_degree_level` varchar(30) DEFAULT NULL,
      `us_citizen` varchar(3) DEFAULT NULL,
      `age` tinyint(3) unsigned DEFAULT NULL,
      `time_to_talk` varchar(3) DEFAULT NULL,
      `date_last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      `scrubbed` tinyint(1) DEFAULT NULL,
      `scrubbed_date` datetime DEFAULT NULL,
      `scrubbed_reason` varchar(256) DEFAULT NULL,
      `enrolled` tinyint(1) DEFAULT NULL,
      `enrolled_date` date DEFAULT NULL,
      `applied` tinyint(1) DEFAULT NULL,
      `applied_date` date DEFAULT NULL,
      `enrollment_data_updated` tinyint(1) DEFAULT '0',
      PRIMARY KEY (`mfp_lead_tracker_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    The query is run quite often as part of an upload script, decreasing the time of the select would greatly decrease the amount of time for the upload to run.

    Note: the rows in the explain represents the entire table

    Thank you for your time and help,
    cranium
  2. #2
  3. Lord of the Dance
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2003
    Posts
    3,737
    Rep Power
    1959
    What happens if you remove the ORDER BY?
    What happens if you remove one criteria from the the WHERE part?

    "Using filesort" sounds like a missing index at one of the fields.
    Try add an index at date_lead_submitted

    But about one second is not necessarily that bad, depends at total number of rows and how many rows it has to return.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    11
    EXPLAIN says it's doing a sequential scan and using a filesort, which is the slowest thing it can do.

    Create an index on date_lead, partner_id, email and phone, in that order (well, date_lead first anyway, because that is used in the ORDER BY and probably filters most of the records anyway)
  6. #4
  7. Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2002
    Location
    close to the edge
    Posts
    1,035
    Rep Power
    69
    I did a lot of what you said while playing with the query. It seems like the partner_id field was the one taking the longest w/in the overall query (getting rid of all fields but it).

    I don't know if this is the correct action but I added an index to the table for partner_id

    Code:
    alter table consolidated_data add index `partner_id` (`partner_id`)
    The explain
    Code:
    id	select_type	table			type	possible_key	key		key_len	ref	rows	Extra
    1	SIMPLE		consolidated_data	range	partner_id	partner_id	5	NULL	103	Using where; Using filesort
    As I mentioned I don't know if this is the correct fix but the query was running at 0.75 second range and it now in the 0.09 second range. It fixes the "issue" now but is this the correct schema change as the data grows?


    As for creating and index on a datetime field, I thought I read that MySQL had trouble creating indexes on date fields (I could have misunderstood as well).
    Last edited by cranium; October 15th, 2013 at 03:32 PM.
  8. #5
  9. Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2002
    Location
    close to the edge
    Posts
    1,035
    Rep Power
    69
    Removed the index on partner_id and added one for date_lead_submitted

    Explain
    Code:
    id	select_type	table			type	possible_key		key	key_len	ref	rows	Extra
    1	SIMPLE		consolidated_data	ALL	date_lead_submitted	NULL	NULL	NULL	356982	Using where; Using filesort
    The time jumps up to about 0.65 seconds for the query, much slower than the partner_id index. I did run a query or two w/ both indexes but that was slower than the singular index of partner_id.

    Still curious if this is the correct indexing for the long haul since this is still doing a filesort>
  10. #6
  11. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,959
    Rep Power
    4035
    Exactly what indexes to add depend a bit on what kind of data you have and what would best reduce the number of rows.

    Add a single index across both the date_lead_submitted and partner_id columns and see how that affects things. Eg:
    Code:
    CREATE INDEX IX_date_partner ON consolidated_data (date_lead_submitted, partner_id);
    Recycle your old CD's, don't just trash them



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    11
    Still curious if this is the correct indexing for the long haul since this is still doing a filesort>
    The filesort is due to lack of an index the starts with (read: is sorted by) the startdate.

    That's why Kicken too recommends creating an index on both the date and the partner_id together. Putting the date first means the rest is ordered by date, just like your ORDER BY statement requires, so when the database reads the index it can simply copy the ordering from it, without having to do extra work.
  14. #8
  15. Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2002
    Location
    close to the edge
    Posts
    1,035
    Rep Power
    69
    Thank you for your input. I created the index as mentioned:

    Code:
    ...
      KEY `date_partner` (`date_lead_submitted`,`partner_id`)
    ...
    The query ran in 0.65 seconds, back up to where it was before. The explain still throws out "filesort" as well:

    Code:
    id	select_type	table			type	possible_key		key	key_len	ref	rows	Extra
    1	SIMPLE		consolidated_data	ALL	date_partner		NULL	NULL	NULL	358295	Using where; Using filesort
    Also, this seems to cause it to scan the entire table again.

    Any other thoughts? I don't mind trying just about anything.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    11
    The explan has two columns; possible_key and key. The possible key says that it the index could be used, but the key column says NULL so no index is being used.

    This baye be because the database thinks that limiting by the date only doesn't remove enough records.
    You could try adding the email and phone fields to the index.

    Alternatively, you could try getting rid of the OR and/or the IN in the WHERE clause. they cause the database to have to scan the index twice, which may be a showstopper for the queryplanner. If that is much faster, you can probably get away with running several queries; one for the email and one for the phone, and merge them later. Those two together could still be faster than one query with OR.

    And yes; that's purely a matter of trial and error, as optimization usually is :-)
  18. #10
  19. Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2002
    Location
    close to the edge
    Posts
    1,035
    Rep Power
    69
    Thank you for the thoughtful reply. I will work on this sometime today (or tomorrow if the boss finds other "priorities"). I will give some more feedback when I get the chance.
  20. #11
  21. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    If you break out the different search conditions you have in your query and try different searches like:
    Code:
    SELECT count(*) 	FROM consolidated_data WHERE partner_id IN (77,499);
    SELECT count(*) 	FROM consolidated_data WHERE date_lead_submitted >= "2013-04-01" AND date_lead_submitted <= "2013-10-01";
    You will see that the condition on partner_id just returns 103 rows while your condition on date_lead_submitted would probably return about the 300000+ rows you had in another of your explains.
    That way you will know that you should definitely have partner_id indexed for this type of query.

    Then you can decide on if you want to include the date_lead_submitted column in this index (if you have AND's between conditions for different columns in the query the query can benefit from a composite index).

    So if you then try the query:
    Code:
    SELECT count(*) 	FROM consolidated_data
    WHERE
      partner_id IN (77,499)
      AND date_lead_submitted >= "2013-04-01"
      AND date_lead_submitted <= "2013-10-01";
    You can compare that result with the result from just WHERE partner_id IN (77,499).

    If the difference between these two had been great then you could have added the date_lead_submitted column to the composite index. But remember to place the column that returns fewest rows as the first column.

    This would in this case mean to flip the order of the PartnerId and Date in the index:
    Code:
    CREATE INDEX IX_partner_date ON consolidated_data (partner_id, date_lead_submitted);
    But at the same time, you would right now sort only 103 rows based on a single partner_id index. And that goes so fast that adding the date_lead_submitted to the index would probably just make the index bigger without speeding up anything.

    Because the negative effects of adding columns to an index is that the index becomes bigger and consumes memory and is slower to scan.

    A "Using filesort" in the explain isn't always bad. If the rows that has to be sorted is just a couple of hundred then the CPU will take care of that so fast that it's not worth trying to optimize it away with bigger indexes. But a "Using filesort" on 10,000+ rows, especially if you are selecting a lot of columns, becomes a big bottleneck and you should try to get rid of it changing your indexes or rewriting your query.

    Summary:
    Your initial question was if the single index on partner_id was enough for the future. And based on what we've seen now with your example query it should be more than enough.
    But if any queries include much more partner_id's in the IN() then the amount of rows might go up so much that adding the date column at the end of the index could be considered.
    /Stefan

IMN logo majestic logo threadwatch logo seochat tools logo