#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2001
    Location
    knee deep in sh*t
    Posts
    254
    Rep Power
    65

    Query runs to long in spite of index


    Hi All,

    I have a simple query to run on a table that takes ~11 - 15 seconds to complete:
    Code:
    SELECT SQL_NO_CACHE
    	*
    FROM
    	post_inquiries
    WHERE
    	campaign_id = 252
    Create syntax
    Code:
    CREATE TABLE `post_inquiries` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `affiliate_id` int(10) unsigned NOT NULL,
      `campaign_id` int(10) unsigned NOT NULL,
      `post_request_id` int(10) unsigned NOT NULL,
      `insert_date` datetime NOT NULL,
      `student_ip_address` varchar(15) DEFAULT NULL,
      `first_name` varchar(50) NOT NULL DEFAULT '',
      `middle_initial` char(1) NOT NULL DEFAULT '',
      `last_name` varchar(50) NOT NULL DEFAULT '',
      `address1` varchar(100) NOT NULL DEFAULT '',
      `address2` varchar(50) NOT NULL DEFAULT '',
      `address3` varchar(50) NOT NULL DEFAULT '',
      `city` varchar(30) NOT NULL DEFAULT '',
      `state` char(2) NOT NULL DEFAULT '',
      `zip` varchar(10) NOT NULL DEFAULT '',
      `country` char(2) NOT NULL DEFAULT '',
      `email` varchar(50) NOT NULL DEFAULT '',
      `phone` varchar(10) NOT NULL DEFAULT '',
      `phone2` varchar(10) NOT NULL DEFAULT '',
      `military` enum('n','y') NOT NULL DEFAULT 'n',
      `education_level` tinyint(3) unsigned NOT NULL DEFAULT '0',
      `graduation_year` smallint(6) NOT NULL DEFAULT '0',
      `birthday` date NOT NULL DEFAULT '0000-00-00',
      `high_school_name` varchar(100) NOT NULL,
      `citizen` enum('n','y') NOT NULL DEFAULT 'y',
      `grade_level` tinyint(3) unsigned NOT NULL,
      `field_of_interest` varchar(100) NOT NULL,
      `post_inquiries_batch_id` int(11) DEFAULT NULL,
      `preping_passback_id` int(10) DEFAULT NULL,
      `td_email_status_code` smallint(5) unsigned DEFAULT NULL,
      `td_phone_status_code` smallint(5) unsigned DEFAULT NULL,
      `processing` tinyint(4) DEFAULT NULL,
      `exported` tinyint(1) unsigned DEFAULT '0',
      `export_id` int(10) unsigned NOT NULL DEFAULT '0',
      `process` enum('n','y') NOT NULL DEFAULT 'n',
      `deleted` enum('n','y') NOT NULL DEFAULT 'n',
      PRIMARY KEY (`id`),
      KEY `campaign_id` (`campaign_id`)
    ) ENGINE=InnoDB
    The query is the "start" of something larger. There are ~460,000 records in the table and there are ~60 unique campign_id values.

    I did an explain for the query above:
    Code:
    select_type: SIMPLE
    table: post_inquiries
    type: ref
    possible_keys: campaign_id
    key: campaign_id
    key_len: 4
    ref: const
    rows: 229216	
    extra:
    Originally there were more indexes, I removed them (this is a dev db), optimized the table thinking that the number of indexes were causing an issue. No dice.

    What is going on to make the query take so long to run?

    Thank you for your time and help,
    oach
  2. #2
  3. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,030
    Rep Power
    4210
    Originally Posted by oach
    What is going on to make the query take so long to run?
    perhaps the fact that you're trying to return all columns for approx 80,000 rows

    are you seriously going to process them all?

    Comments on this post

    • oach agrees : Thank you for the help
    r937.com | rudy.ca
    please visit Simply SQL and buy my book
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2001
    Location
    knee deep in sh*t
    Posts
    254
    Rep Power
    65
    Originally Posted by r937
    perhaps the fact that you're trying to return all columns for approx 80,000 rows

    are you seriously going to process them all?
    What the query is to evolve into is counting the number of times an ip appears in the table for a specific campaign_id over x number of days. We receive up to 5k records a day for certain campaigns. The x days will be in the 90 - 180 day range.

    So, yes, I need to look at a large amount of data to determine if a certain IP is flooding, refusing the data.

    Why would returning 80k rows on a simple id be so slow? Ideas on how I can get this done in a timely manner? Do I need to make a separate table for the ip joined to this main table?
  6. #4
  7. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Originally Posted by oach
    Why would returning 80k rows on a simple id be so slow?
    Reread Rudy's post:
    Originally Posted by r937
    perhaps the fact that you're trying to return all columns for approx 80,000 rows
    This means don't use SELECT * and expect good performance.
    In this case by using SELECT * you are say return all data for every matching row but since actually just want to check if an IP is flooding you don't have any use for knowing if the students name for every row is John Doe or Henry Ford and their address, email address, toilet paper preference etc, etc.

    So start by modifying this query so that you only return the columns that you need to solve the IP-flooding analysis.

    Now you are still returning as many rows, but you are returning less than 1% of the data compared to a SELECT * since you are avoiding a lot of unnecessary student information.

    So to solve your problem I guess the only two things that you need is the IP-address and the insert_date so that you can see when a certain event occurred.

    And since you need to loop through these I'm guessing that it's a good thing to have them in order so my suggestion for this query would be something like (unless you actually know the limits of the time span where you could add these in the SQL query itself and calculate how many times each IP has been used within that time span):
    Code:
    SELECT SQL_NO_CACHE
      ip-address
      ,insert_date
    FROM
    	post_inquiries
    WHERE
    	campaign_id = 252
    ORDER
      BY
        ip-address
        ,insert_date
    And with the query above you can add an index like:
    Code:
    ALTER TABLE post_inquiries ADD INDEX pi_ix_campaign_ip_insertdate (campaign_id, student_ip_address, insert_date);
    That can make a query like the one above lightning fast.

    PS:
    Here's performance trick 1A: Never, ever use SELECT * unless you know why you are doing it and not even then.

    Comments on this post

    • oach agrees : Thank you for the help
    /Stefan
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2001
    Location
    knee deep in sh*t
    Posts
    254
    Rep Power
    65
    Thank you both for your assistance. It greatly helped, especially the detailed reply as it gave me a better understanding of what is going on.

IMN logo majestic logo threadwatch logo seochat tools logo