The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Databases
> MySQL Help
|
Query Performance
Discuss Query Performance in the MySQL Help forum on Dev Shed. Query Performance MySQL Help forum discussing administration, SQL syntax, and other MySQL-related topics. MySQL is an open-source relational database management system (RDBMS).
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

February 26th, 2013, 05:12 PM
|
|
Registered User
|
|
Join Date: Jan 2010
Posts: 10
Time spent in forums: 1 h 55 m 3 sec
Reputation Power: 0
|
|
Query Performance
I am having difficulty figuring out how to improve the performance of a query. The query is used to obtain a summary of individuals from a family tree database based upon their names. Note that each individual may have more than one name. For example a married person has both a birth name and a married name (which may be the same). A person may also have an adopted name, or a nom de plume, etc.
Code:
SELECT DISTINCT tblNX.IDIR, tblNX.Surname,
tblIR.Surname, tblNX.GivenName,
tblIR.BirthD, tblIR.DeathD, tblIR.Gender
FROM tblNX JOIN tblIR on tblIR.IDIR=tblNX.IDIR
WHERE ((tblNX.Surname='Campbell' AND
tblNX.GivenName>='John') OR
tblNX.Surname>'Campbell') AND
tblNX.`Order`>=0
ORDER BY tblNX.SurName, tblNX.GivenName,
tblIR.BirthSD, tblIR.DeathSD LIMIT 50
The table containing the basic data on the individual is tblIR:
Code:
CREATE TABLE `tblIR` (
`ID` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`IDIR` INT(10) UNSIGNED DEFAULT NULL,
`FSID` VARCHAR(255) DEFAULT NULL,
`Surname` VARCHAR(120) DEFAULT NULL,
`SoundsLike` VARCHAR(4) DEFAULT NULL,
`GivenName` VARCHAR(120) DEFAULT NULL,
`Prefix` VARCHAR(120) DEFAULT NULL,
`Title` VARCHAR(120) DEFAULT NULL,
`NameNote` LONGTEXT,
`Gender` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0,
`BirthD` VARCHAR(100) DEFAULT NULL,
`BirthSD` INT(10) NOT NULL DEFAULT -99999999,
`IDLRBirth` INT(10) UNSIGNED DEFAULT NULL,
`ChrisD` VARCHAR(100) DEFAULT NULL,
`ChrisSD` INT(10) NOT NULL DEFAULT -99999999,
`IDLRChris` INT(10) UNSIGNED DEFAULT NULL,
`ChrTerm` VARCHAR(100) DEFAULT NULL,
`DeathD` VARCHAR(100) DEFAULT NULL,
`DeathSD` INT(10) NOT NULL DEFAULT -99999999,
`IDLRDeath` INT(10) UNSIGNED DEFAULT NULL,
`BuriedD` VARCHAR(100) DEFAULT NULL,
`BuriedSD` INT(10) NOT NULL DEFAULT -99999999,
`IDLRBuried` INT(10) UNSIGNED DEFAULT NULL,
...
PRIMARY KEY (`ID`),
UNIQUE KEY `IDIR` (`IDIR`),
KEY `Surname` (`Surname`),
KEY `GivenName` (`GivenName`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The nominal index, which includes all of the alternative names, is in table tblNX:
Code:
CREATE TABLE `tblNX` (
`IDNX` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`IDNR` INT(10) UNSIGNED DEFAULT NULL,
`IDIR` INT(10) UNSIGNED DEFAULT NULL,
`Order` smallint(5) NOT NULL DEFAULT 0,
`MarriedNameCreatedBy` TINYINT(3) UNSIGNED DEFAULT NULL,
`MarriedNameMarIDID` INT(10) UNSIGNED DEFAULT NULL,
`Prefix` VARCHAR(120) DEFAULT NULL,
`Title` VARCHAR(120) DEFAULT NULL,
`Surname` VARCHAR(120) NOT NULL DEFAULT '',
`GivenName` VARCHAR(120) NOT NULL DEFAULT '',
`SoundsLike` VARCHAR(4) NOT NULL DEFAULT '',
`UserRef` VARCHAR(50) DEFAULT NULL,
`AKANote` LONGTEXT,
`PreferredAKA` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0,
`BirthSD` INT(10) NOT NULL DEFAULT -99999999,
`SrchTag` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0,
`qsTag` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (`IDNX`),
KEY `IDIR` (`IDIR`),
KEY `Surname` (`Surname`),
KEY `GivenName` (`GivenName`),
KEY `SoundsLike` (`SoundsLike`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8
problem
Explain on this query returns:
Code:
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+
| 1 | SIMPLE | tblIR | ALL | IDIR | NULL | NULL | NULL | 67863 | Using temporary; Using filesort |
| 1 | SIMPLE | tblNX | ref | IDIR,Surname,GivenName | IDIR | 5 | jcobban_genealogy.tblIR.IDIR | 1 | Using where |
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+
As I understand this, the server is performing a linear search of the entire table tblIR, which currently has 67863 rows, and then uses IDIR to examine the associated rows of tblNX. This has poor performance and will only get worse as more rows are added. I do not understand why the primary search is not on tblNX, which is the only table referenced in the WHERE clause, or why the key of tblIR, which is explicitly mentioned in the JOIN, is not being used.
If I extract the core of the query I get:
Code:
explain select idir, surname, givenname from tblNX where ((surname='Campbell' and givenname>='John') or surname>'Campbell') order by surname, givenname, birthsd limit 20;
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | tblNX | range | Surname,GivenName | Surname | 363 | NULL | 78515 | Using where; Using filesort |
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
1 row in set (0.00 sec)
This does show the query exploiting the surname index, although not the given name index. The performance of this query is probably acceptable. The problem is that while tblNX contains all of the name information it does not contain the "maiden" surname, gender, birth date, and death date information which I need to display to the end user, which is why I did the join of tblIR.
How do I restructure this query to get MySQL to do the tblNX query exploiting the surname index and then join the tblIR information into the response?
|

February 28th, 2013, 04:13 AM
|
|
Problem Solver
|
|
Join Date: Jan 2001
Location: Stockholm, Sweden
|
|
Quote: | Originally Posted by jcobban
Code:
SELECT DISTINCT tblNX.IDIR, tblNX.Surname,
tblIR.Surname, tblNX.GivenName,
tblIR.BirthD, tblIR.DeathD, tblIR.Gender
FROM tblNX JOIN tblIR on tblIR.IDIR=tblNX.IDIR
WHERE ((tblNX.Surname='Campbell' AND
tblNX.GivenName>='John') OR
tblNX.Surname>'Campbell') AND
tblNX.`Order`>=0
ORDER BY tblNX.SurName, tblNX.GivenName,
tblIR.BirthSD, tblIR.DeathSD LIMIT 50
|
What are you actually testing for with your above conditions?
Because the operators >= and > are numeric operators and should not be used on strings in the way that you are using them. And most times when you force implicit data conversion like this indexes can't be used properly.
__________________
/Stefan
|

February 28th, 2013, 04:35 AM
|
|
|
Quote: | Originally Posted by sr What are you actually testing for with your above conditions?
Because the operators >= and > are numeric operators and should not be used on strings in the way that you are using them. And most times when you force implicit data conversion like this indexes can't be used properly. |
Actually to my belief (and trial/error experience) using the > and < operators do work for Strings.
I agree on the statement that the index does not work, because of the inequality operators used. The index is built for a specific value in the column, and not on any conversion of that value.
|

March 1st, 2013, 07:50 AM
|
|
Problem Solver
|
|
Join Date: Jan 2001
Location: Stockholm, Sweden
|
|
Quote: | Originally Posted by Aurum84 Actually to my belief (and trial/error experience) using the > and < operators do work for Strings.
|
Yes but usually not in the way that is expected (if one can expect anything from a numeric operator operating on strings  ).
Since it performs a byte-by-byte comparison of the two strings until it finds a difference and those bytes decide which is larger/smaller of the two strings you get comparison results like:
Code:
mysql> select 'AndersenHasAVeryLongName' > 'Campbell';
+-----------------------------------------+
| 'AndersenHasAVeryLongName' > 'Campbell' |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
1 row in set (0.00 sec)
mysql> select 'Springsteen' > 'Campbell';
+----------------------------+
| 'Springsteen' > 'Campbell' |
+----------------------------+
| 1 |
+----------------------------+
1 row in set (0.00 sec)
That often doesn't really make any sense, I mean why is Springsteen larger than Campbell ? 
|

March 1st, 2013, 01:41 PM
|
 |
SQL Consultant
|
|
Join Date: Feb 2003
Location: Toronto Canada
|
|
Quote: | Originally Posted by sr ... I mean why is Springsteen larger than Campbell ?  | because "larger" is the wrong semantic interpretation of the operator
look, you use > and < on dates too, right?
would you say one date is "larger" than another (hint: no)
so why insist on using "numeric size" for all three different purposes of the operators?
|

March 1st, 2013, 03:38 PM
|
|
Problem Solver
|
|
Join Date: Jan 2001
Location: Stockholm, Sweden
|
|
Quote: | Originally Posted by r937
look, you use > and < on dates too, right?
would you say one date is "larger" than another (hint: no)
|
I beg to differ older man  and say yes.
Date is time and time is an ever incrementing value that you can add, subtract and calculate the difference between two of them.
And what part of a date is not a numeric series:
Code:
mysql> create table mydate (col1 date not null);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into mydate values (current_date);
Query OK, 1 row affected (0.01 sec)
mysql> insert into mydate select max(col1) + interval 1 day from mydate;
Query OK, 1 row affected (0.00 sec)
...
mysql> select * from mydate;
+------------+
| col1 |
+------------+
| 2013-03-01 |
| 2013-03-02 |
| 2013-03-03 |
| 2013-03-04 |
| 2013-03-05 |
| 2013-03-06 |
| 2013-03-07 |
| 2013-03-08 |
| 2013-03-09 |
| 2013-03-10 |
| 2013-03-11 |
| 2013-03-12 |
| 2013-03-13 |
| 2013-03-14 |
| 2013-03-15 |
| 2013-03-16 |
| 2013-03-17 |
| 2013-03-18 |
| 2013-03-19 |
| 2013-03-20 |
| 2013-03-21 |
| 2013-03-22 |
| 2013-03-23 |
| 2013-03-24 |
| 2013-03-25 |
| 2013-03-26 |
| 2013-03-27 |
| 2013-03-28 |
| 2013-03-29 |
| 2013-03-30 |
| 2013-03-31 |
| 2013-04-01 |
| 2013-04-02 |
| 2013-04-03 |
+------------+
34 rows in set (0.00 sec)
The fact that you guys on the other side of the puddle don't usually use ISO8601 format to represent a date doesn't make it less of an incrementing series of values (albeit a bit odd base varying between 28-31 for the days). 
|

March 1st, 2013, 04:18 PM
|
 |
SQL Consultant
|
|
Join Date: Feb 2003
Location: Toronto Canada
|
|
Quote: | Originally Posted by sr Date is time and time is an ever incrementing value that you can add, subtract and calculate the difference between two of them.
And what part of a date is not a numeric series:
| strings are characters and characters are ever incrementing in the collating sequence
and what part of characters are not in a collating sequence?
also, i beg to differ, you can ~not~ add two dates
plus, stop lumping me in with the americans... we canadians are a lot smarter than they are, and we always use yyyy-mm-dd format

|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|