MySQL Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsDatabasesMySQL Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 26th, 2013, 05:12 PM
jcobban jcobban is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2010
Posts: 10 jcobban User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 55 m 3 sec
Reputation Power: 0
Arrow Query Performance

I am having difficulty figuring out how to improve the performance of a query. The query is used to obtain a summary of individuals from a family tree database based upon their names. Note that each individual may have more than one name. For example a married person has both a birth name and a married name (which may be the same). A person may also have an adopted name, or a nom de plume, etc.

Code:
SELECT DISTINCT tblNX.IDIR, tblNX.Surname,
			  tblIR.Surname, tblNX.GivenName,
			  tblIR.BirthD, tblIR.DeathD, tblIR.Gender
			FROM tblNX JOIN tblIR on tblIR.IDIR=tblNX.IDIR
			WHERE ((tblNX.Surname='Campbell' AND 
			                         tblNX.GivenName>='John') OR
			             tblNX.Surname>'Campbell') AND
                                                  tblNX.`Order`>=0 
			ORDER BY tblNX.SurName, tblNX.GivenName,
				 tblIR.BirthSD, tblIR.DeathSD LIMIT 50


The table containing the basic data on the individual is tblIR:

Code:
CREATE TABLE `tblIR` ( 
  `ID` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, 
  `IDIR` INT(10) UNSIGNED DEFAULT NULL, 
  `FSID` VARCHAR(255) DEFAULT NULL, 
  `Surname` VARCHAR(120) DEFAULT NULL, 
  `SoundsLike` VARCHAR(4) DEFAULT NULL, 
  `GivenName` VARCHAR(120) DEFAULT NULL, 
  `Prefix` VARCHAR(120) DEFAULT NULL, 
  `Title` VARCHAR(120) DEFAULT NULL, 
  `NameNote` LONGTEXT, 
  `Gender` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0, 
  `BirthD` VARCHAR(100) DEFAULT NULL, 
  `BirthSD` INT(10) NOT NULL DEFAULT -99999999, 
  `IDLRBirth` INT(10) UNSIGNED DEFAULT NULL, 
  `ChrisD` VARCHAR(100) DEFAULT NULL, 
  `ChrisSD` INT(10) NOT NULL DEFAULT -99999999, 
  `IDLRChris` INT(10) UNSIGNED DEFAULT NULL, 
  `ChrTerm` VARCHAR(100) DEFAULT NULL, 
  `DeathD` VARCHAR(100) DEFAULT NULL, 
  `DeathSD` INT(10) NOT NULL DEFAULT -99999999, 
  `IDLRDeath` INT(10) UNSIGNED DEFAULT NULL, 
  `BuriedD` VARCHAR(100) DEFAULT NULL, 
  `BuriedSD` INT(10) NOT NULL DEFAULT -99999999, 
  `IDLRBuried` INT(10) UNSIGNED DEFAULT NULL, 
  ...
  PRIMARY KEY (`ID`), 
  UNIQUE KEY `IDIR` (`IDIR`), 
  KEY `Surname` (`Surname`), 
  KEY `GivenName` (`GivenName`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8 


The nominal index, which includes all of the alternative names, is in table tblNX:
Code:
CREATE TABLE `tblNX` ( 
  `IDNX` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, 
  `IDNR` INT(10) UNSIGNED DEFAULT NULL, 
  `IDIR` INT(10) UNSIGNED DEFAULT NULL, 
  `Order` smallint(5) NOT NULL DEFAULT 0, 
  `MarriedNameCreatedBy` TINYINT(3) UNSIGNED DEFAULT NULL, 
  `MarriedNameMarIDID` INT(10) UNSIGNED DEFAULT NULL, 
  `Prefix` VARCHAR(120) DEFAULT NULL, 
  `Title` VARCHAR(120) DEFAULT NULL, 
  `Surname` VARCHAR(120) NOT NULL DEFAULT '', 
  `GivenName` VARCHAR(120) NOT NULL DEFAULT '', 
  `SoundsLike` VARCHAR(4) NOT NULL DEFAULT '', 
  `UserRef` VARCHAR(50) DEFAULT NULL, 
  `AKANote` LONGTEXT, 
  `PreferredAKA` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0, 
  `BirthSD` INT(10) NOT NULL DEFAULT -99999999, 
  `SrchTag` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0, 
  `qsTag` TINYINT(3) UNSIGNED NOT NULL DEFAULT 0, 
  PRIMARY KEY (`IDNX`), 
  KEY `IDIR` (`IDIR`), 
  KEY `Surname` (`Surname`), 
  KEY `GivenName` (`GivenName`), 
  KEY `SoundsLike` (`SoundsLike`), 
) ENGINE=MyISAM DEFAULT CHARSET=utf8

problem
Explain on this query returns:
Code:
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys          | key  | key_len | ref                          | rows  | Extra                           |
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | tblIR | ALL  | IDIR                   | NULL | NULL    | NULL                         | 67863 | Using temporary; Using filesort |
|  1 | SIMPLE      | tblNX | ref  | IDIR,Surname,GivenName | IDIR | 5       | jcobban_genealogy.tblIR.IDIR |     1 | Using where                     |
+----+-------------+-------+------+------------------------+------+---------+------------------------------+-------+---------------------------------+

As I understand this, the server is performing a linear search of the entire table tblIR, which currently has 67863 rows, and then uses IDIR to examine the associated rows of tblNX. This has poor performance and will only get worse as more rows are added. I do not understand why the primary search is not on tblNX, which is the only table referenced in the WHERE clause, or why the key of tblIR, which is explicitly mentioned in the JOIN, is not being used.

If I extract the core of the query I get:
Code:
explain select idir, surname, givenname from tblNX where ((surname='Campbell' and givenname>='John') or surname>'Campbell') order by surname, givenname, birthsd limit 20;
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
| id | select_type | table | type  | possible_keys     | key     | key_len | ref  | rows  | Extra                       |
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
|  1 | SIMPLE      | tblNX | range | Surname,GivenName | Surname | 363     | NULL | 78515 | Using where; Using filesort |
+----+-------------+-------+-------+-------------------+---------+---------+------+-------+-----------------------------+
1 row in set (0.00 sec)

This does show the query exploiting the surname index, although not the given name index. The performance of this query is probably acceptable. The problem is that while tblNX contains all of the name information it does not contain the "maiden" surname, gender, birth date, and death date information which I need to display to the end user, which is why I did the join of tblIR.

How do I restructure this query to get MySQL to do the tblNX query exploiting the surname index and then join the tblIR information into the response?

Reply With Quote
  #2  
Old February 28th, 2013, 04:13 AM
sr sr is online now
Problem Solver
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2001
Location: Stockholm, Sweden
Posts: 4,430 sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 7 h 26 m 2 sec
Reputation Power: 532
Quote:
Originally Posted by jcobban
Code:
SELECT DISTINCT tblNX.IDIR, tblNX.Surname,
			  tblIR.Surname, tblNX.GivenName,
			  tblIR.BirthD, tblIR.DeathD, tblIR.Gender
			FROM tblNX JOIN tblIR on tblIR.IDIR=tblNX.IDIR
			WHERE ((tblNX.Surname='Campbell' AND 
			                         tblNX.GivenName>='John') OR
			             tblNX.Surname>'Campbell') AND
                                                  tblNX.`Order`>=0 
			ORDER BY tblNX.SurName, tblNX.GivenName,
				 tblIR.BirthSD, tblIR.DeathSD LIMIT 50


What are you actually testing for with your above conditions?
Because the operators >= and > are numeric operators and should not be used on strings in the way that you are using them. And most times when you force implicit data conversion like this indexes can't be used properly.
__________________
/Stefan

Reply With Quote
  #3  
Old February 28th, 2013, 04:35 AM
Aurum84 Aurum84 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 74 Aurum84 User rank is Sergeant (500 - 2000 Reputation Level)Aurum84 User rank is Sergeant (500 - 2000 Reputation Level)Aurum84 User rank is Sergeant (500 - 2000 Reputation Level)Aurum84 User rank is Sergeant (500 - 2000 Reputation Level)Aurum84 User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 3 Days 23 h 5 m 49 sec
Reputation Power: 17
Quote:
Originally Posted by sr
What are you actually testing for with your above conditions?
Because the operators >= and > are numeric operators and should not be used on strings in the way that you are using them. And most times when you force implicit data conversion like this indexes can't be used properly.


Actually to my belief (and trial/error experience) using the > and < operators do work for Strings.
I agree on the statement that the index does not work, because of the inequality operators used. The index is built for a specific value in the column, and not on any conversion of that value.

Reply With Quote
  #4  
Old March 1st, 2013, 07:50 AM
sr sr is online now
Problem Solver
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2001
Location: Stockholm, Sweden
Posts: 4,430 sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 7 h 26 m 2 sec
Reputation Power: 532
Quote:
Originally Posted by Aurum84
Actually to my belief (and trial/error experience) using the > and < operators do work for Strings.

Yes but usually not in the way that is expected (if one can expect anything from a numeric operator operating on strings ).

Since it performs a byte-by-byte comparison of the two strings until it finds a difference and those bytes decide which is larger/smaller of the two strings you get comparison results like:
Code:
mysql> select 'AndersenHasAVeryLongName' > 'Campbell';
+-----------------------------------------+
| 'AndersenHasAVeryLongName' > 'Campbell' |
+-----------------------------------------+
|                                       0 |
+-----------------------------------------+
1 row in set (0.00 sec)

mysql> select 'Springsteen' > 'Campbell';
+----------------------------+
| 'Springsteen' > 'Campbell' |
+----------------------------+
|                          1 |
+----------------------------+
1 row in set (0.00 sec)

That often doesn't really make any sense, I mean why is Springsteen larger than Campbell ?

Reply With Quote
  #5  
Old March 1st, 2013, 01:41 PM
r937's Avatar
r937 r937 is offline
SQL Consultant
Click here for more information.
 
Join Date: Feb 2003
Location: Toronto Canada
Posts: 26,375 r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level) 
Time spent in forums: 3 Months 1 Week 2 Days 7 h 30 m 23 sec
Reputation Power: 4140
Quote:
Originally Posted by sr
... I mean why is Springsteen larger than Campbell ?
because "larger" is the wrong semantic interpretation of the operator

look, you use > and < on dates too, right?

would you say one date is "larger" than another (hint: no)

so why insist on using "numeric size" for all three different purposes of the operators?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book

Reply With Quote
  #6  
Old March 1st, 2013, 03:38 PM
sr sr is online now
Problem Solver
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: Jan 2001
Location: Stockholm, Sweden
Posts: 4,430 sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level)sr User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 7 h 26 m 2 sec
Reputation Power: 532
Quote:
Originally Posted by r937
look, you use > and < on dates too, right?

would you say one date is "larger" than another (hint: no)

I beg to differ older man and say yes.

Date is time and time is an ever incrementing value that you can add, subtract and calculate the difference between two of them.

And what part of a date is not a numeric series:
Code:
mysql> create table mydate (col1 date not null);
Query OK, 0 rows affected (0.05 sec)

mysql> insert into mydate values (current_date);
Query OK, 1 row affected (0.01 sec)

mysql> insert into mydate select max(col1) + interval 1 day from mydate;
Query OK, 1 row affected (0.00 sec)

...


mysql> select * from mydate;
+------------+
| col1       |
+------------+
| 2013-03-01 |
| 2013-03-02 |
| 2013-03-03 |
| 2013-03-04 |
| 2013-03-05 |
| 2013-03-06 |
| 2013-03-07 |
| 2013-03-08 |
| 2013-03-09 |
| 2013-03-10 |
| 2013-03-11 |
| 2013-03-12 |
| 2013-03-13 |
| 2013-03-14 |
| 2013-03-15 |
| 2013-03-16 |
| 2013-03-17 |
| 2013-03-18 |
| 2013-03-19 |
| 2013-03-20 |
| 2013-03-21 |
| 2013-03-22 |
| 2013-03-23 |
| 2013-03-24 |
| 2013-03-25 |
| 2013-03-26 |
| 2013-03-27 |
| 2013-03-28 |
| 2013-03-29 |
| 2013-03-30 |
| 2013-03-31 |
| 2013-04-01 |
| 2013-04-02 |
| 2013-04-03 |
+------------+
34 rows in set (0.00 sec)


The fact that you guys on the other side of the puddle don't usually use ISO8601 format to represent a date doesn't make it less of an incrementing series of values (albeit a bit odd base varying between 28-31 for the days).

Reply With Quote
  #7  
Old March 1st, 2013, 04:18 PM
r937's Avatar
r937 r937 is offline
SQL Consultant
Click here for more information.
 
Join Date: Feb 2003
Location: Toronto Canada
Posts: 26,375 r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level)r937 User rank is General 47th Grade (Above 100000 Reputation Level) 
Time spent in forums: 3 Months 1 Week 2 Days 7 h 30 m 23 sec
Reputation Power: 4140
Quote:
Originally Posted by sr
Date is time and time is an ever incrementing value that you can add, subtract and calculate the difference between two of them.

And what part of a date is not a numeric series:
strings are characters and characters are ever incrementing in the collating sequence

and what part of characters are not in a collating sequence?

also, i beg to differ, you can ~not~ add two dates

plus, stop lumping me in with the americans... we canadians are a lot smarter than they are, and we always use yyyy-mm-dd format


Reply With Quote
Reply

Viewing: Dev Shed ForumsDatabasesMySQL Help > Query Performance

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap