Beginner Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOtherBeginner Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now!
  #1  
Old February 8th, 2002, 06:16 PM
kjg kjg is offline
apprentice geek
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2002
Location: USA
Posts: 37 kjg User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 h 13 m 31 sec
Reputation Power: 7
database normalization vs (?) optimization

(Warning - this post might belong in the MySQL forum, but since it's a real beginner-type question I opted to put it here.)

I've described my work-in-progress in the MySQL forum, but for those who haven't seen it: I'm creating a database of articles and other information about political issues available on the internet. Each "item" will be coded as relating to various narrowly defined "subjects" - usually several subjects for each item. The subjects are then combined into broader "categories" for the user interface. I've done it this way because many subjects fit in more than one category, just like most items fit in more than one subject (and often, also in more than one category).

A quick example: An article about Social Security privatization would, at minimum, be associated with the subjects taxes, budget, economic policy, social security, and the Bush Administration (it's his plan, after all). Each of those subjects (except taxes) are associated with more than one category.

After some study and some help from kind users at my hosting provider's board, I've determined that the most "normalized" form of this database would be to have five tables: three "unique information" tables -

items
subjects
categories

and two tables to provide relationships between the three tables -

item_to_sub
cat_to_sub

The select statement based on the above would (in part) look like this (breaking it up for easier reading):

SELECT item, url, date, type, desc

FROM items, item_to_sub, cat_to_sub

WHERE items.item_id=item_to_sub.item_id
AND item_to_sub.subject_id=cat_to_sub.subject_id
AND cat_to_sub.category_id=$selected_category

(note: all the "_id" fields are auto-increment primary keys (or indexed where they're "foreign keys"))

But this seems to me to create unnecessary extra work for the database. IF I'm understanding the process right (a big "if"), the "AND"s in the above select would require the search to look at *every* item_id in each subject included in the category selected.

I'm thinking of dispensing with the 'categories' and 'cat_to_sub' tables and using a select that includes an OR list of subjects - in effect "generating" categories via the subjects included in a select statment such as:

SELECT DISTINCT item, url, date, type, desc

FROM items, item_to_sub

WHERE items.item_id=item_to_sub.item_id
AND subject_id=(2 OR 6 OR 15 OR 39)

It's not as "normalized" but it seems more optimized if (here's where my understanding may be faulty) the engine would stop searching on any given item_id as soon as it found ANY match for one of the subject_ids in the "OR" list. Wouldn't this be faster/better than having to search through ALL the "item_ids" associated with ALL the "subject id's" associated with the selected category before returning the results?

(Note, I'm pretty sure I have the OR syntax wrong, but none of the books/tutorials I've read use multiple ORs in their examples. From other things I've seen, I'd bet I have to include the "subject_id=" part of the statment for each of the subjects identified in the OR list.)

Am I misunderstanding how "OR" works? If so, can you explain where I'm wrong or point me to more info?

Karen
__________________
Life is a Rorschach Test.

Last edited by kjg : February 10th, 2002 at 02:35 AM.

Reply With Quote
  #2  
Old February 10th, 2002, 02:38 AM
kjg kjg is offline
apprentice geek
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2002
Location: USA
Posts: 37 kjg User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 h 13 m 31 sec
Reputation Power: 7
'scuze me, moderators

Maybe this post should be moved to the MySQL forum? (If that can be done, that is... if not, I can just copy and paste it, but I didn't want to post it in two places if I could avoid it.)

Reply With Quote
  #3  
Old February 13th, 2002, 02:29 AM
kjg kjg is offline
apprentice geek
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2002
Location: USA
Posts: 37 kjg User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 h 13 m 31 sec
Reputation Power: 7
Thought I'd mention that I decided to include the category and cat_to_sub tables, and wait till I have a hundred or so records, then run the queries both ways to see which is faster/uses less resources.

If anybody's interested in the results, I'll post them here.

Karen

Reply With Quote
  #4  
Old February 16th, 2002, 06:37 PM
NoXcuz's Avatar
NoXcuz NoXcuz is offline
Wiking
Dev Shed Expert (3500 - 3999 posts)
 
Join Date: Sep 2000
Location: Sweden
Posts: 3,608 NoXcuz User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 h 49 m 27 sec
Reputation Power: 11
You'd probably get more response if you posted this in the MySQL forum...
Anyway, if I understand you correctly, you have indexes on all of your PK's/FK's, which is a good start. But your first query would be more efficient if you used the ANSI syntax for JOINs instead of doing it in your WHERE clause. Doing so would help the database in finding what you want the most efficient way. Doing it in the where clause can/will slow things down. So some INNER JOINS would probably help you.

As for normalizing to a what extent, it's really up to you and the way you implement your code to decide which is best. The more you normalize, the merrier will your database be (to a certain degree).

Your second query needs OR subject_id=X OR subject_id=Y to avoid an error. Though it would be better to use a construct like
... WHERE subject_id IN (2, 6, 15, 39).

The database wouldn't stop searching when it finds one criteria to be true, it would continue searching and evaluate every condition you put in there. That's one reason for why using the ANSI syntax is faster than your approach. Though I'm not absolutely sure on how the database works internally on these types of queries...

But your approach with testing which way is faster, is definitly an excellent choice. Sometimes that's the only way to find things out...

And just to be sure, here's some info about normalization that you probably already knew about...
http://www.devshed.com/Server_Side/MySQL/Normal
http://www.phpbuilder.com/columns/barry20000731.php3
Try a Google search on database normalization as well...

Good luck

//NoXcuz
__________________
UN*X is sexy!
who | grep -i blonde | date; cd ~; unzip; touch; strip; finger; mount; gasp; yes; uptime; umount; sleep

Reply With Quote
  #5  
Old February 17th, 2002, 04:53 PM
kjg kjg is offline
apprentice geek
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2002
Location: USA
Posts: 37 kjg User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 h 13 m 31 sec
Reputation Power: 7
um... er... ANSI???



Ok, I'm kinda kidding - I've actually seen the word, and I know what joins *are*, but I don't know how to make my query INTO a join. Part of my problem, I think, is that I'm actually trying to do two things at once with the query. First, as noted above, pull out all the items identified with all the subjects that comprise the category, and second, to suppress duplicate item entries where an item is associated with more than one subject in the category.

I'm working on paring down my subject list to have fewer instances where an item will be associated with more than one subject in a category, but I can't get rid of all of them and retain the flexibility I need.

Maybe there's a way to do it using PHP (I'm just getting started learning that), but so far, it seems that the only way I can suppress duplicates is to create a temporary "heap" table, then pull unique (DISTINCT) item_id's from it.

**edit**
I guess that isn't too clear. What I mean is I know what I want to do, and I even have a notion of how the code *should* go - that is, how it would go if I didn't have to take into account MySQL syntax or capabilities <G> - If I could write it any way I want, it would go something like:

SELECT DISTINCT items.item_id FROM items, subjects
WHERE items.item_id = subjects.item_id
AND subjects.subject_id = (OR list of subject_id's here)

(creates a temporary table)

SELECT items.item_id, title, url, date, type, description, FROM items, temporary
WHERE items.item_id = temporary.item_id

Then the PHP coding for printing out the results.

The problem is, I don't know if that's *possible* in MySQL syntax and capabilities, and if it IS possible, I don't know how to write it!

**end edit**

**edit 2**

HEY! I don't even need the items table in the first part!

SELECT DISTINCT item_id FROM subjects
WHERE subject_id = (OR list or a "having" clause)

(temporary table created)

SELECT items.item_id, temporary.item_id, title, url, date, type, description, FROM items, temporary
WHERE items.item_id = temporary.item_id

etc...

**end edit 2**

I probably should have put this in the MySQL forum, but I figured it was the kind of question only a beginner would ask.

Karen

Last edited by kjg : February 17th, 2002 at 05:44 PM.

Reply With Quote
Reply

Viewing: Dev Shed ForumsOtherBeginner Programming > database normalization vs (?) optimization


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway