|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now! |
|
#1
|
|||
|
|||
|
database normalization vs (?) optimization
(Warning - this post might belong in the MySQL forum, but since it's a real beginner-type question I opted to put it here.)
I've described my work-in-progress in the MySQL forum, but for those who haven't seen it: I'm creating a database of articles and other information about political issues available on the internet. Each "item" will be coded as relating to various narrowly defined "subjects" - usually several subjects for each item. The subjects are then combined into broader "categories" for the user interface. I've done it this way because many subjects fit in more than one category, just like most items fit in more than one subject (and often, also in more than one category). A quick example: An article about Social Security privatization would, at minimum, be associated with the subjects taxes, budget, economic policy, social security, and the Bush Administration (it's his plan, after all). Each of those subjects (except taxes) are associated with more than one category. After some study and some help from kind users at my hosting provider's board, I've determined that the most "normalized" form of this database would be to have five tables: three "unique information" tables - items subjects categories and two tables to provide relationships between the three tables - item_to_sub cat_to_sub The select statement based on the above would (in part) look like this (breaking it up for easier reading): SELECT item, url, date, type, desc FROM items, item_to_sub, cat_to_sub WHERE items.item_id=item_to_sub.item_id AND item_to_sub.subject_id=cat_to_sub.subject_id AND cat_to_sub.category_id=$selected_category (note: all the "_id" fields are auto-increment primary keys (or indexed where they're "foreign keys")) But this seems to me to create unnecessary extra work for the database. IF I'm understanding the process right (a big "if"), the "AND"s in the above select would require the search to look at *every* item_id in each subject included in the category selected. I'm thinking of dispensing with the 'categories' and 'cat_to_sub' tables and using a select that includes an OR list of subjects - in effect "generating" categories via the subjects included in a select statment such as: SELECT DISTINCT item, url, date, type, desc FROM items, item_to_sub WHERE items.item_id=item_to_sub.item_id AND subject_id=(2 OR 6 OR 15 OR 39) It's not as "normalized" but it seems more optimized if (here's where my understanding may be faulty) the engine would stop searching on any given item_id as soon as it found ANY match for one of the subject_ids in the "OR" list. Wouldn't this be faster/better than having to search through ALL the "item_ids" associated with ALL the "subject id's" associated with the selected category before returning the results? (Note, I'm pretty sure I have the OR syntax wrong, but none of the books/tutorials I've read use multiple ORs in their examples. From other things I've seen, I'd bet I have to include the "subject_id=" part of the statment for each of the subjects identified in the OR list.) Am I misunderstanding how "OR" works? If so, can you explain where I'm wrong or point me to more info? Karen
__________________
Life is a Rorschach Test. Last edited by kjg : February 10th, 2002 at 02:35 AM. |
|
#2
|
|||
|
|||
|
'scuze me, moderators
Maybe this post should be moved to the MySQL forum? (If that can be done, that is... if not, I can just copy and paste it, but I didn't want to post it in two places if I could avoid it.)
|
|
#3
|
|||
|
|||
|
Thought I'd mention that I decided to include the category and cat_to_sub tables, and wait till I have a hundred or so records, then run the queries both ways to see which is faster/uses less resources.
If anybody's interested in the results, I'll post them here. Karen |
|
#4
|
||||
|
||||
|
You'd probably get more response if you posted this in the MySQL forum...
Anyway, if I understand you correctly, you have indexes on all of your PK's/FK's, which is a good start. But your first query would be more efficient if you used the ANSI syntax for JOINs instead of doing it in your WHERE clause. Doing so would help the database in finding what you want the most efficient way. Doing it in the where clause can/will slow things down. So some INNER JOINS would probably help you. As for normalizing to a what extent, it's really up to you and the way you implement your code to decide which is best. The more you normalize, the merrier will your database be (to a certain degree). Your second query needs OR subject_id=X OR subject_id=Y to avoid an error. Though it would be better to use a construct like ... WHERE subject_id IN (2, 6, 15, 39). The database wouldn't stop searching when it finds one criteria to be true, it would continue searching and evaluate every condition you put in there. That's one reason for why using the ANSI syntax is faster than your approach. Though I'm not absolutely sure on how the database works internally on these types of queries... But your approach with testing which way is faster, is definitly an excellent choice. Sometimes that's the only way to find things out... And just to be sure, here's some info about normalization that you probably already knew about... http://www.devshed.com/Server_Side/MySQL/Normal http://www.phpbuilder.com/columns/barry20000731.php3 Try a Google search on database normalization as well... Good luck //NoXcuz
__________________
UN*X is sexy! who | grep -i blonde | date; cd ~; unzip; touch; strip; finger; mount; gasp; yes; uptime; umount; sleep |
|
#5
|
|||
|
|||
|
um... er... ANSI???
Ok, I'm kinda kidding - I've actually seen the word, and I know what joins *are*, but I don't know how to make my query INTO a join. Part of my problem, I think, is that I'm actually trying to do two things at once with the query. First, as noted above, pull out all the items identified with all the subjects that comprise the category, and second, to suppress duplicate item entries where an item is associated with more than one subject in the category. I'm working on paring down my subject list to have fewer instances where an item will be associated with more than one subject in a category, but I can't get rid of all of them and retain the flexibility I need. Maybe there's a way to do it using PHP (I'm just getting started learning that), but so far, it seems that the only way I can suppress duplicates is to create a temporary "heap" table, then pull unique (DISTINCT) item_id's from it. **edit** I guess that isn't too clear. What I mean is I know what I want to do, and I even have a notion of how the code *should* go - that is, how it would go if I didn't have to take into account MySQL syntax or capabilities <G> - If I could write it any way I want, it would go something like: SELECT DISTINCT items.item_id FROM items, subjects WHERE items.item_id = subjects.item_id AND subjects.subject_id = (OR list of subject_id's here) (creates a temporary table) SELECT items.item_id, title, url, date, type, description, FROM items, temporary WHERE items.item_id = temporary.item_id Then the PHP coding for printing out the results. The problem is, I don't know if that's *possible* in MySQL syntax and capabilities, and if it IS possible, I don't know how to write it! **end edit** **edit 2** HEY! I don't even need the items table in the first part! SELECT DISTINCT item_id FROM subjects WHERE subject_id = (OR list or a "having" clause) (temporary table created) SELECT items.item_id, temporary.item_id, title, url, date, type, description, FROM items, temporary WHERE items.item_id = temporary.item_id etc... **end edit 2** I probably should have put this in the MySQL forum, but I figured it was the kind of question only a beginner would ask. Karen Last edited by kjg : February 17th, 2002 at 05:44 PM. |
![]() |
| Viewing: Dev Shed Forums > Other > Beginner Programming > database normalization vs (?) optimization |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|