SunQuest
           Database Management
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsDatabasesDatabase Management

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Be the architects of evolution and help create the mobile internet future. It’s your move---enter to win here!
  #1  
Old June 28th, 2002, 08:16 PM
jagerkiss jagerkiss is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2001
Location: Yogyakarta, Indonesia
Posts: 22 jagerkiss User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Normalization: Always Better?

Hello all!

I think my title already explain what I want to ask.
All I want to know is: Is there any possibility that an unnormalized database is faster, more efficient than a normalized one?
Or: Is it a certain that a normalized database always better than an unnormalized one?

Thanks

Reply With Quote
  #2  
Old June 28th, 2002, 08:22 PM
rod k rod k is offline
Apprentice Deity
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Jul 1999
Location: Niagara Falls (On the wrong side of the gorge)
Posts: 3,237 rod k User rank is Private First Class (20 - 50 Reputation Level)rod k User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 4 m 8 sec
Reputation Power: 13
Send a message via AIM to rod k
Normalized databases can be slower do to possible need to do extensive joins.
__________________
FSBO (For Sale By Owner) Realty

Reply With Quote
  #3  
Old June 28th, 2002, 09:16 PM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Pembroke Pines, Florida, USA
Posts: 2,300 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 3 h 2 m 4 sec
Reputation Power: 43
That topic has generated lots of discussion at comp.databases.theory, and www.dbdebunk.com, as well as many other locations for serious database talk.

Those who advise "denormalization for performance" generally approach it two ways: 1) denormalize your tables that cause the most performance hit, and just make sure (or hope) that your application code maintains data integrity, or 2) use denormalization in databases that are mainly for "data warehousing" or in other situations where the databases are read-only, in which case you don't need to worry about integrity.

Those who argue against the "denormalization for performance" concept point out that if you are using approach #1 above, then your database might perform better, but you will lose all that advantage by handling constraints in your application code. (And, you stand more chance of missing certain key constraints and corrupting your database). As far as situation #2, for data warehousing, or "OLAP", the question always arises "what happens when you want to write new data to the system?". If you aren't careful, you could be sorry. Also, your denormalized database causes many more design problems when you want to restructure it, or add new queries.

At www.dbdebunk.com, Fabian Pascal and others argue that the only reason normalized, relational databases have a performance hit is because of the vendors' bad/incomplete implementation of the relational data theory (meaning SQL). Theoretically, the relating of tables in joins is only a logical operation, and the database engine itself should be able to do whatever it needs in physical storage to optimize those queries. (for a very rudimentary example of this concept, Oracle has "materialized views", in which joins are propagated to a separate table, which can be queried without joins. Oracle's back end maintains the integrity of this data) The basic idea is that normalization is a logical concept, which the database presents to the user. The user should be completely shielded from physical storage considerations, and the database engine itself can essentially operate in any way needed to optimize the back end.

Unfortunately, we are left for the moment with DB products that don't really provide much in the way of this sort of optimization. So, the question we have to ask is "what is the trade-off". I personally agree that denormalization for performance brings many hidden costs, such as extra application code, bugs, etc..., so they should be avoided whenever possible. Also, normalization has many faces. Often you can get better performance simply by rethinking your design as a whole, without denormalizing. In situation #2, I suppose I can see a little more justification for denormalization. But for a live system, with constant updates, I would never trust a denormalized system. (been there, done that)
__________________
The real n-tier system:

FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

Amazon wishlist -- rycamor (at) gmail.com

Reply With Quote
  #4  
Old July 5th, 2002, 10:16 AM
MattR MattR is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2001
Location: High above the mucky-muck (Columbus, OH)
Posts: 266 MattR User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 8
Send a message via ICQ to MattR
The only time I've ever noticed a performance increase in de-normalizing was for aggregate-type queries. It is hard to, on the fly, take a 4+ million row and SUM( * ) or COUNT or AVG etc. WHERE some_composite_key = XYZ AND other_things

So we store the de-normalized version in a table with the counts and update them when a new row is inserted. The UPDATE cost is far less than the SELECT since we're updating a single row in a small table (< 40,000 rows).

The other problem situation I've noticed is on JOIN type queries

For example, this forum software stores forums and moderators in separate tables. If you were to JOIN the moderators and forum to each other in order to grab the mods of all the forums and the forum info in one query you will have a LOT of I/O. Why? Because the join will pull back lots of duplicate forum information, e.g.:

Code:
SELECT f.*,
       u.username,
       u.userid
  FROM forum                f
 INNER JOIN forum_moderator fm ON fm.forumid = f.forumid
 INNER JOIN user            u  ON u.userid   = fm.userid


So if you have 10 moderators you're going to have all the forum info duplicated, like this:

Code:
forumid, forum_title, etc. ... username, userid
-----------------------------------------------
1, 'Forum title', ... 'MattR', 1
1, 'Forum title', ... 'Joebob', 2
1, 'Forum title', ... 'Hippy', 23
1, 'Forum title', ... 'Admin', 234
1, 'Forum title', ... 'UserA', 3461
1, 'Forum title', ... 'UserB', 331
1, 'Forum title', ... 'Happy Gilmore', 97121


Which leads to a LOT of I/O back to your client application.

It's not necessarily a good candidate for de-normilization but if you say stored the forum moderators in a single row (horribly denormalized by forum_moderator( username1, userid1, username2, userid2, ... )) you could bring back the mods in a single result set.

Reply With Quote
  #5  
Old July 5th, 2002, 03:32 PM
pabloj's Avatar
pabloj pabloj is offline
Modding: Oracle MsSQL Firebird
Dev Shed God 6th Plane (7500 - 7999 posts)
 
Join Date: Jun 2001
Location: Outside US
Posts: 7,686 pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level)pabloj User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 2 Months 2 Weeks 5 Days 16 h 27 m 51 sec
Reputation Power: 259
If you are not registering transactions but analyzing a huge amount data, at various aggregation levels, then denormalization is better. Take a look at http://publib-b.boulder.ibm.com/Red...f4?OpenDocument (Data Modeling Techniques for Data Warehousing) it is very interesting

Reply With Quote
  #6  
Old July 6th, 2002, 10:12 AM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Pembroke Pines, Florida, USA
Posts: 2,300 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 3 h 2 m 4 sec
Reputation Power: 43

Reply With Quote
  #7  
Old July 9th, 2002, 09:17 PM
jagerkiss jagerkiss is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2001
Location: Yogyakarta, Indonesia
Posts: 22 jagerkiss User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Actually, I have developed a web translator which uses a database for its dictionary and other assisting data. I didn't normalized that dictionary, because I thought that would be faster. Also, I won't do many changes on the data on that dictionary.

Lately, I began to think, "Is there any faster or more efficient way to develop my dictionary?"

That's the background of my question ...

Reply With Quote
  #8  
Old July 9th, 2002, 11:14 PM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Pembroke Pines, Florida, USA
Posts: 2,300 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 3 h 2 m 4 sec
Reputation Power: 43
Well, one of the real benefits of normalization is that you have much more complete ways to manipulate your data, while producing less anomalies. Even when if normalization slows your server down, it can speed up development of your application, because you can push more logic onto the database. I almost always consider that a fair trade-off. If you really take advantage of referential integrity, views, stored procedures, etc... you can do some incredible manipulation of your data.

May I ask what database management system you are using for this project, and what programming language?

Reply With Quote
  #9  
Old July 11th, 2002, 07:46 AM
jagerkiss jagerkiss is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2001
Location: Yogyakarta, Indonesia
Posts: 22 jagerkiss User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
DBMS: MySQL, programming language: PHP

Reply With Quote
  #10  
Old July 11th, 2002, 08:14 AM
jagerkiss jagerkiss is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2001
Location: Yogyakarta, Indonesia
Posts: 22 jagerkiss User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I want to add some details, but I feel difficult to explain this, so if there's any confusing part, just ask me, and I'll try to explain better.

My application is accessing the dictionary twice, first, when it needs to get the word type (noun,verb,etc) for each word, second, when it needs to get the meaning of each word. Each access has different query. So, I don't need to have any type of connection (between word_type_table and word_meaning_table) to access the data. (That's because I access them seperately.)

Pfew ... I have done my best to explain ...

Reply With Quote
  #11  
Old August 11th, 2002, 07:00 PM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Pembroke Pines, Florida, USA
Posts: 2,300 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 3 h 2 m 4 sec
Reputation Power: 43
Hi jagerkiss,

I noticed that there have been no other posts here, but I was wondering if you found any of the answers you were looking for.

If you are creating a dictionary, I would think that normalization, especially the M:M relationship (many-to-many) would be very useful, not just for data integrity but for functionality of your system. (for example, the M:M relationship would allow you to associate any word with any other word, as many times as you want, to provide a nice "related words" list for each definition)

But it's really impossible for me to provide any more help without seeing at least some representation of your database structure. Also, I must mention that there are a lot of considerations to good database design which are not available in MySQL, and normalization is only part of the question: MySQL still does not have triggers, views, stored procedures, and subqueries, among many others. All of these allow you to have a lot more control over your data. Views are especially useful because they allow you to access data as if it is not normalized, even though it is normalized in the base tables.

Recommendation: a DBMS like PostgreSQL offers arbitrary data constraints, which allow you to specify exactly what is allowed to be entered in any one field, and you can have updates to one table affect other areas with triggers, query rewrite RULE's, etc... A couple of other good things to note about PostgreSQL:
1. there is a full-text indexing module that is available as an add-on, which could greatly speed up queries for a dictionary-type system
2. PostgreSQL allows queries to use Regular Expressions, which is very helpful in searching through text information.
3. There is a great library of extra applications, routines, tutorials etc... available at http://techdocs.postgresql.org/

Reply With Quote
  #12  
Old August 12th, 2002, 04:20 AM
beyond cool's Avatar
beyond cool beyond cool is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2002
Location: Netherlands
Posts: 4 beyond cool User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
We have had one case where we had to de-normalize a normalized database. The information stored within the database was too 'generic' and neede too much 'inner joins' on the same table. We built triggers on the tables to fill other tables which we could index properly.

Due to the overnormalized database, there wasn't an index anyone could come up with to improve performance.

I always propose to build tables for speed and understanding. If you 'could' store two types of data in 1 table for normalisation's sake, I always advise against it. It's the race between hands-on developpers and theoretical developpers.

Reply With Quote
  #13  
Old August 13th, 2002, 07:21 PM
jagerkiss jagerkiss is offline
Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2001
Location: Yogyakarta, Indonesia
Posts: 22 jagerkiss User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Hi rycamor,

From all postings and resources I've got from this forum, I finally build my database without being normalized before. The main reason for this is because the program has to make decision between the two queries I've explained in my last post. So, in ways I could think of, I didn't find one to make the two queries to only one query (because of the existence of decision making between two queries).

But, I still think that my program can be improved, only I still don't know how.

Thanks for all helps here ...

Reply With Quote
  #14  
Old August 13th, 2002, 10:46 PM
rycamor rycamor is offline
Gödelian monster
Dev Shed Regular (2000 - 2499 posts)
 
Join Date: Jul 1999
Location: Pembroke Pines, Florida, USA
Posts: 2,300 rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level)rycamor User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Days 3 h 2 m 4 sec
Reputation Power: 43
Quote:
Originally posted by beyond cool
I always propose to build tables for speed and understanding. If you 'could' store two types of data in 1 table for normalisation's sake, I always advise against it. It's the race between hands-on developpers and theoretical developpers.
(sigh...) I am so tired of hearing this kind of thing. . I am a hands-on developer. I don't have a masters degree or PH.D. My applications are not research projects. But I have seen first-hand the results of developing with a non-normalized database, and I have almost always been sorry later. From your quote above, I actually doubt you understand what normalization really means. And the description of your problem with "overnormalization" sounds more like bad database design in general. Normalization is not about storing more types of data in one table. That is a design decision. It's about eliminating the potential for your system to contain ambiguous data, and storing your data in a logical, intelligently accessible method.

Ambiguous data may be easy to handle when you first write an application, but what about the fourth or fifth rewrite, by different developers? What happens when you want another application to access the same data? How do you resolve the inconsistencies? I'm not saying normalization is the best answer; I am saying normalization is the only thing that has provided an answer at all. Everything else ends up being a hack, sooner or later. You may not see this when designing smaller software, but when you want to extend this software, and tie in with larger systems, then you start seeing the problems. (again, I speak from experience)

Yes, there are times when I do not normalize everything, because of some specific performance issue, but I only do that when necessary, and only because of deficiencies in SQL DBMS's. Other than that, views provide all the "denormalized-looking" interfaces I need for other developers. (In postgreSQL, it is even possible to make view updateable)

Is proven theory something to be ignored, just because it is too hard to understand, or doesn't make you feel good? Shouldn't software should be approached like any other type of engineering? Would you like it if the engineer designing a bridge decided that the theory of physics wasn't "hands-on" enough? Would you like it if the pilot of your airplane decided that the theory of aerodynamics didn't matter for this particular landing, because he wanted to optimize for performance?

jagerkiss -- I am not attacking you here. You do what you have to do to complete your project. But, I do feel that you don't completely understand normalization or relational data in general. I just hope I have created a bit of curiosity in you, because the reward of understanding it fully is that you will think more clearly about every application, and possibly save yourself a lot of time and difficulty in the future.

Reply With Quote