August 28th, 2002, 04:19 PM
Good day all,
I have a need for clarification in regards to normalization/functional dependencies as it pertains to the relational model of data. In particular, the question deals with noloss decomposition in relation to functional dependencies.
Given a table R with attributes (A, B, C) and where A functionally determines B (A -> B). If I am to decompose table R to two other tables V and W, where (A, B) are subsets of V table, and (B, C) are subsets of W table. Will I or will I not lose information when I do a join on tables V and W. If I do lose information after the join and I do not end up with the original table R, can someone please explain why?
Thank you in advance,
August 28th, 2002, 09:38 PM
We don't really use terms like "functional dependencies" in relational database talk. That belongs in application discussion.
Also, you wouldn't call A and B "subsets" of a table V, but "attributes". A set connotes a range of data itself, rather than the table structure.
And what kind of join will you be performing?
It looks to me like there is a bit of confusion here regarding exactly what normalization is. If you are talking about an attribute which is 'derived', so that for example B = A * 10, or some such, then that belongs at the logical level, as a view, or the result of a query, rather than being stored statically in two tables.
With the description you have given here, it is impossible to really know what you are talking about. Try to explain further or use an example.
August 28th, 2002, 10:41 PM
thanks for the reply rycamor,
Your correct is saying that I have a confusion in what normalization is exactly, but that is why I posted the question. Next time I'll be more careful in the terminology I use. I solved my own question (it came to me later on), and the join I was referring to was a natural join.
However, with the term functional dependency, is that not the term used in the relational model of data as highlighted by Codd Date, and others in the relational database field and therefore 'should be' a part of relational database talk (and not at the app level)? When defining forms of normalization (2NF for example), how do you determine that all non key attributes completely rely on the candidate key (primary key informally known)? There must be some act of an 'hidden' internal constraint for attribute B (a non key attribute) to rely (or therefore be dependent) on attribute A (primary key attribute) for normalization to be successful. It is in my opinion that functional dependency is quite important to the topic of normalization.
If I am incorrect or off target, then please let me know. I am in the forum to learn as much as I can, so I can understand relational theory as it is applied to database management systems.
August 29th, 2002, 12:42 AM
You are right. I will have to back down a little on this. When discussing the normal forms, the term "functional dependency" has been used to present the formal definitions. However, it's not generally the term used when discussing database design. In fact, at www.dbdebunk.com, which features many articles and discussions by C.J.Date, I have only seen the term used a couple of times.
So, what you are talking about is not "derived" or "calculated" fields, but the act of normalization itself, making sure that the attributes in each table depend only on the primary key to identify the row. (thus, in a row of data about a person, "city" or "state" are not attributes of that person (i.e. not functionally dependent), so they should be keyed to appropriate other tables, which uniquely identify all the possible cities and states).
So the term "functional dependency" is used to mean such a thing as "Every tuple in X uniquely determines a tuple in Y", etc...
But I still am not sure exactly what you are worrying about. Relational database management software is supposed to shield the user from needing to know exactly how it manages constraints, relationships, etc... Thus, yes, you are correct that there are "hidden" constraints in effect, but they behave by very straightforward logical rules. Also, in a true DBMS, you can create many kinds of extra constraints on any table or attribute of that table.
So your questions should be more on the "why" level than the "how" level, unless you need to understand how the database software itself enforces these constraints internally.
By "why" I mean 'what is the reason one would want to normalize any particular data representation'? This is to preserve data integrity, and remove the possibility of ambiguity in your data. So as long as you know why you are grouping your attributes this way or that way, and why you are referencing a key from one table to a key from another table, that is all you need to know.
What that means to us in real terms is simply that we can decide relationships between tables simply by "saying so", and letting the DBMS handle the details. So for a queston such as " how do you determine that all non key attributes completely rely on the candidate key?", you look for any attribute that does not apply directly to that primary entity in question, and which would introduce repeating values. Thus the example I gave above, "state" would definitely produce repeating values, so it should not be freely entered in the 'person' table, but referenced by the primary key in the 'states' table. Thus, there can be no incorrect value for 'state' entered in the 'person' table, because of the foreign key constraint. This constraint is managed by the DBMS, by using the REFERENCES keyword or some similar syntax, (except for some of the less-capable DBMS's which don't have the ability to enforce foreign key constraints, in which case you must maintain the constraint in your application).
So, there is never a question of losing information by normalizing, but rather, you can ensure integrity of your data, ensuring that any one piece of information has only one authoritative source, so that there is no chance for ambiguity. For example, if you have a table of 'states', but you enter the 'state' column of the 'person' table without referencing your 'states' column, then the two groups of values can be out of sync.
I have probably said too much. Really, for the basic concepts, these things are better understood by simple examples rather than by verbal explanations.
August 29th, 2002, 11:15 AM
After reading your above post, I have to admit that I was after the 'how' when I really wanted to figure out the 'why' of normalization. The example you provided has made the topic more lucid in my mind. Thanks for taking the time to write all that you did to explain (at least the basics of) normalization.