#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14

    5 table join returning too many rows of duplicate data


    Hi folks, forgot to introduce myself in the previous post this morning. First thing I probably should say is that I have been using MySQL for so long now that I have forgotten 80% of the principles of joins, triggers & the other more advanced options PostgreSQL offers so if Im seem to be doing a "duh" please bear with me. I am in the process of converting a massive project from MySQL to Postgres as the company has completely outgrown their db. The whole project is in server side Java. Anywasy on to the current issue.

    I am trying to retrieve data from 5 tables that contain all of the pertinent information about a given part number (nsn number). I have succesfully retrieved the data from all of them but I am getting duplicate rows of data back in the results (each possible combination of the joined tables). This is a real nightmare going thru it to find only the unique data to display to the customer. The query, I know is a real nightmare at this point but I have tried every possbile join combination I can think of with no correct results. Here is the table(s) layout:

    NSN_NUMBERS - query should only return ONE row of data per query
    NSN_HISTORY - query may return multiple rows (or no rows) of data per query
    PART_DRAWINGS - once again, multiple rows or none
    NSN_LINK - returns multiple or no rows
    MANUFACTURERS - returns mulitple rows or no rows

    I really want to try to pull this off in one query instead of hitting the db multiple times (which would be the easy way). Here is the query as it stands now that returns the duplicate rows of data:

    SELECT N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, NL.MANU_CODE, NL.MANU_PART_NUM, M.COMPANY_NAME, M.ADDRESS_1, M.ADDRESS_2, M.CITY, M.STATE, M.ZIPCODE, M.ZIP_PLUS, M.PHONE, M.FAX, M.CONTACT_F_NAME, M.CONTACT_L_NAME, M.EMAIL, M.MIN_ORDER, M.IN_HOUSE_NOTES

    FROM NSN_NUMBERS AS N

    INNER JOIN NSN_HISTORY AS NH ON (N.NSN_NUMBER=NH.NSN_NUMBER)

    LEFT OUTER JOIN PART_DRAWINGS AS PD ON (N.NSN_NUMBER = PD.NSN_NUMBER)

    LEFT OUTER JOIN NSN_LINK AS NL ON (N.NSN_NUMBER = NL.NSN_NUMBER)

    LEFT OUTER JOIN MANUFACTURERS AS M ON (NL.MANU_CODE = M.MANU_CODE)

    WHERE N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC

    I was doing a select * from the tables but then I had even more extraneous data to wade through so I then specified the columns. I've beat my brains out with the books & examples I have to try to limit the returned rows but I either get the duplicates or NO rows returned. Im hoping someone here is up for a challenge & wouldnt mind looking at this .... there has to be a better way. Thanks all!
    DC Dalton
    DCD Designs
    SCJP
  2. #2
  3. Modding: Oracle MsSQL Firebird
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jun 2001
    Location
    Outside US
    Posts
    8,527
    Rep Power
    539
    You should try to break up the query, adding join after join to see if data retrieved keep being consistent.
    Why did you choose to start with an inner join on NSN_HISTORY when you stated that "NSN_HISTORY - query may return multiple rows (or no rows) of data per query
    " shouldn't it be a right outer join (allow null in NSN_HISTORY)?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14
    I started that way & then started getting no results so I went to the outers. I did that first inner because I did NOT want nulls returned from the nsn_history tables. I wanted to limit the results to the amount returned from that table, remember the first table nsn_numbers will only ever return one row.... does that make any sense or am babbling?
    DC Dalton
    DCD Designs
    SCJP
  6. #4
  7. Modding: Oracle MsSQL Firebird
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jun 2001
    Location
    Outside US
    Posts
    8,527
    Rep Power
    539
    What does it mean "getting no results"? I suspect something wrong with your data.
    Anyway start with inner joins, adding tables, then go for outer.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14
    By no results I mean absolutely nothing was returned. Im going to restart tomorrow morning back at query one & I will post when I hit a problem again. I just dont remember when the problems started becuase I have tried so many different things
    DC Dalton
    DCD Designs
    SCJP
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14
    OK, I did as you suggested, one query at a time. I was consisitently getting 5 rows of data (there are 5 rows in the nsn_history table for this number) but then when I added the inner join for the nsn_link table it jumped to 10 rows & the dupes began. There are only 2 rows in the nsn_link table for this nsn_number which mean I have 2 manufacturers for this part. (which comes in the last join to the manufacturers table) I should explain that the nsn_link table links manufacturers to the nsn_numbers they make, there by only requireing one entry into the manufacturers table but they are linked to all the parts they make.

    Here is the query that produces 10 rows (with the last inner join)

    SELECT N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, NL.MANU_CODE, NL.MANU_PART_NUM

    FROM NSN_NUMBERS AS N

    INNER JOIN NSN_HISTORY AS NH ON (N.NSN_NUMBER=NH.NSN_NUMBER)

    INNER JOIN PART_DRAWINGS AS PD ON (N.NSN_NUMBER = PD.NSN_NUMBER)

    INNER JOIN NSN_LINK AS NL ON (N.NSN_NUMBER = NL.NSN_NUMBER)

    WHERE N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC


    I dont understand why its doing this AT ALL!
    Last edited by DC Dalton; July 29th, 2003 at 11:23 PM.
    DC Dalton
    DCD Designs
    SCJP
  12. #7
  13. Modding: Oracle MsSQL Firebird
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jun 2001
    Location
    Outside US
    Posts
    8,527
    Rep Power
    539
    Could you post the table structure and some data to reproduce the issue?
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2003
    Posts
    373
    Rep Power
    12
    perhaps also it could help you to say SELECT DISTINCT or SELECT UNIQUE at the top of the quer, I forget which is the preferred syntax for postgres
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2003
    Posts
    373
    Rep Power
    12
    it seems from your query structure that you may be getting ahead of yourself with the inner join syntax. As another approach, maybe you could build the joins on each table explicity... if that's what you guys had been talking about, then nevermind... but you know what I mean, just pick those fields from the joined tables and perform your own explicit join... I ran into similar problems when I was trying to do a cross join, I would get duplicate records. It all has to do with the logic of your query when it gets turned into the query tree by the backend
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2003
    Posts
    373
    Rep Power
    12
    sorry for broken-up posting I'm in the middle of stuff then looking at this... I am thinking mostly the logic of the joins is the problem. If you do a left outer join, then the tuples returned aren't just brought back the way you want them, but are crossed with the resultant data from the table you joined to the outer-join table. sorry that's confusing but it says what I am thinking. So, you get duplicate tuples because the left result column (in the result view, I'm assuming two columns for simplicity) may have a null in the right column from the other table. the ones from the right column can't have a null, but ok, here's the part I think is messing you up. All of the results are crossed with eachother one more time because of the way you specified the outer join on the table. I could be wrong, but this is my first intuition. I'll tell you what, I'm going to look up the outer join logic in a little while, do some messing around, and write up here if I find out anything that will help you.
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14
    OK, here are the tables:

    CREATE TABLE NSN_NUMBERS (
    NSN_NUMBER VARCHAR(16),
    ITEM_NAME VARCHAR(19) DEFAULT 'NONE SPECIFIED',
    PRODUCT_CODE VARCHAR(6) DEFAULT 'NONE',
    SPEC_1 VARCHAR(30),
    SPEC_2 VARCHAR(30),
    SPEC_3 VARCHAR(30),
    FAVORITE BOOLEAN NOT NULL DEFAULT FALSE,
    CONSTRAINT NSN_NUMS_PK PRIMARY KEY (NSN_NUMBER)
    );

    CREATE TABLE NSN_HISTORY (
    NSN_NUMBER VARCHAR(16),
    HISTORY_COMPANY VARCHAR(17) NOT NULL,
    HISTORY_CAGE CHAR(5) NOT NULL,
    HISTORY_CONTRACT VARCHAR(17) NOT NULL,
    HISTORY_DATE DATE NOT NULL,
    HISTORY_QTY INTEGER NOT NULL,
    HISTORY_PRICE DECIMAL(7, 2) NOT NULL,
    HISTORY_UNIT_OF_MEASURE CHAR(2),
    CONSTRAINT NSN_HIST_FK FOREIGN KEY (NSN_NUMBER)
    REFERENCES NSN_NUMBERS (NSN_NUMBER)
    ON DELETE CASCADE,
    CONSTRAINT NSN_HIST_PK PRIMARY KEY(NSN_NUMBER, HISTORY_CAGE, HISTORY_CONTRACT, HISTORY_DATE, HISTORY_QTY)
    );


    CREATE TABLE PART_DRAWINGS (
    NSN_NUMBER VARCHAR(16),
    DRAWING_COMP VARCHAR(25) NOT NULL,
    MANU_CAGE VARCHAR(5) NOT NULL,
    DRAWING_NUMBER VARCHAR(24) NOT NULL,
    CONSTRAINT PT_DRAW_FK FOREIGN KEY (NSN_NUMBER)
    REFERENCES NSN_NUMBERS (NSN_NUMBER)
    ON DELETE CASCADE,
    CONSTRAINT PT_DRAW_PK PRIMARY KEY (NSN_NUMBER, MANU_CAGE, DRAWING_NUMBER)
    );


    CREATE TABLE NSN_LINK (
    NSN_NUMBER VARCHAR(16),
    MANU_CODE VARCHAR(10) NOT NULL,
    MANU_PART_NUM VARCHAR(40) NOT NULL,
    CONSTRAINT NSN_LNK_MANU_FK FOREIGN KEY (MANU_CODE)
    REFERENCES MANUFACTURERS (MANU_CODE)
    ON DELETE CASCADE,
    CONSTRAINT NSN_LNK_FK FOREIGN KEY (NSN_NUMBER)
    REFERENCES NSN_NUMBERS (NSN_NUMBER)
    ON DELETE CASCADE,
    CONSTRAINT NSN_LINK_PK PRIMARY KEY (NSN_NUMBER, MANU_CODE, MANU_PART_NUM)
    );


    CREATE TABLE MANUFACTURERS (
    MANU_CODE VARCHAR(10),
    COMPANY_NAME VARCHAR(30) NOT NULL,
    ADDRESS_1 VARCHAR(30),
    ADDRESS_2 VARCHAR(30),
    CITY VARCHAR(30),
    STATE CHAR(2),
    ZIPCODE CHAR(5),
    ZIP_PLUS CHAR(4),
    PHONE VARCHAR(12),
    FAX VARCHAR(12),
    CONTACT_F_NAME VARCHAR(15),
    CONTACT_L_NAME VARCHAR(15),
    EMAIL VARCHAR(30),
    MIN_ORDER DECIMAL(9, 2),
    IN_HOUSE_NOTES TEXT,
    CONSTRAINT MANUFACT_PK PRIMARY KEY (MANU_CODE)
    );


    NSN_NUMBERS DATA:

    5330-01-209-7568|EXPANDER,SEAL|182949|14153 QAP-EQ001 REV B|14153 QAP-EQ002||0

    NSN_HISTORY DATA:

    5330-01-209-7568|CME ARMA INC|0T0Y2|SP054003W0458MODF|2003-02-27|110|35.10|EA
    5330-01-209-7568|STAMCO AEROSPACE|4R840|SP050002AA6363591|2002-04-04|54|51.03|EA
    5330-01-209-7568|HOOSIER INDUSTRI|0RCD8|SP054002MCX27MODF|2002-03-09|61|EA
    5330-01-209-7568|BASIC RUBBER AND|6K404|SP054001MBN97MODF|2001-05-31|70|57.50|EA
    5330-01-209-7568|BASIC RUBBER AND|6K404|SP054000MT910MODF|2000-05-30|75|57.50|EA

    PART_DRAWINGS DATA:

    5330-01-209-7568|US ARMY COMMUNICATION|57958|5035534 REV C

    NSN_LINK DATA:

    5330-01-209-7568|57958|5035534-11
    5330-01-209-7568|07397|5035534-11

    MANUFACTURERS DATA (SHORT VERSION - ALL NOT NULL COLUMNS)

    57958|US ARMY COMMUNICATION
    07397|GTE GOVERNMENT SYSTEM

    I went thru again later last night & tried "flipping" the joins from inner to outers, right to left trying whatever I could & again, once I hit that nsn_link join I start returning 10 rows instead of 5. I also tried distinct to no avail. Thanks all!
    Last edited by DC Dalton; July 30th, 2003 at 10:55 AM.
    DC Dalton
    DCD Designs
    SCJP
  22. #12
  23. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    Well DC,

    I dragged this thread up again because it's been bothering me, so I went ahead and created your tables and ran your query several different ways, until I realized what was happening: your NSN_LINK table has the same NSN_NUMBER for two different MANU_CODE values:

    Code:
        nsn_number    | manu_code | manu_part_num 
    ------------------+-----------+---------------
     5330-01-209-7568 | 57958     | 5035534-11
     5330-01-209-7568 | 07397     | 5035534-11
    This of course, means that you will double all results, because everything is joined on NSN_NUMBER. If you leave out the NL.MANU_CODE attribute (and don't forget DISTINCT), you now get 5 rows as a result set:

    Code:
    SELECT DISTINCT N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, 
    NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, 
    NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, 
    PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, NL.MANU_PART_NUM
    
    FROM NSN_NUMBERS AS N INNER JOIN NSN_HISTORY AS NH ON (N.NSN_NUMBER=NH.NSN_NUMBER)
    INNER JOIN PART_DRAWINGS AS PD ON (N.NSN_NUMBER = PD.NSN_NUMBER)
    INNER JOIN NSN_LINK AS NL ON (N.NSN_NUMBER = NL.NSN_NUMBER)
    WHERE N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC
    In fact, your query could be done in two other ways:

    1. The implicit join
    Code:
    SELECT DISTINCT
    	N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, 
    	NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, 
    	NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, 
    	PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, NL.MANU_PART_NUM
    
    FROM NSN_NUMBERS AS N, NSN_HISTORY AS NH, PART_DRAWINGS AS PD, NSN_LINK AS NL
    
    	WHERE (N.NSN_NUMBER = NH.NSN_NUMBER)
    	AND (N.NSN_NUMBER = PD.NSN_NUMBER)
    	AND (N.NSN_NUMBER = NL.NSN_NUMBER)
    	AND (NH.NSN_NUMBER = PD.NSN_NUMBER)
    	AND N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC
    2. The natural join
    Code:
    SELECT DISTINCT
    	N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, 
    	NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, 
    	NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, 
    	PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, NL.MANU_PART_NUM
    
    FROM 
    	NSN_NUMBERS AS N NATURAL JOIN NSN_HISTORY AS NH 
    	NATURAL JOIN PART_DRAWINGS AS PD 
    	NATURAL JOIN NSN_LINK AS NL
    
    WHERE N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC
    Natural join is kind of a database "magic" guess at what the join should be, based on the tables having a common column name--cool, huh ?

    P.S. If you need the MANU_CODE value in the result, then you either have accept those results or rething your data itself, because the query is returning correct results.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    NE Pa.
    Posts
    96
    Rep Power
    14
    Yes unforetunately I need that manu_code because that is what links me into the manufacturer table so I can pull all of the info for ALL the manufacturers of this given part.

    Man this is just getting to be a bit much on the head. I have to have that nsn_link table that way because we can have hundreds of different manufacturers that make a part (with a different part number) that is approved for the specific nsn number by the govt. To top it off we can have the SAME mfg making several different part numbers that ALSO match the exact same govt nsn number .... NOW you see my dilema?

    I gota tell you .... govt work will make you rethink life TOTALLY ... nothing makes sense! I truely appreciate the time & effort, no magic bullet but at least I finally know the answer.
    DC Dalton
    DCD Designs
    SCJP
  26. #14
  27. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    Hmm... one way to see it is that you have a normalized form, but you want a multi-valued result in one column: thus for any one result row, you are going to need multiple MANU_CODE values.

    Now, if you always know that there is a fixed limit to the number of MANU_CODE values for any one NSN_NUMBER (such as 2, in our example), then you could just have two columns in your query, and use a little advanced SQL to "cross-tab" the MANU_CODEs into those two columns.

    If there is a variable number of MANU_CODES for each NSN_NUMBER, this is usually where you just handle it in application code: perform one query to get the single rows, and while looping through those, perform another query to get the nested MANU_CODES.

    But, if you really are a glutton for self-punishment, there are ways to nest your multiple values in single rows: http://searchdatabase.techtarget.com...285649,00.html

    But really, it's the kind of thing that application environments were made for, so why punish yourself . Trying to do everything in one query is not always the most efficient thing anyway.

    Unless, of course, you want to create a PostgreSQL function to handle it. It wouldn't be too hard to create a function that returns a comma-separated list for the MANU_CODEs for any one query result:

    Code:
    SELECT DISTINCT
    	N.ITEM_NAME, N.PRODUCT_CODE, N.SPEC_1, N.SPEC_2, N.SPEC_3, N.FAVORITE, 
    	NH.HISTORY_COMPANY, NH.HISTORY_CAGE, NH.HISTORY_CONTRACT, NH.HISTORY_DATE, 
    	NH.HISTORY_QTY, NH.HISTORY_PRICE, NH.HISTORY_UNIT_OF_MEASURE, 
    	PD.DRAWING_COMP, PD.MANU_CAGE, PD.DRAWING_NUMBER, manu_code_list(N.NSN_NUMBER) AS MANU_CODES, NL.MANU_PART_NUM
    
    FROM NSN_NUMBERS AS N, NSN_HISTORY AS NH, PART_DRAWINGS AS PD, NSN_LINK AS NL
    
    	WHERE (N.NSN_NUMBER = NH.NSN_NUMBER)
    	AND (N.NSN_NUMBER = PD.NSN_NUMBER)
    	AND (N.NSN_NUMBER = NL.NSN_NUMBER)
    	AND (NH.NSN_NUMBER = PD.NSN_NUMBER)
    	AND N.NSN_NUMBER='5330-01-209-7568' ORDER BY NH.HISTORY_DATE DESC
    Then, the MANU_CODES column would look like "57958, 07397".
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com

IMN logo majestic logo threadwatch logo seochat tools logo