#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    62
    Rep Power
    3

    SQL Data Redundancy vs Query Performance


    Imagine a site like Stackoverflow, where there are millions of users. Each user has his own data in the server, like questions he asked, answers he gave, reputation, edits, bounties, etc. Now the server should have a master database which contains, say, all questions asked in all tags combined, where each row might have the associated tag-id to identify which tags it relates to. What I want to know is how to store the individual info for a user, say for the record of all questions asked by him as seen on his user-page, do I query the master-table for all questions asked by that particular user, or do I create a separate table for each user, create a trigger in the master table that whenever a new record is added, a copy of the record (or perhaps, only a portion, like the question-id and the tag-id) should be updated in the individual user questions table. Then that way, whenever the records for that user have to be retrieved, I can join that table with the master table to fetch the relevant records.

    So which method would be better? The first one is direct, there is no data redundancy, but the number of questions might exceed billions, it may be be time and resource consuming to query the entire table for 50 records. On the other hand, the 2nd method will perhaps reduce the time because the records can be fetched directly (though the join may take some time, I think it won't be as much as the time taken to query the master table), but will involve substantial data redundancy (even when only one or two keys are stored) when the number of users become huge.
    Edit/Delete Message
  2. #2
  3. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,179
    Rep Power
    4279
    Originally Posted by Cupidvogel
    ...do I query the master-table for all questions asked by that particular user, or do I create a separate table for each user...
    the former

    see also http://www.dbforums.com/database-con...rformance.html
    Last edited by r937; September 12th, 2012 at 10:27 AM.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    62
    Rep Power
    3
    Umm, the link you provided is the same question asked by me in another forum! Can you please explain why I should go with the first option?
  6. #4
  7. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,179
    Rep Power
    4279
    Originally Posted by Cupidvogel
    Umm, the link you provided is the same question asked by me in another forum!
    yes, i know

    i wanted to prevent the good people of this forum from wasting their time answering your question here, when it is being answered quite nicely over there

    cross posting is generally frowned upon, and once people see that you're the type of person that does that, they tend to ignore you everywhere

    Originally Posted by Cupidvogel
    Can you please explain why I should go with the first option?
    for reasons given by healdem
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    62
    Rep Power
    3
    I don't see any problem with cross-posting. The person giving the answer there is doing fine, but I wanted to hear a different approach, hence I posted the question here also. What is wrong with getting different viewpoints for one question?
  10. #6
  11. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,179
    Rep Power
    4279
    Originally Posted by Cupidvogel
    What is wrong with getting different viewpoints for one question?
    oh, on the face of it, nothing

    if you get two conflicting opinions, though, which one will you pay attention to? the one you want to implement anyway? ha ha

    nevertheless, cross-posting does waste the time of other people, and for that reason alone, i will discourage it at every opportunity
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

IMN logo majestic logo threadwatch logo seochat tools logo