#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    4
    Rep Power
    0

    Large tables or many small tables?


    I realize there have been many threads discussing why using a fewer number of larger tables would be advantageous to using many smaller tables, but I'm not too sure if that applies to my case. I am trying to decide on a database design that will be most efficient.

    Basically, I have a group of schools that I will put into one table. This table will have columns that each hold one-to-one data for each school (such as address, etc.). Let us say each school has a unique index called School_ID. Next, I have a group of clubs where each club is associated with a certain school. A school can have multiple clubs, but a club cannot be associated to multiple schools. Next, there will be a group of users. Each user will be associated to a certain school and to a number of groups. Thus, a school can have multiple users, but a user cannot be in multiple schools. However, a group can have multiple users, and a user can also be associated to multiple groups.

    Now, it is possible that I could have 600 or 700 schools, 70,000 clubs, and 5 million users. Also, besides users and clubs, there will be added functionality in the future so that I might need tables to hold things like financial statements (a table that could very well have millions of records). Also, I will have another table that could have millions of records that will be used simply to normalize how clubs and users relate to each other (basically, this table will have one entry per user per clubs that will tell me things like the position of the person in that club and the privileges he has).

    The way I want to do this is to have one table that has a list of all the schools with one-to-one data (like address, administrator, etc.). The problem comes up in how to structure the rest. I could either have a separate set of tables for each school (each school would have its own clubs table, users table, financial statements table, etc), but this way, there would be a lot of tables. However, I wouldn't have to worry about large tables because even with financial statements, I doubt any table would have more than 100,000 entries. The other option would be to have one giant clubs table, one giant users table, one giant financial statements table, etc. But with the amount of data I might have, I'm afraid this method could cause a lot of slowdown (because I will be running lots of queries on these tables to gather user statistics, etc.). Also, I will very rarely be running queries across schools on clubs or users (as in, I won't very often be looking at all the clubs in the whole database. Usually, I will only be running queries to find stats on clubs associated to a certain school or users associated to a certain school.) I hope that wasn't too confusing. Many thanks to all who reply.
    Last edited by Tassadar356; February 6th, 2004 at 02:51 AM.
  2. #2
  3. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,480
    Rep Power
    538
    Some thoughts on the matter.

    My recommendation is that you use only one schools table and one clubs table and so. Do a normal normalisation and don't confuse with separating into different tables and so on. Let the database do what it is good at. This way you don't need lot of extra coding to handle the different tables and so on.

    A performance question is usually more about the size (we are talking Gb) of the data not so much about how many rows it is. As I can gather from your description it still isn't that big since you are mostly keeping a lot of attributes for different schools, clubs and peoples.

    Just make sure that you think thru how your indexes need to be then you won't have any problem. But this is a very important issue since if you don't have the indexes right you will have to scan thru all the data and that slows down the best hardware.

    Besides today, RAM memory which is one of the most precious things a database server can have is so cheap so you can usually just buy a lot. If you compare putting extra money on development and putting extra mondey on hardware the favor usually falls on the hardware.
    Unless you are working with REALY big dbs then you must think thru it very carefully since then the amount of RAM is so miniscule in comparison with the amount of data that all data must be fetched from disc and that is very slow in the world of dbs.

    Oh, I thought about one occation where it can be useful to separate a large table into several. And that is if you are using MyISAM of BDB tables and you want to put a specific table on a specific disk to spread the load between several disks. But if you are running InnoDB then that point falls a bit since in InnoDB you can't control where InnoDB puts a specific table.
    /Stefan
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    4
    Rep Power
    0

    Thanks!


    Thanks a lot for your suggestion! I did, however, have a follow up question. If I did in fact split up the schools, clubs, etc. into different tables and named these tables according to a standard (for example, all the clubs for a certain school could be in the table "schoolID_clubs", where schoolID is the unique index for each school in the larger schools table), then would that still cause a problem in terms of coding? Couldn't I just dynamically select which table I'm working with? Secondly, my biggest concern with using many tables is just the fact that I don't know how it affects performance to have over a thousand tables that are all running select queries.

    Another question... I may have a setup where it is possible for clubs associated with a certain school to only have some characteristics. For example, let's say all clubs could have a constitution, a certain date for an introductory meeting, and an office space. The school chooses which of these are necessary, so every club with one school, for example, would need a constitution and an introductory meeting but no office space, and every club with a second school would need an introductory meeting and an office space but no constitution. Thus, if I make a giant clubs table with columns for "intromeeting", "officespace", and "consitution", not every club in this table would be using all of these columns. On the second hand, if I make a separate clubs table for each school, I will not have unused columns in this table as I will only put the required columns on that table. Which design would be more efficient? Also, in the future, it may be possible for the school to change its requirements, so with that in mind, would it be better to have columns for all possible characteristics of a club regardless of whether or not a school requires it, or would it be better to add on a column later on when the school decides it wants a new requirement? I may find a way to work around this, so this is not really a primary concern for now. Thanks a lot!
    Last edited by Tassadar356; February 6th, 2004 at 03:03 PM.
  6. #4
  7. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,480
    Rep Power
    538
    Yes even if you have a standard and dynamically define the table names it is still more work. You will instead of one table have many tables to maintain and change.

    For example if you at a later stage realise that you forgot a crucial column. Then you will have to issue an "alter table ..." for each of the tables in your DB. If you don't do it then you will have to make exceptions in your code for a specific school, club ....
    That is a start of a nightmare which involves that you will have to be sure that all tables are the same or else you will get errors executing for specific schools.
    Sure you can write scripts that maintain the structure but that is work that didn't need to be there in the first place.

    As for performance with many tables in one database. I usually say that around 10 000 tables is where you start noticing a performance impact (myisam and BDB type tables). innoDB are a bit different and if you use innoDB then you will have no advantage only disadvantages with splitting the tables.

    You will also waste resources since mysql will have to have more filehandles open than if you have one table.

    Another aspect. If someone at a later stage suddenly gets the idea that they want to get some global statistics from the system. ie. They want to get a total sum of all clubs in the system. Then this can be accomplished with only one simple query against the clubs table in the one table example. In the multiple tables case you will have to get a sum() from each clubs table and then summarize the results.

    As for the design with separating info.
    It is a call that you will have to make. If most of the schools use a particular column then it is probably easiest to have it in the big school table.
    But if only a very minor part of the schools use it then you could probably be better off and separate these settings in a different table. But note that you will have to do an aditional join to get the results and that might not be worth the hassel. If you put the extended info in a separate table then ask yourself if the work is worth the saved space. If you for example estimate on the schools table that you will have maybe 100 columns and each is 200 bytes wide. Then the storage need will be in the vincinity of 20kb for each school. If you have 600 schools then this is still only 18Mb which is about a 1/10 to 1/25 of the RAM memory on a normal desktop these days. So if you remove 20 of the columns and try putting them in another table then you save at the most 3.6Mb. It isn't worth it.

    I actually have one place where the setup with having an extra table with additional info is in use. That is a billing system where that perticular table contains 22 milion plus rows in it and the DB at present is a total around 18GB in size. When working with that table only 4 or 5 columns are usually of interrest and the rest is rarely used. So that information has been put in a separate table. That system is intended to continue to expand during a period of 3 years and during that time it will at present speed reach about 80GB of data. When we are talking these volumes of data then the extra work is justifiable.

    To try to end my very long mail I have some thoughts:

    I have also in the past looked at trying to split tables, but the work is usually not worth it. Stick to using one table it makes life so much easier.

    This way the table is the same for all schools and any functionallity for any school is also available to another school.
    This also means that the code that runs are the same for all schools and not something special for one particula school. = less errors

    Try finding some standard for what the schools want.

    Avoid trying to meet every little small demand from all schools. Some demands for a software project where many organisations is involved can just be a wish from some person that don't have the overall picture. And the extra work for that little feature is not worth it and can sometimes be surprisingly easy to work around for the organisation when you explain how the system is actually going to work.
    /Stefan
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    4
    Rep Power
    0

    Thanks


    Thanks a lot sr. I appreciate it.

IMN logo majestic logo threadwatch logo seochat tools logo