#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    17
    Rep Power
    0

    Need help in db design


    Hi All,

    We have analytics database where each month a table will be created by cron like web_analytics_mm_yy (ex:- web_analytics_05_13, web_analytics_06_13, web_analytics_07_13 and so on) respective months data will be inserted to respective tables. but the problem is how to fetch the data from multiple tables in current db setup.
    Alternatively If we have "month" and "year" columns in same table, Don't we have a problem in fetching the records back? cos each month we will have 1 million records inserted. after couple of years data records will be huge and performance will be affected. So we have taut of splitting the table month wise and fetching the records.

    Please advise.
  2. #2
  3. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,178
    Rep Power
    4274
    Originally Posted by deepak.fugo
    ... the problem is how to fetch the data from multiple tables in current db setup.
    you would either issue one query for each table, or a single UNION query which has subselects for each table


    Originally Posted by deepak.fugo
    after couple of years data records will be huge and performance will be affected.
    no, not if the table is indexed properly
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    The UNION suggestion by r937 is one way, but requires that you write a query that includes a phrase for every table that might need to be included in the final result. So if you have three tables you'd need three phrases. A procedural language can help with this, but only a bit.

    Better solutions exist but they depend on what database backend you are using. Postgres, Oracle and DB2 have partitioning features which make this an easy thing to deal with and prevent performance problems from creeping up on you -- but I don't know which DB you are using.

    With Postgres this is easy to deal with. Here is an example:

    Create a master table, and then each child (monthly) table inherits the master table's layout. Each child table/partition would also need exclusion constraint(s) based on whatever the criteria for partition is (in your case, the date range). With exclusions in place Postgres can make a query that ignores any tables that couldn't possibly meet your query criteria.

    sql Code:
    CREATE TABLE web_analytics
       (id   uuid DEFAULT uuid_generate_v1() PRIMARY KEY,
        ip   inet NOT NULL,
        mac  macaddr NOT NULL,
        url  VARCHAR(2048) NOT NULL,
        dtg  timestamptz NOT NULL);
     
    CREATE TABLE web_analytics_07_2013
       (PRIMARY KEY (id),
        CHECK (dtg >= DATE '2013-07-01' AND dtg < DATE '2013-08-01' ))
       INHERITS (web_analytics);
    CREATE INDEX web_analytics_07_2013_dtg ON web_analytics_07_2013 (dtg);
     
    CREATE OR REPLACE FUNCTION partition_web_analytics()
        RETURNS TRIGGER AS
        $$
        DECLARE
            t_string text;
        BEGIN
            t_string := 'web_analytics_' || date_part('month', NEW.dtg) || '_' || date_part('year', NEW.dtg);
            EXECUTE
                'INSERT INTO ' || t_string || ' VALUES ($1, $2, $3, $4, $5)'
                USING NEW.id, NEW.ip, NEW.mac, NEW.url, NEW.dtg;
        END;
        $$ LANGUAGE plpgsql;
     
    CREATE TRIGGER insert_web_analytics
        BEFORE INSERT ON web_analytics
        FOR EACH ROW EXECUTE PROCEDURE partition_web_analytics();

    The child table creation with CHECK constraints that guarantee only the correct month's data will be inserted is enough for the DB to ignore any tables that couldn't possibly hold the data you are querying, assuming that the query filter over date in some way. The trigger let's you do "INSERT INTO web_analytics..." and only hit the correct table.

    Of course, your analytics model probably looks a lot more complicated than this (and I didn't proof any of the above, so there's almost definitely a typo or "duh!" bug lurking), and you'd probably want to write a function that creates the table for each month as you go -- but this the basics of how to do what you're asking if you happen to be using Postgres. There are external modules which make this less verbose, and Oracle and DB2 have different utilities for partitioning, some of them offering more convenience in exchange for less control. If you're doing all this in MySQL/Maria I'd recommend leaving MySQL on the web end and use something with heavier data-shoveling tools like Postgres for analytics work.

IMN logo majestic logo threadwatch logo seochat tools logo