#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    5
    Rep Power
    0

    Horizontal partitionning and indexes...


    Hi,

    When implementing horizontal parititioning using child tables and check constraints, do we need to create the same indexes on the child tables as in the parent/master table?

    The table needs to have 17 indexes, do we need to create the 17 indexes on the parent and on every child tables?

    This is assuming we never do a query on the child table directly...

    What is the best way to do it to achieve the best performance possible?

    Thank you!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Indexes declared over any table directly are only built over that immediate table (the set you get when using SELECT ONLY). If you need indexes over the child tables then you will need to add those to the child definitions as well.

    If you want indexes covering the entire construct (parent(s) and children included) then you will want to also create indexes over an expression that is inclusive of the child tables.

    This is a little bit of work, but if you can profile your query use case then this gives you a huge amount of fine-grained control over indexing and performance. Premature optimization is, however, still the devil, so at the very least it is good to be able to reference what your heaviest/slowest real queries are during production and create indexes that cater to those. Keep in mind that this may change over the life of the system.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    5
    Rep Power
    0
    Thanks for your reply, but what do you mean by "create indexes over an expression that is inclusive of the child tables"?

    Currently I am creating the indexes individually on the parent and on the children using CREATE INDEX <Name> ON <Table> (<Columns>).

    Is there a better way to achieve this?

    Originally Posted by zxq9
    Indexes declared over any table directly are only built over that immediate table (the set you get when using SELECT ONLY). If you need indexes over the child tables then you will need to add those to the child definitions as well.

    If you want indexes covering the entire construct (parent(s) and children included) then you will want to also create indexes over an expression that is inclusive of the child tables.

    This is a little bit of work, but if you can profile your query use case then this gives you a huge amount of fine-grained control over indexing and performance. Premature optimization is, however, still the devil, so at the very least it is good to be able to reference what your heaviest/slowest real queries are during production and create indexes that cater to those. Keep in mind that this may change over the life of the system.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Creating the child table indexes by hand is fine, unless there are tons of them and you an predict what they will be. Like when partitioning for performance (making a child table per year of transactions, for example). In this case you can still create them by hand, of course, but its easier to write a function that does this for you along with the table creation itself.

    That's a very different case than using inheritance to create semantically sub-classed tables (like a "dog" table that inherits its general form from a "pet" table). In the semantic-inheritance case there is usually no other reasonable option but to create the indexes by hand, since they are likely different for each table.

    As for creating an index over an expression that includes all the children... you can write a function that operates on a parent table and returns values that include children and index over that result. The index would only be used on queries that use the result of the function, but its one way to make a giant, master index.

    This is a little hackish though, and in the case of partitioning for performance isn't nearly as good an idea as just using exclusion constraints so the planner can know what tables can't possibly have data relevant to a result and skip checking them (and their indexes) completely.

    As you can see, the indexing plan and method of doing it has a lot to do with why you are using inheritance in the first place. In the case of partitions you gain the most performance by using exclusion constraints in addition to indexes (and can save time typing and errors by automating the table and index creation); in the case of semantic inheritance your index use is really dependent on the use case per-table.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    5
    Rep Power
    0
    We are making a child table per year of transactions. (ie : Orders is the master table, Orders_2009, Orders_2010, ... being the child tables). This is to get better performance on SELECT queries (assuming we use "constraint_exclusion = on;")

    These child tables (if they doesn't already exists) and their indexes are created dynamically on the insert trigger of the master table. The master table is a 185 column table with 17 indexes. We put exactly the same indexes on the child tables.

    Following your explanation, I understand that this is the best way to do it, right?

    Thanks for all your help!

    Originally Posted by zxq9
    Creating the child table indexes by hand is fine, unless there are tons of them and you an predict what they will be. Like when partitioning for performance (making a child table per year of transactions, for example). In this case you can still create them by hand, of course, but its easier to write a function that does this for you along with the table creation itself.

    That's a very different case than using inheritance to create semantically sub-classed tables (like a "dog" table that inherits its general form from a "pet" table). In the semantic-inheritance case there is usually no other reasonable option but to create the indexes by hand, since they are likely different for each table.

    As for creating an index over an expression that includes all the children... you can write a function that operates on a parent table and returns values that include children and index over that result. The index would only be used on queries that use the result of the function, but its one way to make a giant, master index.

    This is a little hackish though, and in the case of partitioning for performance isn't nearly as good an idea as just using exclusion constraints so the planner can know what tables can't possibly have data relevant to a result and skip checking them (and their indexes) completely.

    As you can see, the indexing plan and method of doing it has a lot to do with why you are using inheritance in the first place. In the case of partitions you gain the most performance by using exclusion constraints in addition to indexes (and can save time typing and errors by automating the table and index creation); in the case of semantic inheritance your index use is really dependent on the use case per-table.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Partitioning by year is explained quite well in the Postgres docs. I would take it just a touch farther and write a db function that can generate the yearly tables on demand (either from a cron job or on invokation like "SELECT make_yearly_bigass_table(int year)".

    Don't forget solid check constraints. That's how Postgres will know to skip irrelevant tables completely when you make a query. So let's say I have a setup like this:
    sql Code:
    CREATE TABLE phone_calls
      (id    SERIAL NOT NULL,
       dtg   timestamptz DEFAULT CURRENT_TIMESTAMP NOT NULL,
       origin  VARCHAR(20) NOT NULL);
     
    CREATE TABLE phone_calls_2012
      (PRIMARY KEY (id),
       CHECK (dtg >= '2012-01-01' AND dtg < '2013-01-01'))
      INHERITS (phone_calls);
    --CREATE INDEX(es) blahblahblah
     
    CREATE TABLE phone_calls_2013
      (PRIMARY KEY (id),
       CHECK (dtg >= '2013-01-01' AND dtg < '2014-01-01'))
      INHERITS (phone_calls);
    --CREATE INDEX(es) blahblahblah

    Note that we didn't bother to define any indexes (not even a primary key) on the parent table -- because no data will ever go there anyway, and unique constraints don't extend across the partition boundaries anyway. What the "id" column is really doing is creating an integer field to inherit and an index to inherit a default from -- nothing more. That it will be unique if you don't fiddle with it is just a matter of course.

    When you query phone_calls you'll get everything, when you query phone_calls with a WHERE or JOIN ON the "dtg" part though, it won't scan all partitions, it will only scan the partitions that are allowed to contain the answer your looking for. In the case of a huge table (or your case, insanely wide tables where even just a few rows probably don't fit on page) this can give a pretty big boost to read times.

    But check your table use stats over time to see if you're actually hitting your indexes all the time. You should be, but you can be surprised at how things work sometimes depending on how the server is configured. This is especially true when it comes to work_mem and non-index sorts, or the (usually) surprisingly good optimization of CTEs VS subqueries in expressions, or results that include the output of a table function or whatever.

    Good luck.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    5
    Rep Power
    0
    Alright, this clears things up. Thanks for all these hints!

    Originally Posted by zxq9
    Partitioning by year is explained quite well in the Postgres docs. I would take it just a touch farther and write a db function that can generate the yearly tables on demand (either from a cron job or on invokation like "SELECT make_yearly_bigass_table(int year)".

    Don't forget solid check constraints. That's how Postgres will know to skip irrelevant tables completely when you make a query. So let's say I have a setup like this:
    sql Code:
    CREATE TABLE phone_calls
      (id    SERIAL NOT NULL,
       dtg   timestamptz DEFAULT CURRENT_TIMESTAMP NOT NULL,
       origin  VARCHAR(20) NOT NULL);
     
    CREATE TABLE phone_calls_2012
      (PRIMARY KEY (id),
       CHECK (dtg >= '2012-01-01' AND dtg < '2013-01-01'))
      INHERITS (phone_calls);
    --CREATE INDEX(es) blahblahblah
     
    CREATE TABLE phone_calls_2013
      (PRIMARY KEY (id),
       CHECK (dtg >= '2013-01-01' AND dtg < '2014-01-01'))
      INHERITS (phone_calls);
    --CREATE INDEX(es) blahblahblah

    Note that we didn't bother to define any indexes (not even a primary key) on the parent table -- because no data will ever go there anyway, and unique constraints don't extend across the partition boundaries anyway. What the "id" column is really doing is creating an integer field to inherit and an index to inherit a default from -- nothing more. That it will be unique if you don't fiddle with it is just a matter of course.

    When you query phone_calls you'll get everything, when you query phone_calls with a WHERE or JOIN ON the "dtg" part though, it won't scan all partitions, it will only scan the partitions that are allowed to contain the answer your looking for. In the case of a huge table (or your case, insanely wide tables where even just a few rows probably don't fit on page) this can give a pretty big boost to read times.

    But check your table use stats over time to see if you're actually hitting your indexes all the time. You should be, but you can be surprised at how things work sometimes depending on how the server is configured. This is especially true when it comes to work_mem and non-index sorts, or the (usually) surprisingly good optimization of CTEs VS subqueries in expressions, or results that include the output of a table function or whatever.

    Good luck.

IMN logo majestic logo threadwatch logo seochat tools logo