Page 2 of 2 First 12
  • Jump to page:
    #16
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    If you are not updating the table then clustering is ok, you'll have to re-cluster after the inserts ofcourse.
  2. #17
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    12
    Rep Power
    0

    Postgres Performance Tuning


    hi,
    Do you mean it is essential to re-cluster after I do bulk insert of daily records to this table. Bulk insert will be done once in a day.
  4. #18
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Yes. Clustering is a one-time operation; all the records that are in the table at the time you give the cluster command wll be re-ordered in the datafiles. Any records you add after the cluster command has finished will be appended to the end of the datafile in the order in which they are inserted, they will not be clustered.
  6. #19
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    12
    Rep Power
    0

    Postgres Performance Tuning


    hi,
    thanks. You are right. As per the document, new records will not get inserted as per the Clustered Index order, hence it requires re-clustering.

    As I was trying to achieve the performance and found CLUSTER seems an answer, but concerned because of re-clutering.
    Can I get the CLUSTER performance if I create table in an order by 'ASC' or 'DESC', for example if I create table on 'group_id, end_time' in ASC order. Because CLUSTER also does the same, am I rite ?
  8. #20
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Can I get the CLUSTER performance if I create table in an order by 'ASC' or 'DESC', for example if I create table on 'group_id, end_time' in ASC order.
    Probably not. The database will put new records anywhere it can, using the freespacemap if necessary. There is no guarantee at all that records that you insert in a particular order will be stored in that order.

    But, to avoid re-clustering data that has already been clustered you could try partitioning your data, which is similar to partial indexes but places today's records in a table called "data_20131213" and tomorrows records will go in "data_20131214" and so on. PostgreSQL can do this using inheritance. Once a partition has been loaded you cluster it and never touch it again. PostgreSQL is clever enough to recognize which partitions your table will hit, and simply ignore the partitions that can't hold data that is required for the current query. That means you benefit from the clustering and smaller indexes at the same time.
  10. #21
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    12
    Rep Power
    0

    Postgres Performance Tuning


    hi,
    As I mentioned earlier, re-clustering every day is not affordable because it will take few hours. Also I am not preferring partition because I need to keep few years of data, which means 1000's of daily partitions.
    Hence, I have reordered the index with the aim that there should not be different page read for every index pointer. That is, if I read a page from Disk based on Index pointer, the page should contain more number of required records.
    In my database, records will be inserted based on end_time, which means physical records in the disk will be in the order of end_time.
    Hence I modified my primary key Index from columns(group_id, start_time, call_id) to (end_time, group_id, start_time, call_id), inserted end_time in the index. Because my queries will have where condition with 'end_time' and 'group_id'.
    Copied the Query and Query Plan, Please suggest me what further tuning can be done.
    Number of Records in the table = 67,152,177
    Size of the table = 94 GB
    shared_buffer = 4GB
    effective_cache_size = 3750MB
    work_mem = 800MB
    maintenance_work_mem=256MB
    Indexes:
    "primary_key_det_report" PRIMARY KEY, btree (end_time, group_id, start_time, call_id), tablespace "indexspace"
    "client_detailed_rep" btree (client), tablespace "indexspace"
    "country_code_pay_detailed_rep" btree (country_code_pay), tablespace "indexspace"
    "country_code_recv_detailed_rep" btree (country_code_recv), tablespace "indexspace"
    "gateway_detailed_rep" btree (gateway), tablespace "indexspace"
    "master_country_code_detailed_rep" btree (master_country_code), tablespace "indexspace"


    testdb=# EXPLAIN ANALYZE SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_pay FROM detailed_report WHERE end_time>='2013-04-01 00:00' and end_time<'2013-06-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC;
    QUERY PLAN

    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    --------------------------------------------------
    Sort (cost=2818913.68..2818915.88 rows=878 width=54) (actual time=46220.766..46220.767 rows=6 loops=1)
    Sort Key: (sum(call_amount_recv))
    Sort Method: quicksort Memory: 25kB
    -> HashAggregate (cost=2818846.61..2818870.76 rows=878 width=54) (actual time=46214.324..46214.355 rows=6 loops=1)
    -> Bitmap Heap Scan on detailed_report (cost=957333.57..2806010.36 rows=570500 width=54) (actual time=12879.436..34935.401 rows=4494470 loops=1)
    Recheck Cond: (((client)::text = 'CHOICE'::text) AND (end_time >= '2013-04-01 00:00:00+00'::timestamp with time zone) AND (end_time < '2013-06-01 00:00:00+00'::timestamp with t
    ime zone) AND ((group_id)::text = 'admin'::text))
    -> BitmapAnd (cost=957333.57..957333.57 rows=570500 width=0) (actual time=11982.431..11982.431 rows=0 loops=1)
    -> Bitmap Index Scan on client_detailed_rep (cost=0.00..183229.40 rows=8239578 width=0) (actual time=3284.573..3284.573 rows=8281859 loops=1)
    Index Cond: ((client)::text = 'CHOICE'::text)
    -> Bitmap Index Scan on primary_key_det_report (cost=0.00..773818.67 rows=4649549 width=0) (actual time=7349.067..7349.067 rows=4755008 loops=1)
    Index Cond: ((end_time >= '2013-04-01 00:00:00+00'::timestamp with time zone) AND (end_time < '2013-06-01 00:00:00+00'::timestamp with time zone) AND ((group_id)::t
    ext = 'admin'::text))
    Total runtime: 46248.539 ms
  12. #22
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    Also I am not preferring partition because I need to keep few years of data, which means 1000's of daily partitions.
    The number of partitions really doesn't matter, you can create them automatically and they are fully transparent to your reporting queries.

    Besides which, you could also partition by week, which should make reclustering acceptably fast and means you only get 52 tables per year.

    In my database, records will be inserted based on end_time, which means physical records in the disk will be in the order of end_time.
    That's not true. They will be retrieved in that order, but the data can be fragmented all over the place.

    One last thing you could try is a GiST index on the date, that might prevent the recheck: http://www.postgresql.org/docs/9.1/static/btree-gist.html

    The way I see it; you lose a lot of time rechecking the index, and fetching the data from disk. Rechecks are a consequence of large indexes, and slow fetches are caused by not clustering (amung many other parameters of course).

    Reducing size of the dataset per query will benefit both problems.
  14. #23
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2013
    Posts
    12
    Rep Power
    0

    Postgres Performance Tuning


    hi,
    thanks, I am concerned because as per postgres document:
    "All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably. Partitioning using these techniques will work well with up to perhaps a hundred partitions; don't try to use many thousands of partitions."
  16. #24
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    158
    Rep Power
    10
    "All constraints on all partitions of the master table are examined during constraint exclusion, so large numbers of partitions are likely to increase query planning time considerably. Partitioning using these techniques will work well with up to perhaps a hundred partitions; don't try to use many thousands of partitions."
    Your partitioning will use exatly one constraint and I doubt very much that examining them will take more time than examining the records in the partitions. Excluding a partition will save much more time than it takes to find out whih partition to exclude.

    But as always; you have to try it to see if it works. I don't think that having 200 partitions for 4 years of data is going to be a problem.
Page 2 of 2 First 12
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo