#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2002
    Posts
    4
    Rep Power
    0

    Question performance for a small but very active table


    Hello,

    I would like to know if there is some way to speed up some querrys on a small but very active table. I have a list of process (like "ps" or "top" datas) stored in a table, comming from a cluster of computers. This table is never very big (max 5000 rows), but refreshed every 5 seconds. Afters a few hours, if I dont do a vacuum, even a small simple select querry takes seconds.

    Do you know how to speed up that without vaccuming every 5 minutes ?

    thanks
  2. #2
  3. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    61
    When you say the table is refreshed every 5 seconds, do you mean it is updated every 5 seconds, meaning new records entered, old ones deleted, etc... or is it just a simple SELECT query every 5 seconds, with updates less frequently?

    If you don't mind, please show a dump of the table CREATE statement, so we can analyze better.

    And, what version of PostgreSQL?
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2002
    Posts
    4
    Rep Power
    0

    Post


    The table I use is updated by the computers in the network cluster. Every 5 seconds each computer :

    - Delete all rows in the table from thier IP in the correspondig field.
    - insert a list of process with memory and CPU info in the table

    so, they are a lot of change but not so many rows.

    Then, the table is read from a php webpage, and here is the problem. When a quite simple query is made, for exemple, a sorted list of the 10 most CPU consuming process with the computer names, it can take up to 15 seconds to display after 3 hours without vacuum; and that's not very good for a real time process monitoring interface...

    As this is a very "volatil" table, is there a way for exemple to keep it in memory, insted of hard disk. and is there a better way than vacuuming (also because it takes some times, and the table is locked during this time in older versions of postgresql)

    I generaly use Postgresql 7.0, because the program i'm developping needs to run even on a quitte old system. but I will consider benchmarking against 7.2 if you think it can make a signifiant difference.

    Thanks for your help


    Stephane
  6. #4
  7. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    61
    This sounds like a cool project. Well, there are a few things I can think of:

    1. It's impossible for me to really help without seeing table structure, and which columns you choose to index (which is critical for a frequently updated table)

    2. It's also important to consider the database design itself. Have you normalized your database, so there is no redundant information?

    3. What client connection method, and programming interface are you using to connect these servers to the database? Are you using persistent connections, or opening and closing connections every 5 seconds?

    4. There are many things you can do to tune PostgreSQL, especially at compile time. There are also settings in postgresql.conf which regulate the number of users who can connect, etc... Have you done anything in these areas?

    Here are a few notes from Bruce Momjian's book, available online:

    3.10) How do I tune the database engine for better performance?
    Certainly, indices can speed up queries. The EXPLAIN command allows you to see how POSTGRESQL is interpreting your query, and which indices are being used.

    If you are doing a lot of INSERTs, consider doing them in a large batch using the COPY command. This is much faster than individual INSERTs. Second, statements not in a BEGIN WORK/COMMIT transaction block are considered to be in their own transaction. Consider performing several statements in a single transaction block. This reduces the transaction overhead. Also consider dropping and recreating indices when making large data changes.

    There are several tuning options. You can disable fsync() by starting the postmaster with a -o -F option. This will prevent fsync()'s from flushing to disk after every transaction.

    You can also use the postmaster -B option to increase the number of shared memory buffers used by the backend processes. If you make this parameter too high, the postmaster may not start because you've exceeded your kernel's limit on shared memory space. Each buffer is 8K and the default is 64 buffers.

    You can also use the backend -S option to increase the maximum amount of memory used by the backend process for temporary sorts. The -S value is measured in kilobytes, and the default is 512 (ie, 512K).

    You can also use the CLUSTER command to group data in tables to match an index. See the CLUSTER manual page for more details.

    5. I suppose it's possible to mount the PosrtgreSQL data partition on a RAM disk, in which case it should be a lot faster. I don't know how this affects PostgreSQL overall, though.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2002
    Posts
    4
    Rep Power
    0

    Post


    For information :
    - persistant connexion is used

    - vaccuum can divide the time by 100, but need to be done more than once an hour to remain usable, and thats not very good for an "high availability cluster" to lock this table and get the web stat page "unavaillable".

    - the database structure for this part is very simple, and so is the querrys (compared to the time it can take)

    exmple querry :
    SELECT nodes.name AS node,* FROM top WHERE nodes.ip=ip ORDER BY "pid";

    (even without order by, it also takes a long time)

    cluster-1.1.4-pre1-10.0.0.0=> \d top
    Table "top"
    Attribute | Type | Modifier
    -----------+--------------+----------
    ip | varchar(20) | not null
    pid | integer | not null
    owner | varchar(20) | not null
    cmd | varchar(200) | not null
    misc | varchar(200) | not null
    pcpu | numeric(5,2) |
    pmem | numeric(5,2) |
    Index: top_pkey


    cluster-1.1.4-pre1-10.0.0.0=> \d nodes
    Table "nodes"
    Attribute | Type | Modifier
    -------------+---------------+----------
    ip | varchar(15) | not null
    name | varchar(20) |
    ...
    ...
    ...
    Indices: nodes_mac_key,
    nodes_name_key,
    nodes_pkey

    - and thanks for your advices and your notes from Bruce Momjian's book. I already use begin/commit, and the insert, are working quite well. but the problem seems to comme from the select, when the base is not vacuumed very frequently (because the table only contain about 0.01% of usable information after an hour and because each row don't lives more than 5s).

    I will try to play with postmaster options, to see if it helps.

    thanks

    Stephane
  10. #6
  11. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    61
    Well, I can offer 3 more possibilities:

    1. You might want to index your 'pid' column in the "top" table, since your query sorts by that column. You might even want to experiment with indexing your nodes.name column also. As with any serious experimenting, use the EXPLAIN keyword on your main query before and after making changes, to see the difference.

    2. Why not use the PostgreSQL's network address datatype called 'inet' for your 'ip' column? This constrains the column only to allow valid IP addresses, and should be a little faster, as well as enforcing accuracy.

    3. Version 7.2 of PostgreSQL should give you some significant improvements. One of these being that you can perform a Vacuum without taking the database off-line.

    HTH. Good luck.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2001
    Location
    High above the mucky-muck (Columbus, OH)
    Posts
    266
    Rep Power
    13
    A composite index on ip, pid ASC would be a good thing to have. At least in Sybase ASE it will use the leading column to find rows then the secondary one to sort (rather than using a temp table). Not sure if PostgreSQL can, but it would be worth checking up on.
  14. #8
  15. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    61
    Good idea. Composite indexes are no problem for PostgreSQL: http://www.postgresql.org/idocs/inde...eateindex.html
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  16. #9
  17. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2002
    Posts
    4
    Rep Power
    0

    thanks


    thank you very much, I'm not sure I will use more index because sorting with orther fields is planed (by CPU, Mem ..., like the real "top" program).

    but other advises helped a lot, so thanks for all...

    Stephane

IMN logo majestic logo threadwatch logo seochat tools logo