April 12th, 2002, 12:01 PM
performance for a small but very active table
I would like to know if there is some way to speed up some querrys on a small but very active table. I have a list of process (like "ps" or "top" datas) stored in a table, comming from a cluster of computers. This table is never very big (max 5000 rows), but refreshed every 5 seconds. Afters a few hours, if I dont do a vacuum, even a small simple select querry takes seconds.
Do you know how to speed up that without vaccuming every 5 minutes ?
April 12th, 2002, 12:12 PM
When you say the table is refreshed every 5 seconds, do you mean it is updated every 5 seconds, meaning new records entered, old ones deleted, etc... or is it just a simple SELECT query every 5 seconds, with updates less frequently?
If you don't mind, please show a dump of the table CREATE statement, so we can analyze better.
And, what version of PostgreSQL?
April 12th, 2002, 02:14 PM
The table I use is updated by the computers in the network cluster. Every 5 seconds each computer :
- Delete all rows in the table from thier IP in the correspondig field.
- insert a list of process with memory and CPU info in the table
so, they are a lot of change but not so many rows.
Then, the table is read from a php webpage, and here is the problem. When a quite simple query is made, for exemple, a sorted list of the 10 most CPU consuming process with the computer names, it can take up to 15 seconds to display after 3 hours without vacuum; and that's not very good for a real time process monitoring interface...
As this is a very "volatil" table, is there a way for exemple to keep it in memory, insted of hard disk. and is there a better way than vacuuming (also because it takes some times, and the table is locked during this time in older versions of postgresql)
I generaly use Postgresql 7.0, because the program i'm developping needs to run even on a quitte old system. but I will consider benchmarking against 7.2 if you think it can make a signifiant difference.
Thanks for your help
April 12th, 2002, 02:38 PM
This sounds like a cool project. Well, there are a few things I can think of:
1. It's impossible for me to really help without seeing table structure, and which columns you choose to index (which is critical for a frequently updated table)
2. It's also important to consider the database design itself. Have you normalized your database, so there is no redundant information?
3. What client connection method, and programming interface are you using to connect these servers to the database? Are you using persistent connections, or opening and closing connections every 5 seconds?
4. There are many things you can do to tune PostgreSQL, especially at compile time. There are also settings in postgresql.conf which regulate the number of users who can connect, etc... Have you done anything in these areas?
Here are a few notes from Bruce Momjian's book, available online:
5. I suppose it's possible to mount the PosrtgreSQL data partition on a RAM disk, in which case it should be a lot faster. I don't know how this affects PostgreSQL overall, though.
April 15th, 2002, 07:03 AM
For information :
- persistant connexion is used
- vaccuum can divide the time by 100, but need to be done more than once an hour to remain usable, and thats not very good for an "high availability cluster" to lock this table and get the web stat page "unavaillable".
- the database structure for this part is very simple, and so is the querrys (compared to the time it can take)
exmple querry :
SELECT nodes.name AS node,* FROM top WHERE nodes.ip=ip ORDER BY "pid";
(even without order by, it also takes a long time)
cluster-1.1.4-pre1-10.0.0.0=> \d top
Attribute | Type | Modifier
ip | varchar(20) | not null
pid | integer | not null
owner | varchar(20) | not null
cmd | varchar(200) | not null
misc | varchar(200) | not null
pcpu | numeric(5,2) |
pmem | numeric(5,2) |
cluster-1.1.4-pre1-10.0.0.0=> \d nodes
Attribute | Type | Modifier
ip | varchar(15) | not null
name | varchar(20) |
- and thanks for your advices and your notes from Bruce Momjian's book. I already use begin/commit, and the insert, are working quite well. but the problem seems to comme from the select, when the base is not vacuumed very frequently (because the table only contain about 0.01% of usable information after an hour and because each row don't lives more than 5s).
I will try to play with postmaster options, to see if it helps.
April 15th, 2002, 11:32 AM
Well, I can offer 3 more possibilities:
1. You might want to index your 'pid' column in the "top" table, since your query sorts by that column. You might even want to experiment with indexing your nodes.name column also. As with any serious experimenting, use the EXPLAIN keyword on your main query before and after making changes, to see the difference.
2. Why not use the PostgreSQL's network address datatype called 'inet' for your 'ip' column? This constrains the column only to allow valid IP addresses, and should be a little faster, as well as enforcing accuracy.
3. Version 7.2 of PostgreSQL should give you some significant improvements. One of these being that you can perform a Vacuum without taking the database off-line.
HTH. Good luck.
April 15th, 2002, 03:38 PM
A composite index on ip, pid ASC would be a good thing to have. At least in Sybase ASE it will use the leading column to find rows then the secondary one to sort (rather than using a temp table). Not sure if PostgreSQL can, but it would be worth checking up on.
April 15th, 2002, 05:55 PM
April 17th, 2002, 07:08 AM
thank you very much, I'm not sure I will use more index because sorting with orther fields is planed (by CPU, Mem ..., like the real "top" program).
but other advises helped a lot, so thanks for all...