#1
  1. Prom night: 1973
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Dec 2001
    Posts
    1,156
    Rep Power
    25

    Optimising a char select


    Hi,

    I'm making the switch from MySQL to Postgres. Everything's been going fine, but I've now run into a rather slow query. I've a table with 10,000+ entries, and a CHAR(4) column called Code that I'm using in my select statements. When I do a search for a large(ish) number of items, it's taking 4+ seconds to return results. This was never a problem in MySQL, and so I'm thinking there must be a way to index the table, or rewrite my query.

    Thanks for any help.

    Example query:

    SELECT FabricID, Code FROM Fabric WHERE Code IN ('X295','X296','X297','X298','X299','X300','X301','X302','X303','X304','X306','X400','X403','X404',' X405','X4
    06','X408','X409','X417','X423','X424','X425','X427','X428','X786','X787','X788','X877','X878','X307 ','X308','X309','X310'
    ,'X311','X312','X313','X314','X315','X316','X317','X318','X319','X320','X321','X322','X323','X324',' X325','X326','X327','X
    328','X329','X330','X331','X332','X333','X335','X336','X337','X338','X339','X340','X341','X342','X34 3','X344','X345','X346
    ','X431','X434','X435','X436','X437','X438','X899','Y030','Y058','Y062','Y067','Y068','Y069','Y070', 'Y071','Y072','Y073','
    X347','X348','X349','X350','X351','X352','X445','X449','X450','X452','X453','X454','X455','X456','X4 57','X785','X789','X90
    0','X935','X986','X989','X169','X353','X354','X355','X356','X357','X358','X359','X360','X361','X363' ,'X364','X365','X366',
    'X367','X368','X369','X370','X371','X372','X373','X374','X375','X376','X377','X378','X379','X380','X 381','X382','X383','X4
    59','X461','X463','X465','X466','X467','X471','X472','X473','X474','X475','X993','X384','X385','X386 ','X387','X388','X389'
    ,'X390','X391','X392','X393','X394','X395','X396','X397','X398','X399','X497','X498','X499','X500',' X502','X503','X504','X
    505','X506','X508','X509','X510','X511','X512','X513','X514','X515','X516','X752','X753','X925','X92 6','X927','X928','X929
    ','X930','X990','X517','X518','X519','X520','X521','X522','X523','X524','X525','X526','X527','X528', 'X529','X530','X531','
    X532','X533','X534','X535','X536','X537','X538','X956','X957','X958','X959','X960','X961','X962','X9 63','X982','X983','X98
    4','X985','X546','X547','X548','X551','X553','X554','X555','X936','X937','X938','X939','X940','X941' ,'X942','X943','X944',
    'F001','F002','F003','F004','F005','F006','F007','F008','F009','F010','F011','F012','F013','F014','F 015','F016','F017','F0
    18','F019','F084','F085','F086','F087','F088','F089','F090','F091','F092','F093','F094','F095','F096 ','F097','F098','F099'
    ,'F101','F102','F103','F104','F106','F107','F108','F109','F110','F112','F113','F114','F115','F116',' F117','F118','F119','F
    120','F121','F122','F123','F124','F125','F126','F127','F128','F129','F130','F131','F132','F133','F13 4','F135','F136','F137
    ','F138','F140','F141','F142','F143','F144','F145','F146','F147','F148','F149','F150','F151','F152', 'F153','F154','F155','
    F156','F157','F158','F159','F160','F161','F162','F163','F164','F165','F166','F167','F168','F169','F1 70','F171','F172','F17
    3','F174','F175','F176','F177','F178','F179','F180','F181','F182','F183','F184','F185','F186','F187' ,'F188','F189','F190',
    'F191','F192','F193','F194','F195','F196','F197','F198','F199','F200','F201','F202','F203','F204','F 205','F206','F207','F2
    08','F209','F210','F211','F212','F214','F215','F216','F217','F218','F219','F220','F221','F222','F223 ','F224','F225','F226'
    ,'F227','F228','F229','F230','F231','F232','F233','F234','F236','F237','F238','F239','F240','F241',' F242','F243','F244','F
    245','F246','F247','F248','F249','F250','F251','F252','F253','F254','F255','F256','F257','F258','F25 9','F260')
    Last edited by kurious; September 10th, 2003 at 05:32 PM.
  2. #2
  3. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    Why would you do a partial index on such a narrow column? Just index the whole thing.

    But possibly the main problem here is not indexing, but the use of IN(). IN is a notoriously slow construct in SQL anyway, and often is an unnecessary way to approach things when you have a more relational system.

    Let me ask, how do you arrive at this large list of values you are placing in the IN() clause? Is it the result of a query? If so, you can often it with a subquery, such as "WHERE column IN(SELECT column FROM table WHERE [condition])".

    But, with some knowledge of your database structure and design, we could probably arrive at an even more efficient way to do this. Generally, a query such as yours is evidence that some database design issues need to be re-thought.

    One other reason this query might not perform as fast as MySQL might be because MySQL tables often have fulltext indexing turned on by default. PostgreSQL has a couple ways to achieve fulltext indexing, but they are external modules that need to be installed.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  4. #3
  5. Prom night: 1973
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Dec 2001
    Posts
    1,156
    Rep Power
    25
    Originally posted by rycamor
    Why would you do a partial index on such a narrow column? Just index the whole thing.
    Okay, done.

    But possibly the main problem here is not indexing, but the use of IN(). IN is a notoriously slow construct in SQL anyway, and often is an unnecessary way to approach things when you have a more relational system.

    Let me ask, how do you arrive at this large list of values you are placing in the IN() clause? Is it the result of a query? If so, you can often it with a subquery, such as "WHERE column IN(SELECT column FROM table WHERE [condition])".

    But, with some knowledge of your database structure and design, we could probably arrive at an even more efficient way to do this. Generally, a query such as yours is evidence that some database design issues need to be re-thought.
    Well, I arrived at the large list of values because IN has never been slow in Oracle or MySQL; it's the way I've always done things up until now. I'm importing data from another database via some CSV files, and so the codes I'm trying to select aren't the result of any Postgres query. I'm basically testing to see whether items already exist in the table. I could query individually, but I'm afraid that's not always going to be an effective solution in the long run.

    I've been trying to get my work to dump MySQL for a long time, and I was hoping to do so with minimum change to my coding style. Oh, well. Time to crank my brain into gear and learn something, I suppose


    One other reason this query might not perform as fast as MySQL might be because MySQL tables often have fulltext indexing turned on by default. PostgreSQL has a couple ways to achieve fulltext indexing, but they are external modules that need to be installed.
    Would a fulltext index make any difference on such a small column? I don't know how they work.
  6. #4
  7. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    Originally posted by kurious
    Okay, done.
    Well, I arrived at the large list of values because IN has never been slow in Oracle or MySQL; it's the way I've always done things up until now. I'm importing data from another database via some CSV files, and so the codes I'm trying to select aren't the result of any Postgres query. I'm basically testing to see whether items already exist in the table. I could query individually, but I'm afraid that's not always going to be an effective solution in the long run.
    Really, the correct/fast way to do this is import the data to a temp table, using the COPY command (which is extremely fast), and then run standard queries to determine what to do with files. The added benefit of this is that it can be done completely in PostgreSQL, without needing an external programming environment. In fact, it can even be done with a stored procedure, which precompiles the behavior for more speed.

    I've been trying to get my work to dump MySQL for a long time, and I was hoping to do so with minimum change to my coding style. Oh, well. Time to crank my brain into gear and learn something, I suppose
    Do it... Learning serious database concepts is always worth it .

    Would a fulltext index make any difference on such a small column? I don't know how they work.
    Probably not... it was just a thought.

    Anyway, before we get too worried about PostgreSQL's capabilities, here are a couple other things to consider:

    1. First, check out how your query works with the index you just created. You might even try EXPLAIN on your query, to see what ugliness it reports.

    2. Is PostgreSQL running on Windows, with Cygwin? If so, then that's the problem in a nutshell. PostgreSQL only performs well on Linux/Unix for the moment. Native windows versions are coming soon.

    3. Is postgresql.conf set to the default configuration, or have you done any tweaking for performance? Hint: tweak, please... the default configuration is very conservative, and would probably even run on a 486 with 16 MB of RAM. Increase the buffers, cache limits, etc...

    4. What version of PostgreSQL are you running? You might want to consider moving to PostgreSQL 7.4 beta2. Even though it is still beta, it is very stable, and has been tested for almost a year. 7.4 has substantially improved performance of IN() and a few other constructs. The full release of 7.4 should be out within a month or so.

    So, after taking parts 1-4 into account above, let's see how the query runs after you are though. Enjoy.

    (P.S. You might want to search through the PostgreSQL forum for topics like "performance" and "postgresql.conf".)
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  8. #5
  9. Prom night: 1973
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Dec 2001
    Posts
    1,156
    Rep Power
    25
    Originally posted by rycamor

    Do it... Learning serious database concepts is always worth it .
    Oh, I used to know one or two serious database concepts Two years of MySQLing has made me instantly think, "I'll just fire up PHP to get this done" Doing it all in the database seems like a totally left-field idea at the moment.

    I'm using a rather old version of Linux (7.2) to run Postgres 7.3.4, but I'll stick 7.4 onto the machine today. Once I've done that I'll take a look at tweaking the config, and then figure out how to interpret the EXPLAIN output. If that still doesn't work, I'll bother you some more, but in the meantime thank you very much for writing such complete responses to my posts.
  10. #6
  11. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    One final note: if this is a production system, you should still be careful about using PostgreSQL 7.4Beta2. I am not in the slightest way worried that it will randomly corrupt your data, but still, you should make frequent backups, and do things carefully until the final release is out. For one thing, do NOT assume that when the final release is out, you can just install it over your beta version: do a complete dump of your data, wipe your data directory, install the final release, and then do a complete restore. You might not need to do things this way, but it is better to be careful.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com
  12. #7
  13. Prom night: 1973
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Dec 2001
    Posts
    1,156
    Rep Power
    25
    Installed the newest beta, reinserted all of the data, increased shared memory... with a very slight imporvement in performance. Then I went to the manual and started reading up on how to make sense of EXPLAIN results. The first point in "Examing index usage" is "Always run ANALYZE first."

    The query is now flying. Somebody hit me.
  14. #8
  15. No Profile Picture
    Gödelian monster
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Jul 1999
    Location
    Central Florida, USA
    Posts
    2,307
    Rep Power
    62
    Originally posted by kurious
    The query is now flying. Somebody hit me.
    lol... that's my fault. I forgot to mention ANALYZE and VACUUM ANALYZE.
    The real n-tier system:

    FreeBSD -> PostgreSQL -> [any_language] -> Apache -> Mozilla/XUL

    Amazon wishlist -- rycamor (at) gmail.com

IMN logo majestic logo threadwatch logo seochat tools logo