PostgreSQL Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsDatabasesPostgreSQL Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 12th, 2012, 08:40 AM
db042188 db042188 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 11 db042188 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 48 m 30 sec
Reputation Power: 0
Making PostgreSQL Insert Performance Reasonable

Hi. We're looking for a silver bullet or two that would make us take PostgreSQL more seriously.

We're running two different etl products loading 30 million rows from sql server to PosgreSQL. We've tried different drivers, unlogged tables, different memory settings, removing indexes, using same server, using separate servers, high commit counts in some of the connectors etc etc but arent even coming close to the 1hr in which ssis can load 30 million rows from one sql server db to another. Our machine is a physical machine with tons of SAN and 36 gig of RAM.

The best time we see so far loading PostgreSQL is 12 hours. The table's row is a mix of integer, date , money and boolean columns. I believe my peer is leaving the PK defined when he runs these tests. It is type serial but at the moment we are mapping an input sql server pk to this column.

I can provide a lot more info but I suspect that with us being new there are one or two major settings we dont even know about. And we dont mind reading documentation but we'd like to justify that kind of thing by first seeing this product get into the ballpark.

Reply With Quote
  #2  
Old December 12th, 2012, 08:52 AM
shammat shammat is offline
Contributing User
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Oct 2003
Location: Germany
Posts: 2,685 shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 1 Month 3 Weeks 3 Days 19 h 56 m 12 sec
Reputation Power: 284
How do you load the data?

The fastest way to load data is using the COPY command (with the input file located on the server).

Did you try "fsync = off"? This should turned back on, once you have loaded the data, otherwise you leave the database open for data corruption if your server is not shut down cleanly.

Btw: "high commit counts" will usually slow down things in PostgreSQL. Try committing only once.
__________________
I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags.
http://forums.devshed.com/misc.php?do=bbcode#code

Tips on how to ask better questions:
http://tkyte.blogspot.de/2005/06/how-to-ask-questions.html
http://wiki.postgresql.org/wiki/SlowQueryQuestions
http://catb.org/esr/faqs/smart-questions.html

Reply With Quote
  #3  
Old December 12th, 2012, 09:11 AM
db042188 db042188 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 11 db042188 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 48 m 30 sec
Reputation Power: 0
Thx Shammat.

Each of the products we're looking at is an etl product. One uses Talend as an oem. The other is ssis from Microsoft.

The Talend based product came with its own PostgreSQL driver/connector but that is all we know about it and we dont know if it understands "bulk insert".

In ssis, we've tried devart's and cozyroc's PostgreSQL connectors/components as what is called a "destination" component. And an off the shelf MS connector in the "source" component for our 30 million row sql server table.

We draw an arrow between the two and the etl products know what to do from there. Some need a little help in data type conversions. For those cases we insert what are called "transform" components between the data source and data destination.

If we use the copy command, can the source be a sql table? What formats can the source be in when using the copy command? It looks like csv and txt are the choices. I'm assuming we can issue that command from the PG ADMIN III front end.

We'll look up and try the fsync thing. It looks like finding the instructions for issuing this command is going to be an adventure.

I misspoke on the commit count. Its really a higher number of rows being committed at a time than the default. Its a bit unclear to us at this time if we are getting (or can get) what is called Bulk Insert behavior with these 3rd party connectors. That's always been something we just checked in sql server destination components.

I should have mentioned we're running PG 9.2 on Windows.

Reply With Quote
  #4  
Old December 13th, 2012, 02:05 AM
shammat shammat is offline
Contributing User
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Oct 2003
Location: Germany
Posts: 2,685 shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 1 Month 3 Weeks 3 Days 19 h 56 m 12 sec
Reputation Power: 284
Quote:
Originally Posted by db042188
We draw an arrow between the two and the etl products know what to do from there. Some need a little help in data type conversions. For those cases we insert what are called "transform" components between the data source and data destination.
Well, apparently those tools aren't using the native "bulk load" operation which is the COPY command. I can load a ~1GB text file with ~35 million rows using the COPY command in about 5 Minutes on a virtualized server (so the IO performance is not really that good)

Quote:
If we use the copy command, can the source be a sql table? What formats can the source be in when using the copy command? It looks like csv and txt are the choices.
COPY (to load data) can only use text files.

Quote:
I'm assuming we can issue that command from the PG ADMIN III front end.
No idea. I don't use pgAdmin, but I would very much assume so. Especially as the file is read from the server by the PostgreSQL server process (unless COPY .. FROM stdin is used).

More details in the manual:

Quote:
We'll look up and try the fsync thing. It looks like finding the instructions for issuing this command is going to be an adventure.
Why? It's all documented in the manual: http://www.postgresql.org/docs/current/static/runtime-config-wal.html

There is a dedicated chapter on bulk loading in the manual: http://www.postgresql.org/docs/current/interactive/populate.html

You also want to go through this page: http://wiki.postgresql.org/wiki/Performance_Optimization

Reply With Quote
  #5  
Old December 13th, 2012, 08:00 AM
db042188 db042188 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 11 db042188 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 48 m 30 sec
Reputation Power: 0
Thx Shammat. We're taking PostgreSQL much more seriously now. We created a text file from sql in 12 minutes and with a copy command in PG Admin III loaded it to PostgreSQL in 25 minutes. The table consumes 8 gig in data space. So in 37 minutes we accomplished what takes a similar sql to sql task 1 hour. Very impressive. And that is without the fsync item figured out.

Now we will look at query performance.

We're also curious how the copy command can be issued from a line (dos) command instead of a client tool. With that knowledge in our arsenal, we can easily have our etl jobs execute a "copy from" command rather than an inefficient 3rd party PostgreSQL adaptor component.

Some PostgreSQL documentation is written for PosgreSQL folks like you and we want to spend only minimum $s right now investigating the potential of this product. Terms like "Manpage" found during our quick reads on fsync mean nothing to us and turn us off temporarily from going down certain research paths.

So for a while the answer to your question about documentation is "because we are new, some of the documentation is over our heads and we will invest time in reading tons of documentation only after we see some potential in any given product".

As you can imagine, we are comparing a number of products right now. So we need to be careful upfront about where we invest time. But we intend to catch on quickly with a little patience from forums like this one.

Reply With Quote
  #6  
Old December 13th, 2012, 10:18 AM
shammat shammat is offline
Contributing User
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Oct 2003
Location: Germany
Posts: 2,685 shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level)shammat User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 1 Month 3 Weeks 3 Days 19 h 56 m 12 sec
Reputation Power: 284
Quote:
Originally Posted by db042188
We're also curious how the copy command can be issued from a line (dos) command instead of a client tool.
COPY is a regular Postgres SQL statement, so you can run it with the psql commandline client e.g. putting the statement into a SQL script and then call psql passing the scriptname - the same way Oracle's SQL*Plus works.

Quote:
Terms like "Manpage" found during our quick reads on fsync mean nothing to us and turn us off temporarily from going down certain research paths.
You did read link to the manual (not "manpage") that I included in my previous answer?

Reply With Quote
  #7  
Old December 13th, 2012, 10:51 AM
db042188 db042188 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2012
Posts: 11 db042188 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 48 m 30 sec
Reputation Power: 0
Thx Shammat. We dont use Oracle here.

We're probably going to find that we need to run something like this..c:\>psql.exe param1,parm2...etc. Where one of these params is a "sql statement". We'll look up the details.

Before your post we probably hit most of the docs you listed. Somewhere in those docs use of the word "manpage" (or something like that) turned us off temporarily from pursuing the exact recipe for turning off fsync. We plan to come back to your docs if PG's query engine shows promise.

I should add that we have already aquired a book on the subject. But we may need this or another forum to help us get started. We usually ask only a few strategic questions in forums before we become mostly self reliant. Your patience is appreciated.

Reply With Quote
Reply

Viewing: Dev Shed ForumsDatabasesPostgreSQL Help > Making PostgreSQL Insert Performance Reasonable

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap