|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
What to expect when Pumping Data?
Hi, I'm making my first attempt to develop an application that manages >1Gb of data.
I've allways worked with little databases in the MS-JET format (less than 400MB, around the 2 millions of records for one table), and now I've recieved a promotion to administer a new database (and produce apps to exploit it) that would have tables in the 15-20 millions of records. My question is, I'm used to "update" the data in lapses of few minutes, using direct VBA programming, even inside Excel, and those times are not scaling accordingly with Firebird. I know this is a completely different plane of existance, so I'm asking the experts here. What I'm trying todo is the following: I'm gonna manage the customers database for a bank, with tables describing the branches and their organization (900 branches, grouped in 48 zones, grouped in 9 regions) and a big table consisting in 5 fields: Key, Date, Product, Average and Remanent, to store a shot of the deposits of all the customers the bank has. So this table is gonna contain 24 shots (2 years of history) with each shot approx. of 5 million records. To fill the table I'll get a CSV file of about 400Mb each week. So far I've tried with MS-Access 2003, linking the CSV file and the FB Table via ODBC, and surely it works, but takes 3 hours to feed the data, in a local server. I'm using a P4-2.8Ghz 600~MB RAM SATA Disk, XPpro HT enabled. I know this is not the bes routet to take, but I need an orientation of how long should it take in a well developed process, so to know what to expect, and when to stop investing time trying to improve the process. Thanks in advance. |
|
#2
|
||||
|
||||
|
See Migration Tools
__________________
My blog about OpenSource Databases PDF tutorials about OSS databases, DBMonster ... Please contribute to Open Source Development, fill bug reports!!! Developer Shed eSupport Commented my.ini/my.cnf (PLEASE ADD YOUR OWN CONFIG TRICK) An introduction to database normalization Natural or Surrogate key Custom ordering for your results Correlated and uncorrelated subqueries Don't turn your outer joins into inner joins |
|
#3
|
|||
|
|||
|
I have modified Microsoft Access and SQL Server to InterBase Wizard (http://www.ibphoenix.com/main.nfs?a...51&page=sql2gdb) so it's now able to process UNICODE data. Mail me if you need that. And, using this tool, I was able to pump 8 Gb of data from MSSQL2000 to FB1.5 in 4-5 hours.
Also, see InterBase DataPump, and I would recommend this tool over sql2gdb (it better handles BLOB's). -- Best regards, Fikret Hasovic http://fikret.fbtalk.net TAMP R&D Team FirebirdSQL Foundation member. - Join today at http://www.firebirdsql.org/ff/foundation JEDI VCS contributor http://jedivcs.sourceforge.net/ |
|
#4
|
|||
|
|||
|
With well-developed pump process Firebird can inject a few hundred MB of data per CPU per minute.
For example, you can convert your text data into fixed-width representation which may be directly attached to Firebird database using CREATE TABLE ... EXTERNAL ... mechanizm and insert data into normal table using INSERT statement or stored procedure. If you have really fast IO subsystem and SMP box you can further improve performance via running serveral such INSERT statements in parallel on Firebird/Classic configuration. Try disabling forced writes for the pump process. Also, almost any UNIX tends to be much faster at bulk operations due to more intellegent filesystem cache and IO queue management. |
|
#5
|
|||
|
|||
|
Thanks to all, I'll start trying the DataPump first, then the externally attached file. The only problem with it is that the file I get is a varible lenght one, sepparated with "|", soI'll have to massage it first to convert it to fixe lenght. But I'll try all the different approaches to see what fills the bill better.
|
|
#6
|
|||
|
|||
|
I'm feeling a little too dumb...
I already made a little app in Delphi to transform my delimited txt to a fixed lenght one. But now when I try to CREATE TABLE... EXTERNAL FILE, I get an SQL error saying that "Access to external file ... is denied by server administrator", the TXT is not opened by any other program, what I'm doing wrong? Thanks... ![]() |
|
#7
|
|||
|
|||
|
Go to firebird.conf and adjust ExternalFileAccess option appropriately.
|
|
#8
|
|||
|
|||
|
Quote:
Great I have now completed a process to Import the text file into FireBird, using the external table. The full process takes about 35 minutes (down from 3 hours). Now I'm having another problem: The file I'm translating has one field with 2 decimals, containig the average a client has in one time period. I made Delphi SUM the field while parsing the original file, and I get 79,690,952,644.34 in my process, now if I do a SELECT SUM... in the final table i get 79,690,969,711.864. I defined the field as DOUBLE PRECISION, am I mistaken? should I declare them DECIMAL(15,2) or something? |
|
#9
|
|||
|
|||
|
Precision
Hey HereSomeHow,
The general rule with currency is that you calculate with exact numbers whenever you can. That means using "exact" data types, depending on your platform (sorry I haven't used Delphi or Firebird) they are usually called Decimal or Numeric. These exact types can still yield approximate results if some operation results in a repeating decimal component which goes beyond the precision of your decimal variables. These imprecisions can build up if there is a sequence of operations that result in rounding. Using Double or Float is also good if the absolute size of the number is not too huge. If the absolute size is too massive, you will lose precision at the small scale of the number. You should go over the operations that are conducted on your numbers and see where imprecisions might be introduced (diving by small numbers etc. ) to try to account for the discrepancy. |
|
#10
|
|||
|
|||
|
Hey again, sorry I've had too many beers, I missed the obvious answer.
Yes if you can make the resultant field a (15,2) decimal that should be completely lossless with SUM operation over a set of (x<<15,2) decimals. Floats should also be OK in that circumstance, but with Decimal you know it's exact. |
![]() |
| Viewing: Dev Shed Forums > Databases > Firebird SQL Development > What to expect when Pumping Data? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|