April 11th, 2006, 04:08 PM
MySQL to PostgreSQL, LARGE ROW COUNT
Hello, i work for an organization that uses a lot of MySQL for articles and the like.
right now we have over 700,000 rows in the article table. the previous architecht/programmer that I am in the process of replacing uses MySQL to eventualy point to YAML files to gather the rest of the information about the article, such as the pages and the images and OCR text.
When i came on i could not fathom why he didnt store the page data in the database as well, and he mentioned a massive performance issue when there are over 20 million rows in a table, as there would be if we logged the pages as well, so now there is a lot of programming logic to alter/adjust/read these YAML files for the sake of speed.
Personaly I am not horribly convinced at that. He made this decision 3 years ago when computer power and space was relatively expensive compared to today, and now we have the funding and the computer hardware to handle pretty much anything.
Would PostgreSQL be signifigantly faster than the MySQL situation now?
could i implement a page table as a proof of concept to demonstrate?
also, a lot of what does deal with the MySQL database is written, would moving over to PostgreSQL be as easy as changing the DBI calls from the perl config files we use?
I know this isnt very specific, and I will answer any questions that you have to better help me as well. I have been doing MySQL programming along side using Perl and PHP for 4 years now, and the thing i always liked about it is if i left the job, someone with a knowledge in any programming language could always start it back up again with the database structure left behind, a well normalized well organized structure that was clear and required no "tricks" programaticaly to get things done.
this isnt the case and it makes me feel ill that it is in place and the company almost relies on it, and myself, as PERL programmers are more and more rare, especialy any worth a damn.
Any help would be nice, I am really curious about PostgreSQL coming from only MySQL, besides, i want another DB to put on the resume
Thanks a lot, I know im long winded on this. Thanks
April 12th, 2006, 04:48 AM
The question has been asked many times on these forums and we all went against storage in the database.
Originally Posted by animusdk
April 12th, 2006, 04:34 PM
No we didn't. I have never been against this approach in principle It is simply a matter of application design requirements + system capabilities. I don't understand why so many programmers are offended at someone even considering storing data like this in a database.
Originally Posted by pabloj
The real question to ask is "why are you considering storing this in the database?" Yes, there are tradeoffs: better logical control over your data, but definite performance implications. The question to ask is whether the tradeoff is worth it to you.
Also, this is as much a question of database design. A properly-designed database will not see as many performance problems from this as a poorly-designed one. To start with, if you store documents in the database, you probably don't want this column to be added to your article index table. You should have an associated table with a foreign key to your article index table, and then a column with an appropriate data type to store your article document. This way, the database only accesses that data when needed to retrieve a specific document or group of documents, rather than every single time the article index tables are searched.
The other thing you want to make sure of is that you never do *searches* on this table. If you want full-text indexing, extract the text from the document, and index it into another associated table with tsearch2, which is now in the 'contrib' section of the PostgreSQL distribution. Now there should be no reason to do large aggregate queries on your documents table. Keep it down to simple saves and retrieves and the performance of the rest of your database should not be significantly impacted.
By default PostgreSQL char, varchar, and text columns can be up to 1 GB in size (no lame 8K limit, as was common in most DBMSs up until recently). Also, binary data can be encoded and stored in BYTEA columns of up to 1GB in size. Really, though I imagine your documents will be much smaller than this in size. If you are regularly storing 1 GB files in the database, then you will probably want to reconsider .
If it turns out that this documents table is growing large and impacting main database performance, PostgreSQL 8+ has that wonderful feature known as tablespaces, which allow you to choose the physical data storage location of any table. Thus, you can dedicate an additional disk or even RAID array to your documents table, separate from all other database storage.
As to how PostgreSQL would compare to MySQL in this situation: I imagine with the right design, either one could work OK, although PostgreSQL is much more proven on the heavy-database end of things. There are plenty of companies out there who have multi-Terabyte databases in PostgreSQL. See http://www.powerpostgresql.com/Downl...es_osc2005.pdf for a couple examples.