August 17th, 2003, 10:29 PM
Huge MySQL -> PostgreSQL w/ data Conversion
I'm new to postgreSQL though I've been using MySQL for over 3 years. I've got a huge project which has lots of data that I want to migrate to postgreSQL. Problem is, I cannot find any tool that does the job, and not only the data s big, the scheme is also BIG
So, could someone provide me a tool to convert a dump from MySQL to PostgreSQL?
Thanks in Advance,
Hugo Jose Ferreira
August 18th, 2003, 07:59 AM
There are some on the net. Try google
August 18th, 2003, 08:20 AM
I've tried... But the result of all of them gives me invalid SQL when i try to import it...
August 18th, 2003, 11:35 AM
Unfortunately, migration from MySQL to PostgreSQL is not a painless operation at present. It is still worthwhile, though, for the benefits you get with PostgreSQL.
I am assuming you have tried "my2pg.pl" and "mysql2pgsql", which are located in the 'contrib' section of the downloaded PostgreSQL source. These are just simple Perl scripts which attempt to take a dump from MySQL, and use some text processing to make the table definitions and data fit PostgreSQL. Unfortunately, I believe these scripts are nowhere near being up-to-date (my2pg.pl is more recent than mysql2pgsql, though).
I personally have tried these scripts, and found that they fixed some problems, but not all, and for mission-critical migration, I wouldn't trust my data to a script anyway.
Really, at present the only way to do a true conversion is by hand, manually checking the column types and understanding the differences between how the two systems define things.
Some major differences:
1. MySQL's datetime columns allow storage of incorrect dates; for example, if one inserts an empty string or a 0 into a datetime column, MySQL accepts it, and enters '0000-00-00 00:00:00'. This is obviously incorrect, because there is no zero month of the year, nor is there a zero year of the month. Also, MySQL allows for all months to have 31 days, even February. So, many MySQL databases are full of these sorts of values, which will not be accepted by a PostgreSQL timestamp column.
Here is the other area of confusion. In MySQL 'timestamp' is really just an integer column holding a Unix timestamp, in seconds. PostgreSQL does not store a Unix timestamp, but uses the term 'timestamp' to refer to the standard 'YYYY-MM-DD HH:MM:SS' format for SQL dates.
So, if you are storing Unix timestamps in MySQL, you will have to store those in an INT column in PostgreSQL, while datetime values in MySQL will have to be "cleaned up" and stored in a PostgreSQL timestamp column.
2. PostgreSQL doesn't have TINYINT or MEDIUMINT. The three main integer types in PostgreSQL are SMALLINT or INT2 (-+32000), INT or INT4 (same as MySQL's INT, exept cannot be 'unsigned'), and BIGINT or INT8 which allows for extremely large integers, even on a 32-bit system. Since INT cannot be unsigned in PostgreSQL, you will have to use a CHECK constraint or some other method if you don't want a column to go negative. (or, since PostgreSQL supports user-defined types, you could just create an 'unsignedint' type of your own)
3. MySQL's CHAR type is not a true CHAR type, in that it doesn't pad spaces to the charlength. I can't recall if this causes a problem when importing to PostgreSQL, though.
4. Auto_increment works differently. PostgreSQL uses sequences, which allow more control and flexibility. Fortunately they provide an easy shorthand for creating an auto-incrementing primary key:
CREATE TABLE mytable(
5. PostgreSQL doesn't have ENUM or SET column types. There are plenty of ways to get the same functionality (and much more), but the simplest answer at the moment is to just do something like:
MySQL version -
PostgreSQL version -
pgcolumn VARCHAR(5) NOT NULL CHECK(mycolumn IN('yes','no','maybe'))
This uses a CHECK constraint to prevent the VARCHAR column from having anything except those 3 values.
MySQL version -
PostgreSQL version -
umm... the SET type in MySQL is an odd duck... not really a SET, not quite an array, but some sort of constrained list. Fortunately, PostgreSQL has a full Array type, which is much more powerful than SET:
pgcolumn VARCHAR CHECK( (pgcolumn = 'one') AND (pgcolumn = 'two') AND
(pgcolumn = 'three'))
But PostgreSQL doesn't really limit the size of an array, so it's not quite possible to get the same features here. Really, though, the problem of a SET type is much deeper than this, and involves clear thinking about database design. IMHO the best advice is to use true normalization and create a separate table to hold all these values. Put simply, SET is a hack.
6. There are quite a few differences in querying and data manipulation. Again, MySQL uses many non-standard approaches to syntax, and if you have used many of them, then you will have to convert queries in your application before migrating:
There are plenty of other small differences mainly in how other text-manipulation functions work, so you will have to review any query that uses these kinds of methods.
MySQL | PostgreSQL
REPLACE INTO tablename | DELETE FROM tablename;
| INSERT INTO tablename;
IF ... THEN... ELSE | CASE WHEN ... THEN ...
| ELSE ... END
CONCAT([value1], [value2], ...) | [value1] || [value2] || etc...
Anyway, to perform your PostgreSQL migration, you will have to do something like the following:
1. Spend some time in MySQL, replacing 0000-00-00 dates, etc... with either NULL or some sort of default value, check the upper bounds of your INT column values if you have defined INT UNSIGNED, to make sure they are not too high to fit into a signed INT in PostgreSQL. (you can always use BIGINT if you need to)
2. Check the column definitions for each table, looking for such things as TIMESTAMP, ENUM, SET, etc... and coming up with a replacement in PostgrSQL.
3. Check the queries in your application for any syntax that is not valid in PostgreSQL.
4. Test, test, test... don't expect to migrate live data without testing. In my experience, it usually took about 2 or 3 test runs, with at least substantial subset of the data, running on a test server, in order to make sure the conversion was correct.
I know this isn't what you wanted to hear, but its the only way to proceed if your data is important to you. In fact, given MySQL's incompatibility with many SQL standards, you would have the same problem moving to any DBMS. Fortunately, once you move to PostgreSQL, you should have far fewer problems moving to other systems in the future .
(In case you're wondering, yes, I am writing some articles on MySQL->PostgreSQL conversion. I will let you know when these become available)