Firstly, I do apologize to everyone for offensive manner in which my previous post was written. I was just voicing a frustrated opinion. And I do acknowledge the great work done by the opensource community.
My comments where posted at the minority group of MS haters in the opensource community, not at the opensource community in general.
I probably should have posted these comments earlier.
Thanks for the link, I will have a look at it.
I admit that I enjoy using linux from time to time and all that the linux/*nix community has to offer, especially seeing as how they are doing it all for free to benefit everyone and not just themselves. I also admit that even though linux is an exceptional O/S (yes I have used it for more than five minutes on my own box and at my previous employers), it still isn't a cost effective or efficient O/S for web designers like myself (who prefer more productive programs like Dreamweaver or Homesite and Fireworks or Paintshop Pro) to switch completely over to or to have a permanent partition/hdd allocated to it.
I also find it frustrating, especially when being asked to do a project that would suite using PG or where the client wants me to use it and I am unable to find a "port" of the software that isn't going to take up most of my day in setting up or finding out how.
Unfortunately I didn't specify why the linux/*nix community is so frustrating. It's frustrating because of the whole "MS is evil" thing. MS isn't going to change it's tactics, it is still the most popular O/S on the internet and as frustrating as it must be for the MS haters in the linux/*nix community, it definetly doesn't look like changing. SO BUILD A BRIDGE AND GET OVER IT.
My case in point, and like you said (and re-worded) "It's not their desire to do what you want". MS may be crippled in some ways, but it is still more widely used than linux for Web Dev in firms and quite a few opensouce developers do actually code on it or for it.
Not everyone has the money to buy a seperate machine just for linux or has the luxury of a spare box just lying around to stick linux on and for that matter, just to use one dbserver on. Especially if your unemployed.
I was not commenting on Windows 2k and SQL Server, so I have no idea why you mentioned it. And I don't use them either.
Who on earth needs to call MS for support? stuff that for a joke, I certainly haven't needed to, nor will I ever need to.
Thanks for the heads up, but it would be nice if they actually posted that news on the Postgres site in a nice and easy place to find, without the need to dig for it (like on the download page).
Last edited by deepspring; July 4th, 2002 at 07:25 PM.
- "Netscape 4 users are like lemmings... You can't help but laugh when one falls off a cliff"
Whew... just stepped in from seeing several completely awesome 4th of July fireworks displays from my back yard. Other than that, it's a working holiday for me .
Anyway, let's get right to it:
I understand how people get emotional over these issues, and I know some members of the Linux community haven't always been polite, and some DO have a vendetta against Microsoft
I personally am not a member of the "Linux community", preferring FreeBSD, which actually has a more amicable relationship with Microsoft. I have no sentiments, other than some well-deserved skepticism about the many overblown claims coming out of Redmond.
So fine, that still means they are going to do what they want, and there's nothing wrong with that. They don't care about the relative popularity of use for this or that platform. They care about doing what they consider their best work.
C',mon. You can get a Pentium 100 for free in most developed countries. My father-in-law has a business in Jamaica, and they are even throwing the things away. Just ask around. At the most you will spend a few bucks.
I was bringing it up as a parallel. Postgres was written FOR Unix. How many Unix lovers do you see demanding that Microsoft release a port of SQL server or Access for Unix? People seem to expect the open source community to have infinite resources, but they very rarely demand similar things of the company with the biggest resources of all.
I have never needed to, because I know it will do no good. But, there are many issues when you start doing more complex things with MSoft products, and the answer is usually some crazy workaround, or obscure configuration issues.
How about subscribing to the mail lists?
The Postgres community prefers not to make a big deal about things that are not yet released (vaporware), unlike many software companies who presell like crazy. When it is nearing completion, you will hear about it.
Hey, honestly, I have nothing against anyone in any range of computer use. I just don't see why people seem to have these kinds of attitudes about open source products. And it is not a Windows/Linux thing. I see plenty of Linux zealots who complain and demand things too. Asking is fine (and encouraged); demanding is rude, unless you are a paying customer.
July 12th, 2002, 11:12 AM
I started doding with PHP and MySQL and it really is a killer combination - to begin with.
It's only when you begin to realise that you are actually spending so much time coding error checking and failsafes into your scripts that the advantages of transactions shine so brightly. Then when you have a look around and realise just how easy some stored procedures would make life, the move to something like postgres seems inevitable.
I agree that it is a lot more complex, but that is the price to pay for the power you gain from using it.
Case in point: I was trying to find an equivalent column type to MySQLs ENUM or SET column and was surprised that postgres didn't have one. Then I read up a bit and realised that you use a constraint to another table to achieve the same thing. It seemed backwards at first but then i realised - have you ever tried changing the value of an option in a SET column in MySQL? - Pain in the ***. Now try it with postgres - the way it should be, and complete referential integrity too :-)
anyway, needless to say, i'm converted - my vote goes with PostgreSQL.
July 12th, 2002, 05:21 PM
Yes, that is usually the preferred way to handle multiple value constraints in a column, because it allows flexibility.
Also, though, it is very easy to duplicate the ENUM type, just by using the very useful CHECK constraint (another one of those things found in a "real" DBMS):
CREATE TABLE mytable(
myvalue VARCHAR(5) NOT NULL CHECK(myvalue IN('yes','no','maybe'))
This will not allow any values except "yes", "no", or "maybe" in that field.
July 17th, 2002, 01:11 PM
July 31st, 2002, 01:19 AM
Well, I'm about to step into the fire here. One last post before I got to bed I guess.
They are both great! Postgres is a monster that started off in one direction while MySQL started off in another. The initial aim of MySQL was speed and they state as much. Of course, that meant that all the other stuff that makes a db a real db were not included from the gate, but only know are being offered through the use of different table types. InnoDB and BerkleyDB.
A sad aside. Most PHP/MySQL people are prolly not even aware that there is a difference between MySQL and MySQL-MAX.
However, there is one place that Postgress dropped the ball big time and in my opinion is THE MOST NEEDED of any feature, bloat, whatever you want to call it. That's replication! Without it, it's not a true db either! How can all those other things that make up a true db be touted so strongly and replication left out?
It can be done with Postgres, but throught the use of third party tools like DB Balancer that succeds in adding another layer through wich all traffic to the db must pass. No thanx!
Now if it wasn't for the fact that we needed a highly available system beyond anything else, and we were not about to pay Oracle the stupid amount of money they ask, MySQL fit the bill nicely. Our software (not written by me) isn't using transactions, foriegn key constraints, or stored procedures, so we don't notice the lack.
That said, we have two write masters and a bevy of repiicated read machines. There was no easy way to do that with Postgres!
I love Postgress, but it's going to lag (at the enterprise level) unitl it gets a right proper replication system developed. EVERYTHING ELSE should be put on hold until then!
July 31st, 2002, 01:40 PM
Heehee... OK, I'll flame ;-).
Replication is meaningless if your replicated data is bogus. PostgreSQL has done things in exactly the right order: take care of data constraint robustness first. Replication is not essential for a true DBMS. Consistency of data IS!!!.
One wonders (personally, the mind reels) at the thought that you seem to have an application important enough to replicate between "two masters and a bevy of bevy of repiicated read machines", yet there are no foreign key constraints or transactions. I hope there is no money involved in this system.
Foreign key constraints are only the beginning of data constraint issues. You should be able to put any arbitrary constraint onto any field in your database, for example to make sure that a certain field never has a value outside of a certain range, etc... In MySQL you have no such mechanism, so you must trust your application code. And check out MySQL's Date type constraints. You are allowed to have a 0 month, and a 0 day. Any month can have a value up to 31. That means in MySQL there are actually 32 days per month, and 13 months per year. This is bogus. What a headache I had trying to port one MySQL app to PostgreSQL.
And MySQL's famed perfomance is a red herring also. Once you start trying to implement real data constraints, transactions and foreign keys in InnoDB tables, watch your memory requirements skyrocket. Plus, you will need so much extra application code to handle the missing constraints, that your server will end up with even more of a performance hit.
Yes, I want replication for PostgreSQL, but I want it done right. Pushing aside the more serious data management issues to focus on replication might have been a great marketing move a couple of years ago, but the PostgreSQL developers are not businessmen, they are developers who really believe in what they are doing.
July 31st, 2002, 02:44 PM
LOL...! I knew you were going to come on strong. You guys allways make a flamewar a blast!
Anyways, you haven't convinced me. I hear what you're saying about data integrity, but having just one box with no replication seems akin to suicide!
So... until Postgres get's it in gear, I'll stay on the Pg replication mailing list as well as the mysql mailing list and see how things come along. If Pg gets even the most basic replication (like the MySQL 3.23.xx stuff) I'll prolly switch. But until then, Postgres comes second place. And second place is the first loser!
Not to get out the marshmellows......
July 31st, 2002, 05:14 PM
Big whoop on mysql's speed and replication.
I could get better performance with equivalent data integrity from reiserfs + rsync... In fact, mysql isn't a database at all, but an alpha of a new crossplatform filesystem! Gee, being spuriously inflammatory is fun!
But really... I moved from mysql to postgres because I was willing to sacrifice speed for a database that doesn't throw acid out the window. Row level locking + transactions + foreign keys = one happy dba.
Anyway, replication is urgent on the postgres todo list.
July 31st, 2002, 08:28 PM
I’m going to agree with Ryan here. Without the concept of transactions the current MySQL replication implementation is a disaster waiting to happen.
Time for my treatise on Why MySQL is Not Suitable for Enterprise Use (from a Slashdot post of mine )
Replication in MySQL is a joke for ‘mission critical’ use. As I understand it, the binary log records SQL modification statements which are executed on a master, not the data which was changed. This is involves significant assumptions beforehand, such that the master and slave(s) must be 100% identical. If I perform an UPDATE on the master, the changes are not replicated, but the query. This is what I would call the ‘easy way out’. Who knows what happens to the query once it is replicated out – what if it hangs halfway through? I can’t roll back and be in a consistent state, I have half-completed changes which makes my database inconsistent and now I’m forced to dump-and-load. Keep track of which rows are modified, to what from what, and ensure that those transactions are replicated to my slaves. Anything less is simply useless for high availability.
I would also be willing to bet that a significant number of installations that have transitioned to MySQL replication are doing so due to table-lock induced latency. A suitable system with a capable RDBMS could probably handle all of the load given to it and not need ‘many slaves’ to handle the extra traffic. They would have a single failover for high availability and that’s it.
Filesystem buffered writes. Transactional support is great – it allows me to roll-back aborted transactions. However, due to the inability to control whether or not my tables are write-buffered means that MySQL may *think* it has performed a write even though it is still in the write-cache. I can then turn off the system and voila -- corruption! Part of the fault lies in the OS who tells MySQL it was written even though it is in the cache, but I have a simple solution. Devise a way to selectively turn off buffered writes for certain tables / databases. This way if I know I have a critical table which has a lot of writes I can turn buffering off and be ASSURED that writes will be performed when asked. I suspect a lot of 1040 and other table corruptions are caused by something like this. Yes, performance will take a hit but I think it is a very acceptable trade-off for data corruption. Obviously all system tables should NOT be buffered.
Inability to use more than one index on a table in a query -- most enterprise RDBMS' can use more than one index on a table for a query. This can easily save a table scan or the use of a single, less-efficient index. Given an example query – ‘SELECT bob FROM sometable WHERE somecol = 45 and somecol2 < 44’ – if I have two indexes on sometable (somecol and somecol2) it can join the two indexes together and use the ‘virtual’ index to find rows.
Clustered indexes. These basically physically sort the table based on particular columns. This allows you to ORDER BY username ASC without using anything special since the rows are already sorted on username (if you have a users table and cluster the username col). This also greatly speeds up BETWEEN clauses. And yes, to people who know a little bit of SQL but don’t know as much about clustered indexes -- you can create an index with a bobcol ASC but clustering the actual data is faster and more efficient if you are grabbing data which is not on the index. For example, SELECT * FROM table ORDER BY username ASC will not be as efficient as the same query clustered on the username. If you had a sorted index on username it will probably read the index sequentially and then visit the table. That extra operation = more disk seeks = more time / cpu to execute (and it really adds up as the table size increases). However, if you are doing something like ‘select username, password from user order by username’ it would be better to create a sorted index on username ASC, password. That way it will read the index only and not visit the table at all.
On-line backups. In today's internet world your site has to be 24/7. This means you cannot have significant performance problems (or even offline-ing your dB!) when you make a dump -- Sybase, etc. have done this from as far back as I can remember. Postgres can do this with an add-on which is well worth the money. As far as I know MySQL can only do this with InnoDB tables (since it has a MVC log to use in the meantime).
Backups to something other than CSV files. MS SQL, Sybase, Oracle, they all dump to a compressed binary file. Saves a TON of space and is MUCH FASTER to dump and load. I can dump a 12GB Sybase DB in under 20 minutes. Loading it all (from scratch) and then bringing the DB online is about the same amount of time. MySQL stupidly logs the CREATE TABLE / INSERT statements. What does this mean? That I have to wait for 4 million INSERTs to be performed when loading my table, and FURTHER I have to wait for the INDEXES to be re-created on the new data. Dump the indexes, too! (Remember that full-text indexing is just another index, so if you use that and have to load from a dump be in store for SIGNIFICANT downtime).
Ability to specify the number of files to dump to. What happens if you have a dump which is larger than 2GB? Some linux distros cannot handle a single file of 2GB or more without recompiling the kernel. Give users a way to, within the dump statement, split the dump over two files. Not only will that help avoid the 2GB limit, but it can speed up dump/loads since I can dump to a bunch of different disks to improve throughput. Sybase has the 'STRIPE ON' clause (originally to dump to two tape drives at once but works fine on filesystem files as well) to split the dump equally over an unlimited number of files. This also impacts the fact that MySQL tables and indexes are stored in filesystem files that are also subject to a 2GB limit.
Cleaner way to view query plans of statements. EXPLAIN … is great and all, but the resulting table is a PAIN to read. What I want to know is simple:
Query is using XYZ, ABC tables. Table XYZ is using index 123 which is sorted so I do not need to create a temp table to sort ASC.
Since you have all the columns in your select statement in the index I do not have to visit the actual table – I can pull it all from the index. Because of this, I will read the index from start to finish.
ABC is using index 23dsf which is not sorted so I must create a temp table to sort that. Also, since it is a join, I do not need to perform an index scan but a positioned search (table scan is to a WHERE clause with no index AS index scan is to an index which is not selective enough or needs to read all columns.)
Simple, easy and pretty much even a NOVICE can see that their query is a good performer or a bad performer.
Along with more in-depth EXPLAIN, also provide me with a way to see what the optimizer is doing with the query. In MS SQL and Sybase you have ‘trace flags’ which you can turn on before your query to see EXACTLY what Sybase is doing – why does it think this index is better than this other one, why is it table scanning when you think it should index sort, etc. Give me an easy way to say ‘verbose on; explain xxx;’.
Ability to delve deeply into performance of the system. If there is one job a DBA must know it’s how to tell what the heck is going on when something is slow. Currently MySQL gives you meaningless info like ‘slow queries’. Great, I see 200,000 of them. What queries are they? What good is it in a large application which may contain 3000 lines of SQL to tell me the raw number of queries which are slow? I want to know the EXACT SQL of the query(s) which are slow and I want to find the one taking up the most CPU time and blocking all the rest. I want to know how MySQL is managing it’s data cache so I can see if I need more ram (e.g. it is swapping lots of data to/from the cache) or if I am I/O bound. Don’t tell me to look at ‘free’ or ‘top’ – half the time it is wrong because you (MySQL) tell it misleading figures. I want *you* to tell me exactly what you are doing since you would know best! If you’ve ever seen a sp_sysmon output from Sybase ASE you’d know what I’m talking about.
MySQL’s query optimizer is PISS POOR. If I see another changelog entry like this I’m going to scream:
Optimized queries of type: SELECT DISTINCT * from table_name ORDER by key_part1 LIMIT #
So does that mean these queries were NOT AT ALL optimized before? It doesn't read 'FURTHER optimized'.
“ORDER BY ... DESC can now use keys.”
Does that mean it was table scanning each time? Jebus! Hands down the query optimizer is one of the most important things in the database -- knowing how to use the database statistics and knowing when to use a merge-join vs. a hash-join etc. are CRITIAL to database performance.
Of course, the usuals: integrated row (or in the least page) locking, full support of subqueries, stored procedures, views, triggers, referential integrity, transactions, etc. etc. etc.'
PostGRES and virtually 100% of 'for pay' RDBMs have this. There simply is no reason to use MySQL for anything sufficiently non-trivial.
Last edited by MattR; July 31st, 2002 at 08:32 PM.
August 4th, 2002, 05:38 PM
Here's something for reference:
Oracle single processor license without clustering: $15,000 (9i Standard Edition)
Oracle single processor license WITH clustering: $40,000 (9i Enterprise Edition)
So if you're running two quad machines with clustering (replication), that costs about $320,000.
Most people that run Oracle run it without the cluster.
August 4th, 2002, 09:54 PM
And I'll be willing to bet that $320,000 would be more than enough of a donation to complete the Postgres replication project . Then you get free upgrades for life, as well as great press for being such a supporter of open source, etc...
(P.S. and visit http://pgsql.com/support for a Platinum support plan at $19995.00/year, and you can bet that you will get top priority anytime you want, with unlimited incidents. Sounds like much better TCO than Oracle anyday)
I only wish I was in a position to recommend something like this to the right company . Hopefully soon...
Last edited by rycamor; August 4th, 2002 at 10:06 PM.
August 5th, 2002, 04:49 AM
Hey good point! $320,000 would finish that project I guarantee it. *ahem* Red Hat, *cough* *cough* IBM, cough up the money!!! You could get 3 of those developers to work exclusively on the project for a year and it would be ready.
I think the days of you being able to recommend that support contract are coming rycamor! It's hard to say when, but the feeling I get from Tom Lane and Bruce Momjain in their letters is that they feel Postgres is more or less a complete and enterprise ready product.
From here on out, at least from how I perceive them talking, they now seem to be pooling their resources towards adding in the "cool" features (incomplete recovery should be out in 7.3) that only Oracle has. First among those features is obviously the replication feature.
Even though it's been around for a long time, I consider Postgres a "young" product compared to something like Linux or FreeBSD. I say this because, even though it's a mature and solid piece of software, it's not well known in the software community. But it will succeed because Tom and Bruce and the user community seem to be not only passionate about the software, but also very very sophisticated and savvy.
A backing by someone like IBM would take it to new heights, though I can understand them not wanting to undercut their own sales of DB2 and also people still using Informix. And now Red Hat is in bed with Oracle, (though Red Hat is also in bed with IBM) so who does that leave as a potential suitor? And why is everybody sleeping around, you can get sick doing that!
Speaking of Informix, Postgres reminds me alot of them in terms of style, I LOVED Informix, it was faster than Oracle and also alot easier to use. From a technical standpoint, I thought they were the best out there, but in terms of marketing, Microsoft and Oracle killed them. (Oracle literally killed them by stealing a core group of their programmers in what some would say was an illegal move). It's too bad it looks like they are probably being phased out. Their tech support was top of the line too. I once had this wizard diagnose some code that was bombing out on a DEC Alpha because the processor couldn't handle a machine word that only existed on Intel processors. Talk about low level diagnosis. Sure as hell beats, "uh, apply Service Pack 3."
Last edited by Ted Striker; August 5th, 2002 at 04:56 AM.
August 5th, 2002, 09:42 AM
The problem I see with most support contracts for, say PostGRES and MySQL is that their 24/7 is often more expensive than enterprise RDBMS software.
At least for Sybase ASE, we have a two-cpu Intel (linux) perpetual license (we can use ASE12.5 forever ) for around $6K/CPU. 24/7/365 phone and email support (along with free software upgrades!) was $2K. Each year, though, we only have to renew the $2K fee for 24/7 (and upgrades). 8-to-5 support/upgrades is much cheaper.
So I think open source is only really affordable if you have a monster system you're running it on (say a 8 way Sun box would be something like (making numbers up) $200,000US for ASE) since you could avoid the licensing fees and then pay only the $20K/year for 24/7 support. However if you're on the, say Intel, platform and you require 24/7 support I think closed source might almost always be more affordable.
August 5th, 2002, 11:42 AM
Go ahead and surf http://www.pgsql.com/support for a moment. I was simply quoting the highest-possible 24/7 support price for PostgreSQL. This includes 3 contact people, unlimited events, etc... You can quite easily get by with a much lower support contract, and still have 24/7 support (1/10th the price, even). Also, the support scales to any number of processors on any number of machines.
Oh, and Ted, when I said I wish I was in a position to recommend full PostgreSQL support to a company, it wasn't for any lack in PostgreSQL, but because I am not currently working on a project large enough to warrant this. I use PostgreSQL currently on all my projects, though, and consider http://archives.postgresql.org as the best support system I have seen for any piece of software.
Yes, RedHat seems like a natural candidate to provide the backing for the Postgres replication, since the official RedHat commercial database happens to be just a re-packaged PostgreSQL. I'm hoping they are at least throwing a bit of money that way.