December 21st, 2012, 05:50 AM
First of all. I DO NOT WANT A FLAME WAR! Warriors, please go someplace else.
I once worked on a project where we used mysql (4.1.x). There were:
- one node, 2 CPU x86
- just one database
- 15 tables
- one table with more that 3mil. of rows and 4 foreign keys
- referenced tables had up to 1K rows.
To reuse the above project/database, but this time:
- just one database
- 15 tables
- one table, say table "A" with 65mil. rec. (up to 100mil.), 4 foreign keys
- referenced tables with up to 1K rows.
Two-node failover cluster, each node as follows:
- DB and cluster SW ONLY
- 4 core, at least 2.5GHz, x86_64
- 8 GB RAM min
- 1TB RAID1 DB space (or NAS, iSCSI + Fiber)
- 2x 1GBit eth.
- OS RHEL 6.3, DRBD 8.4, pacemaker, corosync
The DB cluster should be used as an archive for the main one with an average load and should be fed with approx. 10K rows a month. Sort of a blackbox.
Feeding will be performed by batch scripts via apache + mod_perl interface on a separate host.
Reading via the apache+mod_perl interace ...
Initial data fill will be a decent batchlike process.
Users will perform reading only. No updates
The main table will be as indicated below, but with approx. once as many columns without binary data fields. Just varchars, int or bigint and two sets. The digest field will be sha256/512 and MUST be unique. The stat field will be a foreign key ...
Thank you very much for your hints whether mysql would handle the new project or not AND WHY you think so. Eg., would it be wise to use table partitioning, small/large multiple tablespaces, logical data "division" etc. Or just to forget mysql at all for a completely different DB vendor (Postgresq, Oracle, Sybase, DB2) ...
| Field | Type | Null | Key | Default | Extra |
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| digest | varchar(128) | NO | MUL | NULL | |
| datum_updt | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| desc | varchar(256) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| surn | varchar(50) | YES | | NULL | |
| stat | smallint(5) unsigned | NO | MUL | 100 | |
| valid | set('0','1') | NO | MUL | 1 | |
Last edited by zby; December 21st, 2012 at 09:21 AM.
December 22nd, 2012, 06:19 PM
I don't have a doubt that MySQL can't handle the size of the database, especially since you have basically no writes (inserting 10k rows a month is just nothing).
But I would go with replication instead of the DRBD solution. I've run both and nowadays I consider the looser coupling between the servers to be a big advantage when for example performing backups where you don't lock down the server in the same way as if you have a single server or DRBD and you have more of a single point of failure with DRBD since if something for example gets messed up with the file system on one server this mess is getting replicated over to the other over the DRBD.
But the big advantage that I see is that with replication you can send user queries to both servers in a round robin manner to get higher throughput.
And as usual I would go with innodb_file_per_table to get it more flexible.
And since you aren't mentioning anything about what kind of queries the users are most inclined to ask it means that we can't answer if the added complexity of partitioning actually would improve anything so I would say no partitioning to keep it simple until we know more.
But the question is how many and how much load the user queries will place on the databases? Since you will have quite a large database, the way the user queries are written will have a huge impact on what kind of performance you are going to get out of it and if it will be acceptable performance or not. And if you are writing complex queries then sometimes one of the big (oracle, mssql, etc) can be faster since they can multi thread the execution of the query while MySQL can't. But if the query is simple and uses indexes properly then MySQL often beat the big ones in execution time.
So as usual it's hard to say anything for certain
Last edited by sr; December 22nd, 2012 at 06:26 PM.
December 23rd, 2012, 10:38 AM
Hi there and thank you!
1. The replication - do you mean the mysql master/slave scenario?
2. We know so far that there will be mostly reads/updates like:
The "digest" column MUST be unique AND will store the sha256/512 hash value of the binary file content.
select stat,count(*) from main_table group by 1
select * from main_table where stat = 100 limit 1000;
select * from main_table where stat = 100 and id > XY limit 1000;
select id where stat = 100 and id > XY;
select count(*) from main_table where digest='ABCDEF';
update main_table set stat=X where id=Y and (stat=Z and col5 is null);
3. The frequency of "user" reads should not exceed one per 10-15 sec.
4. In case of the "single innodb file per table" solution you still have a few "tiny" TBSP (tablespace) files for tables plus one huge for the main data. What is the advantage of it?
5. The partitioned table produces "partition" files. How about their backup or restoration compared to one huge TBSP?
6. The server will be an archive for the main DB engine. No deletes, just inserts and the "now and then" updates.
I think I might have made a mistake about 10k rows a month. There might be even 100k. So about 1.2mill/year. Let's count even this scenario and I'll doublecheck...
I have performed a light test on my laptop which is:
Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
500GB 7200rpms HDD
Gnome3 session, mysqld (nothing else during queries)
Please, see the attachment