#1
  1. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,661
    Rep Power
    4123

    MySQL Replication and Load Balancing


    I'm in the process of building a web app.

    The service providers I intend to use are now offering really easy to set up load balancers.

    Perhaps not every client of mine will want load balancing, but for those who do I'd like to offer this.

    I was thinking of putting the MySQL part on it's own server, then load balancing connections between just the application servers. Each app server would be identical and read from / write to the mysql server.

    But having just one db server is another single point of failure, so I was thinking of then load balancing between 2 or more database servers with the intention that each database server be identical.

    reading about mysql replication it takes a master-slave approach so one has to write to just one database, then read from any

    once again, a single point of failure.

    then I read about MySQL cluster, but found this in the manual pages

    Originally Posted by http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster.html
    you are very unlikely to be able to switch an existing application datastore—that currently uses, for example, MyISAM or InnoDB—to use the NDB storage engine without allowing for the possibility of changes in schemas, queries, and applications.
    humm,

    I know nothing about replication (or mysql cluster) and would love so find out more: docs, books, links, best practices, nomenclature, etc

    can anyone point me in the right direction?

    Many thanks
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  2. #2
  3. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Do you think any of your clients will _need_ load balancing? (this is a serious question)

    And is the reason:
    1. need it due to high load
    or
    2. want the redundancy to reduce downtime

    If the reason is the first how much would it cost you to buy faster hardware compared with the amount of work you will have to put down to create and set up a scale out environment and keep it running?
    Because you can come a huge amount of way without scaling out. Buying one of the fastest x86 servers doesn't cost much compared to the work hours needed for such a solution.

    My reason for writing this is that after 15 years in the problem solving part of this business I have grown very cautious against basically all cluster/HA solutions and as a technician I have a really hard time to justify the cost.

    Since I have had so many single server setups running without problems for so long compared to with how many times I have had to spend a lot of time to sort out odd race condition/split brain/lost sync problems with cluster solutions.

    The reason is that the more elaborate setup you have with more and more hosts to try to avoid a SPOF, you get a more and more complex environment with more things that can go wrong. And since in my experience most problems does not arise from hardware failure but from human errors a more complex environment causes a greater risk for errors.
    Since it's seldom the same person that built the system that's also operates it on a daily basis the risk for human errors increases with a more complex environment where it is harder to understand the consequences of changes performed.

    So anytime when someone talks about cluster or scaling out I say start by thinking if you really need it and are willing to pay the price.

    That said (just to ensure you that I'm not a whining old fart ) there are cases when you need scaling out and if you for example take load balancing Application servers for the web then this is seldom a problem since every request is basically atomic so it makes it very easy to scale out and create a stable solution.

    But since databases needs to store a state (the entire purpose of their existence ) it makes them very poor candidates for scaling out. Since all writes to a database requires a lock in some form or another and at some time or another, scaling out only means that there are more entities that need to be in sync (just consider how much time it takes to agree with a girlfriend about the weekend schedule) while if she unanimously decides it and just tells you the schedule afterwards (master->slave ).

    My point is don't underestimate the simplicity of master->slave replication since it can save you a lot of work.

    So my suggestion is to keep it simple:
    1. Write to just one server
    2. If you need the redundancy choose a master->slave replication.
    3. If you have a slave run backups on the slave (which ensures no locks or high load on the master during the backup execution).
    4. If you want your application to support load sharing of selects between servers in a master->slave setup you add a database query function in your application something like mysqlQueryLazySelect().
    That way you can control which queries you want to distribute between the servers and which queries that are still run against the master. Since you can sometimes need to be sure that you query the master to avoid race conditions. And all writes are of course always sent to the master.


    Now if you still want a clustering solution after all that.
    One I think is very interesting for the MySQL future is the Percona XtraDB Cluster, although the software is very fresh and the documentation is still lacking if you run into trouble I think it shows a lot of promise in it's simplicity.

    Sorry for a far longer post than I intended and the best of luck!
    /Stefan
  4. #3
  5. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,661
    Rep Power
    4123
    Your post may be longer than you thought i would expect, but it is helpful because I don't know much about mysql replication and clustering

    We'll be using rackspace cloud, which means the biggest server I can get has 30Gb available RAM and a 1.2TB hard drive. This represents a dedicated box, all other variations of RS cloud servers are slices of one of these big boxes (from 256Mb doubling up each time)

    Such a server would cost us ~£950/month just to run - ie not taking bandwidth into consideration (~£120/TB/month) - or ~£15/month for the little 256Mb slice

    So, its really redundancy that I'd be looking for.

    Since writing the OP I considered having two load balancers (or one if I can balance based on ports) so that web requests would balance between app servers; each app server would write to just one (master) mysql server which would then feed down to a number of load balanced slaves - BUT I DON'T KNOW HOW LONG IT WOULD TAKE TO REPLICATE

    This is a web app in which I anticipate a far greater number of reads than writes.
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  6. #4
  7. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Originally Posted by Northie
    BUT I DON'T KNOW HOW LONG IT WOULD TAKE TO REPLICATE
    Unless the master or slave is under heavy load the replication itself usually takes a couple of milliseconds.

    Basically the latency in the network and the execution time of the query.

    For a simple test on the same box look here.

    The replication time is usually never a problem unless you are running something that need concurrency which should be in a transaction and only executed against the master anyway.

    And some more useful information here and here .


    And out of an monitoring perspective I recommend the pt-heartbeat.
    So that you are sure that the slaves are all in sync since the default status information from within MySQL is not reliable.

    Good luck!
    /Stefan

IMN logo majestic logo threadwatch logo seochat tools logo