January 22nd, 2003, 02:25 PM
How do you stay up, when you're down? (Disaster Recovery)
Here's the situation:
My company is an ASP, so people demand strict uptime from us. We manage our own servers at a hosting facility about 30 miles away. Two weeks ago we had a firewall go out, and unfortunately it took us several hours to get it replaced so we were down to the world for that entire time.
Most people go directly to our site through the main link: www.mycompany.com. Since the server hosting www.mycompany.com was behind the firewall, it was basically dead in the water to the world. All of our other servers were behind the firewall as well.
My question is, lets say we had another hosting facility and servers located elsewhere. If our main system goes down, so no traffic can get to "www.mycompany.com", how do you re-direct that traffic to go to "www-backup.mycompany.com"? where backup, at the very least has a page that says "we're upgrading / experiancing technical difficulties"?
Since it takes about 24 hours for DNS changes to propogate through the network, the idea of simply changing the DNS entry would not suffice.
How do others structure their network and hosting facilities so that if their main system goes down, the URL that the world uses automagically changes to their backup system?
Lets take this even to the next step. Lets assume my main datacenter blows up all-together. How can I tell the world "Sorry, www.mycompany.com is over here at this IP now"
I'm trying to figure out how to avoid any 'single points of failure'.
(I tried searching the forums for information on this first, but it is a vague subject that no particular keywords jumped out for)
January 22nd, 2003, 03:27 PM
imho best is: get a redundant network connection. if one connection fails, youŽll stay alive with half the bandwidth. no fuddling with DNS at all.
for eliminating SPOFs, read the "linux high-availability howto", they have solutions for anything. (and it is really fun to read and see what linux / the hardware is capable of. ever thought of stuff like connecting two SCSI host controllers from two different machines to one shared bus to make one machine takeover the otherŽs function immediately on a failure?)
January 22nd, 2003, 10:17 PM
Thanks for the Info. Here's a link I found of interest:
If / when I find more information I will post it back here