#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Location
    Jacksonville, FL
    Posts
    3
    Rep Power
    0

    HA using round robin -- working!


    I've been experimenting with multiple A records for both load-distributing AND high availability.

    Up until this point I was always told that round-robin is for load-distributing ONLY and should not be used for high availability failover. But in practice this is not proving to be true. I'm beginning to think that was just FUD.

    Do a lookup on roundrobintest8.strangled.net and roundrobintest9.strangled.net. Notice the A records:
    roundrobintest8.strangled.net. 3600 IN A 127.0.0.1
    roundrobintest8.strangled.net. 3600 IN A 63.95.68.129 # Real server
    roundrobintest9.strangled.net. 3600 IN A 10.69.96.69 # Bogus IP
    roundrobintest9.strangled.net. 3600 IN A 63.95.68.129 # Real server

    Now, disable anything running on localhost:443 and make sure you do *not* have a host at 10.69.96.69.

    Browse https://roundrobintest8.strangled.net/ and https://roundrobintest9.strangled.net/ You should never get a DNS error. It should always give you first an SSL warning (hostname mismatch) and login prompt. Oh it'll pause while it tries the bad IP but after about 5 seconds it flips to the real server.

    Now load up an SSL web server on localhost. I used Apache+mod_ssl on Linux and TinySSL on Windows. Set up an index page with links to several other pages.

    (Sorry to require SSL, it was the only web server I have control over that no one is using at the moment, so I can kill the web service any time I want... You could also load up an FTP or SSH server on localhost instead of SSL. My server has all three.)

    Flush your cache (e.g. ipconfig /flushdns) and reload the website. Sometimes you will get localhost, sometimes my server. That's the load-distributing action we all know and love.

    If you don't get localhost, keep flushing your cache until you get it. Then kill your server and click on a link in the web page that is still up on your screen. It will fail back to my server and generate a 404. That's high availability! Even though it generates an error, it's coming from my server nonetheless!
    -No- client I've tried (browser, FTP client, MySQL, SSH etc.) fails on the bad IP (10.69.96.69). It thinks for a few seconds and then tries the good IP.

    Nor does it fail when the IP is good, as in the case of localhost, but no service is listening on that port.

    I've tried this on:
    Windows 95
    Windows 98
    Windows 2000
    Windows XP
    Ubuntu 6.06
    Debian 3.1
    CentOS 3
    CentOS 4

    With these clients:
    Netscape 4.5 (Nice and old!!!)
    IE 5.5
    IE 6
    Firefox 1.0
    Firefox 1.5
    DOS FTP
    Linux FTP
    Linux NcFTP
    MySQL client
    OpenSSH client

    My idea is to set up a live server running web/mail/DNS/DB/FTP and a warm standby, such as:
    www.example.com. 3600 IN A 1.1.1.1
    www.example.com. 3600 IN A 2.2.2.2

    The warm standby is powered on but no services are started. Live is synchronized to warm standby. If the live fails I bring up the standby. Bing bang boom, the client automatically goes to the standby.

    It'll be just web/POP/SSH/FTP because DNS and SMTP already have built-in load-distributing and high availability capabilities. No database ports will be exposed to the outside world but if I do they should work.

    If this works, so cool! Replacement for expen$ive and complicated HA solutions :-)

    Was clued into this by Mr. Tenereillo:
    http://www.tenereillo.com/GSLBPageOfShame.htm

    What am I missing? Do I need to do more testing?

    Am I crazy? Or crazy like a fox? ;-)

    Someone check me on this because I'm not sure I'm testing it right...

    CD
    Last edited by SilentRage; July 16th, 2006 at 11:29 PM. Reason: Removed advertisement. No warning given.
  2. #2
  3. DNS/BIND Guru
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2003
    Location
    OH, USA
    Posts
    4,266
    Rep Power
    177
    Yeah, well written applications will try each of your IPs until it finds one that works. Badly written applications will just use the first IP it finds and chokes and dies if it fails. There is nothing magical about having failover with multiple IPs. It is the various applications that have to specifically support this type of failover. I wouldn't rely on this type of failover, but as you say, some major applications do support it.
    Send me a private message if you would like me to setup your DNS for you for a price of your choosing. This is the preferred method if your DNS needs to be fixed/setup fast and you don't have the time to bounce messages back and forth on a forum. Also, check out these links:

    Whois Direct | DNS Crawler | NS Trace | Compare Free DNS Hosts
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Location
    Jacksonville, FL
    Posts
    3
    Rep Power
    0
    Originally Posted by SilentRage
    Yeah, well written applications will try each of your IPs until it finds one that works. Badly written applications will just use the first IP it finds and chokes and dies if it fails. There is nothing magical about having failover with multiple IPs. It is the various applications that have to specifically support this type of failover. I wouldn't rely on this type of failover, but as you say, some major applications do support it.
    Well I've tried even old/simple programs such as DOS FTP and Netscape 4.5... are you certain it's done in the application and not the resolver in the stack?

    Peter Tenereillo said: "The use of multiple A records is not a trick of the trade, or a feature conceived by load balancing equipment vendors. The DNS protocol was designed with support for multiple A records for this very reason. Applications such as browsers and proxies and mail servers make use of that part of the DNS protocol." Are you 100% sure it's not part of the resolver/stack?


    Also, can you name any poorly-written applications that are expected to fail?
  6. #4
  7. DNS/BIND Guru
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2003
    Location
    OH, USA
    Posts
    4,266
    Rep Power
    177
    Any socket-level programmer can tell you with 100% certainty that it is an application level thing. The standard C library has a function call named gethostbyname. You pass it a string (i.e. a domain) that is to be resolved to one or more addresses. The function returns a pointer to a data structure called "hostent" also defined on the link above. This structure includes a list of addresses. For backwards compatibility the field "h_addr" has been defined to point to the first address in the list. Old applications and "badly" written applications will use the h_addr and completely ignore the fact that there are multiple addresses.

    But getting the addresses is just the first part. Let's see how connections are made programmatically.

    The connect function does exactly what its name implies. One of the arguments passed to the connect function is a pointer to a "sockaddr" structure. As you might be able to tell on the socket manpage, only a single address is stored in this sockaddr structure. So an application would have to call gethostbyname once, and loop on each IP address that was returned until a connect() succeeds on one of the addresses or until all of the addresses have been tried. The windows API shares these aspects of sockets.

    Now, it is possible that higher level languages will automatically do this for you. You call a connect() function that takes a domain argument and it will automatically try every IP that domains point to. But even then, this is NOT done by the system. This remains an application-level feature. Therefor, it is considered a RISK to assume that every application will properly fail over.

    Since the "first" IP in the list is mostly left up to sheer luck, I don't know of a way to find you a program that does not properly fail over authoritively in a short amount of time. I just wanted to more fully inform you of the whole multiple IP situation.
    Send me a private message if you would like me to setup your DNS for you for a price of your choosing. This is the preferred method if your DNS needs to be fixed/setup fast and you don't have the time to bounce messages back and forth on a forum. Also, check out these links:

    Whois Direct | DNS Crawler | NS Trace | Compare Free DNS Hosts
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2006
    Location
    Jacksonville, FL
    Posts
    3
    Rep Power
    0

    Talking


    Originally Posted by SilentRage
    Since the "first" IP in the list is mostly left up to sheer luck, I don't know of a way to find you a program that does not properly fail over authoritively in a short amount of time. I just wanted to more fully inform you of the whole multiple IP situation.
    That is *precisely* the answer I was looking for -- information about the low-level resolver. Thank you very much.


    I perhaps think that I ought to
    A.) set up an rsync or Squid copy at the second site (or NFS mount the first or PeerFS or any number of replication possibilities)
    B.) lower DNS TTLs so I can change the addresses in the event of a real disaster. I know that isn't perfect (caching name servers don't always honor my values) but it can certainly help
    C.) test as many programs as I can get my hands on, since now I know it is an application issue.

    I can tell my customers, "if a node dies (hopefully rare), only these (list) are guaranteed to always work." My company, my choice. If my customer doesn't like it he can move on. As I make more money I can consider investing in a less iffy solution, but this should be sufficient to get me started.

    Fortunately, the browsers I have tested cover 95%+ of the market. Other high-availability choices (redundant hardware) help the other 5%. It's the various middle-tier applications such as perl and Java scripts that run on servers and fetch data through web-based APIs I am most concerned about. I am not likely to test sheer number of back-end scripts myself. This is my personal FUD, but I am not discouraged. As I said with an rsync/Squid copy at the second site that effect should be minimal.


    At least I know where to look now.

    Thank you!

    CD
    Last edited by SilentRage; July 16th, 2006 at 11:23 PM. Reason: Removed advertisement. No warning given.

IMN logo majestic logo threadwatch logo seochat tools logo