#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0

    502 bad gateway - how to debug


    now I thought i posted this last night (UK time) but cant find the post - it should have been moved to another forum if it wsnt applicble here but to just remove it without letting me know - but perhaps i didnt press send either way i apologise.

    the problem is that every so often our server (powering the apps) goes down for few minutes. and i am not sure how to debug this? I have looked at all the logs at /var/log/nginx but cant see anything to say why it went out -

    the most i see is a connection timed out in some of the logs..

    my server is also behind a load balancer.. not sure if that makes a difference?

    finally i am running nginx but there is no forum for that or a general server forum..
  2. #2
  3. Confusing Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    15,918
    Rep Power
    9570
    Well, you can't do much looking at the server that's doing the proxying/load balancing. You have to look at what's behind it.

    What is that server? Does it go down on a regular basis? Are you saying you looked at its logs or those of the proxy? How about other system logs besides the ones for the web server?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0
    i literally looked at every log i could in the var/log.. nginx, mysql, server logs etc etc.. but i couldnt see why the server would go down? ie some catastrophic bug in the code or server - which i hadnt touched when (or immediately before) the server went down

    server is Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-65-generic x86_64)

    btw the only one error that looked useful to me was something about

    WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

    but that is happening all the time - so if that would cause server to go down then it would be down all the time
  6. #4
  7. Confusing Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    15,918
    Rep Power
    9570
    Can you tell if there's an absence of any logging during the downtime? That would at least help narrow down whether it's the system or just the web server. You could set up an noop cronjob that runs every minute (a mere ";" as the command might work) then after the outage check the cron logs to see if it executed during the window.

    Do you have any performance monitoring of the server? This would be a good reason to set it up.

    And are you able to wait until the outage and do something while it's still going on? Like, keep an SSH window open and go about your day waiting until it happens. Would mean you'd have to know about the outage as soon as it starts.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0
    theres definitely absence of logging - now i cant remember if it was on all logs or just some but i did notice it.. i will add more details the next time server goes down -am also adding another server + load balancer to see if issue occurs in both or just one.

    Thanks
  10. #6
  11. Confusing Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    15,918
    Rep Power
    9570
    Is it virtualized? How? Could that be to blame? For instance, a failover cluster on Windows Server will suspend an instance as it migrates across hosts.

    Comments on this post

    • mjayt agrees
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0
    Hi - it happened again this time though i figured it out that it was the loadbalancer that was playing up - so i did the usual looking at all logs syslogs, error, access and didnt see anything

    one thing that concerned me was the time it went down one of the log

    Code:
    Mar  7 23:01:37 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from 91.189.89.199:123 (ntp.ubuntu.com).
    Mar  7 23:01:48 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from 91.189.89.198:123 (ntp.ubuntu.com).
    Mar  7 23:01:58 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from 91.189.91.157:123 (ntp.ubuntu.com).
    Mar  7 23:02:08 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from 91.189.94.4:123 (ntp.ubuntu.com).
    Mar  7 23:02:18 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from [2001:67c:1560:8003::c8]:123 (ntp.ubuntu.com).
    Mar  7 23:02:29 domain.com-loadbalancer systemd-timesyncd[909]: Timed out waiting for reply from [2001:67c:1560:8003::c7]:123 (ntp.ubuntu.com).
    and after this no more logs until I restarted the load balancer..
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0
    actually there is a bit more to the previous log the last few lines are:

    Code:
    Mar  7 23:17:01 domain-com-loadbalancer CRON[27520]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
    Mar  7 06:25:02 domain-com-loadbalancer rsyslogd: message repeated 8 times: [ [origin software="rsyslogd" swVersion="8.16.0" x-pid="1129" x-info="http://www.rsyslog.com"] rsyslogd was HUPed]
    notice the time is wrong - it should not be 6:25.. it should be 23:18 or something like that
    [/code]
  16. #9
  17. Confusing Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    15,918
    Rep Power
    9570
    - I doubt all those ntp.ubuntu.com IPs were actually inaccessible. Sounds like more networking problems.
    - Without the NTP sync the clock shouldn't go crazy - at least not while the OS is still running. Again, is this virtualized?
    - rsyslogd being HUPed right when the hourly cron runs could be because of logrotate, so that's not necessarily a problem. But 8 times is odd. This and the time issue could be because the log messages were combined (like to save space) and 06:25 was the time of the most recent message.

    The NTP thing looks like another symptom of networking problems on that load balancer. The other issues could be explained normally - try temporarily turning off rsyslogd's RepeatedMsgReduction setting to stop combining messages.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2016
    Posts
    19
    Rep Power
    0
    No the 6:25 time was at around 23:30~ when i was looking at the logs so there was no way that should have said 06:25.. and looking at the logs the timings seem to be in order ie the earlier time at the top and the latest time at the bottom

    virtualised? its a digital ocean droplet - i am not 100% sure but its a VPS so I am assuming its virtualised.

    PS> which logs should I really be looking at when server goes down, there are so many and because I dont know where I should be looking its hard and so i just check ALL of them.. i am guessing error.log ones instead of syslogs / access logs?
  20. #11
  21. Confusing Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    15,918
    Rep Power
    9570
    Do you have support with DigitalOcean? They could help.

    Unfortunately I don't really know of any logs that would help with large server-scale issues like this. Mostly it's just the kernel and dmesg logs, but they aren't always helpful - just messages like disks that need fscking.

IMN logo majestic logo threadwatch logo seochat tools logo