I haven’t been able to pin down a pattern, but it seems that every night or every few nights our metabase ELB instance goes down. The only consistent log message we get is:
2019/11/13 14:32:23 [error] 27526#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
2019/11/13 14:32:24 [error] 27526#0: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
2019/11/18 14:46:47 [warn] 20205#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:46:47 [warn] 20221#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:46:47 [warn] 20231#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:46:56 [warn] 20583#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:46:56 [warn] 20593#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:47:00 [error] 20597#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
2019/11/18 14:47:10 [warn] 21012#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/18 14:47:10 [warn] 21028#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/19 17:35:40 [error] 21030#0: *17549 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
2019/11/19 17:35:43 [warn] 17038#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/19 17:35:43 [warn] 17048#0: duplicate MIME type "text/html" in /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy.conf:11
2019/11/19 17:35:55 [error] 17051#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
2019/11/19 17:35:58 [error] 17051#0: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.101.80, server: , request: "GET /api/health HTTP/1.1", upstream: "http://10.0.101.80:3000/api/health", host: "10.0.101.80"
That seems to coincide with the container being killed:
2019-11-18T14:46:43.512375478Z image tag sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601 (name=aws_beanstalk/current-app:latest)
2019-11-18T14:46:43.578190263Z image tag sha256:e50f04477cd2aea76b6070de084af9b7286deca5a02c5c956a2ded8ff9a2f029 (name=metabase/metabase:v0.33.5)
2019-11-18T14:46:45.618070265Z image pull metabase/metabase:v0.33.5 (name=metabase/metabase)
2019-11-18T14:46:45.787403197Z image tag sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601 (name=aws_beanstalk/staging-app:latest)
2019-11-18T14:46:49.525549625Z container create 1d6c84bce613cee6e1fff0a2b50028e292f817fec690c52be43ab3dd21dee7c8 (image=ae319f6b1104, name=blissful_lalande)
2019-11-18T14:46:49.609703768Z network connect b9eb01249dd5e58c474cbafada7fb8fbe2d95a62648c1d9d55f637c5f72981f5 (container=1d6c84bce613cee6e1fff0a2b50028e292f817fec690c52be43ab3dd21dee7c8, name=bridge, type=bridge)
2019-11-18T14:46:49.932051478Z container start 1d6c84bce613cee6e1fff0a2b50028e292f817fec690c52be43ab3dd21dee7c8 (image=ae319f6b1104, name=blissful_lalande)
2019-11-18T14:46:57.250072590Z container kill bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c (image=ae319f6b1104, name=vibrant_mcnulty, signal=15)
2019-11-18T14:47:07.261511087Z container kill bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c (image=ae319f6b1104, name=vibrant_mcnulty, signal=9)
2019-11-18T14:47:07.544527672Z container die bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c (exitCode=137, image=ae319f6b1104, name=vibrant_mcnulty)
2019-11-18T14:47:07.657777844Z network disconnect b9eb01249dd5e58c474cbafada7fb8fbe2d95a62648c1d9d55f637c5f72981f5 (container=bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c, name=bridge, type=bridge)
2019-11-18T14:47:07.824365960Z container stop bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c (image=ae319f6b1104, name=vibrant_mcnulty)
2019-11-18T14:47:08.607013252Z container destroy bb476aad178a24857f18e356f3a96ecffaf4afcc0fae147bfae59a1f45a8f22c (image=ae319f6b1104, name=vibrant_mcnulty)
2019-11-18T14:47:08.743086466Z image tag sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601 (name=aws_beanstalk/current-app:latest)
2019-11-18T14:47:08.851273977Z image untag sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601 (name=sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601)
2019-11-18T14:47:10.004762653Z image tag sha256:ae319f6b110453e01a0c29867da3c8ab39479ba07f743905f94909c37a54b601 (name=aws_beanstalk/current-app:latest)
2019-11-18T14:47:10.169365976Z image tag sha256:e50f04477cd2aea76b6070de084af9b7286deca5a02c5c956a2ded8ff9a2f029 (name=metabase/metabase:v0.33.5)
I also modified the cloudwatch scripts to track memory usage, and it seems like it never goes above 30% of the ec2’s memory.
Any help would be greatly appreciated!