ELB Environment Health Transitioned from OK to Severe

Slickrock22 · May 2, 2023, 2:18pm

I consistently get errors from last from AWS Beanstalk daily that says my environment has transitioned from OK to severe and then severe to OK (see image). This has been going on for a few years but now I'm starting to get some complaints that the dashboards are not loading. I think it might be related to this. I am running the most current version of Metabase along with an RDS Postgres backend database.

I have looked at the elastic load balancer, web server, and database server and all of them seem to be fine. No excessive CPU usage for anything else that seems obvious. Any suggestions on where I can start to troubleshoot these issues? Should I stop using Beanstalk and just use a JAR approach or AWS ECS? I don't see anything obvious in the Metabase log but then I am not sure what I am looking for.

Slickrock22 · May 3, 2023, 5:05pm

There is an influx of 403 errors which creates EB to react and goes to severe status. He thinks this is caused by 160+ requests to the server. It is the Dashboard API. Could this be caused by too many simultaneous requests for too many cards on the dashboard? For example, if we have 5 users accessing our application and those are viewing an embedded metabase dashboard with 20 cards. that would be 100 minimum calls. Also, I think because we are embedded with filters added to the URL it calls all of the cards on the Dashboard twice if I am remembering what @flamber stated in one of my previous performance posts. I see a bunch of HTTP 499 errors in the nginx folder, access.log file from the Beanstalk full log download. I am hopeful that this instance size increase from t3a.medium to c6a.large will eliminate the problems. The AWS support tech suggested trying to scale horizontally if this doesn't work. I will update as I learn more.

TonyC · May 4, 2023, 11:34am

Do you have any monitoring insights you can share? Like memory/CPU resources with time

Slickrock22 · May 4, 2023, 4:16pm

I thought increasing the instant size would solve the problem, but it does not look like it is fixing it. We still have a significant number of 4xx errors. Do these help? I am not 100% sure how to get at the memory utilization.

This is the instance CPU utilization

Load balancer

Load Balancer Requests

Network Out Error in Cloudwatch

Luiggi · May 4, 2023, 4:28pm

This is the reason why I don't recommend using Elastic Beanstalk anymore: it's a black box that has lots of components that are there for no reason, and even AWS doesn't update. My humble opinion: move to ECS or even to a simple server running the JAR. The simpler the architecture the better, always

Slickrock22 · May 4, 2023, 11:49pm

@Luiggi I am starting to get that sense. I attempted to follow some directions and spin up an Ubuntu instance

I tried to follow this post and got close but nginx was working.

I tried to follow another one
https://www.letscloud.io/community/how-to-install-metabase-on-ubuntu-20-04-as-a-service-with-nginx
and it didn't work.

The Metabase directions are useful but leave out some of the details for the install (which you can make up using the other references). I also restored my production Postgres config database.

The challenge is that I am not super technical but I'm relatively good at following instructions. A lot of these instructions leave out a couple of steps because maybe they assume people already know or say things like you need to do XYZ before you get started, but I'm not that familiar with Linux so it becomes a bit challenging. I agree would love to run a jar or ECS. Could you point me in the direction of some very well-written step-by-step instructions for using a jar on Ubuntu or ECS? I would be grateful.

Slickrock22 · May 6, 2023, 10:31pm

I am really close to being able to test out using just a jar on Ubuntu. I'm able to get everything running, except for when I am using environment variables to point metabase to postrges RDS. I'm using the exact same environment variables that I use in AWS Beanstalk and I get a 502 bad gateway. If I comment out the environment variable line in metabase.services it works. Uncomment, 502 bad gateway. Any ideas why it doesn't like my environment variables?

Luiggi · May 7, 2023, 1:50am

Slickrock22 · May 8, 2023, 8:05pm

@Luiggi Thanks for the feedback. Have you had success using that tutorial with Environment Variables? I followed that document and everything works until you add the line in the Metabase.service file to point to the RDS postgres DB (that is a clone of a production Metabase RDS instance) and I get bad gateway.

Slickrock22 · May 8, 2023, 11:32pm

@Luiggi This is the exact same issue, but there isn't a clear solution Migrating from H2 to MySQL ends with H2 - #8 by flamber

Luiggi · May 8, 2023, 11:37pm

Here it says: So first do that:
https://www.metabase.com/docs/latest/operations-guide/migrating-from-h2.html

Then when you are running on MySQL, then shutdown, configure your service environment variables to reference the MySQL and otherwise follow what is listed in the service guide:
https://www.metabase.com/docs/latest/operations-guide/running-metabase-on-debian.html

Slickrock22 · May 8, 2023, 11:47pm

@Luiggi Since I already have a proven postgres db that I have been running with metabase in Beanstalk I do not to migrate H2 data. I have followed the metabase on Debian a number of times.

Slickrock22 · May 8, 2023, 11:48pm

@Luiggi Working! Let me figure out what I did and I will share back.