We’re deploying metabase in AWS using EBS. Every few weeks / months, metabase loses its connection to our data warehouse (Redshift).
Symptoms of the problem: Impossible to run any query on the redshift database, while queries behave normally from a standard SQL client.
The “fix” is trivial: just reboot the EC2 instance.
However that bug is a major reliability problem as we use metabase Premium embedding to have dashboards in client facing applications, and right now the “fix” require a human intervention to notice AND fix the problem.
Ideally, that “connection drop” issue would get fixed.
Is there some sort of Healthcheck API endpoint that could be used to set up a a self-healing
- We’ve been using metabase for about 6 months, the problem has occured 3 times
- Right now we’re on the latest metabase version, and we typically update to new metabase versions quickly
- Metabase running on AWS EBS (using metabase provided template), AWS RDS postgres backend for metabase (set up by EBS), Redshift data warehouse
- We’re using a t3.large instance. CPU usage is low when the problem occurs (5-10%)
- All the interface behaves normally when the problems occurs
- The problem never happened during usage peaks - this time it occured at night, when there’s virtually no usage.
Happy to report other information that may help