Non-stop out of memory errors for more than a month

heitor · July 10, 2024, 11:26pm

Hey everyone!

I have been self-hosting metabase since 2019, but only recently have I been overwhelmed with out of memory errors from Metabase.

Currently I run it into AWS ECS on a t3.medium (3.9GB) with Amazon Linux 3, leaving 500MB for the OS and 3.4GB for the Metabase container. We use the official image (metabase/metabase:v0.50.7). Metabase stores its own data on postgres and we configure it with JAVA_OPTS="-Xmx2800m", leaving around 600MB for non-heap memory.

Those seems like very reasonable values to me but I am no expert in such JVM tunning.

We have somewhat "busy" dashboards which render a lot of questions simultaneously, but it crashes with as few as a single user navigating through those dashboards. The backend is a BigQuery warehouse and no significant amount of data is sent as query results, at least nothing out of the ordinary.

The only thing out of the ordinary that I noticed are the huge amount of threads started by Metabase on the clojure-agent-send-off-pool, which should definitely hurt non-heap memory.

Could you guys shed some light into this problem? What is the ideal value for JVM_OPTS? What kind of logs would be useful for me to share?

Thank you very much
Heitor

Luiggi · July 10, 2024, 11:50pm

We have this problem identified and we’re shipping a new version in the next few days to fix this. Please move to 50.11 and upgrade to newer versions as soon as you can

heitor · July 11, 2024, 1:01pm

Hey Luiggi,

Just updated it to 0.50.11, but this version has not yet fixed the OOM errors, right?

Thanks!
Heitor

heitor · July 26, 2024, 12:44pm

After the update all those OOM have stopped, thanks.

Today, after several days, the container reached it's hard limit memory and was terminated by the OS. I am somewhat confident that it was non-heap memory usage. I am currently leaving 3.4GB for the container and 2.8GB for heap memory (-Xmx2800m), leaving 600MB for non-heap. That seemed to me like a very high value, more than enough for running Metabase.

Would you kindly share if I am correct or if I should leave more space for non-heap?

Thank you!