Random crashes on 51.9 / 51.10

th-terray · December 24, 2024, 6:00pm

Hello,

I deploy metabase via docker container and when I updated to version 51.9 / 51.10 it would suddenly crash from normal usage within a day reporting an OOM issue.

This happened previously in past version but was fixed in newer version, but it seems it was reintroduced.
Is there a memory leak?

The error output is the same as before:

Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (debug.cpp:339), pid=1, tid=48
#  fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.25+9 (11.0.25+9) (build 11.0.25+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.25+9 (11.0.25+9, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log

TonyC · December 26, 2024, 12:12pm

Whats the RAM and CPU allocation on your metabase instance?

th-terray · December 26, 2024, 10:57pm

2 CPU / 8GB memory

TonyC · December 27, 2024, 8:12am

Are you sure that's what metabase is getting? How are you deploying metabase?

Java heap space usually will follow with a restart and metabase will log the amount of memory it has available.

th-terray · December 27, 2024, 5:55pm

I'm deploying Metabase via docker container and it uses 4GB.

But so far it seems 51.10 has been a bit more stable than 51.9.
51.9 would crash after 1 day or earlier or later depending on usage.

51.10 has only crashed on the initial startup so far but it seems fine but can't 100% say for certain since usage at this time is extremely low.

TonyC · December 30, 2024, 10:28am

I see! What is the actual memory of the JVM though? If you check the logs there should be some line as following:

2024-12-26 18:17:14,806 INFO metabase.util :: Maximum memory available to JVM: 4.0 GB

Can you confirm?

th-terray · December 31, 2024, 6:14pm

Yes, that is correct it shows 4GB in that line.

Also, the OOM error happened again today, and the container crashed.
I presume it's still the same issue but since usage is at a low right now it is taking a few days for it to crash vs 1-3 during normal usage.

th-terray · January 2, 2025, 5:22pm

@TonyC Any ideas? The crashed happened again this morning. I'd like to ideally get this fixed before next week, if possible, since that is when my users will return, and the normal usage will begin.

I also can't revert version since 51.10 has a version my users need which is enabling all metadata for the tooltips.

TonyC · January 3, 2025, 7:50am

Are you able to point the crush to some metabase activity? What was happening before the OOM error?

2 options you can upgrade to v0.52.4.4 which has some performance fixes around subscriptions ... Or increase the memory in the meantime. but we need to figure out what is metabase doing before it ends up without heap space. Do you know of any process that is running at that time? Potentially a sync/scan or a bunch of dashboard subscriptions?

th-terray · January 3, 2025, 4:18pm

Hard to pinpoint at times since sometimes the container crashes overnight or early morning.

Would the log show what caused it to crash?

I only get notified that its offline as I'm not able to see the crash live.
But it is set to perform a few database syncs with my external databases and Slack notifications are used heavily for dashboards / questions.

TonyC · January 6, 2025, 10:11am

I am hoping we find the OOM memory issue and then look at the previous logs to see what metabase is doing.

Can you pinpoint the crush to when those trigger? ... Are you monitoring RAM and CPU? Can you share that overtime?

th-terray · January 6, 2025, 5:02pm

I do get alerts on CPU / memory usage, but I haven't see any high usage corelate with the crash.

However, I have seen some high instances of CPU usage on initial container start usually.

th-terray · January 6, 2025, 5:08pm

I also deployed v52.4 on my test instance and noticed this one also crashed after a while.

Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (debug.cpp:271), pid=1, tid=72
#  fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.5+11 (21.0.5+11) (build 21.0.5+11-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
[224.298s][warning][os] Loading hsdis library failed

Luiggi · January 6, 2025, 9:57pm

Please upgrade to the newest nightly

th-terray · January 6, 2025, 10:37pm

Which version is that?