I deploy metabase via docker container and when I updated to version 51.9 / 51.10 it would suddenly crash from normal usage within a day reporting an OOM issue.
This happened previously in past version but was fixed in newer version, but it seems it was reintroduced.
Is there a memory leak?
The error output is the same as before:
Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (debug.cpp:339), pid=1, tid=48
# fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.25+9 (11.0.25+9) (build 11.0.25+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.25+9 (11.0.25+9, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
Also, the OOM error happened again today, and the container crashed.
I presume it's still the same issue but since usage is at a low right now it is taking a few days for it to crash vs 1-3 during normal usage.
@TonyC Any ideas? The crashed happened again this morning. I'd like to ideally get this fixed before next week, if possible, since that is when my users will return, and the normal usage will begin.
I also can't revert version since 51.10 has a version my users need which is enabling all metadata for the tooltips.
Are you able to point the crush to some metabase activity? What was happening before the OOM error?
2 options you can upgrade to v0.52.4.4 which has some performance fixes around subscriptions ... Or increase the memory in the meantime. but we need to figure out what is metabase doing before it ends up without heap space. Do you know of any process that is running at that time? Potentially a sync/scan or a bunch of dashboard subscriptions?
Hard to pinpoint at times since sometimes the container crashes overnight or early morning.
Would the log show what caused it to crash?
I only get notified that its offline as I'm not able to see the crash live.
But it is set to perform a few database syncs with my external databases and Slack notifications are used heavily for dashboards / questions.
I also deployed v52.4 on my test instance and noticed this one also crashed after a while.
Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (debug.cpp:271), pid=1, tid=72
# fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.5+11 (21.0.5+11) (build 21.0.5+11-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
[224.298s][warning][os] Loading hsdis library failed