Metabase - docker container going down frequently

Wasim · December 25, 2024, 4:37am

I have a metabase server running on linux machine with docker way of deployment. I am using postgres as backend database of metabase app server. I have some queries and dashboard with around 100+ users using the metabase. My docker container running the metabase is going down automatically and this is happening very frequent. I want to know if this is happening because of queries/users load. Additional things, that I have tried are

Enabled Database Caching with Adaptive and Minimum Query Duration set to 300 and Multiplier to 12
Model Persistence is Enabled

I think I have pulses getting used as well.

bayo99 · December 25, 2024, 11:34am

From experience it looks more like a resource issue rather than a metabase issue.
We had similar issues when we hosted on AWS and the instance kept restarting because the allocated RAM and CPU were not enough.

Wasim · December 27, 2024, 9:06am

@bayo99
I am having an instance with enough resources since I am using c4.xlarge (4vCPUs and 8GB RAM) and as I can see the metrics from grafana and AWS Ec2 monitoring console looks fine for this instance.

bayo99 · January 3, 2025, 6:58am

@Wasim
since you have monitoring set up, can you examine the telemetry and logs for when it goes down? Should give more insights.

Wasim · January 3, 2025, 7:42am

@bayo99
Do you know how do I get the logs of metabase server running on docker container?
Also, please note that am running below commands whenever metabase container goes down and re-run the server again with docker run command, so not sure if I am able retrieve the logs of previously running container.

sudo docker-compose stop && docker rm -f $(docker ps -aq) && docker pull metabase/metabase:latest
docker run -d -p 3000:3000 --mount type=bind,source=$PWD/plugins/ch.jar,destination=/plugins/clickhouse.jar etc.

bayo99 · January 3, 2025, 8:38am

@Wasim
The logs should be available except you delete the container which looks like what you are doing.
Can you try restart the container when it goes down instead of deleting it?
Or you can pull the logs before deleting the container using:
docker logs command
Reference: docker container logs | Docker Docs

Wasim · January 3, 2025, 8:41am

@bayo99
Yes, you're right.
Sure, let me try this out setting up restart always policy if it goes down.
Noted, I will pull the logs before deleting the container.

Thanks, will try these things out.

Wasim · January 5, 2025, 7:09am

Hi @bayo99

I have a restart policy set to always in my docker-compose(file)/docker run(command) for a Docker container, the container will automatically restart if it stops or crashes. Since then I had just one restart of docker container.

But In order to find out why container is going down, I have taken error logs of the server before server gets restarted by running this command,
docker logs --until "2025-01-04T13:42:28.455325565Z" metabase | grep "ERROR"
which has given me the logs until container gets restarted.

Listing down few logs (10 mins prior logs before container crashed)

2025-01-04 13:29:49,046 ERROR middleware.catch-exceptions :: Error processing query: ERROR: operator does not exist: interval <= integer
   "Error executing query: ERROR: operator does not exist: interval <= integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 514",
 "ERROR: operator does not exist: interval <= integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 514",
2025-01-04 13:29:58,013 ERROR middleware.catch-exceptions :: Error processing query: ERROR: operator does not exist: interval * interval
   "Error executing query: ERROR: operator does not exist: interval * interval\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 715",
 "ERROR: operator does not exist: interval * interval\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 715",
2025-01-04 13:31:30,709 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:31:35,215 ERROR middleware.catch-exceptions :: Error processing query: ERROR: cannot cast type interval to integer
   :error "Error executing query: ERROR: cannot cast type interval to integer\n  Position: 963",
 :error "ERROR: cannot cast type interval to integer\n  Position: 963",
2025-01-04 13:32:52,264 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:33:30,991 ERROR middleware.catch-exceptions :: Error processing query: ERROR: aggregate function calls cannot be nested
   :error "Error executing query: ERROR: aggregate function calls cannot be nested\n  Position: 186",
 :error "ERROR: aggregate function calls cannot be nested\n  Position: 186",
2025-01-04 13:33:53,455 ERROR notification.send :: [Notification 365] Error sending notification!
2025-01-04 13:34:49,990 ERROR notification.send :: [Notification 365] Error sending notification!
2025-01-04 13:34:55,639 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:35:16,623 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:36:18,087 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:36:38,093 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:36:58,919 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:37:19,527 ERROR middleware.process-userland-query :: Error saving field usages
2025-01-04 13:37:22,929 ERROR middleware.catch-exceptions :: Error processing query: ERROR: operator does not exist: interval = integer
   "Error executing query: ERROR: operator does not exist: interval = integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 854",
 "ERROR: operator does not exist: interval = integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 854",
2025-01-04 13:38:05,010 ERROR middleware.catch-exceptions :: Error processing query: Cannot run the query: missing required parameters: #{"YEAR" "MONTH"}
2025-01-04 13:40:18,426 ERROR middleware.catch-exceptions :: Error processing query: ERROR: operator does not exist: interval = integer
   "Error executing query: ERROR: operator does not exist: interval = integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 370",
 "ERROR: operator does not exist: interval = integer\n  Hint: No operator matches the given name and argument types. You might need to add explicit type casts.\n  Position: 370",
2025-01-04 13:41:04,691 ERROR middleware.catch-exceptions :: Error processing query: Cannot run the query: missing required parameters: #{"P_Name"}
2025-01-04 13:41:10,225 ERROR middleware.catch-exceptions :: Error processing query: Cannot run the query: missing required parameters: #{"property_name"}

Wasim · January 5, 2025, 7:14am

This is output of docker inspect metabase command

[
    {
        "Id": "ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f",
        "Created": "2025-01-03T08:45:11.299117633Z",
        "Path": "/app/run_metabase.sh",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3866911,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2025-01-04T13:42:28.842298989Z",
            "FinishedAt": "2025-01-04T13:42:28.455325565Z"
        },
        "Image": "sha256:11a6a29b2fda9958d053ce8195de200a9241e98c8032673080b375417a4db3c1",
        "ResolvConfPath": "/var/lib/docker/containers/ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f/hostname",
        "HostsPath": "/var/lib/docker/containers/ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f/hosts",
        "LogPath": "/var/lib/docker/containers/ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f/ab35a0e16db933763d68a37938d5e198d9e3e0418adae23634040fd08a98e23f-json.log",
        "Name": "/metabase",
        "RestartCount": 1,
         .......
  }
]

bayo99 · January 6, 2025, 6:09am

@Wasim
From the errors, it's obvious the errors are making the container unhealthy hence the frequent restart.
Query errors usually should show up on the UI.
Can you try to identify queries that are causing the issues?
Also, what version of metabase are you running?

Wasim · January 6, 2025, 6:50am

@bayo99
Yes, I have further did RCA for the logs which are occured before metabase container restart and have found out that at the end of logs, crashing is because of JVM Heap Memory Issue as listed here and reason for crashing could be events such as large queries and databases syncs.

Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (debug.cpp:271), pid=1, tid=913
#  fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.5+11 (21.0.5+11) (build 21.0.5+11-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log
[104232.601s][warning][os] Loading hsdis library failed

I will have to identify the queries that are causing this issue
And possibly tweak settings for databases syncs that I have added to metabase. By default metabase does the light weight hourly sync and an intensive daily scan of field values.

I am running latest metabase v0.52.4

Luiggi · January 6, 2025, 7:43am

Please post the logs as soon as Metabase starts

Wasim · January 7, 2025, 7:47am

Sure, @Luiggi The logs are huge with 9.7 MB, how do you want me to share the logs file?