We have a Metabase client being run on k8s via Terraform. This is how the liveness_probe
is configured.
liveness_probe {
exec {
command = [
"wget",
"localhost:3000/api/health",
"-q",
"-O",
"-"
]
}
initial_delay_seconds = 300
period_seconds = 10
failure_threshold = 30
timeout_seconds = 10
}
And the resources are set like this:
resources {
limits = {
memory = "12.0Gi"
cpu = "2.0"
}
requests = {
memory = "12.0Gi"
cpu = "2.0"
}
}
However, every 30 mintes to an hour, a random failure occurs:
Warning Unhealthy pod/metabase-deployment-dfawdada-9wagaw Liveness probe failed:
This error is not very descriptive and we can't figure out exactly what is causing this crash. Do the liveness_probe
settings need to be adjusted?
Hi @timsmith12
Post "Diagnostic Info" from Admin > Troubleshooting.
Have you checked the Metabase logs or resource monitors around the time of the probe failure?
Why using exec
instead of httpGet
probe?
{
"browser-info": {
"language": "en-US",
"platform": "MacIntel",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.16+8",
"java.vendor": "Eclipse Adoptium",
"java.vendor.url": "https://adoptium.net/",
"java.version": "11.0.16",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.16+8",
"os.name": "Linux",
"os.version": "5.4.204-113.362.amzn2.x86_64",
"user.language": "en",
"user.timezone": "GMT"
},
"metabase-info": {
"databases": [
"postgres",
"h2"
],
"hosting-env": "unknown",
"application-database": "postgres",
"application-database-details": {
"database": {
"name": "PostgreSQL",
"version": "12.11"
},
"jdbc-driver": {
"name": "PostgreSQL JDBC Driver",
"version": "42.4.1"
}
},
"run-mode": "prod",
"version": {
"date": "2022-08-16",
"tag": "v0.44.1",
"branch": "release-x.44.x",
"hash": "112f5aa"
},
"settings": {
"report-timezone": null
}
}
}
And yes the Metabase pod that we have deployed will only give this error:
Warning Unhealthy pod/metabase-deployment-dfawdada-9wagaw Liveness probe failed:
Kind of strange it ends with a colon ":", it's like the end of the error never got appended... which makes this a lot more difficult to trace what is happening
@timsmith12 That log is from Kubernetes, not Metabase. Try checking the Metabase log or resources around the time of the problem.
Latest release is 0.44.6
Perhaps you're seeing this issue:
https://github.com/metabase/metabase/issues/26266 - upvote by clicking
on the first post
Appreciate the help flamber! So another question, how am I able to get the Metabase log once it crashes? It seems to start fresh and I can't view the error once the liveness probe kills it?
@timsmith12 So you are not recording any logs of any pods (unless the applications themselves writes to a log somewhere "physical")? That seems like a Kubernetes configuration you're missing. Not really specific to Metabase.
But change your probe to httpGet instead of exec, and setup a lot more logging of Kubernetes.
If Metabase is working, but a probe is restarting the pod, then disable or adjust the probe.