Hi,
I’m running Metabase in Docker with MySQL as the application database.
I recently tried upgrading from v0.43.2 straight to v0.56.0.1, but after startup, the service never becomes healthy.
What I did:
-
Stopped my v0.43.2 container (running fine).
-
Pulled the v0.56.0.1 image and started it using the same MySQL DB connection.
-
Watched the logs during startup.
What happens:
The migration log shows success for migrations/001_update_migrations.yaml, but afterwards, /api/health continuously returns 503. The log just repeats the health check errors every ~15 seconds.
Logs:
2025-08-11 02:27:06,387 INFO liquibase.changelog :: ChangeSet migrations/001_update_migrations.yaml::v50.2024-01-10T03:27:31::noahmoss ran successfully in 152ms
2025-08-11 02:27:12,461 ERROR middleware.log :: HEAD /api/health 503 0ms (0 DB calls) {:metabase-user-id nil}
2025-08-11 02:27:27,498 ERROR middleware.log :: HEAD /api/health 503 0ms (0 DB calls) {:metabase-user-id nil}
...
Environment:
Other notes:
Has anyone run into this before? Should I try upgrading via intermediate releases (e.g., v0.50.x) or is this upgrade path supported?
Thanks!
Are you sure the migrations finished? There's a particularly large one in there that can take a while to finish. The step after v50.2024-01-10T03:27:31 creates data_permissions and migrates data from the previous permissions table, which I believe can get quite large if you have a lot of Metabase objects.
Usually there's an accompanying log message if you try to hit Metabase while startup is still running, don't know if that applies to the health check or not. Make sure the container doesn't try to auto-heal itself (i.e. terminate & restart) during the migration, otherwise it may never finish.
If you want to peek at what the migration is doing, log into the metabase database and look at the databasechangelog table, the most recent entry by orderexecuted marked EXECUTED is what's been done so far. The 'vXX' in the id field is the Metabase version the rule was written for. If things are still running you should be able to see them with the usual db monitoring tools, SHOW FULL PROCESSLIST for MySQL, etc.
, I checked MySQL using SHOW PROCESSLIST and found a query running for a long time:
-- Insert 'no' permissions for any table and group combinations that weren't covered by the previous query
INSERT INTO data_permissions (group_id, perm_type, db_id, schema_name, table_id, perm_value)
SELECT
pg.id AS group_id,
'perms/download-results' AS perm_type,
mt.db_id,
mt.schema AS schema_name,
mt.id AS table_id,
'no' AS perm_value
FROM permissions_group pg
CROSS JOIN metabase_table mt
WHERE NOT EXISTS (
SELECT 1
FROM data_permissions dp
WHERE dp.group_id = pg.id
AND dp.db_id = mt.db_id
AND (dp.table_id = mt.id
OR dp.table_id IS NULL)
AND dp.perm_type = 'perms/download-results'
)
AND pg.name != 'Administrators'
It seems this is part of a migration step to populate missing perms/download-results entries in data_permissions.
Notes:
-
In my case, the permissions_group and metabase_table tables are fairly large, so this cross join produces a lot of rows. Row counts in my environment:
-
permissions_group = 67 rows
-
metabase_table = 27,729 rows
-
CROSS JOIN = ~1.85 million combinations
-
This might explain why the upgrade hangs — the migration query could be running for a very long time before allowing the application to finish starting.
Is this expected for large deployments, and is there any recommended way to speed this up? For example, running this manually with indexes in place before upgrading?
1.8 million rows is nothing, unless you're running your app database on a WiFi router.
Plus the WHERE NOT EXISTS triggers a semi-join optimization so you aren't getting all those rows materialized anyway ... assuming you aren't running an old-as-dirt MySQL. 5.7 isn't supported by Metabase (or anybody) anymore and will explode later on if you're still running that dinosaur. 8.x should be fine.
I ran this on my local PC to simulate the production upgrade. I’m currently using MySQL 8.0 running in Docker on Windows, on an i9 machine. I’ve let the query run for over an hour, but it still hasn’t finished. Could this be because it’s running MySQL inside Docker, so the resources aren’t being fully utilized?
In the production environment, it runs on a managed database with 4 vCPUs and 16 GB RAM.
If you have resource limits on the container, it’s certainly a possibility. Databases want lots of memory for cache and fast storage. A dedicated server (or VM, with more resources than your workstation) is going to perform better. That said, there’s a LOT of migration to happen with a version jump that large, and its going to take a while. I would plan for an extended downtime.
And of course, have a backup of the database in the event something goes wrong.
We explicitly ask people to move major version by major version (using the latest minor versions on each version) when you upgrade metabase from very old versions. That will give you more information about which migration is going wrong