Slow Metabase after upgrade to 0.43

Hello :wave:

We've upgraded to 0.43.3 last Tuesday and have been experiencing a slow app since then, and especially today end of afternoon (almost impossible to reach, getting 504s).
It's a self-hosted Metabase instance (EC2) with RDS as DB. We've started investigating:

  • there is a spike in CPU and running queries on the RDS instance
  • there's a bunch of looking-alike queries that are super slow (below is the execution plan)
  • we suspect it comes from the new homepage that's looking for the last viewed items per user

We currently adding an index on query_execution and it takes a long time. Is that a known issue with 0.43.3?

Thanks for your help! :pray:

Note - I'll post the Diagnostic Info when I am able to reach the Admin space

{
  "browser-info": {
    "language": "en-GB",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "1.8.0_312-b07",
    "java.vendor": "Red Hat, Inc.",
    "java.vendor.url": "https://www.redhat.com/",
    "java.version": "1.8.0_312",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "25.312-b07",
    "os.name": "Linux",
    "os.version": "4.14.281-212.502.amzn2.x86_64",
    "user.language": "en",
    "user.timezone": "UTC"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "snowflake"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "12.8"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.3.2"
      }
    },
    "run-mode": "prod",
    "version": {
      "tag": "v0.43.3",
      "date": "2022-06-13",
      "branch": "release-x.43.x",
      "hash": "c9c7ef0"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

Hi @Antoip
Upgrade to Java 11: https://www.metabase.com/docs/latest/operations-guide/java-versions.html
Have a look at Questions very long to save
Without knowing which version you upgraded from, then it makes it a lot more difficult to help.

Thanks @flamber for your answers !

We upgraded today from 0.43.3 to 0.43.4 today (we fetch the latest jar) while switching to a new Java version (from java-1.8.0-openjdk to java-17-amazon-corretto), as per you recommendation

Since the deployment

  • Metabase queries that are run on our Snowflake DB don't seem to go through... ever.
  • We don't have the same issue for queries / questions that are run on other DB (Postgres DBs)
  • our DB logs indicate that most POST /api/*/query result in [ASYNC: canceled] 1.0 mins. Many threads are blocked : snowflake DB 4 connections: 1/1 (5670 threads blocked)

Would you have some explanations / explanations on where we should further investigate?
Thanks a lot :pray:
Have a great day,
Cyril

Diagnostic Info

{
  "browser-info": {
    "language": "fr-FR",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "17.0.3+6-LTS",
    "java.vendor": "Amazon.com Inc.",
    "java.vendor.url": "https://aws.amazon.com/corretto/",
    "java.version": "17.0.3",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "17.0.3+6-LTS",
    "os.name": "Linux",
    "os.version": "4.14.281-212.502.amzn2.x86_64",
    "user.language": "en",
    "user.timezone": "UTC"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "snowflake"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "12.8"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.3.2"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2022-06-27",
      "tag": "v0.43.4",
      "branch": "release-x.43.x",
      "hash": "61cc28e"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

Connection Pool Details


{"connection-pools":{"db-50-postgres-be_api":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-48-postgres-turing":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-52-postgres-jour_prod":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-8-postgres-metabase":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"metabase-postgres-app-db":{"numConnections":15,"numIdleConnections":15,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-4-snowflake-TURING":{"numConnections":2,"numIdleConnections":0,"numBusyConnections":2,"minPoolSize":1,"maxPoolSize":15},"db-49-postgres-es_api":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-51-postgres-eu_tools":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15},"db-2-postgres-fr_api":{"numConnections":1,"numIdleConnections":1,"numBusyConnections":0,"minPoolSize":1,"maxPoolSize":15}}}

DB logs extract

[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 INFO metabase.query-processor.card Question's average execution duration is 411.0 ms; using 'magic' TTL of 8.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/12816/card/14737/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 20/300 (6 idle, 0 queued) (6,036 total active threads) Queries in flight: 107 (0 queued); snowflake DB 4 connections: 1/1 (5670 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/13029/card/14801/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 20/300 (6 idle, 0 queued) (6,036 total active threads) Queries in flight: 106 (0 queued); snowflake DB 4 connections: 1/1 (5667 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/13031/card/14740/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 20/300 (6 idle, 0 queued) (6,036 total active threads) Queries in flight: 104 (0 queued); snowflake DB 4 connections: 1/1 (5666 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/13502/card/15272/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 20/300 (6 idle, 0 queued) (6,036 total active threads) Queries in flight: 104 (0 queued); snowflake DB 4 connections: 1/1 (5669 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/12815/card/14736/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 20/300 (6 idle, 0 queued) (6,036 total active threads) Queries in flight: 103 (0 queued); snowflake DB 4 connections: 1/1 (5668 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:40+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1118/dashcard/12817/card/14738/query 202 [ASYNC: canceled] 1.0 mins (24 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,036 total active threads) Queries in flight: 102 (0 queued); snowflake DB 4 connections: 1/1 (5671 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:50+02:00 INFO metabase.server.middleware.exceptions Request canceled before finishing.
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:51+02:00 INFO metabase.query-processor.card Question's average execution duration is 697.0 ms; using 'magic' TTL of 14.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 INFO metabase.query-processor.card Question's average execution duration is 601.0 ms; using 'magic' TTL of 12.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/13642/card/14768/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 4/15 Jetty threads: 25/300 (5 idle, 0 queued) (6,038 total active threads) Queries in flight: 102 (0 queued); snowflake DB 4 connections: 2/2 (5673 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/13668/card/14800/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 3/15 Jetty threads: 27/300 (4 idle, 0 queued) (6,039 total active threads) Queries in flight: 102 (0 queued); snowflake DB 4 connections: 2/2 (5676 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 INFO metabase.query-processor.card Question's average execution duration is 677.0 ms; using 'magic' TTL of 14.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/13941/card/15593/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 5/15 Jetty threads: 24/300 (5 idle, 0 queued) (6,038 total active threads) Queries in flight: 100 (1 queued); snowflake DB 4 connections: 2/2 (5675 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/13942/card/15597/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 4/15 Jetty threads: 25/300 (5 idle, 0 queued) (6,038 total active threads) Queries in flight: 102 (0 queued); snowflake DB 4 connections: 0/0 (5673 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/14052/card/15767/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 2/15 Jetty threads: 25/300 (5 idle, 0 queued) (6,039 total active threads) Queries in flight: 100 (0 queued); snowflake DB 4 connections: 0/0 (5674 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log POST /api/dashboard/1171/dashcard/13667/card/14799/query 202 [ASYNC: canceled] 1.0 mins (27 DB calls) App DB connections: 5/15 Jetty threads: 24/300 (5 idle, 0 queued) (6,038 total active threads) Queries in flight: 100 (1 queued); snowflake DB 4 connections: 2/2 (5674 threads blocked)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 INFO metabase.query-processor.card Question's average execution duration is 621.0 ms; using 'magic' TTL of 12.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 INFO metabase.query-processor.card Question's average execution duration is 509.0 ms; using 'magic' TTL of 10.0 s 💾
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/session/properties 200 11.0 ms (4 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,041 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/user/current 200 22.2 ms (9 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,043 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/collection/root 200 5.7 ms (2 DB calls) App DB connections: 0/15 Jetty threads: 21/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/bookmark 200 3.1 ms (1 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/collection/tree 200 48.7 ms (5 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/database 200 16.4 ms (3 DB calls) App DB connections: 0/15 Jetty threads: 22/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/dashboard/1113 200 32.4 ms (16 DB calls) App DB connections: 1/15 Jetty threads: 21/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)
[0022f142-ad4f-4519-80f5-8d5a58285142] 2022-07-05T16:33:52+02:00 DEBUG metabase.server.middleware.log GET /api/collection/1348 200 11.9 ms (4 DB calls) App DB connections: 0/15 Jetty threads: 21/300 (5 idle, 0 queued) (6,045 total active threads) Queries in flight: 102 (0 queued)

New hypothesis : could relate to this issue

@Cyril Correct. As I recommended Java 11 and will continue to do that for a little while longer.