Slow metabase queries over postgresql

zokie · May 28, 2024, 10:57am

Hi!

I'm running metabase v0.49.12 in kubernetes cluster and have troubles regardin query time in postgresql.
Same query in metabase takes 65 sec while in dbeaver or over psql takes ~17 sec
Some query takes 4min in dbeaver and cant be executed in metabase due to 10 min timeout.

Might be some settings which I can tweak to increase performance? I've tried several things but nothing seems to work

Here's diagnostics info:

{
"browser-info": {
"language": "en-GB",
"platform": "MacIntel",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4.1 Safari/605.1.15",
"vendor": "Apple Computer, Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.23+9",
"java.vendor": "Eclipse Adoptium",
"java.vendor.url": "https://adoptium.net/",
"java.version": "11.0.23",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.23+9",
"os.name": "Linux",
"os.version": "5.4.0-147-generic",
"user.language": "en",
"user.timezone": "UTC"
},
"metabase-info": {
"databases": [
"h2",
"postgres"
],
"hosting-env": "unknown",
"application-database": "postgres",
"application-database-details": {
"database": {
"name": "PostgreSQL",
"version": "15.1"
},
"jdbc-driver": {
"name": "PostgreSQL JDBC Driver",
"version": "42.7.2"
}
},
"run-mode": "prod",
"version": {
"date": "2024-05-23",
"tag": "v0.49.12",
"hash": "77e7a81"
},
"settings": {
"report-timezone": null
}
}
}

AndrewMBaines · May 29, 2024, 5:54am

What's the query?

zokie · May 29, 2024, 8:19am

Here's the heavy query which runs ~4min in dbeaver

SELECT
t.token,
SUM((query.value = 'true' AND query.key = 'return_crops')::int) AS return_crops_true_count,
SUM((query.value = 'true' AND query.key = 'use_internal_api')::int) AS use_internal_api_true_count,
SUM((query.value = 'true' AND query.key = 'normalization_fias')::int) AS normalization_fias_true_count,
SUM((query.value = 'true' AND query.key = 'check_fake')::int) AS check_fake_true_count,
SUM((query.value = 'true' AND query.key = 'classify_image_origin')::int) AS classify_image_origin_true_count,
SUM((query.value = 'true' AND query.key = 'check_fake_visual')::int) AS check_fake_visual_true_count,
SUM((query.value = 'true' AND query.key = 'with_hitl')::int) AS with_hitl_true_count,
SUM((query.value = 'true' AND query.key = 'mrz_compare')::int) AS mrz_compare_true_count,
SUM((query.value = 'true' AND query.key = 'external_check_passport_complex')::int) AS external_check_passport_complex_true_count
FROM
tasks t,
jsonb_each(t.query_args) AS query
WHERE
t.status_code = 200
GROUP BY
t.token
ORDER BY
t.token;

I've checked query plan and it's identical in metabase and psql.

AndrewMBaines · May 29, 2024, 9:13am

I do nothing with JSON in databases (prefer to use ETL to flatten), so can't be much help on the query side.
Can you see where the bottleneck is - the query running at Postgres or the data returned/processed at the Metabase end?

zokie · May 29, 2024, 11:07am

Guess it's some connection setting problem

I've noticed that if I use psql or pgweb - it spawns 3 connections (client backed + 2 parallel workers)
But when I'm running query with metabase it spawns only one worker.

Is it possible to adjust such behaviour?

zokie · May 29, 2024, 11:20am

I've resolved it using 'preferQueryMode=simple'