Killing too-long queries

Maxime · March 11, 2021, 10:47am

I didn’t find any built-in functionality to prevent users doing too long / too heavy queries.

I know most people rather look for increasing the browser timeout, but for us for example Metabase is used by multiple people. They often create too long / too complex queries which manage to throttle the database CPU and then everyone looses access to Metabase, while it’s business-critical for some users.

Hence we have added percona’s pt-kill to kill all requests longer than 10 minutes, but it would be nice to have this option directly within metabase’s admin panel (on a database-level).

(maybe it’s another topic but once the CPU gets to 100% and the db starts stopping to respond, Metabase goes down. Then when the db recovers Metabase stays down even if all AWS monitoring seems perfectly fine. Once an AWS admin restart the server everything is working fine again)

Maxime · March 11, 2021, 10:49am

The database in question is a MySQL one.
We use Metabase through Docker on AWS EBS, and the version was 0.34 when the issue of not recovering happened last week.

flamber · March 11, 2021, 12:20pm

Hi @Maxime
A lot has changed since 0.34 - likely almost the entire connection handling has been completely revamped in 0.35 with more changes in following versions. Latest release is 0.38.1
I would recommend upgrading (remember to backup first!) and reading this article:
https://www.metabase.com/learn/data-diet/analytics/metabase-at-scale.html

Maxime · March 11, 2021, 2:02pm

Hi Flamber,

I’ve looked at the article which gives interesting insights but none fitting to my situation:
we already have a bigquery data warehouse but it’s not entirely up to date (the new tables are not automatically added, it’s always 1 day late) hence why analytical people keep doing queries on the MySQL db sometimes.
And no matter how much I tell them “use indexes”, “use EXPLAIN” or other recommendations, they don’t necessarily apply them Hence why we need to prevent them from doing any too heavy query anyway, to not impact customer-service for example.

I did the upgrade to 0.38.1 this morning, we’ll see if the problems persist anyway!

flamber · March 11, 2021, 2:14pm

@Maxime Metabase syncs with your data sources every hour (by default), so if it is a day before you see changes, then perhaps it was changed to a different period:
https://www.metabase.com/docs/latest/administration-guide/01-managing-databases.html#database-sync-and-analysis

If changes are done on your data source, then you can use the API to force a new sync:
https://github.com/metabase/metabase/blob/master/docs/api-documentation.md#post-apidatabaseidsync_schema
The best way to learn the API, is to just use Metabase while having your browser developer Network-tab open and looking at the request, and what data is being send/received.
https://www.metabase.com/learn/developing-applications/advanced-metabase/metabase-api.html

Some users might need to refresh their browser, since some objects are cached.

Maxime · March 11, 2021, 2:44pm

@flamber if it is a day before you see changes : this is not due to metabase at all, it’s our process of replicating the mysql database into Bigquery which runs once per day every night. Today’s data is neither accessible in Metabase, nor in Bigquery console (sorry I wasn’t clear).

flamber · March 11, 2021, 3:00pm

@Maxime You can add the following to Admin > Databases > (your MySQL) > Connection String:
sessionVariables=max_execution_time=3000 - in milliseconds - change to higher number
For reference: https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_execution_time
MariaDB uses max_statement_time which is in seconds - https://mariadb.com/kb/en/server-system-variables/#max_statement_time

Maxime · March 11, 2021, 5:05pm

Amazing, how come I never found that ! You just made my day @flamber