Flamber thanks for the reply, we really appreciate it.
We actually were trying to force the server to error out earlier and I added that setting. I've since removed it, but we still had the slow crawl to lockup problem.
I have no doubt we're experiencing that mariadb driver bug, but it's being exasperated by other things. I think once we experience enough pool waits we'll slowly lock up.
One of our guys got frustrated with the server being down and found something else though and began digging on the SQL side... metabase had about 85 of these queries waiting in the database:
SELECT
min(`view_log`.`user_id`),
`view_log`.`model`,
`view_log`.`model_id`,
count(*) AS `cnt`,
max(`timestamp`) AS `max_ts`
FROM
`view_log`
LEFT JOIN `dashboard_bookmark` `bm` ON (`model` = 'dashboard'
AND `bm`.`user_id` = 33
AND `model_id` = `bm`.`dashboard_id`)
WHERE ((`model` in('dashboard', 'table'))
AND `bm`.`id` IS NULL)
GROUP BY
`model`,
`model_id`
ORDER BY
`max_ts` DESC,
`model` DESC
LIMIT 8
This query isn't very efficient; it could probably use some optimization in addition to some indexes.
This was further compounded by the fact our view_log table had no less than 42 millon rows in it... Like I said, we <3 Metabase and we've been using it for years. We went ahead and drop/recreate that table and that got rid of that particular issue (for now).
The guy that found this is going to suggest a rewrite of the query and some indexes that should help out.
I do think Metabase could probably use a cleanup process on that particular table though, as I doubt the data going back to 2018 - 2019 is useful anymore.
Again, thanks for the reply. We'll try to contribute a fix to the view_log query and see if that'll help out some other people!