We are running on v0.46.1 and our instance randomly stops responding. Upon investigation we traced it down to the /autocomplete_suggestions request, there's too many of them and they are too slow.
- At first we see an onrush of autocompletion requests, and most of these are single-letter searches.
- The app database runs into 100% CPU, so it cannot respond to the queries fast enough.
- The requests saturate Metabase's bandwidth, so the app stops responding and latency raises to minutes.
- This can last anywhere from 10 minutes to an hour.
From what I can see, it's a combination of factors:
- Our bloated metabase_field / metabase_table tables.
- The logs show an onrush of single-letter search queries.
- The autocomplete doesn't have a debounce, so typing 100 characters creates 100 requests. Fast typers are able to DDoS our instance
Instance logs contain lots of entries similar to GET /api/database/17/autocomplete_suggestions 200 4.3 s (5 DB calls) App DB connections: 15/15 Jetty threads: 50/50 (0 idle, 54 queued) (211 total active threads) Queries in flight: 0 (0 queued)
:
The autocomplete query is the single most expensive query we have there:
I've checked a very similar issue that was resolved differently (Autocomplete takes 10 minutes to complete) and looked at the output. Here's how the counts look on the bulkiest DBs, as you can see most of the tables are not active:
We use Airbyte, and it creates and destroys a lot of temporary tables in our data warehouse. These have been picked up by Metabase regularly, so the metabase_field
and metabase_table
kinda exploded. Once we detected this, we disabled automatic field scans, as well as disabled the use of temporary tables on Airbyte, but there's still a lot of historical records sitting in Metabase DB.
-
My immediate reaction is to purge some data in the metabase_field table to see if it helps. Is it safe to do, and can I use the red button in the admin section of metabase? Or should I delete the data with SQL?
-
The next question is why the autocomplete so noisy in our case? Is this a bug?
-
And lastly, is there a way to disable scanning certain tables or columns, i.e. using a pattern (
airbyte.*_tmp*
)?