Autocomplete takes down our self-hosted metabase instance

We are running on v0.46.1 and our instance randomly stops responding. Upon investigation we traced it down to the /autocomplete_suggestions request, there's too many of them and they are too slow.

  • At first we see an onrush of autocompletion requests, and most of these are single-letter searches.
  • The app database runs into 100% CPU, so it cannot respond to the queries fast enough.
  • The requests saturate Metabase's bandwidth, so the app stops responding and latency raises to minutes.
  • This can last anywhere from 10 minutes to an hour.

From what I can see, it's a combination of factors:

  • Our bloated metabase_field / metabase_table tables.
  • The logs show an onrush of single-letter search queries.
  • The autocomplete doesn't have a debounce, so typing 100 characters creates 100 requests. Fast typers are able to DDoS our instance :sweat_smile:

Instance logs contain lots of entries similar to GET /api/database/17/autocomplete_suggestions 200 4.3 s (5 DB calls) App DB connections: 15/15 Jetty threads: 50/50 (0 idle, 54 queued) (211 total active threads) Queries in flight: 0 (0 queued):

The autocomplete query is the single most expensive query we have there:

I've checked a very similar issue that was resolved differently (Autocomplete takes 10 minutes to complete) and looked at the output. Here's how the counts look on the bulkiest DBs, as you can see most of the tables are not active:

We use Airbyte, and it creates and destroys a lot of temporary tables in our data warehouse. These have been picked up by Metabase regularly, so the metabase_field and metabase_table kinda exploded. Once we detected this, we disabled automatic field scans, as well as disabled the use of temporary tables on Airbyte, but there's still a lot of historical records sitting in Metabase DB.

  • My immediate reaction is to purge some data in the metabase_field table to see if it helps. Is it safe to do, and can I use the red button in the admin section of metabase? Or should I delete the data with SQL?

  • The next question is why the autocomplete so noisy in our case? Is this a bug?

  • And lastly, is there a way to disable scanning certain tables or columns, i.e. using a pattern (airbyte.*_tmp*)?

Hey! Really good investigation and post, thanks for that! (Makes our lives way easier). About what you’re seeing: you should be able to filter the schemas you want to sync (not the tables unfortunately, so I would do it at the db level permissions).

To make everything faster again: wipe from the tables and fields tables in the app db all the tables and fields that no longer exist so queries can run faster. Also the autocomplete has an environment variable you can tweak to achieve faster results, have you seen that already?

That was a fast response, thank you so much!

I was able to clean away 90% of the tables, let's see if it helps. Regarding the env var, do you mean the MB_NATIVE_QUERY_AUTOCOMPLETE_MATCH_STYLE? Haven't tried using that, thanks for the pointer!

I'm still unsure why our logs were full of 1-letter requests and whether I should dig a bit more for a potential issue there. Do you think it would be worth looking into it further?

We answer the threads fast if users provide a decent level of technical aspects and investigation on the matter, like in this case :grinning:

Regarding the autocomplete: tweak the env var that you mention. I don’t think it can be enhanced further (unless you do crazy stuff like capturing the api call and routing it somewhere else)

Also: what’s the sizing of your app db? Upgrading the size and version can help

It's a modest 1vCPU instance, but it's being underutilized most of the time, except when editing queries. I'm pretty sure it will get better from now on as we don't use temp tables anymore.

I'll report back in a while to confirm this helped, but for now let's consider the case solved :raised_hands:

I noticed the same issue on my setup and I already filter schemas appropriately. I ended up solving it by blocking the autocomplete route at the nginx level. :person_shrugging:

You could have disabled it with the env var

I wished I had known about the ENV variable. I was on 0.45 when I made the change so it was already around. :person_shrugging: