Too much access to database

Hi, I just installed Metabase and connected my MySQL RDS instance to it. It has 50 tables and 40GB of data. Some tables have millions of rows.

I checked the “this is a large database” option during configuration. Then I tried metabase running a very simple query and left it. In 30min it consumed half my RDS BurstBalance and executed some very expensive queries on some very expensive tables.

I toggled the 3 problematic tables to hide on the datamodel admin page. But it didn’t stop the expensive queries. I finally had to change the db password and kill the connections to make it stop.

I want to know if it’s possible to use metabase without this “pre-indexing” expensive phase? Just execute my queries, that I know to be optimized

Thanks

Hi @sergiolopes

Which Metabase version are you using?

That’s quite a large database. My biggest database has a few hundred tables and about 11GB data (SQL2008), which took a little time to index, but running with fairly low load after that.

Toggling the “hide”, just hides the table from the interface, but it’s still part of Metabase. If you want to limit the access, then you should create a user on RDS, which only has access to the tables you want to share with Metabase.

Did you check the Metabase log to see which queries were expensive?

@camsaul Do you have any suggestions? It’s above my pay-grade :slight_smile:

Thanks @flamber!

In a perfect world all tables would be visible to metabase (even the big ones), but I would write my custom optimized queries to access them.

I saw the queries directly on Performance Insights on RDS. They were touching the big tables with queries with millions of results and group by many different columns. There was even a query with a group by on a text column that makes no sense.

I’m using v0.31.2.metabase-aws-eb.zip

Not sure if it’s the case here, but Metabase scans all the views too.
It’s well worth setting up a separate user that only sees a few views.

When you select “this is a large database” take you to a second page (if adding a database for the first time) or add a “Scheduling” tab (for databases you’ve already added) where you can chose how often Metabase syncs and scans the database. You can set it to manually scan values instead of doing it daily.

From a security standpoint it’s probably best to give the database user you created for Metabase the absolute minimum permissions possible, so if you have schemas or tables you don’t want Metabase to see or sync my recommendation is to make sure the DB user has no permissions to access those objects.

In the future we’ll probably look at making it possible to pick and choose which Tables get synced directly from Metabase but we currently don’t have such fine-grained control.

1 Like