We are seeing a very large number of queries being run by Metabase, presumably related to the syncing of the schema in our DBs. As a part of a snowflake contract discussion, it looks like our snowflake costs are 25% higher than they should be, costing us hundreds (almost thousands) of dollars a month.
Here's what we saw in April in terms of queries being run in snowflake and the credits associated with these queries:
Here's the query:
sum(credits_used_cloud_services) as credits_used_cloud_services,
count(query_text) as count
and date_trunc(month, start_time) = '2022-07-01'
group by 1, 2
order by credits_used_cloud_services desc
At $3 / credit this was at least $600. The thing that's alarming to me is the count of queries. We have our databases set up to only sync daily, and these queries are being run over 100,000 - 500,000 times in a month. At the low end that's almost 140 times every hour, or over 400 times every hour for an entire month.
This seems crazy to me. Here are the settings that show the warehouse and user:
Is this happening to anyone else? This seems like something has gone horribly, horribly wrong.
I looked for an issue in github but didn't find anything that I think is related to what I'm seeing. I know that there are some recent updates to Metabase that allows for more selective syncing of databases and schemas for snowflake (introduced in 43.0) but we haven't upgraded yet and this behavior would be helped but not fixed by this setting. We self-host on elastic beanstalk and are on 0.42.2.
I cannot tell which queries Metabase are executing, but I doubt it's sync. It's more likely scan or perhaps fingerprinting. I'll need to see the query.
There are a lot of sync/scan issues fixed since 42.2 - and a complete overhaul of field values (scan) in the upcoming v44.
Though, for some databases, some queries can still be heavy:
https://github.com/metabase/metabase/issues/19470 - upvote by clicking on the first post
Thanks for the quick reply, @flamber!
Here are the queries:
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "REFORGE_DBT_DEV"
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "REFORGE_DBT_DEV"
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "SEGMENT_INGEST"
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "SEGMENT_INGEST"
These have run anywhere from 75k - 220k times this month already. My guess is the columns is the sync, and the primary keys is the scan. Is that right? Either way, I don't understand why it would be doing either of these this frequently.
I feel like the issue you link to is unrelated to the queries I show above. Let me know if I'm mistaken.
@danwolch Those queries are part of sync, and should only run hourly (default) or daily as you have changed it to.
Something is completely off with that amount of queries. Your Metabase log should indicate what is happening during sync/fingerprinting/scan, so check Admin > Troubleshooting > Logs.
Post "Diagnostic Info" from Admin > Troubleshooting.
I would recommend upgrading to latest release: https://github.com/metabase/metabase/releases/latest
Here are the logs: https://gist.github.com/danwolch/4e774df831556586e31a90db9ed17d77
What should I be looking for?
We'll upgrade to the latest version.
@danwolch Try correlating activity on Snowflake with the Metabase log, since I don't see any sync stuff, which would start with the namespace
sync in the log, like:
INFO sync.util :: STARTING: Sync metadata for snowflake Database 2 'MyDB'
I would probably recommend disabling the field value scanning for now, which should help with some queries, and enable it again, when you're on v44, since it has much better logic.
Hey Dan... did you ever figure this out? It's been a long time but I believe I ran into the exact same type of issue a few years back... the Metabase settings didn't work the way I had expected... I'd definitely turn off the setting - "Rerun queries for simple explorations"
Also - here's an article about enabling caching on Metabase - if you're data is getting loaded once at 12am then there shouldn't be any reason for queries hitting Snowflake except for adhoc reporting....
I hope that helps take a step in the right direction.
Just wanted to add we are seeing the exact same problem - with metabase's metadata queries driving about 15-20% of our monthly snowflake credit usage.
We're going to upgrade to the latest version then iterate from there on tuning settings to see if we can nail down what's ultimately driving this, as it's heavily cost impacting
But it would be great to hear if the upgrade helped or not for you @danwolch ?
I upgraded our self hosted instance to 0.44.2 on Monday, 9/5. This hasn't resolved the issue. I'm seeing a ton of queries against one of our staging databases.
Here's the config for this database:
Here is a query showing how many queries are being run for this snowflake warehouse by the metabase user:
where query_text = 'show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "ANALYTICS_STAGING"'
and warehouse_name = 'ADHOC_REPORTING'
and user_name = 'METABASE_USER'
group by 1, 2
order by 1
It shows that this query:
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "ANALYTICS_STAGING"
Is being run almost 5,000 times an hour. This seems insane to me.
I'd love for someone to tell me I'm doing something wrong, or that there are more configuration options I can toggle to reduce our spend.
@flamber I really appreciate all of the responses / help you have already given in this thread.
Do you have any other suggestions for what to look into given my most recent post?