We are seeing a very large number of queries being run by Metabase, presumably related to the syncing of the schema in our DBs. As a part of a snowflake contract discussion, it looks like our snowflake costs are 25% higher than they should be, costing us hundreds (almost thousands) of dollars a month.
Here's what we saw in April in terms of queries being run in snowflake and the credits associated with these queries:
sum(credits_used_cloud_services) as credits_used_cloud_services,
count(query_text) as count
and date_trunc(month, start_time) = '2022-07-01'
group by 1, 2
order by credits_used_cloud_services desc
At $3 / credit this was at least $600. The thing that's alarming to me is the count of queries. We have our databases set up to only sync daily, and these queries are being run over 100,000 - 500,000 times in a month. At the low end that's almost 140 times every hour, or over 400 times every hour for an entire month.
This seems crazy to me. Here are the settings that show the warehouse and user:
Is this happening to anyone else? This seems like something has gone horribly, horribly wrong.
I looked for an issue in github but didn't find anything that I think is related to what I'm seeing. I know that there are some recent updates to Metabase that allows for more selective syncing of databases and schemas for snowflake (introduced in 43.0) but we haven't upgraded yet and this behavior would be helped but not fixed by this setting. We self-host on elastic beanstalk and are on 0.42.2.
I cannot tell which queries Metabase are executing, but I doubt it's sync. It's more likely scan or perhaps fingerprinting. I'll need to see the query.
There are a lot of sync/scan issues fixed since 42.2 - and a complete overhaul of field values (scan) in the upcoming v44.
Though, for some databases, some queries can still be heavy: https://github.com/metabase/metabase/issues/19470 - upvote by clicking on the first post
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "REFORGE_DBT_DEV"
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "REFORGE_DBT_DEV"
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "SEGMENT_INGEST"
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "SEGMENT_INGEST"
These have run anywhere from 75k - 220k times this month already. My guess is the columns is the sync, and the primary keys is the scan. Is that right? Either way, I don't understand why it would be doing either of these this frequently.
I feel like the issue you link to is unrelated to the queries I show above. Let me know if I'm mistaken.
@danwolch Those queries are part of sync, and should only run hourly (default) or daily as you have changed it to.
Something is completely off with that amount of queries. Your Metabase log should indicate what is happening during sync/fingerprinting/scan, so check Admin > Troubleshooting > Logs.
Post "Diagnostic Info" from Admin > Troubleshooting.
I would recommend upgrading to latest release: https://github.com/metabase/metabase/releases/latest
@danwolch Try correlating activity on Snowflake with the Metabase log, since I don't see any sync stuff, which would start with the namespace sync in the log, like: INFO sync.util :: STARTING: Sync metadata for snowflake Database 2 'MyDB'
I would probably recommend disabling the field value scanning for now, which should help with some queries, and enable it again, when you're on v44, since it has much better logic.
Hey Dan... did you ever figure this out? It's been a long time but I believe I ran into the exact same type of issue a few years back... the Metabase settings didn't work the way I had expected... I'd definitely turn off the setting - "Rerun queries for simple explorations"
Also - here's an article about enabling caching on Metabase - if you're data is getting loaded once at 12am then there shouldn't be any reason for queries hitting Snowflake except for adhoc reporting....
I hope that helps take a step in the right direction.
Here is a query showing how many queries are being run for this snowflake warehouse by the metabase user:
where query_text = 'show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "ANALYTICS_STAGING"'
and warehouse_name = 'ADHOC_REPORTING'
and user_name = 'METABASE_USER'
group by 1, 2
order by 1
It shows that this query:
show /* JDBC:DatabaseMetaData.getColumns() */ columns in database "ANALYTICS_STAGING"
Is being run almost 5,000 times an hour. This seems insane to me.