BigQuery: High cost after updating to 0.49.x

andges · March 27, 2024, 1:22pm

Hi,

we see very high BigQuery query cost after updating Metabase to 0.49.x.

This seems to be related to the "Scanning for Filter Values" feature.
Previous versions would not be able to run scans on partitioned tables.
It seems 0.49 supports this now and queries are issued with _PARTITIONTIME > '0001-01-01T00:00:00Z', if I've understood the PRs / code correctly.
For our dataset this means, queries are run on XXX TB sized tables just to get the first 1000 values for each column and we don't even use any of the UI query builders.

How can we get this cost-intensive scanning disabled ASAP?

1st attempt:
In the Metabase admin I've gone to the BigQuery connection and set "Scanning for Filter Values" to "Never, I'll do it manually if I need to".
However, I can still see the scans continue.
So it looks like this setting is actually not respected or am I looking at the wrong setting here?

2nd attempt:
In the Metabase admin I've gone to the BigQuery connection and set Datasets to "Only these ..." and left "Comma separated names of datasets that should appear in Metabase" empty.
Scans still continue.

3rd attempt:
The docs say that scanning can be disabled for an table entirely by hiding it.
That hasn't worked either, scans keep running.
Seriously, what's wrong here?
We'll have to downgrade as a quick fix and migrate off of Metabase.

Any help would be appreciated.
Also please reconsider this insane _PARTITIONTIME filter default of '0001-01-01T00:00:00Z'.

Kind regards,
Andreas

Luiggi · March 27, 2024, 4:23pm

just raised this to the team, they'll check it tomorrow. Can you create an issue in our github repo with this?

qnkhuat · March 28, 2024, 9:03am

Hi @anges,

In the Metabase admin I've gone to the BigQuery connection and set "Scanning for Filter Values" to "Never, I'll do it manually if I need to".

This looks like a bug and we're looking into it.

In the Metabase admin I've gone to the BigQuery connection and set Datasets to "Only these ..." and left "Comma separated names of datasets that should appear in Metabase" empty.

if you set the filtering schema to empty, it'll sync all the schemas. Another option is to not sync any schema, but I'm not sure it's useful since you'll have a database with no active tables.

The docs say that scanning can be disabled for an table entirely by hiding it.
That hasn't worked either, scans keep running.

this should do the trick, it works for me locally, not sure why it doesn't work for you.

On the otherhand, there is another way you could do is to go to each Field metadata setting and change "Filtering on this field " to "Plain input box". You'll need to do this to all the fields so prepare some coffee while you're at it.

Please let me know if that works for you. In the meantime we'll fix the sync option bug!

andges · March 28, 2024, 2:12pm

Hi qnkhuat,

thank you for looking into this and the explanations / suggestions.
We'll give it a try once we've managed to archive some of our data to reduce the size of our dataset.

Andreas