Metabase's SHOW queries Snowflake more frequent than Sync Settings

marcellom · March 20, 2023, 11:39pm

We are seeing a higher number of cloud service credits associated with Metabase than we would expect given our current sync and scan settings. We're wondering whether there's an issue causing Metabase to run SHOW queries once per minute, instead of based on the admin sync settings. Example query:
show /* JDBC:DatabaseMetaData.getForeignKeys() */ imported keys in database "<our_database>"

We are seeing this single query run 38827 times and use 35 cloud compute credits in February. Our settings are:

We've already tried excluding the largest schemas in our database and upgrading to the latest Metabase version, but didn't see any changes in our daily queries. We also turned off our Metabase to verify that we didn't have another account linked to the same warehouse (we don't ). Taking a look at the logs, there doesn't appear to be specific jobs/tasks running more than every hour. We do see pulses every hour, which is more frequent our daily database sync suggests, but is that normal?

Here are the diagnostics:

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.18+10",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.18",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.18+10",
    "os.name": "Linux",
    "os.version": "5.10.90+",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "snowflake"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "13.7"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.5.0"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-02-19",
      "tag": "v0.45.3",
      "branch": "release-x.45.x",
      "hash": "070f57b"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

Here are other topics that we have looked into:

Hourly DB hit that is NOT our scheduled scan or sync - #3 by TropicalTomboy
- Based on the responses there, it also doesn't seem like a Jobs issue, but in case, here are our jobs:
  
  image1538×761 129 KB
Metabase sync costing $500 / month in snowflake credits - #9 by danwolch
- This issue does seem similar to ours, though not quite as frequent as reported there. Is our solution to disable syncs as well?
- Disable Database Sync or Allow Weekly Syncs · Issue #10398 · metabase/metabase · GitHub

Luiggi · March 23, 2023, 3:37am

I would suggest you manually disable the syncing or hide the tables that you don't want Metabase to scan

marcellom · April 20, 2023, 7:04pm

Thanks for the response @Luiggi - Do you suggest manually disabling syncs as a hack because there's something wrong with our Metabase setup or because Metabase isn't syncing as expected? We actually host several Metabase accounts so disabling the sync for every single one isn't a great option and it'd be ideal to understand why this is happening. Is there any other place/issue we could look into?

Luiggi · April 20, 2023, 8:40pm

There’s clearly something that shouldn’t happen there. Can you move to 46.1 and see if this keeps happening?

Rushkof · May 3, 2023, 1:35pm

Hello, we have the same issue, i updated in "You're on version v0.46.2" anyone have solve this, it cost 1K€ per month actually ...

Luiggi · May 3, 2023, 2:25pm

does this keep happening in 46.2? we just upgraded the driver on that version

Rushkof · May 3, 2023, 2:36pm

yes it's very annoying

arakaki · May 3, 2023, 4:20pm

@Rushkof Are you restricting which schemas are available for your connection in the Metabase UI? There is a field "Schemas" there.

I couldn't consistently reproduce this. The right behavior is to look for primary keys, foreign keys, and imported keys per table (and not per DB as it is running in your case). This would make these queries much faster.

Rushkof · May 3, 2023, 8:07pm

it's not a problem of slowness, even filling the schema area with the corresponding schema, it doesn't stop the interogation every second of the show command ...

marcellom · May 3, 2023, 8:13pm

@arakaki - Are you suggesting implementing PK's in Snowflake as a solution? As I understand it, PK's are supported but not enforced by Snowflake the way they are in other DB's.

Rushkof · May 3, 2023, 8:16pm

It's about this request:
show /* JDBC:DatabaseMetaData.getForeignKeys() */ imported keys in database "PROD_DB"

arakaki · May 3, 2023, 8:53pm

No, I'm not suggesting using PKs.

The number of queries is OK. We send one query per table that is synced. If you have hundreds or thousands of tables, it looks like indefine requests.

But these requests should be really light. Getting this metadata per table should like a few ms. But in this case it is scanning the entire db, so if this query takes 3 or 4s to run, it can lock the db and consume a lot of credits.

Luiggi · May 3, 2023, 9:10pm

We just fixed another issue that might have caused this regression

arakaki · May 4, 2023, 1:21pm

We just submitted a PR, it will be out on 46.3

github.com/metabase/metabase

Escape schema and table names in Snowflake JDBC metadata calls

metabase:master ← metabase:26054-escape-schema-and-table-in-snowflake-metadata-calls

opened 06:40PM - 03 May 23 UTC

metamben

+54 -2

Fixes #26054. The Snowflake JDBC driver is buggy: schema and table name are i…nterpreted as patterns in DatabaseMetaData.getPrimaryKeys and DatabaseMetaData.getImportedKeys calls. This PR replaces the default JDBC functions with ones that escape the names. The implementation is a tweaked copy of the functions in `metabase.driver.sql-jdbc.sync.describe-table`. We could generalize those functions so that the snowflake driver can override just the pieces it has to, but I decided that we shouldn't generalize just to be able to support buggy drivers. - - - This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/metabase/metabase/30537)

marcellom · June 1, 2023, 9:25pm

We upgraded to 46.3 last week and have seen a ~98% drop in cloud service credits. Thanks y'all!

Rushkof · June 2, 2023, 7:06am

not resolved for me in 46.4, @marcellom can you share your database configuration for snwoflake in metabase please ?

marcellom · June 2, 2023, 6:09pm

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.19+7",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.19",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.19+7",
    "os.name": "Linux",
    "os.version": "5.10.90+",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "snowflake"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "13.8"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.5.1"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-05-24",
      "tag": "v0.46.4",
      "branch": "release-x.46.x",
      "hash": "f858476"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

DB Config: