Configuration Optimization

We setup Metabase to point towards our production DB and noticed a huge performance hit where basically 50% of the CPU was taken up by queries like:

SELECT TOP 301 “dbo”.“Contact”.“SOCIAL_TWITTER” AS “SOCIAL_TWITTER” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“SOCIAL_TWITTER” ORDER BY “dbo”.“Contact”.“SOCIAL_TWITTER” ASC
SELECT “dbo”.“Contact”.“ADDRESS_MAIL_CITY” AS “field” FROM “dbo”.“Contact” ORDER BY “dbo”.“Contact”.“CONTACT_ID” ASC OFFSET 500 ROWS FETCH NEXT 500 ROWS ONLY
SELECT TOP 301 “dbo”.“Contact”.“PHONE_MOBILE” AS “PHONE_MOBILE” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“PHONE_MOBILE” ORDER BY “dbo”.“Contact”.“PHONE_MOBILE” ASC
SELECT “dbo”.“Contact”.“EMAIL_ADDRESS” AS “field” FROM “dbo”.“Contact” ORDER BY “dbo”.“Contact”.“CONTACT_ID” ASC OFFSET 0 ROWS FETCH NEXT 500 ROWS ONLY
SELECT TOP 301 “dbo”.“Contact”.“PHONE_HOME” AS “PHONE_HOME” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“PHONE_HOME” ORDER BY “dbo”.“Contact”.“PHONE_HOME” ASC
SELECT TOP 301 “dbo”.“Contact”.“FIRST_NAME” AS “FIRST_NAME” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“FIRST_NAME” ORDER BY “dbo”.“Contact”.“FIRST_NAME” ASC
SELECT TOP 301 “dbo”.“Contact”.“PHONE_FAX” AS “PHONE_FAX” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“PHONE_FAX” ORDER BY “dbo”.“Contact”.“PHONE_FAX” ASC
SELECT TOP 301 “dbo”.“Contact”.“SOCIAL_FACEBOOK” AS “SOCIAL_FACEBOOK” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“SOCIAL_FACEBOOK” ORDER BY “dbo”.“Contact”.“SOCIAL_FACEBOOK” ASC
SELECT TOP 301 “dbo”.“Contact”.“SOCIAL_FACEBOOK” AS “SOCIAL_FACEBOOK” FROM “dbo”.“Contact” GROUP BY “dbo”.“Contact”.“SOCIAL_FACEBOOK” ORDER BY “dbo”.“Contact”.“SOCIAL_FACEBOOK” ASC
SELECT “dbo”.“Contact”.“ADDRESS_OTHER_CITY” AS “field” FROM “dbo”.“Contact” ORDER BY “dbo”.“Contact”.“CONTACT_ID” ASC OFFSET 2500 ROWS FETCH NEXT 500 ROWS ONLY
SELECT TOP 301 “dbo”.“Comment”.“BODY_SUMMARY” AS “BODY_SUMMARY” FROM “dbo”.“Comment” GROUP BY “dbo”.“Comment”.“BODY_SUMMARY” ORDER BY “dbo”.“Comment”.“BODY_SUMMARY” ASC
SELECT TOP 301 “dbo”.“CalendarEvent”.“TITLE” AS “TITLE” FROM “dbo”.“CalendarEvent” GROUP BY “dbo”.“CalendarEvent”.“TITLE” ORDER BY “dbo”.“CalendarEvent”.“TITLE” ASC

So we had to stop them. I’m wondering:

  1. Is it recommended to target a production DB or use another method?
  2. We run SQL 2016 in High Available Group and notice there isn’t a way to make it target the read node. Any workaround?
  3. Does Metabase perform these queries once and then update on a regular interval with less impact?
  4. When running a Metabase query does it target its local metabase db it created or the target?

Looking for assistance, thanks.

Hey David -

We will scan your database when it’s added to get things like values for drop-down selects, fingerprinting the special types we add, etc. It’s a bit heavy of a scan, and we only do it once a day. We are sampling each table, so the total load shouldn’t be too crazy for a “reasonable sized” db. Not sure how big/performant your data warehouse is. You can also turn this off in the database edit page (see below)

Regarding your questions specifically,

  1. we generally recommend you use a read replica once you get significant load on your data warehouse. At a certain scale/complexity, you’ll probably also want to transition to a dedicated analytics data warehouse.
  2. not sure. this is a bit outside of my personal experience. How do other tools do this? Can you use a different hostname or connection flag?
  3. Yup, see above
  4. The target. We keep a fair bit of metadata about your underlying database, our questions, caches, etc, but all queries are run in the target data warehouse
1 Like

We have on a read replica now and it is working great. We ended up turning off the in-depth database analysis.

  1. A connection flag. something like = readintent.