Best practices regarding code organization

hsberlesi · July 25, 2020, 1:25am

Hey folks,

Recently, while making a few changes in a dashboard that’s embedded in our web application, I’ve started to face a lot of code repetition. So, I was wondering what are a few best practices regarding code organization that I could explore in order to improve the quality of the code.

I’m using Common Table Expressions (CTEs) to prepare the data before doing the aggregations, in order to better organize the code. Since the dashboard is showing customer data, I’m making sure that there’s the Company ID being filtered inside the WHERE clause in the CTE of every question.

But I found myself defining CTEs and applying the company filter to every question on the Dashboard, creating a lot of repeated code. An option that I was trying to explore was regarding Using an existing question as a sub-query, but I haven’t found out if we are able to add hidden filters in them.

I know that a common approach in the Metabase community is to build views into the database, and then use these views to populate the dashboards, but I was looking to avoid dealing with other customer’s data as soon as possible, in order to improve performance.

So, I’m open to suggestions. Is it possible to avoid repeating myself in every question? I was wondering if it’s possible to “prepare” a dataset with a SQL query (thus passing the parameters into this query) and then use that filtered query as a reference to every other aggregation in the Dashboard.

Thanks in advance.

flamber · July 25, 2020, 8:58am

Hi @hsberlesi
It sounds like you’re doing some multi-tenant setup, which is what Sandboxing in the Enterprise Edition can be used for as well.
For reference, this issue is about being able to parse variables to a nested query:
https://github.com/metabase/metabase/issues/6449 - upvote by clicking on the first post

mesquest · July 27, 2020, 6:32pm

Same issue here, CTEs or parts thereof containing WHERE 1=1 AND {{VARIABLE}} being repeated over many questions in embedded dashboards.

Could SQL Snippets work in such a situation?

flamber · July 27, 2020, 7:29pm

@mesquest It doesn’t support parsing the variable, just like for other nested queries.

fera320 · December 15, 2020, 6:12am

I'm facing the same problem related code organization. Nested queries haven't worked for all our use cases unfortunately. There are different issues that haven't been resolved

Is this the recommend approach in general? There are different features on metabase that solve different issues but not all of them. I thought of a list here let me know if this makes sense:

Nested queries: Use this when you have a common model that doesn't need to join other tables and want to make it easy to consumer by other users.
SQL Snippet: Use this when you have common joins used in different queries or nested queries that you want to reuse.
SQL View: use this when you need to join to other tables to consume that. This basically replaces Nested queries in some way.

I can provide more context from our data but basically I want to write a handbook with detailed steps on what tool to use in what situation. All this is assuming we don't need a datawarehouse. Don't have a huge experience on that so maybe I'm looking this in the right way. Right we have 600 questions there is a mix of nested questions, SQL questions and snippets and a few views but is getting harder to maintain things when we are shipping code very frequently and the criteria of those queries change.

hsberlesi · July 28, 2021, 10:07pm

Regarding the recommended approach that I mentioned earlier, I believe at the time there was no such features as SQL Snippets, for example. But for what I recall, the recommended approach (in general) was to create views to pre-process your data.

I believe this approach was intended to deal with the "Dataset" idea that a few tools provide. The idea is to pre-aggregate your data to make it easier for a User to consume it.

@fera320, looks like the right approach for your scenario might be standardizing the datasets within the company.

When I look back at some projects that I worked on, serving data to a majority of internal clients across our company, I believe that we could spend more time standardizing business rules and creating cleaner datasets. This effort would simplify a lot of our architecture at the time and it would avoid a lot of noise in our data.