Advise on Metabase setup for a multiple database/customers scenario

mszopinski · September 20, 2019, 7:25am

Hi,

I’ve been scouting the forums and guides as we have set up Metabase for our product in the last few weeks.

What we currently have is a SaaS product, that has a small, independent instance of our product for all of our customers. Let’s use an example to try to explain what I’m getting at:
instance 1 - customer 1
instance 2 - customer 1
instance 3 - customer 1
instance 4 - customer 2
Every instance is connected to Metabase and we have a set of reports and dashboards that we manage to copy via a script from our template collection. Then by using groups and permissions we give access to the collection related with the customer instance.

So Bob (customer 1) can see 3 collections of reports for his 3 instances (instances 1, 2, 3)
In the same time Mary (customer 2) can only see her collection for instance 4.

This works just fine and the customers see reports for their individual instances. What we’re struggling to find a solution for is how to use the same template collection and use it to display an aggregated report for all instances owned by one customer. In my example above, Bob has 3 instances and would like to see an accumulated report for all 3 instances at the same time (for example: all new transactions within his 3 instances, now he can only see that separately).
Is this possible to achieve within Metabase? Or do we have to create another db and aggregate the data there?

Best regards,
Maciej

flamber · September 20, 2019, 1:43pm

Hi @mszopinski
Metabase currently does not allow queries across databases. There’s a request for it, go and upvote by clicking on the first post.
https://github.com/metabase/metabase/issues/3953
But until that is supported, then you could setup a separate database to consolidate, where you structure the data in a way that matches the customers. Depending on your type of database, you might be able to use views on your existing database.

lucbug · February 19, 2020, 10:08am

Hi @flamber,

I am currently going the same route for setting up a separate database.
In our case we run MySQL instances for each customers, all DBs look the same. So putting them all into one makes sense.

I have been looking at Data Warehouse tools such as Druid, Kafka, etc. But it feels like a lot of ground work for consolidating the DBs. And I’m not sure that’s the right way to go.

Would you have any best practices to share?

flamber · February 19, 2020, 1:06pm

Hi @lucbug
It’s difficult to give direct pointers, since it depends on how you’ve setup things, but if everything is MySQL, then perhaps it’s possible to setup simple replication from the instances back to a single “warehouse” database, but I guess it depends on structure.
You might want to have a look at these links too:

https://mysqlhighavailability.com/mysql-5-7-multi-source-replication-automatically-combining-data-from-multiple-databases-into-one/
https://aws.amazon.com/blogs/big-data/query-any-data-source-with-amazon-athenas-new-federated-query/ in combination with the Athena driver for Metabase https://github.com/dacort/metabase-athena-driver
Maybe it’s even possible to create a fake warehouse with https://proxysql.com/

thomas1 · September 9, 2020, 9:24am

Hi - I just wanted to post a potential solution we are testing currently which may work for some people.

We also considered different instances and databases, etc. But have come up with an idea we are currently testing and seeing if there is any massive caveat.

Current test setup: 0.36.4 + mysql 8

What we are doing using a number of base tables for the data storage named for example:
dwh_customer / dwh_customergroup / dwh_data

With foreign keys and everything keeping them linked up. These tables contain all the data for all customers and have at least 1 column which is used to split/filter/sandbox the data.

And per data group that we wanted to isolate - you create a simple view called: client1_data - which is nothing more than a view with: SELECT * FROM dwh_data WHERE customer_id in (1,3,5)

So the view itself is filtering at the database level the data before it reaches metabase, so you can do any kind of restriction here in SQL. You can also hide the base table in metabase as to not accidentally query from the un-restricted table.

Then in metabase - it sees the view(s) as tables with all the columns of the base table. The only thing it cannot figure out based on this setup is the FK links which makes sense - so you manually need to link anything up.

Users can be setup with access to these specific view tables which then correspond to whatever filter criteria you have defined in the views.

As far as performance so far - don’t seem to see any difference between the raw table and the view - but we haven’t loaded much data yet.

I’ve built a few sample queries and dashboard and it’s correctly filtering out other client data as expected so far.

Anyone gone down this path? Any good or bad results?

Thanks.

flamber · September 9, 2020, 9:55am

@thomas1 Sounds like you’re trying to do row level access, which exists in the Enterprise Edition - it’s called Sandboxing:
https://www.metabase.com/docs/latest/enterprise-guide/data-sandboxes.html

thomas1 · September 9, 2020, 9:58am

That’s true - and I think we and many other users wouldn’t even consider the work effort here if enterprise didn’t cost 10k.