Optimising Permissions: Data Access and Collection Access given large number of users

Hello,

We are running Metabase 0.35.1.

We are restructuring our data permissions and hence the Data and Collection accesses. We want to optimise the groups based on the number of users we have (~2000). We don’t want to create too many groups where it becomes very granular but also we do not want to give access to sensitive collections to large pools of users due to having little amount of groups with many people.

Currently, we have ~2000 users (and growing), ~100 groups, ~90 data sources, ~800 sub-collections within ~50 parent collections.
The issue is mainly that users who have view access to a collection can view data visualisations even if they do not have access to the underlying data source of the question. Ideally, we would want it to be the opposite, where users can only view questions/dashboards in collections if they have access to the underlying data. Is there a way to change this logic?

We are grouping our users based on their usage of databases / query executions. Initially, we wanted to name our Groups after our data sources to minimise the number of groups, so everyone in Group X would have access to Database X. However this approach doesn’t work when limiting which collections Group X would have access to since there would be many people in this group from different teams (but have similar usage). Collections require much more granular grouping than data sources as many of our collections contain sensitive data visualisations or are only relevant to specific teams and departments.
We were also thinking to go ahead with the naming of groups as above, and then separately have additional groups just for collections, but this seems tedious.

So I’m wondering if anyone has gone through this exercise before, and if they have any tips on how to optimise permissions grouping, or have a model they used within their organisation with as many users as us?

Thank you very much in advance for the help :slight_smile:.

Hi @aminabk

That’s going to be a fairly big setup - can you give any details on how you’re running Metabase and how many resources given (CPU, RAM, etc)?

About the collection permissions. You would have to create a group for each datasource, so you can limit the users to only those datasources and collections.
But like you say, it’s going to be difficult to manage given your size - and tedious.

I don’t remember ever seeing a request for a more granular level of permissions, besides row level access, which is part of the Enterprise Edition as Sandboxing.

I know there’s installations larger than yours, but I don’t think they are using the same level of separation, so it’s not a subject I’ve seen raised before - but interesting.

I’m facing the same questions as the original poster (we work together). The main issue is that we’re not sure how to solve the problem of user_a gaining access to potentially sensitive data if user_b saves such a question to a collection they both have access to - even though user_a doesn’t have access to the underlying database.

  • limiting which users can save questions would be a major blocker
  • people will inevitably make mistakes of saving sensitive questions into collections that they should’t
  • imposing a new collection structure where e.g. there is a corresponding parent folder for each database
    ** users could make mistakes where to save questions (could create some overnight job that moves questions to the right parent collection, but by then data leakage could have happened)
    ** inside each parent folder, teams’ own collections would need to be duplicated (i.e. have a team_a folder in all parent folders) - this is tedious and if collection access was defined not at the parent level but at the team sub-collection level then it’s way too granular
    ** if a dashboard has questions using multiple databases, there is no collection to save this dashboard in (since each parent collection corresponds to a single database) - hence this would again give unwanted access to some users
  • we’d have the same issues if e.g. we had a collection for each team and within those a sub-collection for each of the databases

The reason why we’re not thinking about creating permission groups on a per team basis is because e.g. data analysts and designers on the same team need very different accesses. Our granularity in mind is team&position - which is fine for databases but then unsure how to mix in collections to this (1) without things getting unfeasibly granular, (2) without risking exposing sensitive data to the wrong users.

Currently, our theoretical dream solution is if we could edit Metabase’s settings so that users can only access questions if they have access to the collection AND the database used. Plus that if users only have limited permissions to access a database, they shouldn’t be able to use the “question builder” either. This way we could avoid creating permission groups that are too granular and we could eliminate human error when it comes to saving questions to the right collection.

Hi @alfred
I’m sure it’s something we can talk about incorporating into the Enterprise Edition, but I would recommend that you reach out to the sales team: https://www.metabase.com/contact/