Anonymisation feature request

As we are ramping up our use of Metabase 'from the ground up' (ie. turning users on to how cool a tool it is), I've stumbled on something that would be very useful to us, and I sincerely hope to others too!

As making the most out of metabase comes down to exposing as many useful data-sources as possible the question of data access permissions and visibility is obviously critical. So far we've managed some of the more sensitive data by exclusion but have just found a scenario where the option to 'anonymise' the data would be far more helpful as those doing the data analysis don't need to see the data exposed with personal details to make good use of it, but would benefit from seeing the columns to understand usage patterns grouped by individuals.

So what I'm looking for is a way for an administrator to mark a column for anonymisation so that, for example, the name columns contain fake names, the user id and foreign key are scrambled, in a consistent manner (ie. the aggregations on ""Mickey Mouse" rows will still be correct even if it is really "Joe Bloggs" in the actual data, user id 007 will me mapped to some random but consistent value etc.).

The scenario which gave rise to this idea was a Freedom of Information request where I started in Metabase to access the data but had a use a third-party anonymiser (based on 'faker') on an export to complete the task. However, the more I thought about how it was a shame I couldn't do this from metabase directly, the more I realised the generic functionality would be immensely useful beyond this task as I could imagine exposing more of our data sources internally with this kind of data type set. As it stands we currently have to exclude access to some people due to the nature of the data being more sensitive than they should have access, simply because it is mappable to a real person, which is not what they would be interesting in anyway.

you can do this via Data sandboxes or via the new impersonation feature Add documentation about masking policies · Issue #38798 · metabase/metabase · GitHub

Thanks for the suggestion but data masking on the engine side isn't going to work where we are extracting data from third-party databases that we don't fully control (ie. back-ends for systems we can only meaningfully have read-only access to).

gotcha, so you can do it with our advanced sandboxing feature