Scanning of Map types in Clickhouse


We are using Metabase 0.44.6 with Clickhouse. We are using the unofficial metabase-clickhouse-driver as this is the only one available.

When Metabase does its "Scans", it also includes columns that are of Map(String, Int64). That is, it runs following query on a Map(...) type

-- Metabase
SELECT `my_db`.`my_table`.`my_column` AS `my_column_ex_544d1323`
FROM `my_db`.`my_column`
GROUP BY `my_db`.`my_table`.`my_column`
ORDER BY `my_db`.`my_table`.`my_column` ASC LIMIT 1000
FORMAT TabSeparatedWithNamesAndTypes

This query is non performant and on average takes 1295secs (i.e. ~21mins)... per column of this type.

Our tables are HUGE.... > 1+ Billion rows, AND we have 30+ such tables and each table has 2-3 such columns.

My understanding is that Scanning is to sample values for drop downs, variable etc. It is unlikely that columns with such types every be used in 'Field Filters' or even other kind of variable inputs and thus Scanning these kinds of fields is wasteful.

Should this somehow be optimised such that only columns with simple types such as String, Integer, Bool, Enum, date, time, datetime etc should be sampled?

Hi @nilay
That would have to be implemented on the Clickhouse driver:
You can disable scanning:

hi @flamber

Thanks for the response. After digging a bit into the bowels of MHTS, my understanding is that only certain types are scanned for values. Hence, if I do the following in the Clickhouse driver, then it should fix my issue:

  • Update the driver to translate Map Clickhouse type to types/Collection MHTS type.
  • i.e. For all Clickhouse Map types, the TableMetadataField - base-type will be types/Collection and that would prevent the scanning.

Would that work?


@nilay The driver can decide what is being scanned and how. See the Mongo driver, which is a very different database type. You can completely override the scanning functions, which sounds like what you want to do.

1 Like