Query Builder Date Filter Performance


#1

Dear metabase community,

we are using metabase within our company with great success. It is a phantastic tool that enables many users without SQL knowledge to build the reports they need.

Unfortunately there is one issue that causes major problems for us regarding the usage of date/time related filters:

Many of our database tables contain time series data. The tables grow quite big over time. Each table has an indexed timestamp column. Most of our reports contain a date filter such as

  • show me the data for the last month or
  • show me the data for the last 5 days

When filtering the data based on the timestamp column using the metabase query builder, the generated SQL does not filter on the raw timestamp field but on some transformation of the timestamp field. Examples:

  • WHERE date_trunc(‘month’, CAST(“public”.“my_events”.“event_timestamp” AS timestamp)) = date_trunc(‘month’, CAST((NOW() + INTERVAL ‘-1 month’) AS timestamp))
  • WHERE CAST(“public”.“my_events”.“event_timestamp” AS date) BETWEEN CAST((NOW() + INTERVAL ‘-5 day’) AS date) AND CAST((NOW() + INTERVAL ‘-1 day’) AS date)

Postgres, MySQL and probably most other relational database systems will not be able to make use of the index on the the timestamp field when casts or functions are applied to the field in the where condition. As a result, the whole table will be read for building a report about a few days of data. As our tables grow, the reports become slower and slower - and finally unusable.

Since the upcoming metabase release 0.33 will have a focus on the query builder, I would like to highlight this issue once again. I think many people besides us would benefit from improving the query builder with respect to date / time filtering.

Is there anything we can do to get this topic prioritized for the 0.33 release?

If some information is missing or if I can help solving this issue in any way please let me know.

To show that this topic affects a lot of people I collected the discussions and github issues about this topic from the recent years (unfortunately I am not allowed to post more than two links since I am new to this list):

Issues:

  • Performance Improvement: Use date range instead of casting #4043 - Jan 2017
  • Relative date filter field evades indices #6098 - Oct 2017
  • Performance problem on postgresql’s partitioned tables when using field filter #8275 - Aug 2018:
  • mysql table can not using date column index #9139 - Dec 2018:
  • filter on date column takes a very long time #9187 - Jan 2019:

Discussions:

  • Index usage on MySQL
  • Timestamp filters on indexes fields in MySQL
  • Select problem toward MariaDB

#2

Hi @lethum

Great write-up!

I’m not sure if it will be fixed for 0.33, since it would need a driver re-write (as far as I can understand) and then you would need to prioritize which drivers to focus on first.

I think the best options are:

  1. Go and vote on #4043 by clicking :+1: on the first post. The other issues are duplicates.
  2. Try to modify a driver, then do extensive tests and submit a PR.
  3. Support the project via sponsoring a bug - not sure if this issue is too big for sponsorship.

#3

Hi @flamber,

thank you for the quick response and for pointing out the available options.

  1. I did so :slight_smile:
  2. and 3.: I will check our options and budget about this.

#4

In my experience, the best you can do about this, at the momento, is to try, most of the times, to use BETWEEN clause for dates. In MySQL, at least, metabase uses the raw column, rather than applying some transformation over it, therefore allowing the usage of indexes. Given most of my users do select fixed date frames, rather than fixed periods (eg last 5 months), it has helped us QUITE A BIT (at 0.29, even using between, metabase would use date(column) anyway, even when the column had a date type).

Waiting for this improvement to come, though.