Dashboard specific Caching

theo · October 9, 2020, 4:51pm

We have several dashboards which are basically static. They are supplied with data that is refreshed daily and don’t change all day.

However, the underlying queries are slow and with several users, they frequently cause a backlog of queries and metabase to become unusable.

The solution we want to explore is to run a script clearing the cache - like so:
No way to clear cache ? (or TRUNCATE TABLE QUERY_CACHE ?) -
and then refresh the relevant dashboards using the API.

we would then want the dashboard to cache all day. However, my concern is that setting the caching threshold very low (e.g. 0 seconds) and TTL to make everything last all day - would cause every query to be cached. And I don’t know what effect that would have on the application.

Has anyone else tried this method? Has anyone else had this issue? how have you gone about solving it?

Many Thanks,
Theo

flamber · October 9, 2020, 5:00pm

Hi @theo
Set the TTL for each question instead, since you know which questions are on those dashboards.
Have a look at this issue and the referenced issues:
https://github.com/metabase/metabase/issues/13396

theo · October 9, 2020, 6:04pm

Perfect - that’s just what I’m looking for

theo · October 9, 2020, 6:17pm

Although one question @flamber how would this apply with a dashboard filter?
one of the dashboards has filter that affects all the questions.

Ideally we’d cache all the possible results from the filter. e.g. if the filter is day of the week - cache the results for each day of the week

would this be possible? is each question with a different filter stored as a separate question?

flamber · October 9, 2020, 6:39pm

@theo I haven’t dug into it, just going by memory, but the dashboard filters applied to the cards, which are essentially questions, so the TTL for those questions (in report_card table) is what define the TTL, but again I haven’t tested.

When you make a query through the API (or whatever you are doing to pre-cache everything), then you should see all the results show in the query_cache table.
But you would have to make sure that each filter option is visited, so it stores a cache of that.

Please make sure you upvote this issue: https://github.com/metabase/metabase/issues/4967
And definitely also worth following some of the referenced issues too.

theo · October 13, 2020, 3:29pm

@flamber thanks for your help on this one. Although just looking at the report_card table it seems all the values for cache_ttl are NULL.

After manually changing this to 100 as a test - on one 5-second question. The question is still not cached. Have you got an insight on whats going on here?

flamber · October 14, 2020, 9:01am

@theo Correct. All of them are null by default, since that means it will use the global TTL.
But is cache enabled globally? Check your logs, which should tell you how and when caching is handled.

theo · October 14, 2020, 9:18am

Caching is enabled. My intention was to have caching enabled globally and then to have a custom cache_ttl for those questions that need it, as well.

Then to refresh them with a script daily.

Are you saying that if caching is enabled globally, the cache will not go to cache_ttl in the report card?

flamber · October 14, 2020, 9:39am

@theo No, I’m saying if you disable cache, then it’s disabled. When the card TTL is defined, then that TTL is used instead of global TTL.
https://github.com/metabase/metabase/blob/master/src/metabase/query_processor/middleware/cache.clj#L4-L6

theo · October 14, 2020, 10:10am

Ah I see.

I suppose I was thinking that if I set cache_ttl to not null then the query would cache regardless of query time. (if it didn’t exceed the minimum duration).

However, that seems not to be the case. Changing the field cache_ttl does cause it to cache for that length of time. But it won’t cache in the first place unless it exceeds the minimum duration.

@flamber have you got any suggestions for how to cause a card to cache, regardless of query time?

flamber · October 14, 2020, 10:14am

@theo Manually generate the query_cache entries, then you’ll have the control you’re looking for. What you’re specifically asking for isn’t part of Metabase yet, as you can see from the issues I have referenced.

theo · October 14, 2020, 1:44pm

Thanks @flamber I appreciate this is not something thats a part of Metabase. Just trying to look for a work around that will help us and maybe some others, without having to change the application code in anyway.

But manually generating the query_cache, would involve hashing the results and I’m not sure how to do that in a way that metabase could read. That doesn’t sound straight forward, and a fair bit of work to have to define the queries in a script, read the results hash them and then insert into metabase db.

theo · October 14, 2020, 1:46pm

One VERY hacky way that I think would work. is simply to put a sleep statement in the questions I want to cache, that is longer than the minimum duration to trigger caching.

And just manually change their TTLs so they last all day.

Feels messy, but I think it would work.

flamber · October 14, 2020, 2:07pm

@theo Messy/hacky is not always bad - but the log should help you figure out what is going on.
And then it’s important to check for any known issues, such as:
https://github.com/metabase/metabase/issues/13262

theo · October 14, 2020, 2:26pm

Oh thanks, what a pesky bug

theo · November 6, 2020, 3:29pm

@Wasiq I see you've made your way here.

I should add that I've got a pretty good working solution to refresh and cache a dashboard programatically.

Following the lead of this post:

theo · November 6, 2020, 3:31pm

we have a couple of dashboards which update daily and are quite heavy to load so refresh and caching daily was what we needed.

theo · November 6, 2020, 3:33pm

so the script has a function that does this.

def refresh_and_cache(dashboard_id, min_duration=15):
    set_dashboard_card_ttls(dashboard_id, ttl=0)
    set_dashboard_card_ttls(dashboard_id, ttl=45000)
    
    set_min_duration_to_cache(0)
    card_ids = get_card_ids(dashboard_id)
    
    for card_id in card_ids:
        refresh_card(card_id)

    set_min_duration_to_cache(min_duration)

it sets the ttl to zero, (in effect clearing the cache on that dashboard)
sets it back to 45000 (so as to last at least 12 hours).
then it sets minimum query duration to 0 (so as to cache anything)
then it goes through and refreshes the queries (this is quite slow so I actually do it in parallel)
then it sets minimum query duration back to what we want (in this case 15s)

stan_t · June 24, 2021, 6:42pm

Hi @theo, I am new to metabase but having a similar problem with massive dashboards that we want to cache once a day automatically. Can you please give me a hint on how you constructed these functions?

set_dashboard_card_ttls
set_min_duration_to_cache
get_card_ids
refresh_card

theo · June 28, 2021, 6:27am

The code we use atm looks something like this:

def clear_card_query_cache(card_id):
    query_mbdb("""delete 
                  from public.query_cache
                 where query_hash in (select hash 
                                      from public.query_execution 
                                      where card_id=""" + str(card_id) + ")")

def refresh_card(card_id):
    mb.post('/api/card/{}/query'.format(card_id))

def set_min_duration_to_cache(min_duration):
    data = {'value': min_duration}
    mb.put('/api/setting/query-caching-min-ttl', json=data)

def set_card_cache_ttl(card_id, ttl=3000):
    query_mbdb("""UPDATE public.report_card 
                          SET cache_ttl = """ + str(ttl) + """
                          WHERE id = """ + str(card_id))

def set_dashboard_card_ttls(dashboard_id, ttl=5000):
    card_ids = get_card_ids(dashboard_id)
    for card_id in card_ids:
        set_card_cache_ttl(card_id, ttl=ttl)

def refresh_and_cache(dashboard_id, min_duration=15):
    set_dashboard_card_ttls(dashboard_id, ttl=45000)

    set_min_duration_to_cache(0)
    card_ids = get_card_ids(dashboard_id)

    for card_id in card_ids:
        clear_card_query_cache(card_id)
        refresh_card(card_id)

    set_min_duration_to_cache(min_duration)