Dashboard specific Caching

@flamber thanks for your help on this one. Although just looking at the report_card table it seems all the values for cache_ttl are NULL.

After manually changing this to 100 as a test - on one 5-second question. The question is still not cached. Have you got an insight on whats going on here?

@theo Correct. All of them are null by default, since that means it will use the global TTL.
But is cache enabled globally? Check your logs, which should tell you how and when caching is handled.

Caching is enabled. My intention was to have caching enabled globally and then to have a custom cache_ttl for those questions that need it, as well.

Then to refresh them with a script daily.

Are you saying that if caching is enabled globally, the cache will not go to cache_ttl in the report card?

@theo No, I’m saying if you disable cache, then it’s disabled. When the card TTL is defined, then that TTL is used instead of global TTL.
https://github.com/metabase/metabase/blob/master/src/metabase/query_processor/middleware/cache.clj#L4-L6

Ah I see.

I suppose I was thinking that if I set cache_ttl to not null then the query would cache regardless of query time. (if it didn’t exceed the minimum duration).

However, that seems not to be the case. Changing the field cache_ttl does cause it to cache for that length of time. But it won’t cache in the first place unless it exceeds the minimum duration.

@flamber have you got any suggestions for how to cause a card to cache, regardless of query time?

@theo Manually generate the query_cache entries, then you’ll have the control you’re looking for. What you’re specifically asking for isn’t part of Metabase yet, as you can see from the issues I have referenced.

Thanks @flamber I appreciate this is not something thats a part of Metabase. Just trying to look for a work around that will help us and maybe some others, without having to change the application code in anyway.

But manually generating the query_cache, would involve hashing the results and I’m not sure how to do that in a way that metabase could read. That doesn’t sound straight forward, and a fair bit of work to have to define the queries in a script, read the results hash them and then insert into metabase db.

One VERY hacky way that I think would work. is simply to put a sleep statement in the questions I want to cache, that is longer than the minimum duration to trigger caching.

And just manually change their TTLs so they last all day.

Feels messy, but I think it would work.

@theo Messy/hacky is not always bad - but the log should help you figure out what is going on.
And then it’s important to check for any known issues, such as:
https://github.com/metabase/metabase/issues/13262

1 Like

Oh thanks, what a pesky bug

@Wasiq I see you've made your way here.

I should add that I've got a pretty good working solution to refresh and cache a dashboard programatically.

Following the lead of this post:

we have a couple of dashboards which update daily and are quite heavy to load so refresh and caching daily was what we needed.

so the script has a function that does this.

def refresh_and_cache(dashboard_id, min_duration=15):
    set_dashboard_card_ttls(dashboard_id, ttl=0)
    set_dashboard_card_ttls(dashboard_id, ttl=45000)
    
    set_min_duration_to_cache(0)
    card_ids = get_card_ids(dashboard_id)
    
    for card_id in card_ids:
        refresh_card(card_id)

    set_min_duration_to_cache(min_duration) 
  • it sets the ttl to zero, (in effect clearing the cache on that dashboard)
  • sets it back to 45000 (so as to last at least 12 hours).
  • then it sets minimum query duration to 0 (so as to cache anything)
  • then it goes through and refreshes the queries (this is quite slow so I actually do it in parallel)
  • then it sets minimum query duration back to what we want (in this case 15s)

Hi @theo, I am new to metabase but having a similar problem with massive dashboards that we want to cache once a day automatically. Can you please give me a hint on how you constructed these functions?

  • set_dashboard_card_ttls
  • set_min_duration_to_cache
  • get_card_ids
  • refresh_card

The code we use atm looks something like this:

def clear_card_query_cache(card_id):
    query_mbdb("""delete 
                  from public.query_cache
                 where query_hash in (select hash 
                                      from public.query_execution 
                                      where card_id=""" + str(card_id) + ")")

def refresh_card(card_id):
    mb.post('/api/card/{}/query'.format(card_id))

def set_min_duration_to_cache(min_duration):
    data = {'value': min_duration}
    mb.put('/api/setting/query-caching-min-ttl', json=data)

def set_card_cache_ttl(card_id, ttl=3000):
    query_mbdb("""UPDATE public.report_card 
                          SET cache_ttl = """ + str(ttl) + """
                          WHERE id = """ + str(card_id))

def set_dashboard_card_ttls(dashboard_id, ttl=5000):
    card_ids = get_card_ids(dashboard_id)
    for card_id in card_ids:
        set_card_cache_ttl(card_id, ttl=ttl)

def refresh_and_cache(dashboard_id, min_duration=15):
    set_dashboard_card_ttls(dashboard_id, ttl=45000)

    set_min_duration_to_cache(0)
    card_ids = get_card_ids(dashboard_id)

    for card_id in card_ids:
        clear_card_query_cache(card_id)
        refresh_card(card_id)

    set_min_duration_to_cache(min_duration)
1 Like

*query_mbd() queries the metabase database :slight_smile:

Thanks a lot @theo !!! very appreciate it, a quick question I see you remove cache from query_cache table, but I don't have this table, I only see query and query_execution tables, should I use query table instead?

and * query_mbd() uses REST API as well?
something like this

def query_mbdb(query):
    query = """delete 
                  from public.query_cache
                 where query_hash in (select hash 
                                      from public.query_execution 
                                      where card_id=""" + str(card_id) + ")"
    url = '.../api/dataset/'
    headers = {'Content-Type': 'application/x-www-form-urlencoded', 'X-Metabase-Session': '.....'}
    payload =  {database: 1,
                         native: {query: f"{query}"},
                         type: "native"}
    r = requests.get(url, headers=headers, json= payload)

@stan_t
Everybody has the query_cache table. Otherwise your application database is corrupted.
The queries inside of query_mbdb() are direct database calls to the application database. It's not API calls.

Thanks @flamber!!! it worked.

Hi @theo, have you noticed that some cached dashboards don't use cache and try to call the db? I have 10 cached dashboards and get randomly one or two not using cache.