No way to clear cache ? (or TRUNCATE TABLE QUERY_CACHE ?)

paj · September 23, 2018, 11:57am

Hello,

Problem
I cannot find a way to clear the cache for all questions. This create very strange results for users seing old cached results.

Previous solution
As discussed in another topic, I managed to refresh some questions’ cache automatically.

But this only works for questions without filters. I am trying to find a robust way to clear the cache.

Proposal
I connected to H2 (usingthis docker image), and run the query TRUNCATE TABLE QUERY_CACHE. This seems to clear the cache as expected. But I do not know if it will corrupt H2 database.

Is this a correct way to clear the cache ?
If yes, would it be possible to add it to Metabase interface ?

Many thanks

paj · March 4, 2019, 9:12pm

I finally manage to do it.
I first had to switch from H2 to Postgres as a backend (detailed script for our docker-compose setup)

Then you only have to TRUNCATE query_cache table (which is much easier with postgres…), then query all cards.

liveandletbri · December 23, 2020, 6:59pm

Hello, I’ve experienced a few issues, but I’m replying here to see if this is a viable strategy to address them. My primary question is: if I run TRUNCATE TABLE QUERY_CACHE (we also use a Postgres DB), will it cause any issues with Metabase?

I have seen other posts mentioning that deleting directly from tables can cause bugs, but that makes sense if you are deleting something that is referenced in other tables such as users or collections. These cache entries don’t get referenced elsewhere (as far as I know) so I’m hoping that means this would be safe.

If this would cause issues, then is there a safe way to actually clear the cached results? I am looking to minimize the size of our database, and my understanding is that rows in QUERY_CACHE never get removed, but only overwritten by new cached results. If that’s true, then the size of the table will only ever grow.

Background info:
We have our own fork of the Metabase repo, and we added some functionality to the UI to allow people to manipulate the cache_ttl value per-question. We were excited to let people have individual control of the cache settings on their questions, but a few hours after we rolled the feature out, our database crashed because it ran out of storage space. Before this incident, we had 20 GB free. Our global settings dictate that cached results can be no larger than 1000 KB, but we allow caching on any question that runs more than 2 seconds. We fixed the issue by increasing the size of our Postgres instance and disabling caching altogether.

Running this query: SELECT pg_size_pretty( pg_total_relation_size('query_cache') ) tells me that the query_cache table is only 1055 MB. How is that table just over 1 GB, yet caching seemed to suddenly take up 20GB of our storage space? I’m wonder if something was just being written to the disk for our instance, but not being stored in this table.

Anyway, I am looking to see if I can prune away some old results while I’m testing to see how big this table grows. We may even consider a long-term solution that prunes away results only a nightly job or something.

flamber · December 23, 2020, 7:28pm

@liveandletbri
You can truncate query_cache if you want to clear all cache - or individual rows if needed.
But any manual changes to the application database should only be done after you have a backup.

Old cache is automatically be cleaned up: https://www.metabase.com/docs/latest/operations-guide/environment-variables.html#mb_query_caching_max_ttl

Are you sure that it was the query_cache table taking up all the space? Couldn’t it have been another problem that caused the storage hit?
There must have been more details in the Postgres logs.

SetSails · January 19, 2021, 9:52am

Hi @flamber, we’re also having real trouble with caching unfortunately.

The approach of multiplying average execution time with a certain factor to determine TTL is really not very helpful for us - data has to be accurate in certain time intervals (e.g. at the latest 1 day old).

That typically depends on the query’s purpose and isn’t necessarily equal for all queries in an instance.

Having the TTL depending on the average runtime results in absolutely unpredictable system behavior and is really developing to be a show stopper.

I have tried to set the mb_query_caching_max_ttl to 4 hours (in seconds) as outlined in the article you linked above https://www.metabase.com/docs/latest/operations-guide/environment-variables.html#mb_query_caching_max_ttl by invoking the following command to run metabase:

java -DMB_QUERY_CACHING_MAX_TTL=14400 -jar metabase.jar >> metabase.log 2>&1 &

Yet still reports keep displaying data that is several days old.

How can we get this sorted ?

We also experience additional issues right now, because we’ve had an exetremely long-running query which basically bogged down the entire DB for a while. Therefore, all average execution times appear to be significantly higher than actually needed - is there any way to reset the execution time statistics for all queries ?

Thank you & Best Regards
Fabian

flamber · January 19, 2021, 11:29am

@SetSails Post “Diagnostic Info” from Admin > Troubleshooting.
There’s a request for different TTL handling:
https://github.com/metabase/metabase/issues/13396 - upvote by clicking on the first post
And there’s an issue with the calculation of the average execution time:
https://github.com/metabase/metabase/issues/7282

SetSails · January 19, 2021, 12:54pm

@flamber thanks for the quick reply ! here is the diagnostic info:

{
  "browser-info": {
    "language": "de",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.9.1+1-Ubuntu-0ubuntu1.18.04",
    "java.vendor": "Ubuntu",
    "java.vendor.url": "https://ubuntu.com/",
    "java.version": "11.0.9.1",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.9.1+1-Ubuntu-0ubuntu1.18.04",
    "os.name": "Linux",
    "os.version": "4.15.0-1065-aws",
    "user.language": "en",
    "user.timezone": "Etc/UTC"
  },
  "metabase-info": {
    "databases": [
      "h2",
      "mysql"
    ],
    "hosting-env": "unknown",
    "application-database": "mysql",
    "run-mode": "prod",
    "version": {
      "date": "2020-01-13",
      "tag": "v0.34.1",
      "branch": "release-0.34.x",
      "hash": "265695c"
    },
    "settings": {
      "report-timezone": "Europe/Berlin"
    }
  }
}

SetSails · January 19, 2021, 12:56pm

@flamber RE issue with calculating the avg exec time https://github.com/metabase/metabase/issues/7282 - this doesn’t seem to be related to our issue.

Here it is really that past executions took forever during a defined time window . That issue has been resolved, but now the avg exec times are skewed.

Thanks & Regards
Fabian

flamber · January 19, 2021, 1:04pm

@SetSails There has been many changes to query handling and cache since 0.34.1, so it’s hard to tell, but there’s an issue specific to resetting execution time:
https://github.com/metabase/metabase/issues/6666 - upvote by clicking on the first post.
You would have to manually redo the calculation currently.