Slow csv and excel download

We were running metabase .35 , we upgraded to .40.2 while simultaneously upgrading to mysql 8.0 from mysql 5.7, now on a really huge result , the excel and csv file download is taking for ever, even going beyond 10+ minutes.

I tried upgrading to .40.4 but the issue still persists.

Hi @chiddu
That's a lot of changes at the same time, so it's difficult to tell where the problem is.
It would take some debugging on your side to figure out where the problem is.

The problem is there even with .38 . There is no table migration between .35 and .38. So I'm able to work with the same database with both the jars.
I use .38 the problem appears
I use .35 the problem disappears.

@chiddu There are a lot of migrations between 0.35.0 and 0.38.0, so not sure where you got that from.
https://github.com/metabase/metabase/blob/master/resources/migrations/000_migrations.yaml

What is your definition of "really huge result"?

Check the query log on MySQL to see how much time it's using on returning results.

The problem does not happen with v0.36.12
I switch to v0.37.0 I see the issue , so it is some fundamental change , or some new configuration that needs to be made on .37.0 ?

@chiddu Then I'm guessing you have a lot of timestamp columns, which was changed in 0.37.0:
https://github.com/metabase/metabase/pull/13447
There are a lot of further changes to exports in the upcoming 0.41

The problem is there even in v0.37.0-rc1 . So here is the deal, the records return fairly quickly , displaying the 1st 2000 rows but when i download the csv or excel the download speed reduces to a few kb per second taking for ever, one of our users download a 46mb file , that is what alerted us to the problem

Not sure about timestamp, let me try fetching only a subset of the results.

The trace on commandline when i shut down the server (stopping the download)
2021-09-17 09:05:27,740 INFO api.card :: Question's average execution duration is 4.8 s; using 'magic' TTL of 1.6 mins :floppy_disk:
2021-09-17 09:05:29,612 DEBUG middleware.log :: GET /api/database 200 2.3 s (4 DB calls) App DB connections: 0/4 Jetty threads: 3/50 (5 idle, 0 queued) (55 total active threads) Queries in flight: 1 (0 queued)
2021-09-17 09:05:45,970 INFO middleware.cache :: Query took 18.2 s to run; miminum for cache eligibility is 60.0 s
2021-09-17 09:05:46,048 DEBUG middleware.log :: POST /api/card/645/query 202 [ASYNC: completed] 18.3 s (7 DB calls) App DB connections: 1/4 Jetty threads: 2/50 (6 idle, 0 queued) (57 total active threads) Queries in flight: 0 (0 queued)
2021-09-17 09:05:50,215 INFO api.card :: Question's average execution duration is 2.3 mins; using 'magic' TTL of 46.5 mins :floppy_disk:
2021-09-17 09:08:42,606 INFO cache.impl :: Results are too large to cache. :tired_face:

download1

@chiddu
Post "Diagnostic Info" from Admin > Troubleshooting.
How are you hosting Metabase and how much memory+CPU resources are available?
The Metabase log you provided is not for exports.

{
"browser-info": {
"language": "en-GB",
"platform": "Linux x86_64",
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "1.8.0_292-8u292-b10-0ubuntu1~18.04-b10",
"java.vendor": "Private Build",
"java.vendor.url": "http://java.oracle.com/",
"java.version": "1.8.0_292",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "25.292-b10",
"os.name": "Linux",
"os.version": "5.4.0-1056-aws",
"user.language": "en",
"user.timezone": "Etc/UTC"
},
"metabase-info": {
"databases": [
"h2",
"presto",
"mysql"
],
"hosting-env": "unknown",
"application-database": "mysql",
"application-database-details": {
"database": {
"name": "MySQL",
"version": "5.7.33-log"
},
"jdbc-driver": {
"name": "MariaDB Connector/J",
"version": "2.6.2"
}
},
"run-mode": "prod",
"version": {
"tag": "v0.37.0",
"date": "2020-10-22",
"branch": "release-x.37.x",
"hash": "f435e43"
},
"settings": {
"report-timezone": null
}
}
}

This is a r5a.large instance from AWS so a 2vcpu, 16gb virtual machine. (Production we run r5a.xlarge(s) )

@chiddu
Check the first couple of lines during startup, which tells you about memory available for Java.
Also, try Java 11 instead: https://www.metabase.com/docs/latest/operations-guide/java-versions.html
(Java 17 has just been released, and is even faster, but we have not tested it enough yet)

1 Like

2021-09-17 09:35:55,617 INFO metabase.util :: Loading Metabase...
2021-09-17 09:35:55,629 INFO metabase.util :: Maximum memory available to JVM: 21.3 GB
2021-09-17 09:36:09,141 INFO util.encryption :: Saved credentials encryption is DISABLED for this Metabase instance. :unlock:
For more information, see https://metabase.com/docs/latest/operations-guide/encrypting-database-details-at-rest.html
2021-09-17 09:36:17,316 WARN metabase.core :: WARNING: You have enabled namespace tracing, which could log sensitive information like db passwords.
2021-09-17 09:36:17,323 INFO metabase.core :: Starting Metabase in STANDALONE mode

Will install java11 and check.

@chiddu Please provide information about what question 645 contains. It sounds like you have huge columns.
It's taking 18 seconds just to return 2000(?) rows, so if you are downloading 1 million rows, then that would take at least 500 times as long, meaning 2,5 hours - if we just went by those numbers.

java11 works. [flamber]
@flamber if ever we meet I owe you a drink or a dinner or whatever you think is appropriate

1 Like

@chiddu By the way, I would recommend that you read this article, which we wrote. It contains a lot of good information.
https://www.metabase.com/learn/administration/metabase-at-scale
Also, 24GB memory seems like way too much - unless you have thousands of users, but then you might actually get more performance from running multiple smaller instance (example 3 instances).

1 Like

Hello, I have a question that when I download csv or excel file, metabase will execute the query again? If this query take 10mins, download will also take 10mins?
@flamber Thanks!

@jimomiaomiao Metabase will always execute queries again, so export will minimum take the time the query takes.
https://github.com/metabase/metabase/issues/8616 - upvote by clicking :+1: on the first post