We have long queries (MongoDB aggregation ones) that usually take 12/15 min to load, but recently we had some problems loading them, sometimes the question simply stops. After reading through the logs I’ve had found this:
Exception in thread "async-dispatch-429" java.lang.IllegalStateException: AsyncContext completed and/or Request lifecycle recycled
at org.eclipse.jetty.server.AsyncContextState.state(AsyncContextState.java:54)
at org.eclipse.jetty.server.AsyncContextState.complete(AsyncContextState.java:74)
at ring.util.servlet$make_output_stream$fn__46932.invoke(servlet.clj:89)
at ring.util.servlet.proxy$java.io.FilterOutputStream$ff19274a.close(Unknown Source)
at java.base/sun.nio.cs.StreamEncoder.implClose(Unknown Source)
at java.base/sun.nio.cs.StreamEncoder.close(Unknown Source)
at java.base/java.io.OutputStreamWriter.close(Unknown Source)
at java.base/java.io.BufferedWriter.close(Unknown Source)
at metabase.async.api_response$write_chan_vals_to_writer_BANG_$fn__67375$state_machine__6775__auto____67380$fn__67382.invoke(api_response.clj:87)
at metabase.async.api_response$write_chan_vals_to_writer_BANG_$fn__67375$state_machine__6775__auto____67380.invoke(api_response.clj:79)
at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:973)
at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:972)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:977)
at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:975)
at clojure.core.async.impl.ioc_macros$take_BANG_$fn__6793.invoke(ioc_macros.clj:986)
at clojure.core.async.impl.channels.ManyToManyChannel$fn__1679$fn__1680.invoke(channels.clj:95)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
11-25 20:14:27 e[1mDEBUG middleware.loge[0m :: e[32mPOST /api/card/801/query 200 [ASYNC: canceled] 10.0 mins (7 DB calls) Jetty threads: 2/50 (4 idle, 0 queued) (119 total active threads) Queries in flight: 10e[0m
And this happens to every question that take more than 10 min to load. What seems strange to me is if I try to run the aggregations direct on DB (using Robo 3T) the query run normally and sometimes even faster.
We already have the cache enabled and that don’t helped a lot, and our Metabase version is the 0.33.6.
Also, what does that Request lifecycle recycled means?
Which version of MongoDB?
Have you defined any Connection String options in Admin > Databases > (mongo)?
Or have you tried setting connectTimeout=1200000 (1000ms * 60 * 20, for 20 minutes)
There shouldn’t be any timeouts, but I know there’s a timeout in the Druid driver, but I cannot see any in the MongoDB driver.
The only issue that references the first error is #11391, but that’s specific to Postgres.
What seems strange to me is that the queries that takes more than 10 minutes run normally using Robo 3T, only when running in Metabase I get this problem. That makes me think that this is driver thing, the stackoverflow link that you attached reinforce that, don’t you think?
@mateusluzifb
I don’t know what Robo3T is, but I guess it’s a client. Metabase is also just a client, but uses different drivers for the connection (unless Robo3T is also build in Clojure), so maybe it has something to do with the drivers or Robo3T sets some configurations, which extends the timeout.
@camsaul Is there some hidden timeout on the MongoDB driver that kills the connection after 10 minutes exactly? It seems there’s one in the Druid driver, so that’s why I’m asking.
Also, it seems like it’s not connectionTimeout, but connectionTimeoutMS and socketTimeoutMS (milliseconds): https://docs.mongodb.com/manual/reference/connection-string/#timeout-options
So have you tried adding this to the Connection String in Admin > Databases > (mongodb)? connectionTimeoutMS=10000&socketTimeoutMS=1200000
I am getting a similar error with Oracle. Increasing the oracle ReadTimeout by setting -Doracle.jdbc.ReadTimeout=900000 will not affect the 10 minute timeout.
@k_p There shouldn’t be any timeouts on Oracle - as far as I know. It seems like ReadTimeout is set to prevent errors, when the database connection is down, but that would be a different problem than this topic.
There are timeouts on BigQuery and Druid currently: https://github.com/metabase/metabase/issues/12003
I am extending this value by a lot to handle some of our long running queries - but there may be some further consequences I’m not aware of.
speculation: I’m assuming this is has accidentally effected query timouts, as the “async jetty thread” that is handling the ajax query request from the browser is timed out after 10 minutes which causes all resources to be cleaned up - aka connection to DB is prematurely closed by jetty.
@k_p I don’t know which version you’re using, but there’s a lot of work put into the async Jetty in the upcoming 0.35.0, but there has been several changes since it was introduced too.