Question with 10 minutes long are being canceled

We have long queries (MongoDB aggregation ones) that usually take 12/15 min to load, but recently we had some problems loading them, sometimes the question simply stops. After reading through the logs I’ve had found this:

Exception in thread "async-dispatch-429" java.lang.IllegalStateException: AsyncContext completed and/or Request lifecycle recycled
	at org.eclipse.jetty.server.AsyncContextState.state(AsyncContextState.java:54)
	at org.eclipse.jetty.server.AsyncContextState.complete(AsyncContextState.java:74)
	at ring.util.servlet$make_output_stream$fn__46932.invoke(servlet.clj:89)
	at ring.util.servlet.proxy$java.io.FilterOutputStream$ff19274a.close(Unknown Source)
	at java.base/sun.nio.cs.StreamEncoder.implClose(Unknown Source)
	at java.base/sun.nio.cs.StreamEncoder.close(Unknown Source)
	at java.base/java.io.OutputStreamWriter.close(Unknown Source)
	at java.base/java.io.BufferedWriter.close(Unknown Source)
	at metabase.async.api_response$write_chan_vals_to_writer_BANG_$fn__67375$state_machine__6775__auto____67380$fn__67382.invoke(api_response.clj:87)
	at metabase.async.api_response$write_chan_vals_to_writer_BANG_$fn__67375$state_machine__6775__auto____67380.invoke(api_response.clj:79)
	at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:973)
	at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:972)
	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:977)
	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:975)
	at clojure.core.async.impl.ioc_macros$take_BANG_$fn__6793.invoke(ioc_macros.clj:986)
	at clojure.core.async.impl.channels.ManyToManyChannel$fn__1679$fn__1680.invoke(channels.clj:95)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
11-25 20:14:27 e[1mDEBUG middleware.loge[0m :: e[32mPOST /api/card/801/query 200 [ASYNC: canceled] 10.0 mins (7 DB calls) Jetty threads: 2/50 (4 idle, 0 queued) (119 total active threads) Queries in flight: 10e[0m

And this happens to every question that take more than 10 min to load. What seems strange to me is if I try to run the aggregations direct on DB (using Robo 3T) the query run normally and sometimes even faster.

We already have the cache enabled and that don’t helped a lot, and our Metabase version is the 0.33.6.

Also, what does that Request lifecycle recycled means?

Also, sometimes we get this error too:

11-25 20:35:43 e[1mWARN middleware.process-userland-querye[0m :: Query failure e[31m{:status :failed,
 :class com.mongodb.MongoInterruptedException,
 :error "Interrupted acquiring a permit to retrieve an item from the pool ",
 :stacktrace
 ("com.mongodb.internal.connection.ConcurrentPool.acquirePermit(ConcurrentPool.java:203)"
  "com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:140)"
  "com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:123)"
  "com.mongodb.internal.connection.PowerOfTwoBufferPool.getBuffer(PowerOfTwoBufferPool.java:78)"
  "com.mongodb.internal.connection.SocketStream.read(SocketStream.java:105)"
  "com.mongodb.internal.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:578)"
  "com.mongodb.internal.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:441)"
  "com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:295)"
  "com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:255)"
  "com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99)"
  "com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:444)"
  "com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72)"
  "com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:200)"
  "com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269)"
  "com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131)"
  "com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:123)"
  "com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:242)"
  "com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:233)"
  "com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:136)"
  "com.mongodb.operation.AggregateOperationImpl$1.call(AggregateOperationImpl.java:193)"
  "com.mongodb.operation.AggregateOperationImpl$1.call(AggregateOperationImpl.java:189)"
  "com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:462)"
  "com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:406)"
  "com.mongodb.operation.AggregateOperationImpl.execute(AggregateOperationImpl.java:189)"
  "com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:295)"
  "com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:41)"
  "com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:179)"
  "com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:161)"
  "com.mongodb.DBCollection.aggregate(DBCollection.java:1502)"
  "com.mongodb.DBCollection.aggregate(DBCollection.java:1462)"
  "com.mongodb.DBCollection.aggregate(DBCollection.java:1447)"
  "monger.collection$aggregate.invokeStatic(collection.clj:551)"
  "monger.collection$aggregate.doInvoke(collection.clj:537)"
  "clojure.lang.RestFn.invoke(RestFn.java:573)"
  "--> driver.mongo.query_processor$execute_query.invokeStatic(query_processor.clj:744)"
  "driver.mongo.query_processor$execute_query.invoke(query_processor.clj:736)"
  "driver.mongo$eval2311$fn__2312.invoke(mongo.clj:185)"
  "query_processor$fn__44449$execute_query__44454$fn__44455.invoke(query_processor.clj:71)"
  "query_processor$fn__44449$execute_query__44454.invoke(query_processor.clj:65)"
  "query_processor.middleware.mbql_to_native$mbql__GT_native$fn__20887.invoke(mbql_to_native.clj:38)"
  "query_processor.middleware.annotate$result_rows_maps__GT_vectors$fn__36504.invoke(annotate.clj:541)"
  "query_processor.middleware.annotate$add_column_info$fn__36410.invoke(annotate.clj:485)"
  "query_processor.middleware.cumulative_aggregations$handle_cumulative_aggregations$fn__37445.invoke(cumulative_aggregations.clj:57)"
  "query_processor.middleware.resolve_joins$resolve_joins$fn__41557.invoke(resolve_joins.clj:184)"
  "query_processor.middleware.limit$limit$fn__38522.invoke(limit.clj:19)"
  "query_processor.middleware.results_metadata$record_and_return_metadata_BANG_$fn__44316.invoke(results_metadata.clj:87)"
  "query_processor.middleware.format_rows$format_rows$fn__38510.invoke(format_rows.clj:26)"
  "query_processor.middleware.add_dimension_projections$add_remapping$fn__34938.invoke(add_dimension_projections.clj:232)"
  "query_processor.middleware.add_source_metadata$add_source_metadata_for_source_queries$fn__35638.invoke(add_source_metadata.clj:107)"
  "query_processor.middleware.resolve_source_table$resolve_source_tables$fn__41607.invoke(resolve_source_table.clj:46)"
  "query_processor.middleware.add_row_count_and_status$add_row_count_and_status$fn__35498.invoke(add_row_count_and_status.clj:16)"
  "driver.mongo$eval2143$fn__2144$fn__2146$f__857__auto____2148.invoke(mongo.clj:64)"
  "driver.mongo.util$_with_mongo_connection$fn__848.invoke(util.clj:205)"
  "util.ssh$with_ssh_tunnel_STAR_.invokeStatic(ssh.clj:106)"
  "util.ssh$with_ssh_tunnel_STAR_.invoke(ssh.clj:91)"
  "driver.mongo.util$_with_mongo_connection.invokeStatic(util.clj:199)"
  "driver.mongo.util$_with_mongo_connection.invoke(util.clj:194)"
  "driver.mongo$eval2143$fn__2144$fn__2146.invoke(mongo.clj:63)"
  "query_processor.middleware.driver_specific$process_query_in_context$fn__37521.invoke(driver_specific.clj:12)"
  "query_processor.middleware.resolve_driver$resolve_driver$fn__41221.invoke(resolve_driver.clj:22)"
  "query_processor.middleware.bind_effective_timezone$bind_effective_timezone$fn__36835$fn__36836.invoke(bind_effective_timezone.clj:9)"
  "util.date$call_with_effective_timezone.invokeStatic(date.clj:88)"
  "util.date$call_with_effective_timezone.invoke(date.clj:77)"
  "query_processor.middleware.bind_effective_timezone$bind_effective_timezone$fn__36835.invoke(bind_effective_timezone.clj:8)"
  "query_processor.middleware.store$initialize_store$fn__44341$fn__44342.invoke(store.clj:11)"
  "query_processor.store$do_with_store.invokeStatic(store.clj:46)"
  "query_processor.store$do_with_store.invoke(store.clj:40)"
  "query_processor.middleware.store$initialize_store$fn__44341.invoke(store.clj:10)"
  "query_processor.middleware.async$async__GT_sync$fn__20798.invoke(async.clj:23)"
  "query_processor.middleware.async_wait$runnable$fn__36561.invoke(async_wait.clj:89)"

Hi @mateusluzifb

Which version of MongoDB?
Have you defined any Connection String options in Admin > Databases > (mongo)?
Or have you tried setting connectTimeout=1200000 (1000ms * 60 * 20, for 20 minutes)

There shouldn’t be any timeouts, but I know there’s a timeout in the Druid driver, but I cannot see any in the MongoDB driver.

The only issue that references the first error is #11391, but that’s specific to Postgres.

When doing a bit of searching, I find things like this, which links to their JIRA, with “works as designed” (whatever that means):
https://stackoverflow.com/questions/40962094/interrupted-acquiring-a-permit-to-retrieve-an-item-from-the-pool-in-mongdb

Hi @flamber

Mongo version: mongod version: 3.6.5 (WiredTiger)

What seems strange to me is that the queries that takes more than 10 minutes run normally using Robo 3T, only when running in Metabase I get this problem. That makes me think that this is driver thing, the stackoverflow link that you attached reinforce that, don’t you think?

@mateusluzifb
I don’t know what Robo3T is, but I guess it’s a client. Metabase is also just a client, but uses different drivers for the connection (unless Robo3T is also build in Clojure), so maybe it has something to do with the drivers or Robo3T sets some configurations, which extends the timeout.

@camsaul Is there some hidden timeout on the MongoDB driver that kills the connection after 10 minutes exactly? It seems there’s one in the Druid driver, so that’s why I’m asking.

@camsaul I wasn’t able to find the 10 min timeout, can you help me?

Also, our current solution is mongo’s map-reduce, we’re using it to minimize the query load time

@mateusluzifb I’m hoping Cam will pop in and has an answer, but everyone is working hard on the upcoming 0.34 release

Have you checked if it could be a timeout on your server?
https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments

Also, it seems like it’s not connectionTimeout, but connectionTimeoutMS and socketTimeoutMS (milliseconds):
https://docs.mongodb.com/manual/reference/connection-string/#timeout-options
So have you tried adding this to the Connection String in Admin > Databases > (mongodb)?
connectionTimeoutMS=10000&socketTimeoutMS=1200000