You're Question Took Too Long using AWS ELB

alexandralouise · October 3, 2016, 12:51pm

Hello

I continue to receive a time out error from Metabase.

I am querying a very large table and it can take from 2 - 3 minutes to run. However, my queries through Metabase result in a “You’re Question Took Too Long” error.

I have looked at my ELB instance as per @camsaul’s suggestion and have adjusted the configuration of the EC2 Instance to only idle time out after 3600 seconds (the maximum), however I am still getting the same error through Metabase. It times out after 60 seconds exactly.

Any recommendations at this point in time would be greatly appreciated as it is our primary table that we are unable to run queries on through Metabase, and it is such a great tool that we really would like to use.

Thanks

Alexa

camsaul · October 3, 2016, 6:33pm

@alexandralouise I think you’re getting me mixed up with @sameer

sameer · October 3, 2016, 6:49pm

Some thoughts –

My mental model of the chain of possible timeouts
MB Frontend <-> Browser <-> ELB <-> nginx <-> docker network mapping <-> Jetty <-> Jetty connection handler pool<-> DB Connection Pool <-> Data warehouse (this is pretty opaque for me)

Something somewhere in there is barfing and throwing a timeout.

It might take a bit of digging to figure out what’s going on. Can you open up Developer tools (if in chrome) and reload the page, and see what the actual response from the dataset query is returning? It might have more information about where in the chain a timeout is occuring.

alexandralouise · October 4, 2016, 6:05am

Hi @sameer - see the error I am receiving - does this mean it is related to my ELB instance?

https://postimg.org/image/3n4yjo1fx/

https://postimg.org/image/fxns3lt63/

alexandralouise · October 4, 2016, 7:29am

@sameer found the solution!

We were getting a 504 Gateway Timeout error using Nginx as Proxy.

For Nginx as Proxy for a Metabase Docker web server, this is what you have to try to fix the 504 Gateway Timeout error:

Add these variables to nginx.conf file:

proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;

This fixed our error.

Alexa

sameer · October 5, 2016, 3:01am

which nginx config file is this?

https://github.com/metabase/metabase/blob/master/bin/aws-eb-docker/.ebextensions/metabase_config/nginx/default_server ?

alexandralouise · October 5, 2016, 6:25am

HI @sameer

I think it was that one - you add that code inside the location /api/health brackets.

Alexa

jasonjano · December 24, 2016, 6:29pm

We are experiencing the same issue on ELB using Metabase 21.1. We had updated from 20.1 to 21.1 and, once done, even the simplest queries started to timeout. I even went so far as to do a manual “select top 1 * from X” and receive nothing. A direct SQL connection does that query and returns a row in 20ms or so. When reverting back to the old version, we are still experiencing the same issue. I am thinking next step is to reset our database connection but I really don’t want to lose all the saved questions.

Any ideas?

jasonjano · December 24, 2016, 9:02pm

Figured it out. We have a DB on Rackspace. The act of upgrading the server on ELB popped our server onto a different network and, hence, away from the firewall rules we had setup on Rackspace.

marko · March 31, 2017, 11:14pm

Hey @alexandralouise, your answer helped resolve the problem for us too. Thanks!

However, in order to avoid manually doing the same steps every time auto-scaling kicks in or environment is rebuilt we do the following now:

We open up the latest metabase version zip and add the following two files to .ebextensions folder:

00_elb-timeout.config:

option_settings:
  - namespace: aws:elb:policies
   option_name: ConnectionSettingIdleTimeout
   value: 1200

00_nginx-timeout.config:

files: 
  "/etc/nginx/conf.d/longtimeout.conf" :
    mode: "000644"
    owner: root
    group: root
    content: |
      proxy_connect_timeout 1200;
      proxy_send_timeout 1200;
      proxy_read_timeout 1200;
      send_timeout 1200;

Now the newly deployed application version will add the appropriate settings automatically to all instances and during any scaling events.

I suppose that the next step would be to use an environment variable to control the timeout (which defaults to, say, 600) and add this configuration to the official Metabase package.