Dashboard issue: There was a problem displaying this chart

We are currently experiencing an issue with the display of charts on our dashboards, where seemingly randomly the chart will display the message “There was a problem displaying this chart” rather than rendering correctly. If we refresh the dashboard, we might then see the chart render correctly but another chart that was rendering will now show the error message.

I thought we had been experiencing issue #9989 so we upgraded to v0.33.4 but we are unfortunately still seeing it so perhaps it’s a different issue after all, especially as it doesn’t look like anyone else has reported this since 0.33.4 was released a month ago.

The steps to reproduce are browse to a dashboard, then if any of the charts show the error message, refresh the page and hope that they render correctly.

We see the issue more often at the start of the day when the system hasn’t had recent usage and less so after recent use.

Diagnostic Info:

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.4+11-post-Ubuntu-1ubuntu218.04.3",
    "java.vendor": "Ubuntu",
    "java.vendor.url": "https://ubuntu.com/",
    "java.version": "11.0.4",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.4+11-post-Ubuntu-1ubuntu218.04.3",
    "os.name": "Linux",
    "os.version": "4.15.0-66-generic",
    "user.language": "en",
    "user.timezone": "Etc/UTC"
  },
  "metabase-info": {
    "databases": [
      "mysql"
    ],
    "hosting-env": "unknown",
    "application-database": "mysql",
    "run-mode": "prod",
    "version": {
      "date": "2019-10-07",
      "tag": "v0.33.4",
      "branch": "release-0.33.x",
      "hash": "9559406"
    },
    "settings": {
      "report-timezone": "Pacific/Auckland"
    }
  }
}

MySQL version: 5.6

Logs when we see a failure to render a chart on a dashboard:

[8bbe9ca1-41f3-4123-9f9f-9ded945e3b17] 2019-11-11T13:34:38+13:00 DEBUG metabase.middleware.log POST /api/card/286/query 200 [ASYNC: completed] 662.1 ms (11 DB calls) Jetty threads: 3/50 (12 idle, 0 queued) (140 total active threads) Queries in flight: 14
[8bbe9ca1-41f3-4123-9f9f-9ded945e3b17] 2019-11-11T13:34:38+13:00 ERROR metabase.middleware.log POST /api/card/140/query 500 9.2 ms (2 DB calls)
{:message "(conn=46284) Broken pipe (Write failed)",
:type java.sql.SQLNonTransientConnectionException,
:stacktrace
("org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:234)"
"org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:165)"
"org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:238)"
"org.mariadb.jdbc.MariaDbPreparedStatementClient.executeInternal(MariaDbPreparedStatementClient.java:232)"
"org.mariadb.jdbc.MariaDbPreparedStatementClient.execute(MariaDbPreparedStatementClient.java:159)"
"org.mariadb.jdbc.MariaDbPreparedStatementClient.executeQuery(MariaDbPreparedStatementClient.java:174)"
"com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:431)"
"clojure.java.jdbc$execute_query_with_params.invokeStatic(jdbc.clj:1072)"
"clojure.java.jdbc$execute_query_with_params.invoke(jdbc.clj:1066)"
"clojure.java.jdbc$db_query_with_resultset_STAR_.invokeStatic(jdbc.clj:1095)"
"clojure.java.jdbc$db_query_with_resultset_STAR_.invoke(jdbc.clj:1075)"
"clojure.java.jdbc$query.invokeStatic(jdbc.clj:1164)"
"clojure.java.jdbc$query.invoke(jdbc.clj:1126)"
"toucan.db$query.invokeStatic(db.clj:285)"
"toucan.db$query.doInvoke(db.clj:281)"
"clojure.lang.RestFn.invoke(RestFn.java:410)"
"toucan.db$simple_select.invokeStatic(db.clj:391)"
"toucan.db$simple_select.invoke(db.clj:380)"
"toucan.db$select.invokeStatic(db.clj:659)"
"toucan.db$select.doInvoke(db.clj:653)"
"clojure.lang.RestFn.applyTo(RestFn.java:139)"
"clojure.core$apply.invokeStatic(core.clj:667)"
"clojure.core$apply.invoke(core.clj:660)"
"toucan.db$select_field.invokeStatic(db.clj:682)"
"toucan.db$select_field.doInvoke(db.clj:674)"
"clojure.lang.RestFn.applyTo(RestFn.java:142)"
"clojure.core$apply.invokeStatic(core.clj:669)"
"clojure.core$apply.invoke(core.clj:660)"
"toucan.db$select_ids.invokeStatic(db.clj:692)"
"toucan.db$select_ids.doInvoke(db.clj:685)"
"clojure.lang.RestFn.invoke(RestFn.java:439)"
"--> models.collection$fn__32818$descendant_ids__32823$fn__32824.invoke(collection.clj:429)"
"models.collection$fn__32818$descendant_ids__32823.invoke(collection.clj:426)"
"models.collection$fn__33619$user__GT_personal_collection_and_descendant_ids__33624$fn__33625.invoke(collection.clj:1063)"
"models.collection$fn__33619$user__GT_personal_collection_and_descendant_ids__33624.invoke(collection.clj:1053)"
"models.user$permissions_set.invokeStatic(user.clj:283)"
"models.user$permissions_set.invoke(user.clj:277)"
"middleware.session$do_with_current_user$fn__47524.invoke(session.clj:175)"
"models.interface$current_user_permissions_set.invokeStatic(interface.clj:237)"
"models.interface$current_user_permissions_set.invoke(interface.clj:237)"
"models.interface$make_perms_check_fn$_has_perms_QMARK___27820.invoke(interface.clj:252)"
"models.interface$make_perms_check_fn$_has_perms_QMARK___27820.invoke(interface.clj:250)"
"models.interface$fn__27743$fn__27752$G__27746__27761.invoke(interface.clj:184)"
"api.common$read_check.invokeStatic(common.clj:346)"
"api.common$read_check.invoke(common.clj:339)"
"api.card$run_query_for_card_async.invokeStatic(card.clj:609)"
"api.card$run_query_for_card_async.doInvoke(card.clj:601)"
"api.card$fn__60921$fn__60924.invoke(card.clj:623)"
"api.card$fn__60921.invokeStatic(card.clj:622)"
"api.card$fn__60921.invoke(card.clj:618)"
"middleware.auth$enforce_authentication$fn__47314.invoke(auth.clj:14)"
"routes$fn__66170$fn__66171.doInvoke(routes.clj:56)"
"middleware.exceptions$catch_uncaught_exceptions$fn__47270.invoke(exceptions.clj:104)"
"middleware.exceptions$catch_api_exceptions$fn__47267.invoke(exceptions.clj:92)"
"middleware.log$log_api_call$fn__47180$fn__47181.invoke(log.clj:170)"
"middleware.log$log_api_call$fn__47180.invoke(log.clj:164)"
"middleware.security$add_security_headers$fn__47231.invoke(security.clj:122)"
"middleware.json$wrap_json_body$fn__47397.invoke(json.clj:61)"
"middleware.json$wrap_streamed_json_response$fn__47415.invoke(json.clj:97)"
"middleware.session$bind_current_user$fn__47529$fn__47530.invoke(session.clj:193)"
"middleware.session$do_with_current_user.invokeStatic(session.clj:176)"
"middleware.session$do_with_current_user.invoke(session.clj:170)"
"middleware.session$bind_current_user$fn__47529.invoke(session.clj:192)"
"middleware.session$wrap_current_user_id$fn__47518.invoke(session.clj:161)"
"middleware.session$wrap_session_id$fn__47503.invoke(session.clj:123)"
"middleware.auth$wrap_api_key$fn__47322.invoke(auth.clj:27)"
"middleware.misc$maybe_set_site_url$fn__47295.invoke(misc.clj:56)"
"middleware.misc$bind_user_locale$fn__47298.invoke(misc.clj:72)"
"middleware.misc$add_content_type$fn__47283.invoke(misc.clj:28)"
"middleware.misc$disable_streaming_buffering$fn__47306.invoke(misc.clj:87)"),
:sql-exception-chain ["SQLNonTransientConnectionException:" "Message: (conn=46284) Broken pipe (Write failed)" "SQLState: 08" "Error Code: 0"]}

[8bbe9ca1-41f3-4123-9f9f-9ded945e3b17] 2019-11-11T13:34:38+13:00 INFO metabase.query-processor.middleware.cache Query took 618 ms to run; miminum for cache eligibility is 60000 ms
[8bbe9ca1-41f3-4123-9f9f-9ded945e3b17] 2019-11-11T13:34:38+13:00 DEBUG metabase.middleware.log POST /api/card/30/query 200 [ASYNC: completed] 718.4 ms (12 DB calls) Jetty threads: 2/50 (13 idle, 0 queued) (140 total active threads) Queries in flight: 13

Please let me know if there’s anything else I can provide you with. Love your work!

Hi @april
Okay, I guess you’re seeing this issue (#9989 was relatively specific to MSSQL), but I’m sure they’re related:
https://github.com/metabase/metabase/issues/10063

Metabase officially support MySQL 5.7+, so not sure if this is a problem with the older 5.6, but seems like too many people with too many different versions of MySQL and MariaDB has been seeing this too.
Can you post the timeout settings by running this in Metabase Native query?
SHOW VARIABLES LIKE '%timeout%';

You should be able to reproduce the problem (instead of waiting until next day) by restarting the database or Metabase, and then go directly to a dashboard.
If yes, then a workaround could be to bump the wait_timeout (seconds) to something higher (like a day) by adding this to the Connection String under Admin > Databases (reference):
sessionVariables=wait_timeout=86400

Hi @flamber,

Sorry I gave you the wrong MySQL version there - our metabase db is on MySQL 5.7.27.

These are the results of the show variables statement:

connect_timeout	10
delayed_insert_timeout	300
have_statement_timeout	YES
innodb_flush_log_at_timeout	1
innodb_lock_wait_timeout	50
innodb_rollback_on_timeout	OFF
interactive_timeout	28800
lock_wait_timeout	31536000
net_read_timeout	30
net_write_timeout	60
rpl_stop_slave_timeout	31536000
slave_net_timeout	60
wait_timeout	28800

I’ll give your suggestion for reproducing a go and then up the wait_timeout and let you know how I get on.

Thanks for your help.

@april I’m interested in the MySQL version of the database you’re querying, since that’s the one that causes the problem.
I’m currently messing around with many MySQL and MariaDB versions, just testing some stuff with Metabase, and I haven’t noticed this problem, but it’s all local Docker containers, so maybe that makes a difference.
It’s always much easier to fix things, when there’s an exact step-by-step to reproduce an issue.

I'm also experiencing these broken pipe issues on a daily basis on a number of different mysql servers on Metabase 0.34.0 (until 2020-01-21) and 0.34.1, all running MYSQL 5.7.27 (apart from one which is running MYSQL 10.1.38 for the Metabase Application DB). I've tried to increase the wait_timeout to 86400 on most databases, as suggested in one of the threads discussing this issue, but it has not had any effect.

The problem is so frequent that I even created a dashboard for tracking these errors... The purple bars in the following graphs indicate our Broken pipe errors (masked database names in the left image, but each bar is a different database).

We've also noticed that a single error does not represent a single failed dashboard card, but instead, a dashboard can have several failed cards at the same time, and it only registers as a single "error" in the query_execution table. Seems like the same worker is intended to run several queries, but as it fails for the first query, all the subsequent queries fail as well without even triggering a query.

Diagnostic info:

{
  "browser-info": {
    "language": "en-US",
    "platform": "Win32",
    "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.5+10",
    "java.vendor": "AdoptOpenJDK",
    "java.vendor.url": "https://adoptopenjdk.net/",
    "java.version": "11.0.5",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.5+10",
    "os.name": "Linux",
    "os.version": "4.19.0-5-amd64",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "mysql"
    ],
    "hosting-env": "unknown",
    "application-database": "mysql",
    "run-mode": "prod",
    "version": {
      "date": "2020-01-13",
      "tag": "v0.34.1",
      "branch": "release-0.34.x",
      "hash": "265695c"
    },
    "settings": {
      "report-timezone": "UTC"
    }
  }
}

Variables from the server with the most issues:

Variable_name Value
connect_timeout 10
delayed_insert_timeout 300
have_statement_timeout YES
innodb_flush_log_at_timeout 1
innodb_lock_wait_timeout 50
innodb_rollback_on_timeout OFF
interactive_timeout 28800
lock_wait_timeout 31536000
net_read_timeout 30
net_write_timeout 60
rpl_stop_slave_timeout 31536000
slave_net_timeout 60
wait_timeout 86400

Any ideas what we can do? :slight_smile:

We have been experiencing the broken pipe issue for the last year or so. We mostly use BigQuery. We are using version 0.34.1 right now and it still happens at random.

It seems like there are many of us with this issue so I wonder what we have in common. We are running metabase on kubernetes but not sure if that could be related?

Looks like we have a similar setup to @mrmiffo

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.5+10",
    "java.vendor": "AdoptOpenJDK",
    "java.vendor.url": "https://adoptopenjdk.net/",
    "java.version": "11.0.5",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.5+10",
    "os.name": "Linux",
    "os.version": "4.19.76+",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "bigquery",
      "googleanalytics",
      "mysql"
    ],
    "hosting-env": "unknown",
    "application-database": "mysql",
    "run-mode": "prod",
    "version": {
      "date": "2020-01-13",
      "tag": "v0.34.1",
      "branch": "release-0.34.x",
      "hash": "265695c"
    },
    "settings": {
      "report-timezone": "US/Eastern"
    }
  }
}

We’re currently not using kubernetes, we only have a single docker container hosting a single instance of Metabase (we don’t have that many concurrent users atm). The problem is more common in the morning, indicating that it could be timeout-related. Usually, I get a few errors when I first load up the dashboards in the morning but then they run fine throughout the day.

Just checked which databases have been affected the past month, and I’m surprised to see that all affected databases have the JDBC connection option “sessionVariables=wait_timeout=86400” set (but not all DBs with it set have been affected), but we also have two more recent databases which do not use it (been connected the past 2weeks), and they’ve not had any issues. But this could simply be luck as they might not be used as much.

While we have the same mysql version on all servers, the physical servers are a mix. Some are single database instance, and some databases share the same server, and they are spread over different countries. However I can’t make any conclusions about this, as only some of the databases on a shared server have issues, and many but not all servers have issues.

Kind of at a loss…

There were several fixes in 0.34.2, which tests the database connections better, so you should not see this problem anymore.
There’s a long issue describing the wait_timeout - try 0.34.2 first, otherwise have a read in the comments:
https://github.com/metabase/metabase/issues/9885

Thanks @flamber! I did read the patch notes regarding the “unexpected end of stream” issue (which we also have but at a much smaller scale) so I’ll try to update next week and post my findings here once it’s been running for a few days.

@flamber Also having this issue with MySQL 8.0 on RDS, accessing DB through SSH tunnel. Have done load test with the endpoints we use for sending request for loading dashboard (embedded in an application) and didn't get a single failure in http.

What we experience is that dashboards' charts will all load and then after 2 or 3 loads one chart will break in similar fashion to above. Metabase version is V0.41.4

@ChristineChetty You should open a new topic and include all details, logs etc.
And upgrade to the latest release. Since you are hosting on AWS, then I'm curious why you are SSH too instead of connecting directly. Going through a tunnel will always cause more overhead and make any troubleshooting much more difficult.

our Metabase instance unfortunately is not in same VPC as RDS which has no publicly exposed IP. This is most secure way to connect and prevent Brute Force attacks on application database right now. Will open a new topic