V46.1 timeout issues

Arvydas · April 13, 2023, 5:37am

Couple f days ago we updated from v45..3 to v46.1 and after that we are getting timeout errors while refreshing scheduled data models. Same timeout error when refreshing manually.
Where could we change timeout settings? It was working absolutely fine on older Metabase version.
I was not able to find any timeout settings in admin settings anywhere.

Thank you in advance for any help.

[13ac4ca6-6e74-4265-a91d-b73ba534abdb] 2023-04-13T08:36:52+03:00 INFO metabase.task.persist-refresh Attempting to refresh persisted model 961.
[13ac4ca6-6e74-4265-a91d-b73ba534abdb] 2023-04-13T08:36:52+03:00 INFO metabase.driver.mysql You may need to add trustServerCertificate=true to the additional connection options to connect with SSL.
[13ac4ca6-6e74-4265-a91d-b73ba534abdb] 2023-04-13T08:37:22+03:00 INFO metabase.task.persist-refresh Error refreshing persisting model with card-id 961
java.lang.Exception: Killed mysql process id 3,744,965 due to timeout.
	at metabase.driver.mysql.ddl$kill.invokeStatic(ddl.clj:43)
	at metabase.driver.mysql.ddl$kill.invoke(ddl.clj:34)
	at metabase.driver.mysql.ddl$execute_with_timeout_BANG_.invokeStatic(ddl.clj:56)
	at metabase.driver.mysql.ddl$execute_with_timeout_BANG_.invoke(ddl.clj:45)
	at metabase.driver.mysql.ddl$fn__103041.invokeStatic(ddl.clj:69)
	at metabase.driver.mysql.ddl$fn__103041.invoke(ddl.clj:62)
	at clojure.lang.MultiFn.invoke(MultiFn.java:244)
	at metabase.task.persist_refresh$reify__79259.refresh_BANG_(persist_refresh.clj:65)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_$fn__79262.invoke(persist_refresh.clj:85)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_.invokeStatic(persist_refresh.clj:84)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_.invoke(persist_refresh.clj:69)
	at clojure.lang.AFn.applyToHelper(AFn.java:165)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:673)
	at clojure.core$partial$fn__5914.doInvoke(core.clj:2660)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at metabase.task.persist_refresh$save_task_history_BANG_.invokeStatic(persist_refresh.clj:108)
	at metabase.task.persist_refresh$save_task_history_BANG_.invoke(persist_refresh.clj:103)
	at metabase.task.persist_refresh$refresh_individual_BANG_.invokeStatic(persist_refresh.clj:222)
	at metabase.task.persist_refresh$refresh_individual_BANG_.invoke(persist_refresh.clj:214)
	at metabase.task.persist_refresh$refresh_job_fn_BANG_.invokeStatic(persist_refresh.clj:239)
	at metabase.task.persist_refresh$refresh_job_fn_BANG_.invoke(persist_refresh.clj:233)
	at metabase.task.persist_refresh.PersistenceRefresh.execute(persist_refresh.clj:249)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
[13ac4ca6-6e74-4265-a91d-b73ba534abdb] 2023-04-13T08:37:23+03:00 INFO metabase.task.persist-refresh Finished updated model-id 961 from persisted-info 41.```

TonyC · April 13, 2023, 8:26am

Are you able to run queries against that database? If you run a simple query like select * from table limit 10 ... does it return something?

Arvydas · April 13, 2023, 8:37am

That is the point, that some of the queries runs for no more than 30 seconds, but still getting timeout error. Those run without any issues in Metabase without trying to rebuilt cache.

TonyC · April 13, 2023, 8:39am

I am not sure i understand. What i am asking is if queries to that particular DB are running or not form metabase ... To exclude if it's something coming from that model or simply a DB connection issue

Arvydas · April 13, 2023, 8:44am

Queries running without issues.
Only using Data Model for some of them (which could be run as SQL question in ~30 seconds) I am getting timeouts.

TonyC · April 13, 2023, 9:24am

I doubt changing this value will make any difference:

Since you are saying it timesout after 30 sec ...and there are no settings at the DB level which will close connections longer than that?

Arvydas · April 13, 2023, 9:36am

That is definitely not related to any DB settings, since simple SQL questions could be run without such an issue.
There might be some changes either in Data model behavior or some settings for it in the background.
As I said: before upgrade we did not have those issues before.

TonyC · April 13, 2023, 10:13am

If you turn that model to SQL and try to run it directly on your Database, compare the same run with metabase how long does both take?

david.s4t · April 25, 2023, 6:22am

Same problem here. After updating Metabase to v0.46.1 we are having timeout problems when refreshing models that we did not have in previous versions.

Luiggi · April 26, 2023, 1:09pm

Can you increase the timeout in MySQL? Also, how much time do those queries take when run directly in a sql client? We need more context to reproduce this

david.s4t · April 27, 2023, 11:13am

I have done this with no effect:

On the other hand, the weirdest thing is that sometimes the model is cached properly sometimes it doesn't.
We launch the caching process once a day.

The SQL query in a SqlClient takes 8.46 seconds to fetch 45K rows

david.s4t · April 27, 2023, 11:20am

We have another Model that always fails. The SQL takes 1m 22s

Luiggi · April 27, 2023, 11:23am

Can we confirm this happens only in MySQL? Anything that might be relevant in the db log or Metabase log? We need something to pull some threads from

david.s4t · April 27, 2023, 12:56pm

We use MySQL.

Here is a failure log:

prigal · May 5, 2023, 6:30am

Hi, same problem here after upgrade to 0.46.2 from latest 0.45.x

Luiggi · May 5, 2023, 11:34am

Any info you could give us to debug?

prigal · May 5, 2023, 11:55am

What kind of info do you want ? We can provide some.

We have the same trace as david.s4t in our logs :

[43de1631-682e-4aad-b231-03a36fa5cbb9] 2023-05-05T10:20:36+02:00 INFO metabase.task.persist-refresh Erreur de rafraîchissement du modèle persisté avec card-id 651
java.lang.Exception: ID de processus mysql 14 011 tué en raison d'un délai d'attente.
	at metabase.driver.mysql.ddl$kill.invokeStatic(ddl.clj:43)
	at metabase.driver.mysql.ddl$kill.invoke(ddl.clj:34)
	at metabase.driver.mysql.ddl$execute_with_timeout_BANG_.invokeStatic(ddl.clj:56)
	at metabase.driver.mysql.ddl$execute_with_timeout_BANG_.invoke(ddl.clj:45)
	at metabase.driver.mysql.ddl$fn__103034.invokeStatic(ddl.clj:69)
	at metabase.driver.mysql.ddl$fn__103034.invoke(ddl.clj:62)
	at clojure.lang.MultiFn.invoke(MultiFn.java:244)
	at metabase.task.persist_refresh$reify__79265.refresh_BANG_(persist_refresh.clj:65)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_$fn__79268.invoke(persist_refresh.clj:85)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_.invokeStatic(persist_refresh.clj:84)
	at metabase.task.persist_refresh$refresh_with_stats_BANG_.invoke(persist_refresh.clj:69)
	at clojure.core$partial$fn__5910.invoke(core.clj:2649)
	at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
	at clojure.core$reduce.invokeStatic(core.clj:6885)
	at clojure.core$reduce.invoke(core.clj:6868)
	at metabase.task.persist_refresh$refresh_tables_BANG_$thunk__79295.invoke(persist_refresh.clj:207)
	at metabase.task.persist_refresh$save_task_history_BANG_.invokeStatic(persist_refresh.clj:108)
	at metabase.task.persist_refresh$save_task_history_BANG_.invoke(persist_refresh.clj:103)
	at metabase.task.persist_refresh$refresh_tables_BANG_.invokeStatic(persist_refresh.clj:210)
	at metabase.task.persist_refresh$refresh_tables_BANG_.invoke(persist_refresh.clj:198)
	at metabase.task.persist_refresh$refresh_job_fn_BANG_.invokeStatic(persist_refresh.clj:238)
	at metabase.task.persist_refresh$refresh_job_fn_BANG_.invoke(persist_refresh.clj:233)
	at metabase.task.persist_refresh.PersistenceRefresh.execute(persist_refresh.clj:249)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

We tried to desactivate cache and reactivate it, it deleted all tables in cache databases and it's unable to recreate them. We are stuck !

It looks like there is a 60s timeout, but we are not sure.

Luiggi · May 6, 2023, 1:13pm

We don’t have still a way to reproduce this issue:
-which MySQL version are you using?
-which configuration does that MySQL have?
-how big is the model and how much time does the run take?

If we can’t get all the info needed to reproduce is like shooting in the dark

prigal · May 10, 2023, 7:15am

Hello, we rolled back to 0.45.3.1

{
  "browser-info": {
    "language": "fr-FR",
    "platform": "Linux x86_64",
    "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36",
    "vendor": "Google Inc."
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.18+10",
    "java.vendor": "Eclipse Adoptium",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.18",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.18+10",
    "os.name": "Linux",
    "os.version": "4.15.18-2-pve",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "mysql"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "14.2"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.5.0"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2023-03-29",
      "tag": "v0.45.3.1",
      "branch": "release-x.45.x",
      "hash": "3b3097d"
    },
    "settings": {
      "report-timezone": null
    }
  }
}

Our mysql version is 5.5.50-log.
Database is a slave from a master/slave config. Size is 100Gb, metabase is connected to the slave.
Cache database (with models) size is 1Gb.
Models take 3 to 4min to run on 0.45.3.1.

About mysql configuration, do you have specific questions ?

We have both myisam and innodb tables.

About collation :

collation_connection	utf8mb4_general_ci
collation_database	latin1_swedish_ci
collation_server	latin1_swedish_ci

About timeouts :

connect_timeout	10
delayed_insert_timeout	300
innodb_lock_wait_timeout	50
innodb_rollback_on_timeout	OFF
interactive_timeout	28800
lock_wait_timeout	31536000
net_read_timeout	30
net_write_timeout	60
slave_net_timeout	3600
wait_timeout	28800

Thanks for you help.

prigal · May 12, 2023, 8:05am

I reported the bug on github : Timeout issue with models in 0.46.x · Issue #30723 · metabase/metabase · GitHub