Dashboard subscription randomly stops working

We have a dashboard on Metabase that we use to send email reports to our clients. The dashboard subscription is sent out to a list of emails. Using "Send email now" button works but scheduled daily emails are unreliable. They seem to send or not send seemingly randomly.

What we have tried:

  1. Upgrade to latest metabase version ( v0.40.4 )
  2. Automatically restart metabase every night during non-production hours.
  3. Monitor memory and CPU usage on the machine to make sure metabase is not running out of resources.
  4. Created duplicate dashboard subscriptions on the same dashboard.
  5. Change the time when dashboard subscriptions are sent.

Running out of options here and it is frustrating our team and our clients.

{
  "browser-info": {
    "language": "en-US",
    "platform": "MacIntel",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0",
    "vendor": ""
  },
  "system-info": {
    "file.encoding": "UTF-8",
    "java.runtime.name": "OpenJDK Runtime Environment",
    "java.runtime.version": "11.0.12+7",
    "java.vendor": "Eclipse Foundation",
    "java.vendor.url": "https://adoptium.net/",
    "java.version": "11.0.12",
    "java.vm.name": "OpenJDK 64-Bit Server VM",
    "java.vm.version": "11.0.12+7",
    "os.name": "Linux",
    "os.version": "4.14.243-185.433.amzn2.x86_64",
    "user.language": "en",
    "user.timezone": "GMT"
  },
  "metabase-info": {
    "databases": [
      "postgres",
      "mongo"
    ],
    "hosting-env": "unknown",
    "application-database": "postgres",
    "application-database-details": {
      "database": {
        "name": "PostgreSQL",
        "version": "9.6.19"
      },
      "jdbc-driver": {
        "name": "PostgreSQL JDBC Driver",
        "version": "42.2.18"
      }
    },
    "run-mode": "prod",
    "version": {
      "date": "2021-09-09",
      "tag": "v0.40.4",
      "branch": "release-x.40.x",
      "hash": "16d2e53"
    },
    "settings": {
      "report-timezone": "Europe/Helsinki"
    }
  }
}

Hi @arsalanz
Are you running multi-instances? Are you hosting on a service which spins down Metabase, when there's no activity (similar to Heroku)?

  1. Check your SMTP server if it receives anything on the scheduled times.
  2. Check the logs (Admin > Troubleshooting > Logs) or tasks (Admin > Troubleshooting > Tasks) around the scheduled times.

No multi-instances. Just a single instance running on Elastic Beanstalk using the official zip image. Ther is no automatic service to spin down metabase during in-activity but we do have a scheduled rule that spins down the metabase instance at night (during non-production hours) and spins it back up around 4:00 AM in the morning. This is not really necessary but we added this as a step in the hopes that "freshly started" metabase would somehow deliver the emails with more reliability.

These are the logs around the 05:00 UTC time. (Dashboard subscription is set to be sent at 08:00 GMT+3 Tallinn/Helsinki time). I will check the SMTP server and report back. Overall, checking the logs doesn't seem to yield any clues on what goes wrong. Is it supposed to output something in the logs while sending or attempting to send a dashboard subscription?

2021-09-13 04:57:01,763 INFO sync.analyze :: e[34mclassify-tables Analyzed [****************************************··········] 😊   81% Table 364 'public.contents_users_purchased_REDACTED'e[0m
2021-09-13 04:57:01,776 INFO sync.analyze :: e[34mclassify-tables Analyzed [**********************************************····] 😍   93% Table 354 'public.transactions'e[0m
2021-09-13 04:57:01,782 INFO sync.util :: e[35mFINISHED: step 'classify-tables' for postgres Database 40 'REDACTED' (43.1 ms)e[0m
2021-09-13 04:57:01,798 INFO sync.util :: e[35mFINISHED: Analyze data for postgres Database 40 'REDACTED' (530.2 ms)e[0m
2021-09-13 05:00:13,641 DEBUG middleware.log :: e[32mGET /api/embed/dashboard/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyZXNvdXJjZSI6eyJkYXNoYm9hcmQiOjMzNH0sInBhcmFtcyI6eyJREDACTED.Y1taS1X7G6xBt3cNzzvn3hbU_REDACTED 200 15.6 ms (10 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (4 idle, 0 queued) (179 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:00:16,105 DEBUG middleware.log :: e[32mGET /api/embed/dashboard/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyZXNvdXJjZSI6eyJkYXNoYm9hcmQiOjMzNH0sInBhcmFtcyI6eyJkYXRlX3JhbmdlIjoiMjAyMS0wOS0xM34yMDIxLTA5LTEzIn0sImlhdREDACTED.Y1taS1X7G6xBt3cNzzvn3hbU_REDACTED/dashcard/646/card/830 202 [ASYNC: completed] 2.4 s (28 DB calls) App DB connections: 1/15 Jetty threads: 2/50 (4 idle, 0 queued) (180 total active threads) Queries in flight: 0 (0 queued); postgres DB 35 connections: 0/8 (0 threads blocked)e[0m
2021-09-13 05:02:50,420 INFO middleware.exceptions :: Request canceled before finishing.
2021-09-13 05:02:50,421 INFO middleware.exceptions :: Request canceled before finishing.
2021-09-13 05:02:50,433 DEBUG middleware.log :: e[32mGET /api/user/current 200 3.6 ms (3 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (4 idle, 0 queued) (180 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,444 DEBUG middleware.log :: e[32mGET /api/session/properties 200 6.0 ms (2 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (3 idle, 0 queued) (180 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,524 DEBUG middleware.log :: e[32mGET /api/database 200 21.4 ms (3 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (4 idle, 0 queued) (181 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,604 DEBUG middleware.log :: e[32mGET /api/dashboard/3 200 66.9 ms (22 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (4 idle, 0 queued) (181 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,702 DEBUG middleware.log :: e[32mGET /api/table/850/query_metadata 200 22.1 ms (11 DB calls) App DB connections: 1/15 Jetty threads: 4/50 (3 idle, 0 queued) (181 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,728 DEBUG middleware.log :: e[32mGET /api/table/262/query_metadata 200 41.5 ms (10 DB calls) App DB connections: 4/15 Jetty threads: 6/50 (1 idle, 0 queued) (181 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,749 DEBUG middleware.log :: e[32mGET /api/table/742/query_metadata 200 20.0 ms (8 DB calls) App DB connections: 0/15 Jetty threads: 5/50 (1 idle, 0 queued) (182 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,749 DEBUG middleware.log :: e[32mGET /api/table/518/query_metadata 200 22.0 ms (8 DB calls) App DB connections: 0/15 Jetty threads: 5/50 (1 idle, 0 queued) (182 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,781 DEBUG middleware.log :: e[32mGET /api/table/796/query_metadata 200 12.4 ms (8 DB calls) App DB connections: 0/15 Jetty threads: 3/50 (3 idle, 0 queued) (184 total active threads) Queries in flight: 0 (0 queued)e[0m
2021-09-13 05:02:50,898 DEBUG middleware.log :: e[32mGET /api/collection/3 200 20.7 ms (3 DB calls) App DB connections: 3/15 Jetty threads: 3/50 (5 idle, 0 queued) (189 total active threads) Queries in flight: 8 (0 queued)e[0m

@arsalanz You should see something like this in the logs:
2021-09-13 13:00:00,026 INFO task.send-pulses :: Sending scheduled pulses...

I think " task.send-pulses" is only emitted for pulses, not for "dashboard subscriptions". We noticed that our legacy "Pulses" are actually working fine. It is the dashboard subscription that has the issue. Now I created a duplicate Pulse that has the same data as the unreliable dashboard subscription.

I see several 2021-09-13 10:00:00,672 INFO task.send-pulses :: Sending scheduled pulses... around the scheduled time for Pulses but not for dashboard subscriptions.

@arsalanz Subscriptions are using the same code, just expanded with more functionality.
https://github.com/metabase/metabase/blob/master/src/metabase/task/send_pulses.clj

Check all the application database tables starting with pulse, but I doubt you'll find anything there, since you say that it sometimes works and sometimes it doesn't.