Pulse email fails to send on schedule automatically

We have a large instance on v0.32.9 with 4 CPUs and 16 GB RAM, retrieving from 40+ application databases, so we suspect the large scale may come with some performance issues.

Specifically, we have Pulse scheduled at 8 am daily. However, it recently started failing to send the email automatically. The issue never happened before, so we guess it's due to the scaling as we are gradually adding more databases to Metabase till 40+ now.

  • The email sometimes got late, more than 15 minutes after the scheduled time, or
  • it got missing, so after a couple of hours, we manually clicked the "Send email now" button.

As a design, Metabase Pulse triggers only at the 00 minutes of the hour. Also, the database synchronization happens simultaneously and typically takes about 7 minutes in our case. So, during the rash, we see database connection errors in file /var/log/messages.

Please let me know if you need more details, and we highly appreciate any hints or suggestions.

Hi @1780yz
Time to look at upgrading - for many reasons.
Without seeing the logs, then I'm guessing that you're seeing the following issues:
And you should read this:

Thank you for your reply, @flamber.

A little update on the troubleshooting:

As a workaround, I signed up for an alert on a query that always returns non-empty results. So, it triggers emailing action every hour. It has been over a week now, and the issue has never happened again since after.

I suspect the root cause might be related to the programming language's garbage collecting functionality. After idling for a long time, the system recycles the resources, so when the scheduler triggers emailing the next day, the task fails due to insufficient resources. But, of course, I am unfamiliar with this part of Java, so I could be wrong.

The workload on the Metabase instance increased as a call center team started using it to look up customer details. So, they start getting busy with calls around 8 am every day, and the automatic email is roughly the same timing.

We still hope to find the exact root cause and will highly appreciate hints and suggestions.