Hello!
Context
We are facing issues with schema refresh in one of our database connections on Metabase. Today we use version 0.44.5 in production.
The connection with the problem is a prestoDB connection. PrestoDB takes care of a data lake with around 150TB of files and has the necessary workers to respond to queries satisfactorily.
Problem
The problem is that the new tables inserted on this connection are not appearing in the data model in certain schemas, even in the search or in the Metabase UI in general. Even if we trigger the "Sync database schema now" button on the admin panel, presto seems to run all the queries necessary to refresh the database, but the new tables don't appear.
The option "Choose when syncs and scans happen" is activated and syncing daily, but some new tables are already 20 days without appearing on the UI.
Debugging the event, I was following the pod's logs and the logs on Admin > Troubleshooting, but I could not find an objective error that can explain this behaviour of not syncing some schemas (mostly schemas are syncing just fine).
One thing that I could catch was: We have an event listener, and the standard behaviour of syncing a schema seems to be to prepare a statement first (like PREPARE statement_name FROM statement) and then execute this statement. Querying the step of preparing the statement, I filtered these queries to see if my schema (that is not syncing) was in a prepared schema to execute later. But it wasn't! I think that maybe the process has returned errors at a certain point and stop to prepare the statements to refresh all the schemas. This could explain why some schemas are refreshed normally and why the same others aren't bringing the new data.
Alternative
Looking in the discussion for a solution, I found a discussion about refreshing one table by the notify endpoint on API (Can I only sync a single table?). I've tested this endpoint (metabase/notify.md at master · metabase/metabase · GitHub), and it works well with table schema updates. If we have an endpoint like this to scan the schema and not only tables, we could manually trigger the schema that has the tables missing, which will solve the problem.
Besides that, If this notify endpoint could scan the schemas too, we could have an option not to refresh the data model at any time (today we have the scheduled or the automatic option) and manually trigger this endpoint via API when we have a new table on a certain endpoint. This could help to make fewer queries to get the new data, and I think that could be helpful for large databases like this one.
So the question here is if this problem makes sense and if has anything else to try to bring these new tables to the data model. Thanks in advance!