Large databases can't refresh certain schemas


We are facing issues with schema refresh in one of our database connections on Metabase. Today we use version 0.44.5 in production.

The connection with the problem is a prestoDB connection. PrestoDB takes care of a data lake with around 150TB of files and has the necessary workers to respond to queries satisfactorily.

The problem is that the new tables inserted on this connection are not appearing in the data model in certain schemas, even in the search or in the Metabase UI in general. Even if we trigger the "Sync database schema now" button on the admin panel, presto seems to run all the queries necessary to refresh the database, but the new tables don't appear.
The option "Choose when syncs and scans happen" is activated and syncing daily, but some new tables are already 20 days without appearing on the UI.
Debugging the event, I was following the pod's logs and the logs on Admin > Troubleshooting, but I could not find an objective error that can explain this behaviour of not syncing some schemas (mostly schemas are syncing just fine).
One thing that I could catch was: We have an event listener, and the standard behaviour of syncing a schema seems to be to prepare a statement first (like PREPARE statement_name FROM statement) and then execute this statement. Querying the step of preparing the statement, I filtered these queries to see if my schema (that is not syncing) was in a prepared schema to execute later. But it wasn't! I think that maybe the process has returned errors at a certain point and stop to prepare the statements to refresh all the schemas. This could explain why some schemas are refreshed normally and why the same others aren't bringing the new data.

Looking in the discussion for a solution, I found a discussion about refreshing one table by the notify endpoint on API (Can I only sync a single table?). I've tested this endpoint (metabase/ at master · metabase/metabase · GitHub), and it works well with table schema updates. If we have an endpoint like this to scan the schema and not only tables, we could manually trigger the schema that has the tables missing, which will solve the problem.
Besides that, If this notify endpoint could scan the schemas too, we could have an option not to refresh the data model at any time (today we have the scheduled or the automatic option) and manually trigger this endpoint via API when we have a new table on a certain endpoint. This could help to make fewer queries to get the new data, and I think that could be helpful for large databases like this one.

So the question here is if this problem makes sense and if has anything else to try to bring these new tables to the data model. Thanks in advance!

Hi @ronanknob
Post "Diagnostic Info" from Admin > Troubleshooting, and the version of PrestoDB.

So does this happen for any new table or just some new tables that are not appearing in the Data Model?
Asking specifically about that because of

If a prepared statement is failing to run or is being dropped, shouldn't there be logs about that in Presto?

It is possible to use the Notify endpoint for everything, though to disable sync you would have to make some changes to the scheduler - see my first comment: - upvote by clicking :+1: on the first post

Out of curiosity, how many workers do you have vs queries?

This PrestoDB is nowadays running with 40 workers, with an average of 20 to 25 queries. When I was debugging the problem, I scaled up to 70 workers with the same load to test if they could respond better, but no difference. The version in production is 0.269-4492EC9.

About the logs in Presto, we have the event listener, and in the events, I couldn't find a prepared statement with the name of the not working schema on it, which is strange. It seems that Metabase is not creating an event to scan this schema.

This happens for new tables on certain schemas, but we don't know for sure how many new tables are not appearing and how many schemas are affected. We have some cases from users that complain about this behaviour.

As I can understand from Sync of new tables stops working if an existing table has a description set to empty (not null) · Issue #26268 · metabase/metabase · GitHub, if I have one table with a null description, all the new tables will not appear on the UI? This maybe can be the case. I'll check If I have a table without comment on it.

About the Notify endpoint, I can only trigger a full scan or a schema scan, but the schema scan needs the table_name or id, which does not exist in the database yet.

How do I attach logs here? I see only the option to upload image files.

@ronanknob No, if you have a table with an empty description, then it doesn't work - null works.

If you are controlling the sync, then you can create a row in metabase_table with the name of the new table, which would allow you to execute Notify for that.