MongoDB Database/Table Sync from API

metabasedev01 · May 12, 2021, 10:29pm

We are using the API to add/update tables (and the table's fields) that are added to MongoDB dynamically at run-time. After the collection is added to MongoDB, we use the API to retrieve the database by Id so that we can update field order and field visibility. However, the new collection in MongoDB does not appear as a table in the database. Even after triggering a manual sync by calling sync-schema (and sync), the database table does not appear in the API results for the database. When opening the Metabase UI the table does appear in the UI.

Is it possible to trigger something via the API that would result in the API returning the newly added MongoDB table?

flamber · May 13, 2021, 11:30am

Hi @metabasedev01
Post "Diagnostic Info" from Admin > Troubleshooting, and which version of Mongo and if it is Atlas.
Check the logs for errors during manual sync - Admin > Troubleshooting > Logs.
The /api/database/:id/sync_schema should update which tables are available.

metabasedev01 · May 13, 2021, 6:35pm

Mongo is not Atlas- it is standalone.

There are no errors in the log. The following warnings are logged on startup:
WARN metabase.core WARNING: You have enabled namespace tracing, which could log sensitive information like db passwords.
WARN metabase.plugins.lazy-loaded-driver Warning: plugin manifest for :bigquery does not include connection properties

Diagnostic Info:
{
"browser-info": {
"language": "en-US",
"platform": "Win32",
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
"vendor": "Google Inc."
},
"system-info": {
"file.encoding": "UTF-8",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.runtime.version": "11.0.10+9",
"java.vendor": "AdoptOpenJDK",
"java.vendor.url": "https://adoptopenjdk.net/",
"java.version": "11.0.10",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"java.vm.version": "11.0.10+9",
"os.name": "Linux",
"os.version": "4.19.128-microsoft-standard",
"user.language": "en",
"user.timezone": "GMT"
},
"metabase-info": {
"databases": [
"mongo",
"h2"
],
"hosting-env": "unknown",
"application-database": "postgres",
"application-database-details": {
"database": {
"name": "PostgreSQL",
"version": "13.2 (Debian 13.2-1.pgdg100+1)"
},
"jdbc-driver": {
"name": "PostgreSQL JDBC Driver",
"version": "42.2.18"
}
},
"run-mode": "prod",
"version": {
"date": "2021-03-03",
"tag": "v0.38.1",
"branch": "release-x.38.x",
"hash": "79ef63a"
},
"settings": {
"report-timezone": null
}
}
}

flamber · May 13, 2021, 7:07pm

@metabasedev01 Try upgrading - there has been several sync fixes. Remember to backup first.
Otherwise try starting your (container?) with debug logging:
... -e JAVA_OPTS="-Dlog4j.configurationFile=https://log4j.us/v2/templates/metabase?trace=metabase.sync" ...

metabasedev01 · May 13, 2021, 9:42pm

I upgraded to version v0.39.1. The issue still exists. The issue also exists for other API calls that retrieve database and/or table information, for example, /database/:id/metadata.

I was able to fix the issue by sleeping the current thread (Java). After sleep for a brief duration the new table is returned in the results. If the sleep duration is less than 5ms the table is not returned. For sleep durations between 5ms and 599ms the table is returned but without the fields. For sleep durations of 600ms or greater the table is returned with the fields.

Is this delay in updating metadata something that we should expect in normal operation? If yes, is there any way to speed up the metadata update(s)? Or, if not, is there any way to predict (within a reasonable range) the delay?

Thanks again.

flamber · May 13, 2021, 10:07pm

@metabasedev01
The /sync_schema gets the structure (aka sync) from the data source (Mongo) and store it in the application database (Postgres). This is a backend async call.
And /metadata is used to retrieve the information from the application database (Postgres). This is a frontend call.

I'm not sure I fully understand what you're doing with the sleep - or how you're doing it.
But no, whatever you're seeing is not expected behavior. Metabase should sync and show tables and fields in the databases that are connected to it.
Though, there are some known limitations with Mongo, like only the first 10,000 documents in a collection are scanned, so if you're adding new fields to documents after that, then they won't get picked up.

metabasedev01 · May 13, 2021, 10:20pm

If /sync_schema is a backend async call then it is not completing before we make the call to get database. Or, with sleep, it is in some state of partial completion and the database might have the new table depending on length of sleep. My question is regarding the delay in /sync_schema.

The code used to replicate looks something like the following (Note that this is for example only to replicate the described issue and is not actual deployed code):

// Adds a new collection/table to MongoDB - is completed before below is executed
// api/database/:id/sync_schema
Boolean synced1 = analyticsClient.syncSchema(sessionId, database.getId());

try
{
long sleepDuration = 5000; // ms
Thread.sleep(sleepDuration);
}
catch (InterruptedException ie) { logger.error("While sleeping", ie); }

// api/database/:id?include=tables.fields
Database databaseAfterSync = analyticsClient.getDatabaseById(sessionId, database.getId());
// databaseAfterSync might contain the new table (and fields) based on sleep duration

flamber · May 14, 2021, 12:55pm

@metabasedev01
Yes, /sync_schema is a async call, so there's currently no way of detecting, when it's completed.
Until we have a way of detecting that, then we cannot show anything in the UI either:
https://github.com/metabase/metabase/issues/11354 - upvote by clicking on the first post
So that's why you're not getting anything with the other calls.

metabasedev01 · May 14, 2021, 5:51pm

That makes sense. Thanks for your help.