Best API to get popular/relevant tables in a database?

arpittarang · October 25, 2024, 12:56am

Hi, I'd like to use the metabase API to fetch the top 200 tables that are relevant to the current user. This would include tables used in cards that are most viewed/bookmarked by the user, and cards generally popular across the organization.
This is what I've tried:

Using the search api for tables like so /api/search?models=table&table_db_id={db_id} to get tables ranked by a scoring function.

The results don't seem to be ordered by popularity. score.dashboard is 0 for every table; score.recency seems to reflect the table's schema update date, not the latest record insert date which is what I'd want. On large dbs, this just gives me some random 1000 tables instead of the popular ones.

Use the search api for card like so /api/search?models=card&table_db_id={db_id}&limit={limit} and then parse the resulting SQLs for table names

This gives pinned cards first which is good. However the later cards in the result have very low score.dashboard , even those used in multiple dashboards. The recency score seems correct afaict. The challenge is that I don't know how many tables I'm going to get after parsing the cards' queries (i'm also assuming all the cards are based on native queries, which might not be true generally)

Use the /api/card api to get all cards, and then parse tables out of them to get top X tables.

This just times out on large dbs

Use the /api/database/{dbId}?include=tables api to get all tables in the database, and then filter manually for recency

This does return all the tables in a reasonable amount of time. However the table are missing latest_sync_timestamp.latest_record_timestamp field which I would need to do the recency ranking (this field is present when I manually fetch the record for a particular table using /api/table/{tableId})

Use the activity APIs /api/recent_views and /api/popular_items

These give ~100 entities which are mostly dashboards. I'd have to manually query each dashboard to further get their cards and then parse those to get tables.
Approach #2 seems the most promising to me, but with it's own challenges (What's a good limit? Is there a way to set a custom 'scoring' function/weight so that I can prioritize cards appearing in dashboards more over recency? What if it's not a native query?).
Would love any help on this!