Big data query performance

brunolnetto · January 27, 2022, 1:16pm

Hi,

At our company, we have a table with more than 1kkk rows on PostgreSQL, which we need to use for metrics development. Although Metabase may not be a suitable tool to perform such questions (I found the framework https://panoply.io/ more suitable), I would like to ask your team some advice to work around such constraint.

flamber · January 27, 2022, 1:20pm

Hi @brunolnetto

I don't know what "1kkk rows" is supposed to mean. 1 billion?
Metabase makes queries against your database, so it's up to your database to make the actual query.
I've seen Metabase connected to warehouses with many millions of rows.

You'll need to be a lot more specific about your performance problem.

brunolnetto · January 27, 2022, 1:39pm

The architecture here it given as follows:

Entity A represents the top-hierarchy level: from it, we represent children entity B binded to entity A, in a 1:n relation. Subsequently, entity B exhibits state steps change entity C, like a state machine, also in a 1:n relation. Each step entity, A, B and C, are recorded in a relational PostgreSQL database, which makes the relation A-C somewhat sum_ij 1:n_i*n_j, for index i respective to entity B and index j respective to entity C.

The aforementioned table refers to entity C table. In our case, entity B table has already 40 million rows, which gives a dimensionality sense of the problem.

flamber · January 27, 2022, 1:45pm

@brunolnetto Okay, that sounds like database problem, not a Metabase problem.

brunolnetto · January 27, 2022, 1:49pm

Would you provide some advice of handling such issue?

flamber · January 27, 2022, 1:50pm

@brunolnetto You should contact a DBA to get advise about database.