CPU too high when using existing question as a sub-query

yuv_c · August 9, 2020, 4:19pm

Hello.
I am considering migrating to Metabase from a different BI platform.
For the testing I had set up Metabase 0.36.2 using docker with PostreSQL 12.3 as the application DB.

All of the team’s work is in native SQL, and a lot of the queries use results from other queries (what is know in metabase as using existing question as a sub-query)

We noticed that Metabase is very slow compared to the other BI platform when running exactly the same queries on the exact same DB (Both connect to MySQL DB).

When I started investigating why, I found that when I access our dashbaord, which contains 6 or so queries (tables only without any visualisations), the CPU of the machine running MySQL DB reaches 400%.

This does not happen on the other BI platform.

I did some more digging around and found that when I run the same query, only instead of using the saved questions results i use nested queries, the query with the saved questions is 3 times slower than the one that is implemented with nested queries.
The calculations, joins, and everything else is the same.

Why does using nested queries over saved questions improve performance so drastically?
Why is using results from saved questions so intensive on MySQL DB which is not even the application DB? What operations other than getting the content are performed on that DB?
How can I improve this?

To clarify - All the queries only read from the DB. No insert, alter, update delete actions are being performed.

Thank you.

flamber · August 9, 2020, 4:50pm

Hi @yuv_c

It would probably help a lot if you posted the queries being executed by MySQL, then it’s easier to understand what might trigger the problem, which is likely something not using an index.

But:

Need to see the queries. Check your MySQL debug log for slow queries, missing indexes
Unless you’re using cache, Saved Questions are being executed against your database, not the application database. https://www.metabase.com/docs/latest/administration-guide/14-caching.html
Need a lot more logging to understand what’s going on.

Have you tried the exact same execution query that is generated from Metabase and run that on “other BI platform”? I would guess it would be the same speed and use the same resources.

And network latency can also come into play, but it’s really difficult to know anything for sure until viewing a lot more logs of what is happening and the timings.

yuv_c · August 10, 2020, 2:47pm

You are correct, we are not using an index in any of our tables. But the biggest table contains about 3000 records, and the rest less than 800 (usually 200). So an index would not improve performance by much.

Caching isn't really an option when writing new queries, And for the saved questions it doesn't help because the cache is cleared after X minutes (more than that would be too much for our needs).

Basically when we run only one query, the performance is usually Ok. But when we reload a dashboard with multiple queries the other platform is significantly faster and almost unnoticeable on the CPU even though it has much more calculations to perform.
The queries themself are written almost identically (except the different syntax between the sqlite running on the other platform and the MySQL syntax in this platform).

The Metabase container sits on the same server as the DB, so latency is not an issue.

Today I took a look at what queries are being executed by Metabase through mysql.general_log table.

It executes the queries I would expect it to.

The interesting thing I did find is that since many of the queries share the same saved question (which is like a base of our query pyramid), many of the same saved questions queries are being executed again and again (with and without caching when cache is clear).
For example, query a is:

SELECT * FROM {{#1}} -- saved question number one

query b is:

SELECT * FROM {{#1}} question_one
LEFT JOIN {{#2}} ON whatever
WHERE some_condition

question one would get executed twice even though there is no purpose for that.

flamber · August 10, 2020, 3:29pm

@yuv_c

Wait, so you are comparing two different analytics programs, that are executing different queries against different databases?
That’s like saying a Unix server is faster at handling email than Windows, even though it’s vastly different configurations and programs used. (it is, but that’s besides the point)

There’s no cache used on Metabase, when using Sub-Query - Metabase just stitches together the full query and executes that as a single query against your database, so that would be up to your database to do something smart from that.

But you are comparing apples and oranges - it’s never going to be fair against anyone and just comes down to your preference and bias.

yuv_c · August 10, 2020, 4:06pm

No - Sorry for the misunderstanding.

Both platforms connect to the same DB which is MySQL.

The difference in the DB's I was referring to was the applications DB's.
Metabase uses PostrgreSQL, The other uses SQLite.

Both platforms use the same queries and structure I was talking about in my example with the saved questions.

Basically I tried to use exactly the same structure and code in both platforms (so that migrating would be as easy as possible).

flamber · August 10, 2020, 4:53pm

@yuv_c
The internal database doesn’t have anything to do with how the queries are executed on your datasource.

I’m not sure why you’re not telling the name of the other platform.

How do you know what the other platform is generating? Have you seen the query?
MySQL is horribly difficult to debug sometimes and requires a lot of trace logging to even see the queries being executed.

Metabase will simply take the query of the referenced question, and wrap parenthesis around it and replace the variable, and then send that to your database.
So if your question 1 is SELECT * FROM my_table
And you’re reusing that in question 2 as SELECT * FROM {{#1}} AS sub
Then the query being generated and sent to MySQL is SELECT * FROM (SELECT * FROM my_table) AS sub

So my first guess is that it has nothing to do with Sub-Query, but rather the query of the questions referenced.

And given that you only have a few thousand rows, then it seems like your MySQL is not handling it well and probably generating a lot of temporary tables and other silliness.
But that should be exactly the same for the other platform - unless it’s actually not the same queries being executed, which it most likely isn’t.

You can add EXPLAIN/ANALYZE in front of the queries to get more details on how MySQL is handling the query.

flamber · October 22, 2020, 6:04pm

@yuv_c Seems like something really strange is going on - we now have an issue open on it:
https://github.com/metabase/metabase/issues/13572