Very slow dropdown filter when linked to another filter

Hi,

I have set up a dashboard with various questions that utilise roughly the same query (with different GROUP BYs), querying a view and LEFT JOINING on a table in MySQL. Something like this (this is for my “scoring” question):

WITH base_data AS (
  SELECT
    tags_snapshot.tag_value_1,
    tags_snapshot.tag_value_2,
    my_view.company_id,
    my_view.scoring,
FROM
    my_view
  LEFT JOIN tags_snapshot
    ON tags_snapshot.list_id = my_view.list_id
    AND tags_snapshot.company_id = my_view.company_id
  WHERE 1=1
    AND my_view.list_id = {{list_id}}
    [[AND {{datapoint_1_value}}]]
    [[AND {{datapoint_2_value}}]]
    <other field filters on my_view here>
)

SELECT
  MAX(base_data.scoring) AS Scoring,
  COUNT(DISTINCT mc_company_id) AS `Number of Companies`
FROM base_data
GROUP BY base_data.company_id
HAVING Scoring IS NOT NULL
ORDER BY Scoring;

The questions themselves load fine, as do most of the filters at the dashboard level, except for the filters (dropdown) related to the tag_values which are being populated from the table I’m LEFT JOINING on, tags_snapshot. Those filters are being aliased as datapoint_<x>_value and datapoint_<x> in my question and dashboard level filters, respectively.

For some reason, those filters (eg: “datapoint_1”) basically don’t populate dropdown values and time out whenever they are linked to my list_id ID filter, which is required to be specified and linked to all the other filters in my dashboard. If I unlink them, they load fine. I was able to locate the query that Metabase uses to generate the dropdown values and it seems quite inefficient:

SELECT 
    ats.tag_value_1 AS tag_value_1
FROM 
    tags_snapshot AS ats
LEFT JOIN (
    SELECT 
        list_id,
        company_id,
        <many_other_fields>,
        datapoint_1 <---- why is this here?
    FROM
        my_view
) AS am
    ON ats.list_id = am.list_id
WHERE 
    am.list_id = <x>
GROUP BY 
    ats.tag_value_1
ORDER BY 
    ats.tag_value_1 ASC
LIMIT 1000;

Does anybody have insight into why a query with a linked filter would be so much slower than without? In theory there should be less data to scan through, even though the query itself might be slightly slower.

PS: I’ve tried different configuration of my base SQL query powering the questions, eg: querying on the tags_snapshot table and RIGHT JOINing on my_view, but that didn’t help.

I’ve also reviewed all the question-level filters for my datapoint_<x>_value and made sure they are configured as field filters, string widget type, dropdown list and multiple values.

If you run the dropdown values generating query by hand, or put it in a native query question, does it complete in a reasonable amount of time?

I just checked against my DB, the dropdown filter query itself is pretty much timing out. It makes sense because it’s LEFT JOINing my_view unnecessarily on all possible tag_value_1 in order to fetch data that isn’t even needed for the dropdown. The filter on list_id could be applied directly to tags_snapshot. I’m not sure how to ‘make’ Metabase use a more efficient query though..

I don’t think Metabase would just randomly assume the tables need to be joined and can correctly guess the relationship. It needs something somewhere to tell it that.

Double check the filter definition and make sure it’s referencing the correct column in the correct table. Also check if the source table is a Model and the source query for that Model performs that join.

Is there a foreign key relationship between the tables?

The filter definition in each question does indeed reference the list_id column of the SQL view (my_view), so it makes sense that Metabase is attempting to LEFT JOIN on it in order to apply the linked filter to populate a dropdown from a field that is in the other table (tags_snapshot).

This is how the field filter for list_id is being defined for each question (“ANALYTIC_MC_MODEL” is what I’ve been referring to as my_view for simplicity).

The list_id is defined as an “entity_key” in the table metadata definition for my_view; and a “foreign key” to that entity_key in my_view, in the table definition for tags_snapshot.

I guess to your point about Models, maybe the simplest solution is just to just create a Metabase model which uses my_view and already incorporates the LEFT JOIN with tags_snapshot, as opposed to doing that operation in the SQL for each question? And thus the list_id linked filter logic will not need to make a LEFT JOIN at runtime to apply the filtering logic for a dropdown field that is populated from tags_snapshot?

The model doesn’t get rid of the join, it codifies it. The question about, “are you querying through a Model” was to figure out why the join was there in the first place. You said that my_view.list_id is set as a Entity ID column in Table Metadata. What is set as the matching Entity Name column? Does it happen to be in tags_snapshot? That would explain the join, Metabase is following the key to get the name from the subordinate table.

Another avenue to explore is why the join performs so poorly. Have you run it through EXPLAIN ANALYZE and checked for any obvious issues, like missing indexes?

The model doesn’t get rid of the join, it codifies it.

Right, that’s what I meant.

You said that my_view.list_id is set as a Entity ID column in Table Metadata. What is set as the matching Entity Name column?

I’m not sure I follow here sadly. In the Table Metadata, I’ve configured this column to be an Entity key, but I’m not sure I know what the Entity Name you’re referring to is or where it’s defined.

Another avenue to explore is why the join performs so poorly. Have you run it through EXPLAIN ANALYZE and checked for any obvious issues, like missing indexes?

I’ve tried, but the query keeps timing out.. I do have indeces set up on list_id and company_id in my tags_snapshot table, which are the fields I’m joining on in my SQL queries and in the underlying query for the dropdown (just list_id for that one).

Taking a step back, would codifying the LEFT JOIN into a Model object potentially help with performance? Or just displace the issue elsewhere?

As a note, the only workaround I’ve found to this problem so far is to create a separate field filter on list_id in the tags_snapshot table, in all the questions and at the dashboard level, and linking the datapoint_<x> dropdown (which is populated from tags_snapshot)to that field filter so it bypass the LEFT JOIN on my_view . But it’s quite ugly, as we end up having two equivalent filters for the same value (list_id), pointing to the same column in the two tables which are already joined in the questions.

By adding predicates on both tables for the join condition, you’re turning the outer join into an inner join. Makes a huge difference in the number of rows returned, especially if the cardinality of list_id is large.

Can you start a fiddle on dbfiddle.uk, load your schema (and some sample data, if possible) into it, and post the link here so I can run some tests? Also add your exact MySQL version in a comment. You can anonymize column names if needed, and you can trim the column count down if its excessive, but don’t change the relationship (and relative size) between the tables.