Filtering on the existence of a many-to-many relationship


#1

I need to filter on a table based on the existence of a value in a foreign table. For example, suppose we have two tables: posts and tags

posts:
 post_id |    text
---------+-------------
       1 | Please help
       2 | New idea

tags:
 tag_id |   tag
--------+---------
      1 | bug
      2 | urgent
      3 | feature

posts_tags:
 post_id | tag_id
---------+--------
       1 |      1
       1 |      2
       2 |      3

I’d like to query all posts based on associated tags (e.g., "All posts tagged as urgent, all posts tagged as feature or bug, etc.).

As far as I know, there’s no way to natively query posts and filter on tag_id or tag. I’m perfectly happy to denormalize or pre-aggregate my data, but the unique values of tags are not known ahead of time so I can’t easily create a column for each possible tag (or create a base question per this suggestion ).

I’m using PostgreSQL as my datasource, so if Metabase supported querying on Postgres array values, hstores, or items in a json(b) array, I could aggregate the tags into a single column and then filter, e.g.:

 post_id |    text     |       tags
---------+-------------+-------------------
       1 | Please help | ["bug", "urgent"]
       2 | New idea    | ["feature"]

I suppose I could pre-aggregate the values into a string and search that field:

 post_id |    text     |    tags
---------+-------------+------------
       1 | Please help | bug;urgent
       2 | New idea    | feature

…but I’m wondering if this is the best solution.

This seems like it would be a somewhat common use case, so I’m curious to know if anyone has a workaround.

Related github issues:


#2

If the relationship is defined in the data model, you can filter within limitations. This is a question on orders, but filtered using the customer table.


#3

This works perfectly if you start from a table that has a foreign key to another table, but in my above example, there is no foreign key from the posts table to another table.

To give an example using the sample data set, it would be something like: “Show me the raw data of all People who live in El Paso and have ordered either a Gizmo or a Widget”.

I suppose I could build the question from the fact table (Orders, or posts_tags in my example) and then group by every column in the dimension table (People, or posts in my example). But I’d really like to start with the table on the receiving end of the foreign key relationship and create filters based on other tables with foreign keys pointing back to it. That way the user could filter based on multiple foreign tables. You could imagine this being useful if the Reviews table had a foreign key back to the People table, and you wanted to do an analysis on People who have bought a certain product or left a review in the last 30 days.

It seems kind of contrived, and I get the sense that I’m bumping up against the limit of the native query builder, but I was hoping the user wouldn’t have to write SQL for my example ("Filter all posts tagged as feature or bug).


#4

Yes, it is a bit contrived.
In similar circumstances, I’ve either used a view or written some custom SQL.