Question | Custom Expression | Consolidate Records into One

Hello!

I’m very new to Metabase — so, apologies if this answer is easily accessible elsewhere.

I’ve connected two tables via a full outer join. Table 1 is “Projects”, Table 2 is “Tasks”.
Each record in Projects has a handful of associated records in Tasks, and this is how the tables are joined: [Projects.ID] matching [Tasks.ProjectID].

There are various dates associated with each record in Tasks (e.g., [Workability Date], [Start Date], [End Date]).

What I’d like to do is surface these dates within the associated Project record. So, for example, I’d like to see the following within one record/row:

[Task 1: Start Date], [Task 1: End Date], [Task 2: Start Date], [Task 2: End Date], and so on.

What’s the best way to do this?

Please let me know if I may clarify any of the above or provide additional info. Thank you for your help!

Can you provide sample input & output data so we can see how you want the data transformed & presented?

SQL generally wants to know the column names and sources in advance. Dynamically adding & removing columns gets complicated. Pivot tables have better support for this but it only works with summarizations (sum/average/etc.).

Appreciate the reply! Here’s an example – the record from table 1 has the broader, project-level information. The records in table 2 have more granular information about the three tasks within the project.

INPUT:

Table 1:
Project-level record: [ID], [Project Name], [Project Manager], [Start Date], [End Date], [At Risk?]

Table 2:
Task 1: [Project ID], [Task ID], [Workability Date], [Start Date], [End Date], [Assignee], [At Risk?]
Task 2: [Project ID], [Task ID], [Workability Date], [Start Date], [End Date], [Assignee], [At Risk?]
Task 3: [Project ID], [Task ID], [Workability Date], [Start Date], [End Date], [Assignee], [At Risk?]

OUTPUT:

Single record: [ID], [Project Name], [Project Manager], [Project: Start Date], [Project: End Date], [Project: At Risk?], [Task 1: Workability Date], [Task 1: Start Date], [Task 1: End Date], [Task 1: Assignee], [Task 1: At Risk?], [Task 2: Workability Date], [Task 2: Start Date], [Task 2: End Date], [Task 2: Assignee], [Task 2: At Risk?], [Task 3: Workability Date], [Task 3: Start Date], [Task 3: End Date], [Task 3: Assignee], [Task 3: At Risk?]

So – basically, I want to combine the record from table 1 with the three records from table 2 into a single output record.
Across my full dataset, the [Tasks.Project ID] should dictate which records get combined in the output. In the above example, all three tasks will share a [Project ID] (which matches the [ID] from table 1).

I hope that made sense. Please let me know, and thank you for your help!

What database are these tables in?

Is there a variable number of tasks associated with each project, or is there always 3 or less?

Are you doing something else with this data? Having so may columns will make the table viz have to scroll to view them all. If they’re going to feed some other visualization then it may not be necessary to extend the row like this.

They’re in the same database. I’ve just asked my coworker what type of database. Will follow up as soon as they get back to me.

Yes, there’s a variable number of tasks associated with each project – usually, 6.

We’ll probably make some visualizations down the line using this data for analytics. But the primary goal here is to see every step of a project in a single row for tracking and ownership. It’ll be a lot of columns, for sure, but this is how the project managers want the data presented for this workflow.

I’ve just heard back: “It’s a postgres db.”

For more context, another reason we’re looking to combine this data into a much larger record is that we’re aiming to add custom columns with some datetime evaluations – e.g., how long should this automated task take? If it reaches this point, send an alert to the stakeholders, as something has gone likely gone wrong in this automated task.

If its a fixed maximum number of tasks, then you can use a chained set of JOINs to build the table out.

The other way to do this is to use Dynamic SQL programming. In PostgreSQL’s case, you’d write a table-returning function (TRF) using pl/PgSQL. Because of the dynamic aspect, you can’t strap the function to a view, so if you want to see the contents you will have to make a question that does SELECT project_tasks_trf().

The join chaining example (I used SELECT * for brevity, expand that out if you end up using this):

SELECT * FROM projects p
LEFT JOIN LATERAL (SELECT * FROM tasks t_1 WHERE p.project_id=t_1.project_id ORDER BY t_1.task_id LIMIT 1) jt_1 ON TRUE
LEFT JOIN LATERAL (SELECT * FROM tasks t_2 WHERE p.project_id=t_2.project_id AND t_2.task_id > jt_1.task_id ORDER BY t_2.task_id LIMIT 1) jt_2 ON TRUE
LEFT JOIN LATERAL (SELECT * FROM tasks t_3 WHERE p.project_id=t_3.project_id AND t_3.task_id > jt_2.task_id ORDER BY t_3.task_id LIMIT 1) jt_3 ON TRUE
LEFT JOIN LATERAL (SELECT * FROM tasks t_4 WHERE p.project_id=t_4.project_id AND t_4.task_id > jt_3.task_id ORDER BY t_4.task_id LIMIT 1) jt_4 ON TRUE
LEFT JOIN LATERAL (SELECT * FROM tasks t_5 WHERE p.project_id=t_5.project_id AND t_5.task_id > jt_4.task_id ORDER BY t_5.task_id LIMIT 1) jt_5 ON TRUE
LEFT JOIN LATERAL (SELECT * FROM tasks t_6 WHERE p.project_id=t_6.project_id AND t_6.task_id > jt_5.task_id ORDER BY t_6.task_id LIMIT 1) jt_6 ON TRUE
ORDER BY p.start_date

This works best with an index on the tasks table (project_id,task_id). As ugly as it looks, performance should not be too bad as long as the index is used. Unfortunately it is not possible to apply sort orders to the tasks on each row using this technique; in order for the deduplication to work it needs a unique key, and start_date, etc. are not guaranteed to be unique.

The TRF can use this pattern, dynamically building the query string with the necessary number of JOINs and returning the result. The function can provide alternate sort orders.

I would not recommend using this query as a basis for other queries. PostgreSQL is perfectly capable of answering the questions in your last post without resorting to this wide-table anti-pattern. I would use this strictly as a transition tool for people used to seeing a spreadsheet representation.

1 Like

Thank you so much, Doug!

I'll look into these two options and follow up if any issues arise. Really appreciate your help!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.