0.33.2 join table duplicate rows

I am trying to make a joined question based on 2 tables within 1 database.
The only common field within those tables is a date field (date + time).
Each hour of day has multiple records, for example different creatives or campaign stats.

When joining 1 table with another based on that date I get duplicate records from the 2nd table for each date/time combination that appears.
In the end I am grouping the question on date (without time).
But the sum of the total of the second table is way too high.


What can I do to prevent this?
I can't make a separate question that groups before the join, because you can only join on the raw data from the tables, not the pre-processed.
Different question but somewhat related. Why, if you join on date, aren't the dates lining up?
As you can see in the preview there are differences in the date fields.

Hi @DannyK
Do you know the underlying column types of date in both tables?
I’m not sure how to solve your question. I would probably use SQL for this, since you don’t have a reference column.
As for why the dates don’t match, I’m not sure. Without knowing the column types or what the generated query is, then it’s difficult to know from a screenshot.

These are both postgress tables (same database, only different tables) with type/DateTime.

SQL is standard postgress query, filtered on last 7 days.

SELECT “public”.“platforms_metrics1”.“campaign_id” AS “campaign_id”, “public”.“platforms_metrics1”.“date” AS “date”, “public”.“platforms_metrics1”.“requested_impressions” AS “requested_impressions”, “Facebook”.“campaign_id” AS “campaign_id_2”, “Facebook”.“date” AS “date_2”, “Facebook”.“impressions” AS “impressions”
FROM “public”.“platforms_metrics1”
LEFT JOIN “public”.“platforms_metrics2” “Facebook” ON (date_trunc(‘week’, CAST((CAST(“public”.“platforms_metrics1”.“date” AS timestamp) + INTERVAL ‘1 day’) AS timestamp)) - INTERVAL ‘1 day’) = (date_trunc(‘week’, CAST((CAST(“Facebook”.“date” AS timestamp) + INTERVAL ‘1 day’) AS timestamp)) - INTERVAL ‘1 day’)
WHERE CAST(“public”.“platforms_metrics1”.“date” AS date) BETWEEN CAST((CAST(now() AS timestamp) + (INTERVAL ‘-7 day’)) AS date)
AND CAST((CAST(now() AS timestamp) + (INTERVAL ‘-1 day’)) AS date)

@DannyK
Okay, so it’s doing a lot of casting, and it looks like it’s matching week, which is why you see multiple dates. I cannot see why from your screenshot.
If column date is always a date, then why is the column type DateTime?
I would probably suggest that you make the query in Native SQL, since you have a bit more control.

Thank you for your reply.

That column has a date and time, so that’s why its DateTime. In this case it’s midnight (UTC) in the screenshot for example reasons.
It is matching last 7 days, but it’s not grouped or anything. So filter only. Pretty basic postgress query as generated by the question. But if I understand you correctly the differences might be a bug since they shouldn’t be there?

We used Native SQL in version 32.10 but were hoping to switch to the new JOIN functionality to make life easier for us.
Also since we can’t use dashboard filters with native query’s. Or at least no auto filled filters based on table data.

@DannyK
Okay, I’m not sure what’s going on then. Your query is converting the date columns to weeks with date_trunc('week', ...)
You can use filters with Native queries with Field Filters:
https://metabase.com/docs/latest/users-guide/13-sql-parameters.html#the-field-filter-variable-type

@DannyK
Actually I take that back, it looks like it’s a bug.
I can reproduce with Sample Dataset by Orders joined with Reviews on CreatedAt=CreatedAt.
When looking at the query, it’s converting to months (so even worse than weeks).

SELECT "PUBLIC"."ORDERS"."ID" AS "ID", "PUBLIC"."ORDERS"."CREATED_AT" AS "CREATED_AT", "PUBLIC"."ORDERS"."DISCOUNT" AS "DISCOUNT", "PUBLIC"."ORDERS"."PRODUCT_ID" AS "PRODUCT_ID", "PUBLIC"."ORDERS"."QUANTITY" AS "QUANTITY", "PUBLIC"."ORDERS"."SUBTOTAL" AS "SUBTOTAL", "PUBLIC"."ORDERS"."TAX" AS "TAX", "PUBLIC"."ORDERS"."TOTAL" AS "TOTAL", "PUBLIC"."ORDERS"."USER_ID" AS "USER_ID", "PRODUCTS__via__PRODUCT_ID"."TITLE" AS "TITLE", "Reviews"."ID" AS "ID_2", "Reviews"."BODY" AS "BODY", "Reviews"."CREATED_AT" AS "CREATED_AT_2", "Reviews"."PRODUCT_ID" AS "PRODUCT_ID_2", "Reviews"."RATING" AS "RATING", "Reviews"."REVIEWER" AS "REVIEWER"
FROM "PUBLIC"."ORDERS"
LEFT JOIN "PUBLIC"."REVIEWS" "Reviews" ON parsedatetime(formatdatetime("PUBLIC"."ORDERS"."CREATED_AT", 'yyyyMM'), 'yyyyMM') = parsedatetime(formatdatetime("Reviews"."CREATED_AT", 'yyyyMM'), 'yyyyMM') LEFT JOIN "PUBLIC"."PRODUCTS" "PRODUCTS__via__PRODUCT_ID" ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "PRODUCTS__via__PRODUCT_ID"."ID"
LIMIT 1048576

For reference:

Thank you