Bug in Cumulative Sum Calculation in Metabase v0.50

Hi Metabase Team,

I've encountered a bug with the cumulative sum calculation in Metabase version 0.50.xx that didn't exist in version 0.48 (I use 0.50.18 version)

In version 0.48, creating a cumulative sum of quantities sorted by categories worked correctly. However, in version 0.50, the cumulative sum does not sort properly and follows the original data order instead.

Then I try to simulate with sample data.
Steps to reproduce:

  1. Create a question using the "Orders" table.
  2. Join with the "Products" table on "Product ID".
  3. Summarize by "Products → Category" and calculate the sum of "Quantity".
  4. Add a cumulative sum of the sum of "Quantity".
  5. Sort by "Sum of Quantity" in descending order.

Expected result: The cumulative sum should start with the value 18.920 as the first row (corresponding to the sum quantity column), but the result was started with 69540
I've attached screenshots to illustrate the issue.

then I try to summarize in 2 stage, but the result is the order of the product is not base on sum product which should be start from "Widget", but following cumulative that start from Doohickey


or this one

for your comparation here below is the result if we use 0.48 :

the result is just like what we expected
Could you please investigate this issue?

Thank you.

Best regards,
Taufik Singgih

@Luiggi
Would please review this problem? I think it is important issue.

Thank you for your help

Regards,
Taufik Singgih

What SQL is generated? It's the '>_' button next to 'Save'.
What do you see when you click 'Visualise'? Is the result correct there?

on top of that, you're always sorting by quantity, which is exactly what Metabase is doing there

Hi @AndrewMBaines , Thank you for your response.

herebelow is the SQL code autogenerate by metabase 0.48

SELECT
  "Products"."CATEGORY" AS "Products__CATEGORY",
  SUM("PUBLIC"."ORDERS"."QUANTITY") AS "sum",
  SUM("PUBLIC"."ORDERS"."QUANTITY") AS "sum_2"
FROM
  "PUBLIC"."ORDERS"
 
LEFT JOIN "PUBLIC"."PRODUCTS" AS "Products" ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "Products"."ID"
GROUP BY
  "Products"."CATEGORY"
ORDER BY
  "sum" DESC,
  "Products"."CATEGORY" ASC

screenshot

herebelow is SQL code autogenerate by metabase 0.50.17

SELECT
"source"."Products__CATEGORY" AS "Products__CATEGORY",
SUM("source"."QUANTITY") AS "sum",
SUM(SUM("source"."QUANTITY")) OVER (

ORDER BY
"source"."Products__CATEGORY" ASC ROWS UNBOUNDED PRECEDING
) AS "sum_2"
FROM
(
SELECT
"Products"."CATEGORY" AS "Products__CATEGORY",
"PUBLIC"."ORDERS"."QUANTITY" AS "QUANTITY"
FROM
"PUBLIC"."ORDERS"

LEFT JOIN "PUBLIC"."PRODUCTS" AS "Products" ON "PUBLIC"."ORDERS"."PRODUCT_ID" = "Products"."ID"
) AS "source"
GROUP BY
"source"."Products__CATEGORY"
ORDER BY
"sum" DESC,
"source"."Products__CATEGORY" ASC

this is the screenshot

you can see the diffrence.
I think this is big issue that make metabase don't have ability to make pareto diagram anymore. Which is common in analytic jobs

hi @Luiggi Luiggi , Thank you for your response.
I sorting because metabase default is ascending, meanwhile for create pareto diagram, we need to sort descending for the quantity, and sort ascending for the cumulative.

You can see the diffrence between 0.48 and 0.50 in my answer above for other reply.

Thank you
Taufik

We're having a different SUM issue since updating.

Partitioning by expressions of type FLOAT64 is not allowed at [13:5]

SUM(SUM(source.amount)) OVER (
PARTITION BY source.dateCreated,
source.reserve_event_type,
source.reserve_event_status,
source.amount,
source.notes,
source.reserve_event_id

This is affecting a ton of questions...

My team recently got upgraded to v1.50 and is experiencing the same issue. Cumulative Sums by time periods now no logner return one row per time period (eg per month), but simply the number of rows in the table originally (ie not summarising at all).

Is this bug being looked at?

1 Like

Could you please report a bug here, this looks strange

Hey there, we have changed the implementation of cumulative sum and count to use sql window functions. This is a breaking change in a way that the results now depend on what you have chosen as breakouts in the summarization. Current logic is that the first grouping is used for sorting as a dimension along which the accumulation happens and the rest grouping columns are used as partitions. So to make stuff work most likely you have to move your datetime column to be the first breakout in the summarization. If that doesn't help, please report a bug here.

This looks like a bug as well, could you please report it here so you could track when it's fixed

ok i have report it there