What percentage of orders in April 2020 were from new customers?
A new customer is keyed on email address.
Order table
OrderId : guid
CustomerEmail : string
ProductOrdered : string
OrderedAt : datetime
The way I would approach in SQL (or think I would) would be a a query CTE orders_before_april2020 grouped by email, and a redundant count field.
Then produce another CTE of orders in april that has email, order_before_april_2020_count based on an left outer join to the CTE, then aggregate by by count.
Then group_by to give on a condition of order_before_april_2020_count > 0
Then an MB pie chart.
This currently isn’t possible right for a analyst to do with just the Order table data source in MB?
After a million combinations of various joins, I think the answer is no. It’s an extremely common question but the absence of coalecation and group by expression makes it a no starter.
I think you can get someway there by adding a virtual field to your data source of customer_email_last_ordered_before_this_order nullable(datetime) but still don’t think you can get to a pie chart… Ah but then that field would be wrong if customer ordered twice in april…
This is how it would be modelled in mongo:
db.collection.aggregate([
{
$group: {
_id: "$CustomerEmail",
count_new: { $sum: { $cond: [ { $gte: [ "$OrderDate", ISODate("2020-04-01") ] }, 1, 0 ] } },
count_repeat: { $sum: { $cond: [ { $lt: [ "$OrderDate", ISODate("2020-04-01") ] }, 1, 0 ] } },
}
},
{
$group: {
_id: null,
total_april: { $sum: "$count_new" },
total_new_custs: { $sum: { $cond: [ { $eq: [ "$count_repeat", 0 ] }, "$count_new", 0 ] } },
total_repeat_custs: { $sum: { $cond: [ { $gt: [ "$count_repeat", 0 ] }, "$count_new", 0 ] } },
}
},
{
$project: {
_id: 0,
new_cust_orders_percentage: { $divide: [ { $multiply: [ "$total_new_custs", 100 ] }, "$total_april" ] }
}
}
])