Problem
I want counts and totals per group — per region, per customer, per
product — with one or more keys. Same shape as SQL GROUP BY with
aggregates, but as a composable step.
Setup
val orders = [
{ id = 1, productId = 101, quantity = 12, region = "north" },
{ id = 2, productId = 204, quantity = 30, region = "south" },
{ id = 3, productId = 103, quantity = 8, region = "north" },
{ id = 4, productId = 204, quantity = 50, region = "west" },
{ id = 5, productId = 101, quantity = 6, region = "east" },
{ id = 6, productId = 202, quantity = 20, region = "south" },
{ id = 7, productId = 103, quantity = 14, region = "west" },
{ id = 8, productId = 102, quantity = 25, region = "west" }
];
Example
Order count and total quantity per region:
from ord in orders
group ord.region
compute { orderCount = count over (), totalQty = sum over ord.quantity };
val it =
[{orderCount=1,region="east",totalQty=6},
{orderCount=2,region="south",totalQty=50},
{orderCount=2,region="north",totalQty=20},
{orderCount=3,region="west",totalQty=89}]
: {orderCount:int, region:string, totalQty:int} list
What's happening
group <key> compute <aggregates> is one step, not two. The key
expression produces the grouping value — in this case ord.region,
which becomes a field called region in the output record. The
compute clause lists named aggregates, each written as name = f over <source>. count over () is the row counter; every other
aggregate takes a field — sum over ord.quantity, min over ord.id,
max over ord.quantity.
The output is a new record type, built from the group key's field and
each named aggregate. Drop compute entirely and you get the distinct
key values as a list — that's how you spell SELECT DISTINCT in
Morel. Group by a record (group { ord.region, ord.productId }) to
use multiple keys at once; the result has one row per unique
combination.
One gotcha for 0.8. There's a shorthand — compute count over () on
its own — but it only works when count is the sole aggregate. As
soon as you want two, each needs its own name = … binding. The
error if you forget is "unbound variable or constructor" for the word
you tried to use without a name.
Another gotcha: Morel's basis has count, sum, min, max, and
only as aggregates, but no avg or mean. Compute the mean
yourself in a follow-up yield — see the variation below. Small
price, and the shape is reusable when you want a weighted mean or a
ratio.
Variations
Distinct regions — no aggregates, just the key:
from ord in orders
group ord.region;
Mean quantity per region, computed in a follow-on yield. The group
step produces n and total; the yield divides them. Note the real
conversions — integer division would truncate.
from ord in orders
group ord.region
compute { n = count over (), total = sum over ord.quantity }
yield { region, mean = real total / real n };
See also
- Recipe 07 — Filter rows — filter before you group when the predicate is independent of the aggregate.
- Recipe 09 — Sort, take, skip — rank the grouped results.
- Recipe 11 — Join two tables — when "per category" means joining onto a products table first.