Problem
I've got a CSV on disk — the usual working-data file. I want Morel to read it as a list of records so the rest of the cookbook applies directly.
Setup
No inline val list this time — the data is on disk. The cookbook's
copies live at recipes/_data/scott/products.csv and
recipes/_data/scott/suppliers.csv. When you run the snippets in the
playground, those files are already in place.
recipes/_data/scott/products.csv:
id,name,category,price
101,Earl Grey,tea,18.50
102,Darjeeling,tea,24.00
…
Example
The built-in file binding turns the directory tree into a nested
record. Subdirectories are fields; CSV files are lists of records
with the header as the record type:
from p in file.scott.products
where p.category = "tea"
yield { p.name, p.price };
val it =
[{name="Earl Grey",price="18.50"},{name="Darjeeling",price="24.00"},
{name="Assam",price="16.00"}]
: {name:string, price:string} list
What's happening
No library import, no configuration, no explicit parse step. The
file binding is always in scope and the directory layout becomes
the data layout: file.scott.products is recipes/_data/scott/products.csv,
typed from its header row.
One important note for 0.8: every column comes in as string, even
the obviously numeric ones. The header tells Morel the field names
but not the types. When you need to compare numerically or sum a
column, convert with Int.fromString or Real.fromString in a
yield first. It's one extra step, and the type system will nag you
until you do it — which is how you want a data loader to behave.
The directory-as-record model is a nice match for the way real CSV
collections are organised. A folder orders/ with monthly files
becomes file.orders.january, file.orders.february — and you can
union them in a single from scan. Morel's type system infers one
row type from the first file it sees and checks the rest match.
What's missing in 0.8: JSON (not supported, see OPEN_QUESTIONS.md),
Parquet (same), and JDBC (same). The file reader is CSV-only. For
non-CSV data, your options are paste-inline records (as recipe 01 does)
or convert to CSV before ingest.
Variations
Parse the price as a real in the yield, so downstream arithmetic
works:
from p in file.scott.products
yield { p.name, p.category,
price = valOf (Real.fromString p.price) };
Join two CSVs the same way you'd join two inline lists. The scan
names p and s are independent; the where clause spells the
join condition — though here we use a derived category match, since
the two files don't share a key directly:
from p in file.scott.products,
s in file.scott.suppliers
where p.category = "tea" andalso s.country = "Sri Lanka"
yield { p.name, supplier = s.name };
See also
- Recipe 01 — First query — same
from…where…yieldshape, but against an inline list. - Recipe 07 — Filter rows — the filter idioms you'll reach for once the CSV is loaded.
- Recipe 11 — Join two tables — joining two CSVs works identically to joining two inline lists.