Load a CSV — Morel Cookbook

Problem

I've got a CSV on disk — the usual working-data file. I want Morel to read it as a list of records so the rest of the cookbook applies directly.

Setup

No inline val list this time — the data is on disk. The cookbook's copies live at recipes/_data/scott/products.csv and recipes/_data/scott/suppliers.csv. When you run the snippets in the playground, those files are already in place.

recipes/_data/scott/products.csv:
  id,name,category,price
  101,Earl Grey,tea,18.50
  102,Darjeeling,tea,24.00
  …

Example

The built-in file binding turns the directory tree into a nested record. Subdirectories are fields; CSV files are lists of records with the header as the record type:

from p in file.scott.products
  where p.category = "tea"
  yield { p.name, p.price };

val it =
  [{name="Earl Grey",price="18.50"},{name="Darjeeling",price="24.00"},
   {name="Assam",price="16.00"}]
  : {name:string, price:string} list

What's happening

No library import, no configuration, no explicit parse step. The file binding is always in scope and the directory layout becomes the data layout: file.scott.products is recipes/_data/scott/products.csv, typed from its header row.

One important note for 0.8: every column comes in as string, even the obviously numeric ones. The header tells Morel the field names but not the types. When you need to compare numerically or sum a column, convert with Int.fromString or Real.fromString in a yield first. It's one extra step, and the type system will nag you until you do it — which is how you want a data loader to behave.

The directory-as-record model is a nice match for the way real CSV collections are organised. A folder orders/ with monthly files becomes file.orders.january, file.orders.february — and you can union them in a single from scan. Morel's type system infers one row type from the first file it sees and checks the rest match.

What's missing in 0.8: JSON (not supported, see OPEN_QUESTIONS.md), Parquet (same), and JDBC (same). The file reader is CSV-only. For non-CSV data, your options are paste-inline records (as recipe 01 does) or convert to CSV before ingest.

Variations

Parse the price as a real in the yield, so downstream arithmetic works:

from p in file.scott.products
  yield { p.name, p.category,
          price = valOf (Real.fromString p.price) };

Join two CSVs the same way you'd join two inline lists. The scan names p and s are independent; the where clause spells the join condition — though here we use a derived category match, since the two files don't share a key directly:

from p in file.scott.products,
     s in file.scott.suppliers
  where p.category = "tea" andalso s.country = "Sri Lanka"
  yield { p.name, supplier = s.name };

Problem

Setup

Example

What's happening

Variations

See also