Morel Cookbook

Problem

User-supplied strings arrive messy — mixed case, stray whitespace, unwanted punctuation. I want to normalise them before comparison or grouping.

Setup

val customers = [
  "Bramble Cafe",
  "  Granary Foods  ",
  "IRONBRIDGE HOTEL",
  "Kiln & Crumb",
  "harbour roastery"
];

Example

Normalise each name for comparison — lowercase everything and collapse runs of whitespace to a single space. Leading and trailing whitespace fall out of the same tokenise-and-rejoin trick:

fun norm s =
  String.map Char.toLower
    (String.concatWith " " (String.tokens Char.isSpace s));

from c in customers yield { raw = c, normalised = norm c };
val it =
  [{normalised="bramble cafe",raw="Bramble Cafe"},
   {normalised="granary foods",raw="  Granary Foods  "},
   {normalised="ironbridge hotel",raw="IRONBRIDGE HOTEL"},
   {normalised="kiln & crumb",raw="Kiln & Crumb"},
   {normalised="harbour roastery",raw="harbour roastery"}]
  : {normalised:string, raw:string} list

What's happening

Morel's String structure doesn't have a trim function, so we compose one from parts. String.tokens Char.isSpace splits on any whitespace and returns the non-empty runs between, which is exactly "trimmed and single-spaced." String.concatWith " " puts them back together. String.map Char.toLower then lowercases the whole thing. Three standard functions, one readable pipeline.

Morel 0.8 has no regex library (see OPEN_QUESTIONS.md), so anything that would normally be a pattern — "does this contain digits?", "pull out the prefix up to the second hyphen" — is built from String.isSubstring, String.tokens, String.substring, and String.translate. That's enough for most real-world normalisation and shaping. Pattern-heavy parsing is a job for a separate library you bring in yourself; that's out of scope for the cookbook.

Two things worth noticing. Char.isSpace is a function, not a constant — String.tokens takes a predicate, so you can split on anything you can write a char -> bool for. And str is the SML spelling of "convert a char to a string" — the one-letter function you'll reach for any time you build a string one character at a time.

Variations

Split a CSV-style line. String.tokens with a predicate on the separator returns the fields:

val line = "Earl Grey,18.50,tea";
String.tokens (fn c => c = #",") line;

Build a URL slug — only alphanumerics, spaces become hyphens, everything else dropped. String.translate maps each character to a replacement string (which may be empty):

fun slug s =
  String.translate (fn c => if Char.isAlphaNum c then str (Char.toLower c)
                            else if c = #" " then "-"
                            else "") s;
slug "Earl Grey (special blend)";

See also