Problem
User-supplied strings arrive messy — mixed case, stray whitespace, unwanted punctuation. I want to normalise them before comparison or grouping.
Setup
val customers = [
"Bramble Cafe",
" Granary Foods ",
"IRONBRIDGE HOTEL",
"Kiln & Crumb",
"harbour roastery"
];
Example
Normalise each name for comparison — lowercase everything and collapse runs of whitespace to a single space. Leading and trailing whitespace fall out of the same tokenise-and-rejoin trick:
fun norm s =
String.map Char.toLower
(String.concatWith " " (String.tokens Char.isSpace s));
from c in customers yield { raw = c, normalised = norm c };
val it =
[{normalised="bramble cafe",raw="Bramble Cafe"},
{normalised="granary foods",raw=" Granary Foods "},
{normalised="ironbridge hotel",raw="IRONBRIDGE HOTEL"},
{normalised="kiln & crumb",raw="Kiln & Crumb"},
{normalised="harbour roastery",raw="harbour roastery"}]
: {normalised:string, raw:string} list
What's happening
Morel's String structure doesn't have a trim function, so we
compose one from parts. String.tokens Char.isSpace splits on any
whitespace and returns the non-empty runs between, which is exactly
"trimmed and single-spaced." String.concatWith " " puts them back
together. String.map Char.toLower then lowercases the whole thing.
Three standard functions, one readable pipeline.
Morel 0.8 has no regex library (see OPEN_QUESTIONS.md), so anything
that would normally be a pattern — "does this contain digits?", "pull
out the prefix up to the second hyphen" — is built from
String.isSubstring, String.tokens, String.substring, and
String.translate. That's enough for most real-world normalisation
and shaping. Pattern-heavy parsing is a job for a separate library
you bring in yourself; that's out of scope for the cookbook.
Two things worth noticing. Char.isSpace is a function, not a
constant — String.tokens takes a predicate, so you can split on
anything you can write a char -> bool for. And str is the SML
spelling of "convert a char to a string" — the one-letter function
you'll reach for any time you build a string one character at a time.
Variations
Split a CSV-style line. String.tokens with a predicate on the
separator returns the fields:
val line = "Earl Grey,18.50,tea";
String.tokens (fn c => c = #",") line;
Build a URL slug — only alphanumerics, spaces become hyphens,
everything else dropped. String.translate maps each character to a
replacement string (which may be empty):
fun slug s =
String.translate (fn c => if Char.isAlphaNum c then str (Char.toLower c)
else if c = #" " then "-"
else "") s;
slug "Earl Grey (special blend)";
See also
- Recipe 13 — Handle missing values — when the string is actually a
string optionwithNONEfor missing. - Recipe 17 — Define a metric once — wrap
normas a named function to reuse across queries. - Recipe 18 — Higher-order functions on data —
String.tokensandString.maptake predicates and functions; that's higher-order thinking already.