Part 1: The Shape of Lisp
Before writing a single line of F#, we need to understand why Lisp is used to teach interpreter theory and what properties of its design make an interpreter so tractable to implement.
The Series at a Glance
Each part of this series builds on the previous, adding one layer at a time:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart TB
P1["Part 1\nThe Shape of Lisp\nWhy Lisp? S-expressions. REPL."]
P2["Part 2\nTokenizing & Reading\nText → LispVal tree"]
P3["Part 3\nEnvironments & Evaluation\neval / apply / env chain"]
P4["Part 4\nSpecial Forms & Closures\ndefine · if · lambda"]
P5["Part 5\nDerived Forms & REPL\nlet · cond · interactive shell"]
P6["Part 6\nTail-Call Optimization\nStack-safe recursion"]
P1 --> P2 --> P3 --> P4 --> P5 --> P6
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
classDef orange fill:#DE8F05,color:#fff,stroke:#DE8F05
classDef teal fill:#029E73,color:#fff,stroke:#029E73
classDef purple fill:#CC78BC,color:#fff,stroke:#CC78BC
classDef brown fill:#CA9161,color:#fff,stroke:#CA9161
classDef gray fill:#808080,color:#fff,stroke:#808080
class P1 blue
class P2 orange
class P3 teal
class P4 purple
class P5 brown
class P6 gray
The Problem with Most Language Syntaxes
Most programming languages are surprisingly hard to parse. Consider a fragment of arithmetic:
3 + 4 * 2
A naive left-to-right reading gives (3 + 4) * 2 = 14. The correct answer is 3 + (4 * 2) = 11. The parser must know that * binds more tightly than + — this is operator precedence. Real languages have dozens of precedence levels, associativity rules, and special syntactic forms (if/else, for, while, match) that each require dedicated grammar rules and parser branches.
Infix language parser — must resolve precedence, associativity, and special forms:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart LR
I1["3 + 4 * 2"] --> I2["Precedence table\n15+ levels in Java"] --> I3["Associativity\nrules"] --> I4["Special form\ngrammar"] --> I5["AST"]
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
class I1,I2,I3,I4,I5 blue
Lisp parser — one recursive rule, no precedence table needed:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart LR
L1["(+ 3 (* 4 2))"] --> L2["One rule:\nLPAREN = read list\nelse = read atom"] --> L5["AST"]
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class L1,L2,L5 teal
Lisp removes all of it.
S-Expressions: One Rule to Parse Them All
Lisp programs are built from S-expressions (symbolic expressions). An S-expression is either:
- An atom — a number, a string, a boolean, or a symbol (identifier)
- A list — zero or more S-expressions enclosed in parentheses
That is the complete grammar. There are no statements, no operators with precedence, no special syntactic forms. Everything is a list.
42 ; atom: number
"hello" ; atom: string
true ; atom: boolean
x ; atom: symbol (identifier)
() ; empty list
(1 2 3) ; list of three numbers
(+ 1 2) ; list: symbol + followed by two numbers
(define x 10) ; list: symbol define, symbol x, number 10
(if (> x 0) x (- x)) ; nested listsThe grammar expressed as a diagram:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart LR
SE["S-expression"]
A["Atom"]
L["List\n'(' S-expr* ')'"]
N["Number\n42, -7, 3.14"]
S["Symbol\nx, +, define"]
ST["String\nhello"]
B["Boolean\n#t, #f"]
SE --> A
SE --> L
L -->|"contains zero or more"| SE
A --> N
A --> S
A --> ST
A --> B
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
classDef orange fill:#DE8F05,color:#fff,stroke:#DE8F05
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class SE blue
class A,L orange
class N,S,ST,B teal
Parsing this grammar requires exactly two rules:
- If you see
(, read S-expressions until you see), collect them as a list. - Otherwise, read characters until whitespace or
)— that is an atom.
A complete parser is about 20 lines of code.
S-Expressions as Trees
Every S-expression is a tree. (+ (* 2 3) (- 5 1)) parses into:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
graph LR
root["List"]
plus["Symbol '+'"]
mul["List"]
sub["List"]
mulsym["Symbol '*'"]
two["Number 2"]
three["Number 3"]
subsym["Symbol '-'"]
five["Number 5"]
one["Number 1"]
root --> plus
root --> mul
root --> sub
mul --> mulsym
mul --> two
mul --> three
sub --> subsym
sub --> five
sub --> one
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
classDef orange fill:#DE8F05,color:#fff,stroke:#DE8F05
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class root blue
class mul,sub orange
class plus,mulsym,subsym,two,three,five,one teal
The tree structure encodes evaluation order: children are evaluated before parents. No precedence rules needed — the nesting is the precedence.
Prefix Notation Eliminates Ambiguity
Notice that (+ 1 2) places the operator before the operands — this is called prefix notation (or Polish notation). It completely eliminates operator precedence as a parsing concern.
Infix: 3 + 4 * 2 — ambiguous without precedence rules.
Prefix: (+ 3 (* 4 2)) — unambiguous. The nesting of parentheses encodes precedence explicitly.
The trade-off is verbose notation. The payoff is a parser that is trivially simple.
Evaluation Follows Structure
Once parsed into nested lists, evaluation follows the structure directly:
- An atom evaluates to itself (numbers, strings, booleans) or looks up its value in the environment (symbols).
- A list is a function call by default: evaluate the first element to get a function, evaluate the remaining elements to get arguments, apply the function to the arguments.
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart TB
expr["(+ 1 2)\nList [Symbol '+'; Number 1; Number 2]"]
e1["eval Symbol '+'\n→ builtin addition fn"]
e2["eval Number 1\n→ 1"]
e3["eval Number 2\n→ 2"]
app["apply addition to (1, 2)"]
res["Number 3"]
expr --> e1
expr --> e2
expr --> e3
e1 --> app
e2 --> app
e3 --> app
app --> res
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
classDef orange fill:#DE8F05,color:#fff,stroke:#DE8F05
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class expr blue
class e1,e2,e3 orange
class app,res teal
This maps naturally to a recursive function eval that dispatches on the shape of the input. No need for an abstract syntax tree separate from the parsed form — the parsed form is the AST.
Homoiconicity: Code as Data
In most languages, source code and runtime data are completely separate. A Java program cannot easily manipulate Java source code as a value at runtime.
Lisp programs are written in the same structure that Lisp uses for lists at runtime:
Other languages — code and data are completely separate:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart LR
src["Source code\n(text)"] -. "completely separate" .-> rt["Runtime data\n(objects, arrays)"]
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
class src,rt blue
Lisp (homoiconic) — code and data share the same list structure:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart LR
lsrc["(+ 1 2)\nSource code"] <-->|"same structure"| lrt["List: Symbol + Number Number\nRuntime data"]
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class lsrc,lrt teal
This property — code and data sharing the same representation — is called homoiconicity. For our interpreter, it means the data structure we produce during parsing and the data structure we manipulate during evaluation are identical. There is no "compile to AST" step distinct from "parse".
What We Build: A Scheme Dialect
We implement a subset of Scheme, one of the two main Lisp dialects (alongside Common Lisp). Scheme is preferred for interpreter pedagogy because it is minimal and formally specified.
The Scheme standard (R5RS) mandates two properties that will shape our implementation:
- Lexical scope — a function closes over the environment where it was defined, not where it is called.
- Tail-call optimization — a function call in tail position must not consume additional stack space.
Both are correctness requirements, not optional features.
The Read-Eval-Print Loop
Every Lisp system is traditionally structured as a REPL:
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart TB
R["READ\nparse one S-expression\nfrom input"]
E["EVAL\nevaluate it in\ncurrent environment"]
P["PRINT\ndisplay the\nresult"]
L["LOOP\ngo back\nto READ"]
R --> E --> P --> L --> R
classDef blue fill:#0173B2,color:#fff,stroke:#0173B2
classDef orange fill:#DE8F05,color:#fff,stroke:#DE8F05
classDef teal fill:#029E73,color:#fff,stroke:#029E73
classDef purple fill:#CC78BC,color:#fff,stroke:#CC78BC
class R blue
class E orange
class P teal
class L purple
This structure is pedagogically valuable because it exposes each phase as a distinct, composable step. We will implement each phase separately before wiring them into the loop in Part 5.
What We Will and Will Not Build
In scope (Parts 1–6):
- Tokenizer and recursive descent parser
LispValdiscriminated union (the AST/value type)- Environment model with lexical scope
eval/applymutual recursion- Special forms:
define,if,lambda,begin,quote - Derived forms:
let,cond - Numeric and list primitives
- Interactive REPL
- Tail-call optimization via loop transform
Out of scope (explicitly excluded):
call/cc— continuations require a complete interpreter rewrite- Hygienic macros —
define-syntax/syntax-rulesare the advanced follow-up - Garbage collection — delegated to the .NET runtime
- The full R5RS standard library
Summary
%% Color palette: Blue #0173B2, Orange #DE8F05, Teal #029E73, Purple #CC78BC, Brown #CA9161, Gray #808080
flowchart TB
subgraph Why["Why Lisp for interpreter theory"]
R1["S-expression syntax\ntrivially parseable"]
R2["Prefix notation\neliminates operator precedence"]
R3["Evaluation follows\nparsed structure"]
R4["Homoiconicity\ncode = data"]
R5["Scheme's formal semantics\nforces correct lexical scope + TCO"]
end
classDef teal fill:#029E73,color:#fff,stroke:#029E73
class R1,R2,R3,R4,R5 teal
In Part 2, we implement the tokenizer and recursive descent parser, producing the LispVal type that will carry values through all subsequent parts.
Last updated April 19, 2026