Build Your Own Interpreter

What's the smallest programming language you can build an interpreter for? Let's find out by creating a complete interpreter for a Turing-complete language in just 25 lines of code.

Every programming language, from Python to JavaScript to Rust, needs something to actually run it. That something is either a compiler or an interpreter. Compilers translate code into machine instructions ahead of time. Interpreters read and execute code on the fly, one instruction at a time.

In this article, we'll build an interpreter from scratch. Not for a toy language, for a real, Turing-complete language that can theoretically compute anything any other programming language can compute. And we'll do it in remarkably few lines of code.

What we're building

By the end, you'll have built an interpreter that includes:

A memory model with cells and a movable head
Commands for incrementing, decrementing, and moving
Input/output operations
Looping with bracket matching
The ability to run any computable algorithm

What Makes a Language Turing-Complete?

Problem

How simple can a programming language be while still being able to solve real problems?

A programming language is Turing-complete if it can simulate a Turing machine, an abstract model that can implement any computer algorithm. This is the theoretical foundation of all computation.

Picture a Turing machine: an infinite tape divided into cells, each containing a symbol. A "head" can read the current cell, write to it, and move left or right. It follows simple rules: "If you read X, write Y and move left." That's enough to compute anything computable.

What does a language need to be Turing-complete? Surprisingly little:

Variables to store values (the tape cells)
Assignment to change those values (writing to cells)
Conditionals to branch based on values (if statements)
Jumps to loop or move to different code (goto/loops)

Solution

A language with just 8 single-character commands can be Turing-complete. We call our minimal language NanoLang.

Introducing NanoLang

NanoLang has exactly 8 commands. Each command is a single character. Everything else is ignored (and can be used as comments). Here's the entire language:

That's it. Eight commands. With just these, you can implement any algorithm, from Fibonacci sequences to sorting to parsing JSON. It won't be pretty, but it's possible.

The Memory Model

NanoLang programs operate on a tape of memory cells. Think of it as an array of integers, initially all zeros. We track two things:

cells: An array of integers (our tape)
cell_index: Which cell the head is currently pointing at

# The tape: 30,000 cells, all starting at 0
cells = [0] * 30000

# The head starts at position 0
cell_index = 0

Each cell can hold a value from 0 to 255 (one byte). When a cell value goes above 255, it wraps back to 0. When it goes below 0, it wraps to 255.

Stepping Through a Program

Let's trace through a simple program to understand how NanoLang executes. Use the controls to step through and watch the tape change:

Each command modifies the program state in a predictable way. The interpreter simply reads each character, applies its effect, and moves to the next instruction.

Understanding Loops

The [ and ] commands create loops. Here's how they work:

[: If the current cell is zero, jump forward to the matching ]. Otherwise, continue to the next instruction.
]: If the current cell is not zero, jump back to the matching [. Otherwise, continue forward.

This creates a while loop: keep executing the code between brackets until the current cell becomes zero.

# The loop [->+<] in NanoLang is equivalent to:
while cells[cell_index] != 0:
  cells[cell_index] -= 1    # -
  cell_index += 1           # >
  cells[cell_index] += 1    # +
  cell_index -= 1           # <
# This moves the value from cell 0 to cell 1

The Structure of an Interpreter

Most interpreters have three distinct phases:

For complex languages like Python or JavaScript, these phases are essential. The tokenizer handles whitespace and comments. The parser builds a tree representing the program structure. The runtime executes that tree.

But NanoLang is so simple that we can merge all three phases into a single loop. Each command is exactly one character (no tokenization needed), each command has immediate meaning (no parsing needed), and we can execute immediately.

Implementing the Interpreter

Here's the core of our NanoLang interpreter. It's remarkably compact:

nanolang.pypython

class NanoLang:
  def __init__(self, file_name: str):
      with open(file_name, "r") as f:
          self.source_code = f.read()

  def execute(self):
      cells = [0] * 30000
      cell_index = 0
      instruction_index = 0

      while instruction_index < len(self.source_code):
          instruction = self.source_code[instruction_index]

          match instruction:
              case ">":
                  cell_index += 1
              case "<":
                  cell_index -= 1
              case "+":
                  cells[cell_index] = (cells[cell_index] + 1) % 256
              case "-":
                  cells[cell_index] = (cells[cell_index] - 1) % 256
              case ".":
                  print(chr(cells[cell_index]), end='')
              case ",":
                  cells[cell_index] = ord(input()[0])
              case "[":
                  if cells[cell_index] == 0:
                      instruction_index = self.find_match(instruction_index, True)
              case "]":
                  if cells[cell_index] != 0:
                      instruction_index = self.find_match(instruction_index, False)

          instruction_index += 1

That's the entire execution engine. Each command maps directly to a simple operation on the program state. The match statement (Python 3.10+) handles each command type.

Why match instead of if/elif?

The match statement was added in Python 3.10. It's similar to a switch statement in other languages. For our interpreter, it makes the code cleaner by explicitly showing we're matching against specific command characters. You could use if/elif chains instead, the behavior is identical.

The Bracket Matching Algorithm

The trickiest part of our interpreter is finding matching brackets. When we hit a [ with a zero cell, we need to jump forward to the matching ]. But brackets can be nested!

The algorithm is straightforward: count nested brackets as we search. Every time we see another opening bracket, increment a counter. Every time we see a closing bracket, check if the counter is zero, if so, we found our match. Otherwise, decrement the counter.

nanolang.pypython

def find_match(self, start: int, forward: bool) -> int:
  """Find the matching bracket for the one at position 'start'."""
  depth = 0
  direction = 1 if forward else -1
  position = start + direction

  start_bracket = "[" if forward else "]"
  end_bracket = "]" if forward else "["

  while 0 <= position < len(self.source_code):
      char = self.source_code[position]

      if char == end_bracket:
          if depth == 0:
              return position  # Found the match!
          depth -= 1
      elif char == start_bracket:
          depth += 1

      position += direction

  raise ValueError(f"Unmatched bracket at position {start}")

The depth variable tracks how many "in-between" bracket pairs we've entered. Only when depth returns to zero have we found the true matching bracket.

Try It Yourself

Here's a live NanoLang interpreter. Try the presets or write your own programs:

Program ideas to try

+++++.: Output character with ASCII code 5
+++++[->++<]>.: Multiply 5 x 2, output result
+++[->+++<]>.: Compute 3 x 3 = 9

Example: Adding Two Numbers

Let's trace through a program that adds two numbers. We'll put 2 in cell 0 and 3 in cell 1, then add them together:

++        # Cell 0 = 2
>+++      # Move right, Cell 1 = 3
<         # Move back to Cell 0
[->+<]    # While Cell 0 > 0: decrement Cell 0, increment Cell 1
>         # Move to Cell 1 (now contains 5)

The loop [->+<] is the core pattern. It decrements the current cell while incrementing the next one, effectively "moving" the value. When the source cell hits zero, the loop ends.

The Complete Interpreter

Here's the full, production-ready interpreter with error handling:

nanolang.pypython

from pathlib import Path

class NanoLang:
  def __init__(self, file_name: str | Path):
      with open(file_name, "r") as f:
          self.source_code = f.read()

  def execute(self):
      cells = [0] * 30000
      cell_index = 0
      instruction_index = 0

      while instruction_index < len(self.source_code):
          instruction = self.source_code[instruction_index]

          match instruction:
              case ">":
                  cell_index += 1
              case "<":
                  cell_index -= 1
              case "+":
                  cells[cell_index] = (cells[cell_index] + 1) % 256
              case "-":
                  cells[cell_index] = (cells[cell_index] - 1) % 256
              case ".":
                  print(chr(cells[cell_index]), end='', flush=True)
              case ",":
                  cells[cell_index] = int(input()) % 256
              case "[":
                  if cells[cell_index] == 0:
                      instruction_index = self.find_match(instruction_index, True)
              case "]":
                  if cells[cell_index] != 0:
                      instruction_index = self.find_match(instruction_index, False)

          instruction_index += 1

  def find_match(self, start: int, forward: bool) -> int:
      depth = 0
      direction = 1 if forward else -1
      position = start + direction
      start_bracket = "[" if forward else "]"
      end_bracket = "]" if forward else "["

      while 0 <= position < len(self.source_code):
          char = self.source_code[position]
          if char == end_bracket:
              if depth == 0:
                  return position
              depth -= 1
          elif char == start_bracket:
              depth += 1
          position += direction

      print(f"Error: unmatched {start_bracket} at position {start}")
      return start


if __name__ == "__main__":
  import sys
  if len(sys.argv) > 1:
      NanoLang(sys.argv[1]).execute()

Save this as nanolang.py. To run a NanoLang program:

python nanolang.py program.nl

What We Built

In about 50 lines of Python, we built a complete interpreter for a Turing-complete language. Our interpreter includes:

Memory management: A 30,000-cell tape with byte-sized values
8 commands: Movement, arithmetic, I/O, and loops
Bracket matching: Efficient nested loop handling
Full Turing-completeness: Can compute any computable function

The key insight is how simple the core loop is. Read a character, apply its effect, move to the next. That's the essence of interpretation: read, evaluate, repeat.

Real interpreters for languages like Python are more complex because they have:

Multi-character tokens (identifiers, numbers, strings)
Complex syntax requiring parsing into trees
Scopes, functions, objects, and other abstractions
Optimization passes before execution

But the fundamental loop, read, understand, execute, remains the same. NanoLang strips away the complexity to reveal interpretation in its purest form.

Next steps: Try extending NanoLang with a debug mode that prints the tape after each instruction, or add a "breakpoint" command that pauses execution. How would you implement a NanoLang-to-Python transpiler?

Build Your Own Interpreter

What Makes a Language Turing-Complete?

Introducing NanoLang

The Memory Model

Stepping Through a Program

Understanding Loops

The Structure of an Interpreter

Tokenizer (Lexer)

Implementing the Interpreter

The Bracket Matching Algorithm

Try It Yourself

Example: Adding Two Numbers

The Complete Interpreter

What We Built