Back to blog

Lexer Generation with Proc Macros

How procedural macros can remove lexer boilerplate while keeping compiler frontend code type-safe and readable.

Writing a lexer by hand is a good learning exercise, but it becomes repetitive quickly. Token definitions, position tracking, literal extraction, and error handling all need to move together.

Lachs explores a small idea: describe tokens once as Rust enum variants, then let a procedural macro generate the repetitive scanning code.

use lachs::token;

#[token]
pub enum Token {
    #[terminal("+")]
    Plus,

    #[literal("[0-9]+")]
    Integer,

    #[literal("[a-zA-Z_][a-zA-Z0-9_]*")]
    Identifier,
}

What the Macro Owns

The macro can generate a lexer implementation, attach spans to tokens, compile the patterns, and expose a compact API for consumers. The caller gets the nice part: a typed token enum that matches the parser's expectations.

What Still Needs Design

Macro-generated code is only helpful when the generated behavior is predictable. Longest-match rules, whitespace handling, diagnostics, and performance characteristics need to be explicit.

Good compiler tooling is not magic. It is boring in exactly the places where boring is a virtue.