Lexer Generation with Proc Macros
How procedural macros can remove lexer boilerplate while keeping compiler frontend code type-safe and readable.
Writing a lexer by hand is a good learning exercise, but it becomes repetitive quickly. Token definitions, position tracking, literal extraction, and error handling all need to move together.
Lachs explores a small idea: describe tokens once as Rust enum variants, then let a procedural macro generate the repetitive scanning code.
use lachs::token;
#[token]
pub enum Token {
#[terminal("+")]
Plus,
#[literal("[0-9]+")]
Integer,
#[literal("[a-zA-Z_][a-zA-Z0-9_]*")]
Identifier,
}
What the Macro Owns
The macro can generate a lexer implementation, attach spans to tokens, compile the patterns, and expose a compact API for consumers. The caller gets the nice part: a typed token enum that matches the parser's expectations.
What Still Needs Design
Macro-generated code is only helpful when the generated behavior is predictable. Longest-match rules, whitespace handling, diagnostics, and performance characteristics need to be explicit.
Good compiler tooling is not magic. It is boring in exactly the places where boring is a virtue.