Lexer

Lexica is a whitespace delimited language and so indentation is used for lexical scoping. This means that indentation tokens must be parsed into logical block open and close tokens.

The lexer consists of a pipeline from a string to a stream of tokens. This stream differs from a traditional iterator in that it always produces a token. End of input is denoted with a special token variant. This simplifies parsing the language.

Each stage in the pipeline transforms the provided input. An intermediary LexerToken type is used to thread Indent tokens through to further stages.

Stage

Description

Source Split

Splits a string into string slices with span information

Lexer Tokenize

Maps each string slice into a LexerToken

Indent Lexer

Maintains the indentation level and produces logical blocks

Space Lexer

Modifies logical tokens based on surrounding context

Last updated