Fmlib_parse
Parsing Library
Introduction to Combinator Parsing
module Position : sig ... end
Represent a position in a text file.
module Located : sig ... end
A parsing construct located within a file.
module Indent : sig ... end
The allowed indentations: Helper module for indentation sensitive parsing.
module Error_reporter : sig ... end
Convenience module to generate readable error messages.
module Interfaces : sig ... end
Module types
Character parsers are the simplest parsers. The tokens are characters. In order to generate a character parser you just need 3 modules. A State
module which in many cases is just Unit
, a module Final
to describe the type of the construct which the parser returns after successful parsing and a module Semantic
which describes the semantic errors (the parser itself handles just syntax errors).
module Character : sig ... end
Character Parser: An indentation sensitive parser which parses streams of characters i.e. the token type is char
.
module Ucharacter : sig ... end
Parser for streams of unicode characters.
module Utf8 : sig ... end
Encoder and Decoder for Unicode Characters encoded in UTF-8.
module Utf16 : sig ... end
Encoders and Decoders for Unicode Characters encoded in UTF-16.
Sometimes pure character parser are not very efficient if a lot of backtracking is necessary (and for many languages backtracking is necessary). Backtracking causes all characters of a failed construct to be pushed back into the lookahead and rescanning all characters for a different construct.
For these cases the library offers parsers with 2 layers. A lexer and a token parser. The lexer parses the lexical tokens. A lexer usually needs no or very little backtracking. The token parser receives the already parsed tokens where each token is a unit consisting of all parsed characters. In case of backtracking the token parser just pushes back the whole tokens (not character by character) into the lookahead and reparses the whole tokens (again not character by character).
module Token_parser : sig ... end
Token Parser: A parser which parses streams of user supplied tokens.
module Parse_with_lexer : sig ... end
A parser which works with two components: A lexer which splits up the input into a sequence of tokens and parser which parses the tokens.
All parsers of the library are based on this generic parser. The user usually does not write a generic parser.
module Generic : sig ... end
A Generic Parser where all parameters are customizable.