Module Fmlib_parse

Parsing Library

Documentation

Introduction to Combinator Parsing

Utilities

module Position : sig ... end

Represent a position in a text file.

module Located : sig ... end

A parsing construct located within a file.

module Indent : sig ... end

The allowed indentations: Helper module for indentation sensitive parsing.

module Error_reporter : sig ... end

Convenience module to generate readable error messages.

module Interfaces : sig ... end

Module types

Parsers

Parse streams of characters

Character parsers are the simplest parsers. The tokens are characters. In order to generate a character parser you just need 3 modules. A State module which in many cases is just Unit, a module Final to describe the type of the construct which the parser returns after successful parsing and a module Semantic which describes the semantic errors (the parser itself handles just syntax errors).

module Character : sig ... end

Character Parser: An indentation sensitive parser which parses streams of characters i.e. the token type is char.

Unicode Parsers

module Ucharacter : sig ... end

Parser for streams of unicode characters.

module Utf8 : sig ... end

Encoder and Decoder for Unicode Characters encoded in UTF-8.

module Utf16 : sig ... end

Encoders and Decoders for Unicode Characters encoded in UTF-16.

Parsing with lexers

Sometimes pure character parser are not very efficient if a lot of backtracking is necessary (and for many languages backtracking is necessary). Backtracking causes all characters of a failed construct to be pushed back into the lookahead and rescanning all characters for a different construct.

For these cases the library offers parsers with 2 layers. A lexer and a token parser. The lexer parses the lexical tokens. A lexer usually needs no or very little backtracking. The token parser receives the already parsed tokens where each token is a unit consisting of all parsed characters. In case of backtracking the token parser just pushes back the whole tokens (not character by character) into the lookahead and reparses the whole tokens (again not character by character).

module Token_parser : sig ... end

Token Parser: A parser which parses streams of user supplied tokens.

module Parse_with_lexer : sig ... end

A parser which works with two components: A lexer which splits up the input into a sequence of tokens and parser which parses the tokens.

Full generic parser

All parsers of the library are based on this generic parser. The user usually does not write a generic parser.

module Generic : sig ... end

A Generic Parser where all parameters are customizable.