Overview

Up

Features

The parsers of this library implement parsers for Parsing Expression Grammars with parsing combinators. They have the following main features:

These features in combination are to the best of our knowledge unique in Fmlib_parse.

Parsing expression grammars are very similar to context free grammers. There are two main differences:

Accepting these restrictions leads to fairly efficient parser which can be implemented directly in a functional language.

Main Modules

All parsers in this library are combinator parsers.

All parsers are functors which need some other modules to be instantiated. All parsers have to be instantiated with the following modules:

E.g. if you want to write a parser which parses a stream of characters then you should write a module which looks like

module State =
struct
    type t = ...        (* your user state *)
    ...
end

module Final =
struct
    type t = ...        (* type of the final construct *)
    ...
end

module Semantic =
struct
    type t = ...        (* type of semantic error *)
    ...
end

module Basic = Fmlib_parse.Character.Make (State) (Final) (Semantic)

open Basic

...

After that you write combinators starting from the basic combinators of Basic until you have a combinator which represents a parser for the whole input stream i.e. a combinator of type Final.t Basic.t. Finally you use Basic.make to generate the actual parser of type Basic.Parser.t. The next sections show how to work with combinators and how to generate the actual parser.

Combinators

The grammar is described by combinators. A combinator of type 'a t returns an object of type 'a after successful parsing.

Basic Combinators

There are the following basic combinators:

As the name implies combinators can be combined to form more complex combinators.

Sequencing

If we have the combinators p: 'a t, q: 'a -> 'b t and r: 'a -> 'b -> 'c t and a function f: 'a -> 'b -> 'c -> 'd then we can form

let* a = p in
let* b = q a in
let* c = r a b in
return (f a b c)

which is a combinator of type 'd t. This combinator describes a parser which first parses p. In case of success p returns the value a. Then the parser described by q a is exectuted which returns in case of success the value b. The the parser described by r a b is executed which returns in case of success the value c. Then the parser succeeds by returning f a b c.

Choice

In order to describe syntactic alternatives there is the choice operator </>: 'a t -> 'a t -> ' t. If we have combinators p: 'a t, q: 'a t and r: 'a t then the combinator

p </> q </> r

starts by using the combinator p. If p succeeds the whole choice succeeds. If p fails without consuming tokens, the parsing resumes with the combinator q. If it succeeds then the whole choice succeeds. If q fails without consuming tokens, then the last combinator r is used.

The expression p </> q is a biased choice, because the first alternative has priority. If the first alternative succeeds the next one is not tried.

If one of the alternatives in a biased choice fails after consuming some token, then the whole construct fails. Backtracking is needed (see next section) to restore consumed tokens.

Repetition

If you have a combinator p of type 'a t which parses a certain construct, it is often necessary to parse a sequence of one or more or zero or more repetitions of the construct. All parsers have combinators to parse repetitions of a given combinator. E.g.

list_zero_or_more p

parses zero or more occurrences of the constructs parsed by p and returns them in a list. There are more combinators which parse repetitions like one_or_more, skip_one_or_more, zero_or_more, one_or_more_separated ... All these combinators can be found in Fmlib_parse.Interfaces.COMBINATOR.

Operator Expressions

Recognizing operator expressions with unary and binary operators are a very common task for parsers. It is not very complicated to make a combinator recognizing operator expressions by using a combinator of basic combinators. However many cases can be parsed by using the combinator operator_expression.

This combinator needs combinators to

Usually primary expressions are constants, identifiers or functions applied to arguments.

Furthermore combinators are needed which

The calculator example shows a simple but typical use case for this combinator.

Backtracking

Sometimes it is not possible for a combinator to fail without consuming some token. This is the case, if the first token is not sufficient to decide whether the alternative is the correct one.

In order to make such a parser fail without consuming any token it can be wrapped with the backtracking combinator. If the combinator p fails after consuming one or more tokens, then backtrack p expect fails with the expectation expect without consuming any token. The consumed tokens are pushed back to the lookahead and the original state is reestablished.

Backtracking has a cost because the consumed tokens have to be buffered and the original state has to be stored. The most efficient parsers do not use backtracking. However sometimes it is inevitable to backtrack.

State

Every parser can be instantiated with a state module. In compilers typically a symbol table is part of the state.

There are combinators to access the state during parsing. Some examples:

E.g. a source file in a programming language consists of a sequence of definitions. After successful parsing of a definition the new definition can be added to the state with the following code:

let* (name, def) = definition   (* parse a definition *)
in
update
    (State.add_definition name def)

where the module State has a function with the signature add_definition: string -> definition -> t -> t.

Make a Parser

Let's assume we have a combinator c with the type Final.t t. Then

make state c

generates a parser of type Basic.Parser.t. Let's assume p is the generated parser. Then we have the following functions to inspect the parser p:

In order to handle errors there are the following functions:

In order to push tokens into the parser p we have the function

Basic.Parser.put token p

The function returns a new parser which has either consumed the token successfully and needs more tokens or has a result (success or failure).

At the end of the token stream there is the function

Basic.Parser.put_end p

The function returns a new parser. After encountering the end, a result is available, either success or failure.

Note the inversion of control. The generated parser does not read from an input stream. Instead of reading from an input stream tokens are pushed into the parser.

Tokens can be pushed even after the parser has a result. The not needed tokens are just pushed as lookahead tokens and can be recovered by Basic.Parser.lookaheads p.

After signalling the end by put_end p to the parser, it is no longer allowed to push tokens into the parser.

Running the Parser on Streams

The generated parsers have inversion of control. However it is not necessary to use the inversion of control. A character parser can be run on a string or on an input channel.

Basic.Parser.run_on_string "..." p

Basic.Parser.run_channel ic p   (* 'ic' is an input channel *)

Up