Error Messages



The parsers in Fmlib_parse support the generation of user friendly error messages. There are 2 types of errors:

Syntax Errors

Available Error Information

Why is there a list of expectations in case of a syntax error? Because of alternatives like p </> q </> r. If this construct fails syntactically because all three combinators have failed without consuming any token, then the expectations of all three combinators are in the list of syntax expectations.

Remember that alternatives are tried only if a combiator fails without consuming tokens (this can be enforced by backtracking). If in p </> q </> r the combinator p fails by consuming tokens, then the alternatives are not even tried. However p has failed because it encountered something unexpected after successfully consuming some tokens. At that specific state it has some expectations which are reported as errors.

In order to generate syntax error messages the following functions are available in a parser:

With these functions it is possible to write quite informative error messages.

Let us look at some syntax errors in the calculator example.


1 + ,2 + 2

There is an unexpected comma in line 0 at column 4. The first lookahead token is ','.

At the position of the comma, the parser would expect one of the following:

With the available information it is possible to generate an error message like:

    0 | 1 + ,2 + 2
    I have found an unexpected ','. I was expecting one of
    - ' '
    - '\n'
    - '-'
    - '('
    - '['
    - digit

Improved Syntax Errors

The error message in the previous section is already quite readable. However it can be improved by giving the user more relevant information.

It is quite useless to inform the user about expected whitespace. Whitespace can occur nearly everywhere. This does not give any information. In the library there is a generic combinator no_expectations to wrap combinators like the whitespace combinator.

If we use

let whitespace: int t =
    char ' ' </> char '\n'
    |> skip_zero_or_more
    |> no_expectations

as the whitespace combinator then we get rid of the useless information about expected whitespace characters.

We can do better with the expected parentheses. It is more instructive to the user to tell him that an opening parenthesis has been expected than telling each parenthesis as a separated expectation.

By using

let lpar: char t =
    lexeme (
        map (fun _ -> ')') (char '(')
        map (fun _ -> ']') (char '[')
    "opening parenthesis '(' or '['"

we give to the user a more instructive error message. The operator <?> let us collapse several failed alternatives into a more abstract expectation. With p </> q </> r <?> "message" we bundle the 3 expectations into one expectation.

We can use <?> to improve the error message above furthermore. It is better to report about an expected number than reporting an expected digit. We can achieve this by

let number: int t =
        (fun d -> d)
        (fun v d -> 10 * v + d)

Here we have added the no_expectations combinator in order to not report the expectation of one more digit in case that there have been already sufficient digits to form a number.

With all these improvements we are able to generate the following error message:

    0 | 1 + ,2 + 2
    I have found an unexpected ','. I was expecting one of
    - '-'
    - opening parenthesis '(' or '['
    - number

Maybe it would be even better to report unary '-' instead of '-'.

Semantic Errors

Semantic errors are triggered by the user by calling fail error where error is the semantic error message. In the calculator example we have triggered an error message when division by zero or a negative exponent occurred.

The position returned by position parse is not very interesting. The parser has already found a syntactically valid construct. Therefore the position points beyond the end of the construct. In order to form an informative error message we want to have the start position and the end position of the construct which failed semantically.

In the character parser there is a combinator located. If we wrap the combinator p in located p then we get the result of p with the start and end position.

The located combinator is useful only for constructs which do not have trailing whitespace. We are usually interested in the position range of the construct without the whitespace.

In the calculator example of the previous chapter the recommended wrapping of operators and numbers is the following:

type operator = Position.range * char
type operand  = Position.range * int

let unary_operator: operator t =
    lexeme (char '-' |> located)

let binary_operator: operator t =
    let op_chars = "+-*/^"
    one_of_chars op_chars "binary operator"

let number: operand t =
        (fun d -> d)
        (fun v d -> 10 * v + d)

Note that the located combinator is called before removing the whitespace (i.e. calling lexeme).

With this modification we get all operators and the numbers with the additional range information.

The combinator make_binary has to be modified to use this information correctly in the success cases and in the case of a semantic failure.

let make_binary
        (((p1, _), a): operand)
        ((_, o): operator)
        (((pb1, p2), b): operand)
    : operand t
    match o with
    | '+' ->
        return ((p1, p2), a + b)
    | '/' ->
        if b = 0 then
            fail ((pb1, p2), "Zero divisor")
            return ((p1, p2), a / b)
    | '^' ->
        if b < 0 then
            fail ((pb1, p2), "Negative exponent")
            return ((p1, p2), power a b)

In this case the type of the semantic error has to be described by the module

module Semantic =
    type t = Position.range * string

and not by the module String.

Suppose we feed the calculator parser with the following input

1 + 2 / (4 - 4)

The parser fails semantically because division by zero is not allowed. With the available error information we can generate the following error message:

    0 | 1 + 2 / (4 - 4)

    I have encountered a

        Zero divisor

    which is not allowed.

In the token parser there is no located combinator. There is no need, because the tokens already contain the range information.