A partial parser is a parser which parses a part of the input stream and not the whole input stream until the end of input is reached. Some examples where a partial parser is useful:
#include file
or Ocaml's open
. The reading of another file is I/O which cannot be done within a parsing combinator.The input stream for partial parsing might look like
A B C EOS
where A
, B
, C
are certain grammatical constructs and EOS
marks the end of input. In that case we split the input stream into three different parts. The parsers parsing the constructs A
and B
have to be partial parsers. They can terminate successfully without consuming the end of input. Then parser for C
is a complete parser, it terminates successfully only if followed by the end of input.
Furthermore it is possible that the input stream looks like
A A A ... EOS
i.e. the stream consist of zero or more (or one or more) grammatical constructs A
.
In order to parse A*
i.e. zero ore more repetitions of the construct A
we need a partial parser which parses either an A
or the end of input.
In order to parse A+
i.e. one or more repetitions of the construct A
we need a partial parser which parses an A
with an optional end of input at its end.
The structure of the stream suited for partial parsing can have arbitrary loops.
A B+ C* D* EOS
But the structure must be linear. It is not possible to nest partial parsers.
Usually you have a combinator final: Final.t
which parses a grammatical construct A
. If the structure of the stream is A EOS
i.e. the structure A
is ended by the end of input, then you make the final parser by
make start_state final
The function make
converts a combinator and adds to it the expectation of the end of input after the construct. Therefore the parser cannot succeed unless the construct A
fills the complete input stream to the end.
In order to make partial parser there is the function
make_partial start_state final
This function makes a partial parser parsing a grammatical construct A
without expecting the end of input after A
.
The function make
is implemented by the function make_partial
in the following way:
let make (state: State.t) (final: Final.t t): Parser.t =
make_partial state (final >>= expect_end)
Now suppose we want to write a parser which parse an input stream of the form
A+ EOS
and we have a combinator final: Final.t t
which parses a construct A
. We construct a partial parser by
make_partial
state
(
let* a = final in
expect_end a
</>
return a
)
This parser parse a construct A
followed by an optional end of input. In both cases (with or without end of input) the parser can succeed and return an object of type Final.t
. The function has_consumed_end
can be used to decide if the parsing has ended of if there is a next partial parser needed to parse the remainder of the input stream.
Now let the input stream look like
A* EOS
and let final: Final.t t
be the combinator which parses the construct A
.
In that case we need an object eos: Final.t
which the parser returns if the end of input has been reached. We can construct the parser by
make_partial
state
(final </> expect_end eos)
This parser in case of success has either parsed successfully a construct A
or has reached the end of input.
The termination of the loop can be detected either by looking at has_consumed_end
or by comparing the final result of the parser with eos
.
Let's assume our input stream has the structure
A B EOS
and pa:P.t
is a parser parsing A
without consuming the end of input. Furthermore assume that pa
has finished successfully without consuming the end of input.
assert (P.has_succeeded pa);
assert (not (P.has_consumed_end pa));
At this point we can extract the construct representing A
and the current state
let state = P.state pa
and a = P.final pa
Now we can do anything we want (make I/O etc.) using this information.
After that we want to finish parsing with the next parser. For that we need a combinator recognizing the construct B
, say b_combi
, and a new state (might be the same as state
) and make a complete parser (the parser has to consume the end of input) by
let pb = make new_state b_combi
This is not yet enough, because the parser pa
might have loaded some lookahead which is part of the construct B
, clearly without consuming it. We have to transfer the lookahead of pa
to pb
by
let pb = P.transfer_lookahead pa pb
In extreme cases this might put the parser pb
into a success state. You have to check that before continue with the parsing. If this is not yet the case i.e. the parser pb
needs more, then you can continue the parsing.
The description in the previous section only works, if the parsers pa
and pb
have the same type. In this section we assume that the types are different pa: PA.t
and pb: PB.t
. In order to transfer the lookaheads from pa
to pb
there is a generic function fold_lookahead
which can be called by
let pb = PA.fold_lookahead pb PB.put PB.put_end pa
Then, as above, check if pb
has_succeeded or failed. If it needs more tokens the parsing can continue.
For two stage parsers i.e. parsers with lexers the pasting together of subsequent parser works in principle in the same way. The user of the library only has to paste the token parser together. For details see the documentation Fmlib_parse.Parse_with_lexer.Make