Parse_with_lexer.Make_utf8
Generate a parser with a utf8 lexer and a token parser.
The generated parser parses a stream of unicode characters encoded in utf-8. The lexer is used to convert the stream of characters into a stream of tokens of type Position.range * Token.t
which are fed into the token parser.
The type of tokens is utf-8 decoded unicode characters.
type token = Utf8.Decoder.t
Type of syntax expectations:
type expect = string * Indent.expectation option
module Lex :
Interfaces.LEXER
with type final = Position.range * Token.t
and type token = Utf8.Decoder.t
module Parse :
Interfaces.FULL_PARSER
with type state = State.t
and type token = Position.range * Token.t
and type expect = string * Indent.expectation option
and type final = Final.t
and type semantic = Semantic.t
A parser p
is a sink of token. As long as it signals needs_more p
more token can be pushed into the parser via put token p
or the input stream can be ended via put_end p
.
has_ended p
is equivalent to not (needs_more p)
. has_ended p
signals that the parser has either succeeded or failed.
If it has succeeded the final value is available via final p
.
type token = Utf8.Decoder.t
Type of the tokens.
type item = token
In order to conform to the interface Fmlib_std.Interfaces.SINK
.
val needs_more : t -> bool
needs_more p
Does the parser p
need more tokens?
put tok p
Push token tok
into the parser p
.
Even if the parser has ended, more tokens can be pushed into the parser. The parser stores the token as lookahead token.
If the parser has already received the end of the token stream via put_end
, then all subsequent tokens are ignored.
type final = Final.t
Type of the final result.
val has_succeeded : t -> bool
has_succeeded p
Has the parser p
succeeded?
val has_ended : t -> bool
has_ended p
Has the parser p
ended parsing and either succeeded or failed?
has_ended p
is the same as not (needs_more p)
val has_consumed_end : t -> bool
Has the parser consumed the end of input?
final p
The final object constructed by the parser p
in case of success.
Precondition: has_succeeded p
type expect = string * Indent.expectation option
Type of expectations.
val has_failed_syntax : t -> bool
has_failed_syntax p
Has the parser p
failed with a syntax error?
failed_expectations p
The failed expectations due to a syntax error.
Precondition: has_failed_syntax p
type semantic = Semantic.t
Type of semantic errors.
val has_failed_semantic : t -> bool
Has the parser failed because of a semantic error?
The semantic error encountered.
Precondition: A semantic error has occurred.
type state = State.t
Type of the state of the parser (in many cases unit
)
If the input stream shall be parsed in parts, then a parser with a lexer can be used for partial parsing as well.
Note that the lexer must be partial, because it succeeds after successfully parsing a lexical token from the input stream and is restarted afterwards. The restart of the lexer transfers the lookahead from the previous lexer to the next lexer.
A parser with a lexer becomes partial, if the token parser is partial. As user of this module you have to transfer only the lookahead buffer from the old token parser to the next token parser.
If the old and the new token parser have the same type, then the function make_next
can be used to transfer the lookahead buffer.
If the old and the new token parser have different types then the following will do the job. Assume that TP1.t
and TP2.t
are the types of the old and new token parser, P1.t
and P2.t
are the types of the corresponding parsers with lexers and tp2: TP2.t
is the new token parser
assert (P1.has_succeeded p1);
assert (not (P1. has_consumed_end p1));
let lex = P1.lex p1
and tp1 = P1.parse p1
in
let tp2 = TP2.fold_lookahead tp2 TP2.put TP2.put_end tp1 in
let p2 = P2.make lex tp2 in
...
Note that as described in the chapter Partial Parsing the parser p2
might have used the lookaheads of p1
to either succeed or fail. You can continue parsing the input stream only of this is not yet the case. Otherwise you might need a new subsequent token parser to continue to parse the remaining input stream.
make_next p tp
This function assumes that p
has been made with a partial token parser and has already successfully consumed a part of the input stream and tp
is the token parser which shall be used to parse the next part of the input stream.
Since the token parser contained in p
might have unconsumed lookahead tokens, these tokens must be transferred to the new token parser tp
.
The call make_next p tp
makes a new parser with lexer using the old lexer and the new token parser tp
with all the lookaheads transferred to it.
val position : t -> Position.t
The current position in the input.
val range : t -> Position.range
The current range in the input; usually the range of the first lookahead token. In case of a syntax error this is the unexpected token i.e. the token which caused the syntax error.
run_on_string str start p
Run the parser p
on the string str
starting at index start
Return the parser and the index next to be pushed in.