Make_utf8.LexA lexer analyses a stream of characters and groups the stream of characters into tokens. It usually strips off whitespace. I.e. a lexer expects a stream of characters of the form
WS Token WS Token ... WS Token WS EOS
WS is a possibly empty sequence of whitespace characters like blanks, tabs and newlines and comments. Token represents a legal token. EOS represents the end of the stream.
A lexer is in one of three states:
needs_more: The lexer needs more characters from the stream of characters in order to decide the next correct token or the end of input. The lexer is ready to receive more characters via put or to receive the end of input via put_end.has_succeeded: The lexer has found a correct token or detected the end of input. In this state (except at the end of inpute) the lexer can be restarted to find the next token.has_failed_syntax: The lexer has detected a character (or the end of intput) which cannot be part of a legal token.In the state has_succeeded the lexer signals via has_consumed_end that the end of input has been reached.
A module conforming to the module type LEXER can be used in the module Parse_with_lexer to create a two stage parser where the lexer handles tokens and a combinator parser handles the higher level constructs.
A parser p is a sink of token. As long as it signals needs_more p more token can be pushed into the parser via put token p or the input stream can be ended via put_end p.
has_ended p is equivalent to not (needs_more p). has_ended p signals that the parser has either succeeded or failed.
If it has succeeded the final value is available via final p.
type token = Utf8.Decoder.tType of the tokens.
type item = tokenIn order to conform to the interface Fmlib_std.Interfaces.SINK.
val needs_more : t -> boolneeds_more p Does the parser p need more tokens?
put tok p Push token tok into the parser p.
Even if the parser has ended, more tokens can be pushed into the parser. The parser stores the token as lookahead token.
If the parser has already received the end of the token stream via put_end, then all subsequent tokens are ignored.
type final = Position.range * Token.tType of the final result.
val has_succeeded : t -> boolhas_succeeded p Has the parser p succeeded?
val has_ended : t -> boolhas_ended p Has the parser p ended parsing and either succeeded or failed?
has_ended p is the same as not (needs_more p)
final p The final object constructed by the parser p in case of success.
Precondition: has_succeeded p
type expect = string * Indent.expectation optionType of expectations.
val has_failed_syntax : t -> boolhas_failed_syntax p Has the parser p failed with a syntax error?
failed_expectations p The failed expectations due to a syntax error.
Precondition: has_failed_syntax p
val has_consumed_end : t -> boolHas the lexer consumed the end of input?
val position : t -> Position.tLine and column number of the current position of the lexer.
val start : tThe lexer for the first token.
A lexer does not consume the entire input stream. It just consumes characters until a token has been recognized. In case of the successful recognition of a token, it returns the token (see final). Then it can be restarted to recognize the next token.