Fmlib_parse.UcharacterParser for streams of unicode characters.
There are several possibilities to encode unicode characters in byte streams.
There are the following modules available:
Make_utf8: Parse text streams encoded in utf-8.Make_utf16_be: Parse text streams encoded in utf-16 big endian.Make_utf16_le: Parse text streams encoded in utf-16 little endian.Make: Parse text streams in any encoding. The encoder and decoder have to be provided as module parameter.All parsers in this module work like a character parser (see Character.Make) with some additional combinators to recognize unicode characters.
module Make_utf8
(State : Fmlib_std.Interfaces.ANY)
(Final : Fmlib_std.Interfaces.ANY)
(Semantic : Fmlib_std.Interfaces.ANY) :
sig ... endParse an input stream consisting of unicode characters encoded in utf-8.
module Make_utf16_be
(State : Fmlib_std.Interfaces.ANY)
(Final : Fmlib_std.Interfaces.ANY)
(Semantic : Fmlib_std.Interfaces.ANY) :
sig ... endParse an input stream consisting of unicode characters encoded in utf-16 big endian.
module Make_utf16_le
(State : Fmlib_std.Interfaces.ANY)
(Final : Fmlib_std.Interfaces.ANY)
(Semantic : Fmlib_std.Interfaces.ANY) :
sig ... endParse an input stream consisting of unicode characters encoded in utf-16 little endian.
module Make
(Codec : Interfaces.CHAR_CODEC)
(State : Fmlib_std.Interfaces.ANY)
(Final : Fmlib_std.Interfaces.ANY)
(Semantic : Fmlib_std.Interfaces.ANY) :
sig ... endParse an input stream consisting of unicode characters. The unicode characters are encoded and decoded by using the module Codec.