--- /dev/null
+----------------------------------------------------------------------
+m2parsergen
+----------------------------------------------------------------------
+
+This is a parser generator for top-down (or recursively descending) parsers.
+The input file must be structured as follows:
+
+---------------------------------------- Begin of file
+
+<OCAML TEXT ("preamble")>
+
+%%
+
+<DECLARATIONS>
+
+%%
+
+<RULES>
+
+%%
+
+<OCAML TEXT ("postamble")>
+
+---------------------------------------- End of file
+
+The two-character combination %% separates the various sections. The
+text before the first %% and after the last %% will be copied verbatim
+to the output file.
+
+Within the declarations and rules sections you must use /* ... */ as
+comment braces.
+
+There are two types of declarations:
+
+%token Name
+
+declares that Name is a token without associated value, and
+
+%token <> Name
+
+declares that Name is a token with associated value (i.e. Name x).
+
+In contrast to ocamlyacc, you need not to specify a type. This is a
+fundamental difference, because m2parsergen will not generate a type
+declaration for a "token" type; you must do this yourself.
+
+You need not to declare start symbols; every grammar rule may be used
+as start symbol.
+
+The rules look like:
+
+name_of_rule(arg1, arg2, ...):
+ label1:symbol1 label2:symbol2 ... {{ CODE }}
+| label1:symbol1 label2:symbol2 ... {{ CODE }}
+...
+| label1:symbol1 label2:symbol2 ... {{ CODE }}
+
+The rules may have arguments (note that you must write the
+parantheses, even if the rule does not have arguments). Here, arg1,
+arg2, ... are the formal names of the arguments; you may refer to them
+in OCaml code.
+
+Furthermore, the symbols may have labels (you can leave the labels
+out). You can refer to the value associated with a symbol by its
+label, i.e. there is an OCaml variable with the same name as the label
+prescribes, and this variable contains the value.
+
+The OCaml code must be embraced by {{ and }}, and these separators
+must not occur within the code.
+
+EXAMPLE:
+
+prefix_term():
+ Plus_symbol Left_paren v1:prefix_term() Comma v2:prefix_term() Right_paren
+ {{ v1 + v2 }}
+| Times_symbol Left_paren v1:prefix_term() Comma v2:prefix_term() Right_paren
+ {{ v1 * v2 }}
+| n:Number
+ {{ n }}
+
+As you can see in the example, you must pass values for the arguments
+if you call non-terminal symbols (here, the argument list is empty: ()).
+
+The generated parsers behave as follows:
+
+- A rule is applicable to a token sequence if the first token is
+ matched by the rule.
+
+ In the example: prefix_term is applicable if the first token of a
+ sequence is either Plus_symbol, Times_symbol, or Number.
+
+- One branch of the applicable rule is selected: it is the first
+ branch that matches the first token. THE OTHER TOKENS DO NOT HAVE
+ ANY EFFECT ON BRANCH SELECTION!
+
+ For instance, in the following rule the second branch is never
+ selected, because only the A is used to select the branch:
+
+ a():
+ A B {{ ... }}
+ | A C {{ ... }}
+
+- Once a branch is selected, it is checked whether the branch matches
+ the token sequence. If this check succeeds, the code section of the
+ branch is executed, and the resulting value is returned to the
+ caller.
+ If the check fails, the exception Parsing.Parse_error is raised.
+ Normally, this exception is not caught, and will force the parser
+ to stop.
+
+ The check in detail:
+
+ If the rule demands a terminal, there a must be exactly this
+ terminal at the corresponding location in the token sequence.
+
+ If the rule demands a non-terminal, it is checked whether the rule
+ for to this non-terminal is applicable. If so, the branch
+ is selected, and recursively checked. If the rule is not applicable,
+ the check fails immediately.
+
+- THERE IS NO BACKTRACKING!
+
+ Note that the following works (but the construction is resolved at
+ generation time):
+
+ rule1() =
+ rule2() A B ... {{ ... }}
+
+ rule2() =
+ C {{ ... }}
+ | D {{ ... }}
+
+ In this case, the (only) branch of rule1 is selected if the next
+ token is C or D.
+
+---
+
+
+
+*** Options and repetitions ***
+
+Symbols can be tagged as being optional, or to occur repeatedly:
+
+rule():
+ Name whitespace()* Question_mark?
+
+- "*": The symbol matches zero or more occurrences.
+
+- "?": The symbol matches zero or one occurrence.
+
+This is done as follows:
+
+- terminal*: The maximum number of consecutive tokens <terminal> are
+ matched.
+- non-terminal*: The maximum number of the subsequences matching
+ <non-terminal> are matched. Before another
+ subsequence is matched, it is checked whether the
+ rule for <non-terminal> is applicable. If so, the
+ rule is invoked and must succeed (otherwise Parsing.
+ Parse_error). If not, the loop is exited.
+
+- terminal?: If the next token is <terminal>, it is matched. If not,
+ no token is matched.
+
+- non-terminal?: It is checked whether the rule for <non-terminal>
+ is applicable. If so, the rule is invoked, and
+ matches a sequence of tokens. If not, no token is
+ matched.
+
+You may refer to repeated or optional symbols by labels. In this case,
+the label is associated with lists of values, or optional values,
+respectively:
+
+rule():
+ A lab:other()* lab':unlikely()?
+ {{ let n = List.length lab in ...
+ match lab' with
+ None -> ...
+ | Some v -> ...
+ }}
+
+A different scheme is applied if the symbol is a token without
+associated value (%token Name, and NOT %token <> Name):
+
+rule():
+ A lab:B* lab':C?
+
+Here, "lab" becomes an integer variable counting the number of Bs, and
+"lab'" becomes a boolean variable denoting whether there is a C or not.
+
+
+*** Early let-binding ***
+
+You may put some OCaml code directly after the first symbol of a
+branch:
+
+rule():
+ A $ {{ let-binding }} C D ... {{ ... }}
+
+The code brace {{ let-binding }} must be preceded by a dollar
+sign. You can put "let ... = ... in" statements into this brace:
+
+rule1():
+ n:A $ {{ let twice = 2 * n in }} rule2(twice) {{ ... }}
+
+This code is executed once the branch is selected.
+
+
+*** Very early let-binding ***
+
+This is also possible:
+
+rule():
+ $ {{ CODE }}
+ A
+ ...
+
+The CODE is executed right when the branch is selected, and before any
+other happens. (Only for hacks!)
+
+
+
+*** Computed rules ***
+
+rule():
+ A $ {{ let followup = ... some function ... in }} [ followup ]()
+ {{ ... }}
+
+Between [ and ], you can refer to the O'Caml name of *any* function.
+Here, the function "followup" is bound in the let-binding.
+
+
+*** Error handling ***
+
+If a branch is already selected, but the check fails whether the other
+symbols of the branch match, it is possible to catch the resulting
+exception and to find out at which position the failure has occurred.
+
+rule():
+ x:A y:B z:C {{ ... }} ? {{ ERROR-CODE }}
+
+After a question mark, it is allowed to append another code
+brace. This code is executed if the branch check fails (but not if the
+branch is not selected nor if no branches are selected). The string
+variable !yy_position contains the label of the symbol that caused the
+failure (or it contains the empty string if the symbol does not have a
+label).
+
+Example:
+
+rule():
+ x:A y:B z:C {{ print_endline "SUCCESS" }} ? {{ print_endline !yy_position }}
+
+If the token sequence is A B C, "SUCCESS" will be printed. If the
+sequence is A C, the second symbol fails, and "y" will be printed. If
+the sequence is A B D, the third symbol fails, and "z" will be
+printed. If the sequence is B, the rule will be never selected because
+it is not applicable.
+
+
+
+*** Error recovery ***
+
+You may call the functions yy_current, yy_get_next, or one of the
+parse_* functions in the error brace to recover from the error
+(e.g. to move ahead until a certain token is reached). See below.
+
+
+
+*** How to call the parser ***
+
+The rules are rewritten into a OCaml let-binding:
+
+let rec parse_<rule1> ... = ...
+ and parse_<rule2> ... = ...
+ ...
+ and parse_<ruleN> ... = ...
+in
+
+i.e. there are lots of functions, and the name of the functions are
+"parse_" plus the name of the rules. You can call every function.
+
+The first two arguments of the functions have a special meaning; the
+other arguments are the arguments coming from the rule description:
+
+rule(a,b):
+ ...
+
+===>
+
+let rec parse_rule yy_current yy_get_next a b = ...
+
+The first argument, yy_current, is a function that returns the current
+token. The second arguments, yy_get_next, is a function that switches
+to the next token, and returns it.
+
+If the tokens are stored in a list, this may be a definition:
+
+let input = ref [ Token1; Token2; ... ] in
+let yy_current() = List.hd !input in
+let yy_get_next () =
+ input := List.tl !input;
+ List.hd !input
+
+When you call one of the parser functions, the current token must
+already be loaded, i.e. yy_current returns the first token to match by
+the function.
+
+After the functions has returned, the current token is the token
+following the sequence of tokens that have been matched by the
+function.
+
+The function returns the value computed by the OCaml code brace of the
+rule (or the value of the error brace).
+
+If the rule is not applicable, the exception Not_found is raised.
+
+If the rule is applicable, but it does not match, the exception
+Parsing.Parse_error is raised.