2 * ----------------------------------------------------------------------
3 * Markup! The validating XML parser for Objective Caml.
4 * Copyright 1999 by Gerd Stolpmann. See LICENSE for details.
6 * THIS IS THE markup-0.2.10 COMPATIBLE INTERFACE TO markup_yacc.mli.
7 * It corresponds to revision 1.4 of markup_yacc.mli.
11 (*$ markup-yacc.mli *)
18 { warner : collect_warnings;
19 (* An object that collects warnings. *)
21 errors_with_line_numbers : bool;
22 (* Whether error messages contain line numbers or not. The parser
23 * is 10 to 20 per cent faster if line numbers are turned off;
24 * you get only character positions in this case.
27 processing_instructions_inline : bool;
28 (* true: turns a special mode for processing instructions on. Normally,
29 * you cannot determine the exact location of a PI; you only know
30 * in which element the PI occurs. The "inline" mode makes it possible
31 * to find the exact location out: Every PI is artificially wrapped
32 * by a special element with name "-pi". For example, if the XML text
33 * is <a><?x?><?y?></a>, the parser normally produces only an element
34 * object for "a", and puts the PIs "x" and "y" into it (without
35 * order). In inline mode, the object "a" will contain two objects
36 * with name "-pi", and the first object will contain "x", and the
39 * (1) The name "-pi" is reserved. You cannot use it for your own
40 * tags because tag names must not begin with '-'.
41 * (2) You need not to add a declaration for "-pi" to the DTD. These
42 * elements are handled separately.
43 * (3) Of course, the "-pi" objects are created from exemplars of
48 (* true: the topmost element of the XML tree is not the root element,
49 * but the so-called virtual root. The root element is a son of the
50 * virtual root. The virtual root is an ordinary element with name
52 * The following behaviour changes, too:
53 * - PIs occurring outside the root element and outside the DTD are
54 * added to the virtual root instead of the document object
55 * - If processing_instructions_inline is also turned on, these PIs
56 * are added inline to the virtual root
58 * (1) The name "-vr" is reserved. You cannot use it for your own
59 * tags because tag names must not begin with '-'.
60 * (2) You need not to add a declaration for "-vr" to the DTD. These
61 * elements are handled separately.
62 * (3) Of course, the "-vr" objects are created from exemplars of
66 (* The following options are not implemented, or only for internal
70 debugging_mode : bool;
75 Entity of ((dtd -> Pxp_entity.entity) * Markup_reader.resolver)
76 | Channel of in_channel
79 | ExtID of (ext_id * Markup_reader.resolver)
83 * The sources do not have all the same capabilities. Here the differences:
85 * - File: A File source reads from a file by name. This has the advantage
86 * that references to external entites can be resolved. - The problem
87 * with SYSTEM references is that they usually contain relative file
88 * names; more exactly, a file name relative to the document containing it.
89 * It is only possible to convert such names to absolute file names if the
90 * name of the document containing such references is known; and File
93 * - Channel, Latin1: These sources read from documents given as channels or
94 * (Latin 1-encoded) strings. There is no file name, and because of this
95 * the documents must not contain references to external files (even
96 * if the file names are given as absolute names).
98 * - ExtID(x,r): The identifier x (either the SYSTEM or the PUBLIC name) of the
99 * entity to read from is passed to the resolver r as-is.
100 * The intention of this option is to allow customized
101 * resolvers to interpret external identifiers without any restriction.
102 * For example, you can assign the PUBLIC identifiers a meaning (they
103 * currently do not have any), or you can extend the "namespace" of
105 * ExtID is the interface of choice for own extensions to resolvers.
107 * - Entity(m,r): You can implementy every behaviour by using a customized
108 * entity class. Once the DTD object d is known that will be used during
109 * parsing, the entity e = m d is determined and used together with the
111 * This is only for hackers.
116 { map : (node_type, 'ext node) Hashtbl.t;
117 default_element : 'ext node;
119 (* Specifies which node to use as exemplar for which node type. See the
120 * manual for explanations.
123 val default_config : config
124 (* - The resolver is able to read from files by name
125 * - Warnings are thrown away
126 * - Error message will contain line numbers
127 * - The internal encoding is ISO-8859-1
128 * - standalone declaration is checked
131 val default_extension : ('a node extension) as 'a
132 (* A "null" extension; an extension that does not extend the funtionality *)
134 val default_dom : ('a node extension as 'a) domspec
135 (* Specifies that you do not want to use extensions. *)
137 val parse_dtd_entity : config -> source -> dtd
138 (* Parse an entity containing a DTD, and return this DTD. *)
140 val parse_document_entity : config -> source -> 'ext domspec -> 'ext document
141 (* Parse a closed document, i.e. a document beginning with <!DOCTYPE...>,
142 * and validate the contents of the document against the DTD contained
143 * and/or referenced in the document.
146 val parse_content_entity : config ->
151 (* Parse a file representing a well-formed fragment of a document. The
152 * fragment must be a single element (i.e. something like <a>...</a>;
153 * not a sequence like <a>...</a><b>...</b>). The element is validated
154 * against the passed DTD, but it is not checked whether the element is
155 * the root element specified in the DTD.
156 * Note that you can create DTDs that specify not to validate at all
157 * (invoke method allow_arbitrary on the DTD).
160 val parse_wf_entity : config -> source -> 'ext domspec -> 'ext document
161 (* Parse a closed document (see parse_document_entity), but do not
162 * validate it. Only checks on well-formedness are performed.
168 (* ======================================================================
172 * Revision 1.1 2000/11/17 09:57:30 lpadovan
175 * Revision 1.1 2000/05/29 23:43:51 gerd
176 * Initial compatibility revision.
178 * ======================================================================
181 * Revision 1.4 2000/05/29 21:14:57 gerd
182 * Changed the type 'encoding' into a polymorphic variant.
184 * Revision 1.3 2000/05/27 19:24:01 gerd
185 * New option: recognize_standalone_declaration.
187 * Revision 1.2 2000/05/20 20:31:40 gerd
188 * Big change: Added support for various encodings of the
189 * internal representation.
191 * Revision 1.1 2000/05/06 23:21:49 gerd
194 * Revision 1.9 2000/04/30 18:23:38 gerd
195 * New config options 'processing_instructions_inline' and
198 * Revision 1.8 2000/03/13 23:46:46 gerd
199 * Change: The 'resolver' component of the 'config' type has
200 * disappeared. Instead, there is a new resolver component in the Entity
201 * and ExtID values of 'source'. I hope that this makes clearer that the
202 * resolver has only an effect if used together with Entity and ExtID
204 * Change: The Entity value can now return the entity dependent
205 * on the DTD that is going to be used.
207 * Revision 1.7 2000/02/22 02:32:02 gerd
210 * Revision 1.6 2000/02/22 01:52:45 gerd
211 * Added documentation.
213 * Revision 1.5 2000/01/20 20:54:43 gerd
214 * New config.errors_with_line_numbers.
216 * Revision 1.4 1999/09/01 23:09:10 gerd
217 * New function parse_wf_entity that simulates a well-formedness
220 * Revision 1.3 1999/09/01 16:26:36 gerd
221 * Added an empty line. This is *really* a big change.
223 * Revision 1.2 1999/08/14 22:20:27 gerd
224 * The "config" slot has now a component "warner"which is
225 * an object with a "warn" method. This is used to warn about characters
226 * that cannot be represented in the Latin 1 alphabet.
227 * Furthermore, there is a new component "debugging_mode".
229 * Revision 1.1 1999/08/10 00:35:52 gerd