This commit was manufactured by cvs2svn to create branch

[helm.git] / helm / DEVEL / pxp / pxp / doc / manual / src / markup.sgml
diff --git a/helm/DEVEL/pxp/pxp/doc/manual/src/markup.sgml b/helm/DEVEL/pxp/pxp/doc/manual/src/markup.sgml

deleted file mode 100644 (file)

index 1cb2064..0000000
--- a/helm/DEVEL/pxp/pxp/doc/manual/src/markup.sgml
+++ /dev/null
@@ -1,5109 +0,0 @@
-<!DOCTYPE book PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
-<!ENTITY markup "<acronym>PXP</acronym>">
-<!ENTITY pxp "<acronym>PXP</acronym>">
-<!ENTITY % readme.code.to-html SYSTEM "readme.ent">
-<!ENTITY apos "&#39;">
-<!ENTITY percent "&#37;">
-<!ENTITY % get.markup-yacc.mli SYSTEM "yacc.mli.ent">
-<!ENTITY % get.markup-dtd.mli SYSTEM "dtd.mli.ent">
-%readme.code.to-html;
-%get.markup-yacc.mli;
-%get.markup-dtd.mli;
-
-<!ENTITY fun "-&gt;">                       <!-- function type operator -->
-
-]>
-
-
-<book>
-
-  <title>The PXP user's guide</title>
-  <bookinfo>
-    <!-- <bookbiblio> -->
-    <authorgroup>
-      <author>
-       <firstname>Gerd</firstname>
-       <surname>Stolpmann</surname>
-       <authorblurb>
-         <para>
-        <address>
-          <email>gerd@gerd-stolpmann.de</email>
-        </address>
-      </para>
-       </authorblurb>
-      </author>
-    </authorgroup>
-    
-    <copyright>
-      <year>1999, 2000</year><holder>Gerd Stolpmann</holder>
-    </copyright>
-    <!-- </bookbiblio> -->
-
-    <abstract>
-      <para>
-&markup; is a validating parser for XML-1.0 which has been
-written entirely in Objective Caml.
-</para>
-      <formalpara>
-       <title>Download &markup;: </title>
-       <para>
-The free &markup; library can be downloaded at
-<ulink URL="http://www.ocaml-programming.de/packages/">
-http://www.ocaml-programming.de/packages/
-</ulink>. This user's guide is included.
-Newest releases of &markup; will be announced in
-<ulink URL="http://www.npc.de/ocaml/linkdb/">The OCaml Link
-Database</ulink>.
-</para>
-      </formalpara>
-    </abstract>
-
-    <legalnotice>
-      <title>License</title>
-      <para>
-This document, and the described software, "&markup;", are copyright by
-Gerd Stolpmann. 
-</para>
-
-<para>
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this document and the "&markup;" software (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-</para>
-      <para>
-The above copyright notice and this permission notice shall be included
-in all copies or substantial portions of the Software.
-</para>
-      <para>
-The Software is provided ``as is'', without warranty of any kind, express
-or implied, including but not limited to the warranties of
-merchantability, fitness for a particular purpose and noninfringement.
-In no event shall Gerd Stolpmann be liable for any claim, damages or
-other liability, whether in an action of contract, tort or otherwise,
-arising from, out of or in connection with the Software or the use or
-other dealings in the software.
-</para>
-    </legalnotice>
-
-  </bookinfo>
-
-
-<!-- ********************************************************************** -->
-
-  <part>
-    <title>User's guide</title>
-    
-    <chapter>
-      <title>What is XML?</title>
-
-      <sect1>
-       <title>Introduction</title>
-
-       <para>XML (short for <emphasis>Extensible Markup Language</emphasis>)
-generalizes the idea that text documents are typically structured in sections,
-sub-sections, paragraphs, and so on. The format of the document is not fixed
-(as, for example, in HTML), but can be declared by a so-called DTD (document
-type definition). The DTD describes only the rules how the document can be
-structured, but not how the document can be processed. For example, if you want
-to publish a book that uses XML markup, you will need a processor that converts
-the XML file into a printable format such as Postscript. On the one hand, the
-structure of XML documents is configurable; on the other hand, there is no
-longer a canonical interpretation of the elements of the document; for example
-one XML DTD might want that paragraphes are delimited by
-<literal>para</literal> tags, and another DTD expects <literal>p</literal> tags
-for the same purpose. As a result, for every DTD a new processor is required.
-</para>
-
-       <para>
-Although XML can be used to express structured text documents it is not limited
-to this kind of application. For example, XML can also be used to exchange
-structured data over a network, or to simply store structured data in
-files. Note that XML documents cannot contain arbitrary binary data because
-some characters are forbidden; for some applications you need to encode binary
-data as text (e.g. the base 64 encoding).
-</para>
-
-
-       <sect2>
-         <title>The "hello world" example</title>
-       <para>
-The following example shows a very simple DTD, and a corresponding document
-instance. The document is structured such that it consists of sections, and
-that sections consist of paragraphs, and that paragraphs contain plain text:
-</para>
-
-       <programlisting>
-<![CDATA[<!ELEMENT document (section)+>
-<!ELEMENT section (paragraph)+>
-<!ELEMENT paragraph (#PCDATA)>
-]]>
-</programlisting>
-
-       <para>The following document is an instance of this DTD:</para>
-      
-       <programlisting>
-<![CDATA[<?xml version="1.0" encoding="ISO-8859-1"?>
-<!DOCTYPE document SYSTEM "simple.dtd">
-<document>
-  <section>
-    <paragraph>This is a paragraph of the first section.</paragraph>
-    <paragraph>This is another paragraph of the first section.</paragraph>
-  </section>
-  <section>
-    <paragraph>This is the only paragraph of the second section.</paragraph>
-  </section>
-</document>
-]]>
-</programlisting>
-
-       <para>As in HTML (and, of course, in grand-father SGML), the "pieces" of
-the document are delimited by element braces, i.e. such a piece begins with
-<literal>&lt;name-of-the-type-of-the-piece&gt;</literal> and ends with
-<literal>&lt;/name-of-the-type-of-the-piece&gt;</literal>, and the pieces are
-called <emphasis>elements</emphasis>. Unlike HTML and SGML, both start tags and
-end tags (i.e. the delimiters written in angle brackets) can never be left
-out. For example, HTML calls the paragraphs simply <literal>p</literal>, and
-because paragraphs never contain paragraphs, a sequence of several paragraphs
-can be written as:
-
-<programlisting><![CDATA[<p>First paragraph 
-<p>Second paragraph]]></programlisting>
-
-This is not possible in XML; continuing our example above we must always write
-
-<programlisting><![CDATA[<paragraph>First paragraph</paragraph>
-<paragraph>Second paragraph</paragraph>]]></programlisting>
-
-The rationale behind that is to (1) simplify the development of XML parsers
-(you need not convert the DTD into a deterministic finite automaton which is
-required to detect omitted tags), and to (2) make it possible to parse the
-document independent of whether the DTD is known or not.
-</para>
-
-<para>
-The first line of our sample document,
-
-<programlisting>
-<![CDATA[<?xml version="1.0" encoding="ISO-8859-1"?>]]>
-</programlisting>
-
-is the so-called <emphasis>XML declaration</emphasis>. It expresses that the
-document follows the conventions of XML version 1.0, and that the document is
-encoded using characters from the ISO-8859-1 character set (often known as
-"Latin 1", mostly used in Western Europe). Although the XML declaration is not
-mandatory, it is good style to include it; everybody sees at the first glance
-that the document uses XML markup and not the similar-looking HTML and SGML
-markup languages. If you omit the XML declaration, the parser will assume
-that the document is encoded as UTF-8 or UTF-16 (there is a rule that makes
-it possible to distinguish between UTF-8 and UTF-16 automatically); these
-are encodings of Unicode's universal character set. (Note that &pxp;, unlike its
-predecessor "Markup", fully supports Unicode.)
-</para>
-
-<para>
-The second line,
-
-<programlisting>
-<![CDATA[<!DOCTYPE document SYSTEM "simple.dtd">]]>
-</programlisting>
-
-names the DTD that is going to be used for the rest of the document. In
-general, it is possible that the DTD consists of two parts, the so-called
-external and the internal subset. "External" means that the DTD exists as a
-second file; "internal" means that the DTD is included in the same file. In
-this example, there is only an external subset, and the system identifier
-"simple.dtd" specifies where the DTD file can be found. System identifiers are
-interpreted as URLs; for instance this would be legal:
-
-<programlisting>
-<![CDATA[<!DOCTYPE document SYSTEM "http://host/location/simple.dtd">]]>
-</programlisting>
-
-Please note that &pxp; cannot interpret HTTP identifiers by default, but it is
-possible to change the interpretation of system identifiers.
-</para>
-
-       <para>
-The word immediately following <literal>DOCTYPE</literal> determines which of
-the declared element types (here "document", "section", and "paragraph") is
-used for the outermost element, the <emphasis>root element</emphasis>. In this
-example it is <literal>document</literal> because the outermost element is
-delimited by <literal>&lt;document&gt;</literal> and
-<literal>&lt;/document&gt;</literal>. 
-</para>
-
-       <para>
-The DTD consists of three declarations for element types:
-<literal>document</literal>, <literal>section</literal>, and
-<literal>paragraph</literal>. Such a declaration has two parts:
-
-<programlisting>
-&lt;!ELEMENT <replaceable>name</replaceable> <replaceable>content-model</replaceable>&gt;
-</programlisting>
-
-The content model is a regular expression which describes the possible inner
-structure of the element. Here, <literal>document</literal> contains one or
-more sections, and a <literal>section</literal> contains one or more
-paragraphs. Note that these two element types are not allowed to contain
-arbitrary text. Only the <literal>paragraph</literal> element type is declared
-such that parsed character data (indicated by the symbol
-<literal>#PCDATA</literal>) is permitted.
-</para>
-
-       <para>
-See below for a detailed discussion of content models. 
-</para>
-       </sect2>
-
-       <sect2>
-         <title>XML parsers and processors</title>
-         <para>
-XML documents are human-readable, but this is not the main purpose of this
-language. XML has been designed such that documents can be read by a program
-called an <emphasis>XML parser</emphasis>. The parser checks that the document
-is well-formatted, and it represents the document as objects of the programming
-language. There are two aspects when checking the document: First, the document
-must follow some basic syntactic rules, such as that tags are written in angle
-brackets, that for every start tag there must be a corresponding end tag and so
-on. A document respecting these rules is
-<emphasis>well-formed</emphasis>. Second, the document must match the DTD in
-which case the document is <emphasis>valid</emphasis>. Many parsers check only
-on well-formedness and ignore the DTD; &pxp; is designed such that it can
-even validate the document.
-</para>
-
-         <para>
-A parser does not make a sensible application, it only reads XML
-documents. The whole application working with XML-formatted data is called an
-<emphasis>XML processor</emphasis>. Often XML processors convert documents into
-another format, such as HTML or Postscript. Sometimes processors extract data
-of the documents and output the processed data again XML-formatted. The parser
-can help the application processing the document; for example it can provide
-means to access the document in a specific manner. &pxp; supports an
-object-oriented access layer specially.
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Discussion</title>
-         <para>
-As we have seen, there are two levels of description: On the one hand, XML can
-define rules about the format of a document (the DTD), on the other hand, XML
-expresses structured documents. There are a number of possible applications:
-</para>
-
-         <itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para>
-XML can be used to express structured texts. Unlike HTML, there is no canonical
-interpretation; one would have to write a backend for the DTD that translates
-the structured texts into a format that existing browsers, printers
-etc. understand. The advantage of a self-defined document format is that it is
-possible to design the format in a more problem-oriented way. For example, if
-the task is to extract reports from a database, one can use a DTD that reflects
-the structure of the report or the database. A possible approach would be to
-have an element type for every database table and for every column. Once the
-DTD has been designed, the report procedure can be splitted up in a part that
-selects the database rows and outputs them as an XML document according to the
-DTD, and in a part that translates the document into other formats. Of course,
-the latter part can be solved in a generic way, e.g. there may be configurable
-backends for all DTDs that follow the approach and have element types for
-tables and columns.
-</para>
-             
-             <para>
-XML plays the role of a configurable intermediate format. The database
-extraction function can be written without having to know the details of
-typesetting; the backends can be written without having to know the details of
-the database.
-</para>
-
-             <para>
-Of course, there are traditional solutions. One can define an ad hoc
-intermediate text file format. This disadvantage is that there are no names for
-the pieces of the format, and that such formats usually lack of documentation
-because of this. Another solution would be to have a binary representation,
-either as language-dependent or language-independent structure (example of the
-latter can be found in RPC implementations). The disadvantage is that it is
-harder to view such representations, one has to write pretty printers for this
-purpose. It is also more difficult to enter test data; XML is plain text that
-can be written using an arbitrary editor (Emacs has even a good XML mode,
-PSGML). All these alternatives suffer from a missing structure checker,
-i.e. the programs processing these formats usually do not check the input file
-or input object in detail; XML parsers check the syntax of the input (the
-so-called well-formedness check), and the advanced parsers like &markup; even
-verify that the structure matches the DTD (the so-called validation).
-</para>
-             
-           </listitem>
-
-           <listitem>
-             <para>
-XML can be used as configurable communication language. A fundamental problem
-of every communication is that sender and receiver must follow the same
-conventions about the language. For data exchange, the question is usually
-which data records and fields are available, how they are syntactically
-composed, and which values are possible for the various fields. Similar
-questions arise for text document exchange. XML does not answer these problems
-completely, but it reduces the number of ambiguities for such conventions: The
-outlines of the syntax are specified by the DTD (but not necessarily the
-details), and XML introduces canonical names for the components of documents
-such that it is simpler to describe the rest of the syntax and the semantics
-informally.
-</para>
-           </listitem>
-
-           <listitem>
-             <para>
-XML is a data storage format. Currently, every software product tends to use
-its own way to store data; commercial software often does not describe such
-formats, and it is a pain to integrate such software into a bigger project. 
-XML can help to improve this situation when several applications share the same
-syntax of data files. DTDs are then neutral instances that check the format of
-data files independent of applications. 
-</para>
-           </listitem>
-
-         </itemizedlist>
-       </sect2>
-      </sect1>
-
-
-      <!-- ================================================== -->
-
-
-      <sect1>
-       <title>Highlights of XML</title>
-
-       <para>
-This section explains many of the features of XML, but not all, and some
-features not in detail. For a complete description, see the <ulink
-url="http://www.w3.org/TR/1998/REC-xml-19980210.html">XML
-specification</ulink>.
-</para>
-
-       <sect2>
-         <title>The DTD and the instance</title>
-         <para>
-The DTD contains various declarations; in general you can only use a feature if
-you have previously declared it. The document instance file may contain the
-full DTD, but it is also possible to split the DTD into an internal and an
-external subset. A document must begin as follows if the full DTD is included:
-
-<programlisting>
-&lt;?xml version="1.0" encoding="<replaceable>Your encoding</replaceable>"?&gt;
-&lt;!DOCTYPE <replaceable>root</replaceable> [
-  <replaceable>Declarations</replaceable>
-]&gt;
-</programlisting>
-
-These declarations are called the <emphasis>internal subset</emphasis>. Note
-that the usage of entities and conditional sections is restricted within the
-internal subset.
-</para>
-         <para>
-If the declarations are located in a different file, you can refer to this file
-as follows:
-
-<programlisting>
-&lt;?xml version="1.0" encoding="<replaceable>Your encoding</replaceable>"?&gt;
-&lt;!DOCTYPE <replaceable>root</replaceable> SYSTEM "<replaceable>file name</replaceable>"&gt;
-</programlisting>
-
-The declarations in the file are called the <emphasis>external
-subset</emphasis>. The file name is called the <emphasis>system
-identifier</emphasis>. 
-It is also possible to refer to the file by a so-called
-<emphasis>public identifier</emphasis>, but most XML applications won't use
-this feature.
-</para>
-         <para>
-You can also specify both internal and external subsets. In this case, the
-declarations of both subsets are mixed, and if there are conflicts, the
-declaration of the internal subset overrides those of the external subset with
-the same name. This looks as follows:
-
-<programlisting>
-&lt;?xml version="1.0" encoding="<replaceable>Your encoding</replaceable>"?&gt;
-&lt;!DOCTYPE <replaceable>root</replaceable>  SYSTEM "<replaceable>file name</replaceable>" [
-  <replaceable>Declarations</replaceable>
-]&gt;
-</programlisting>
-</para>
-
-         <para>
-The XML declaration (the string beginning with <literal>&lt;?xml</literal> and
-ending at <literal>?&gt;</literal>) should specify the encoding of the
-file. Common values are UTF-8, and the ISO-8859 series of character sets. Note
-that every file parsed by the XML processor can begin with an XML declaration
-and that every file may have its own encoding.
-</para>
-
-         <para>
-The name of the root element must be mentioned directly after the
-<literal>DOCTYPE</literal> string. This means that a full document instance
-looks like
-
-<programlisting>
-&lt;?xml version="1.0" encoding="<replaceable>Your encoding</replaceable>"?&gt;
-&lt;!DOCTYPE <replaceable>root</replaceable>  SYSTEM "<replaceable>file name</replaceable>" [
-  <replaceable>Declarations</replaceable>
-]&gt;
-
-&lt;<replaceable>root</replaceable>&gt;
-  <replaceable>inner contents</replaceable>
-&lt;/<replaceable>root</replaceable>&gt;
-</programlisting>
-</para>
-       </sect2>
-
-        <!-- ======================================== -->
-
-       <sect2>
-         <title>Reserved characters</title>
-         <para>
-Some characters are generally reserved to indicate markup such that they cannot
-be used for character data. These characters are &lt;, &gt;, and
-&amp;. Furthermore, single and double quotes are sometimes reserved. If you
-want to include such a character as character, write it as follows:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-<literal>&amp;lt;</literal> instead of &lt;
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>&amp;gt;</literal> instead of &gt;
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>&amp;amp;</literal> instead of &amp;
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>&amp;apos;</literal> instead of '
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>&amp;quot;</literal> instead of "
-</para>
-             </listitem>
-           </itemizedlist>
-
-All other characters are free in the document instance. It is possible to
-include a character by its position in the Unicode alphabet: 
-
-<programlisting>
-&amp;#<replaceable>n</replaceable>;
-</programlisting>
-
-where <replaceable>n</replaceable> is the decimal number of the
-character. Alternatively, you can specify the character by its hexadecimal
-number: 
-
-<programlisting>
-&amp;#x<replaceable>n</replaceable>;
-</programlisting>
-
-In the scope of declarations, the character % is no longer free. To include it
-as character, you must use the notations <literal>&amp;#37;</literal> or
-<literal>&amp;#x25;</literal>.
-</para>
-
-         <para>Note that besides &amp;lt;, &amp;gt;, &amp;amp;,
-&amp;apos;, and &amp;quot; there are no predefines character entities. This is
-different from HTML which defines a list of characters that can be referenced
-by name (e.g. &amp;auml; for ä); however, if you prefer named characters, you
-can declare such entities yourself (see below).</para>
-       </sect2>
-
-
-        <!-- ======================================== -->
-
-       <sect2>
-         <title>Elements and ELEMENT declarations</title>
-
-         <para>
-Elements structure the document instance in a hierarchical way. There is a
-top-level element, the <emphasis>root element</emphasis>, which contains a
-sequence of inner elements and character sections. The inner elements are
-structured in the same way. Every element has an <emphasis>element
-type</emphasis>. The beginning of the element is indicated by a <emphasis>start
-tag</emphasis>, written
-
-<programlisting>
-&lt;<replaceable>element-type</replaceable>&gt;
-</programlisting>
-
-and the element continues until the corresponding <emphasis>end tag</emphasis>
-is reached:
-
-<programlisting>
-&lt;/<replaceable>element-type</replaceable>&gt;
-</programlisting>
-
-In XML, it is not allowed to omit start or end tags, even if the DTD would
-permit this. Note that there are no special rules how to interpret spaces or
-newlines near start or end tags; all spaces and newlines count.
-</para>
-
-         <para>
-Every element type must be declared before it can be used. The declaration
-consists of two parts: the ELEMENT declaration describes the content model,
-i.e. which inner elements are allowed; the ATTLIST declaration describes the
-attributes of the element.
-</para>
-
-         <para>
-An element can simply allow everything as content. This is written:
-
-<programlisting>
-&lt!ELEMENT <replaceable>name</replaceable> ANY&gt;
-</programlisting>
-
-On the opposite, an element can be forced to be empty; declared by:
-
-<programlisting>
-&lt!ELEMENT <replaceable>name</replaceable> EMPTY&gt;
-</programlisting>
-
-Note that there is an abbreviated notation for empty element instances:
-<literal>&lt;<replaceable>name</replaceable>/&gt;</literal>. 
-</para>
-
-         <para>
-There are two more sophisticated forms of declarations: so-called
-<emphasis>mixed declarations</emphasis>, and <emphasis>regular
-expressions</emphasis>. An element with mixed content contains character data
-interspersed with inner elements, and the set of allowed inner elements can be
-specified. In contrast to this, a regular expression declaration does not allow
-character data, but the inner elements can be described by the more powerful
-means of regular expressions.
-</para>
-
-         <para>
-A declaration for mixed content looks as follows:
-
-<programlisting>
-&lt;!ELEMENT <replaceable>name</replaceable> (#PCDATA | <replaceable>element<subscript>1</subscript></replaceable> | ... | <replaceable>element<subscript>n</subscript></replaceable> )*&gt;
-</programlisting>
-
-or if you do not want to allow any inner element, simply
-
-<programlisting>
-&lt;!ELEMENT <replaceable>name</replaceable> (#PCDATA)&gt;
-</programlisting>
-</para>
-
-
-<blockquote>
-             <title>Example</title>
-             <para>
-If element type <literal>q</literal> is declared as
-
-<programlisting>
-<![CDATA[<!ELEMENT q (#PCDATA | r | s)*>]]>
-</programlisting>
-
-this is a legal instance:
-
-<programlisting>
-<![CDATA[<q>This is character data<r></r>with <s></s>inner elements</q>]]>
-</programlisting>
-
-But this is illegal because <literal>t</literal> has not been enumerated in the
-declaration:
-
-<programlisting>
-<![CDATA[<q>This is character data<r></r>with <t></t>inner elements</q>]]>
-</programlisting>
-</para>
-           </blockquote>
-         
-         <para>
-The other form uses a regular expression to describe the possible contents:
-
-<programlisting>
-&lt;!ELEMENT <replaceable>name</replaceable> <replaceable>regexp</replaceable>&gt;
-</programlisting>
-
-The following well-known regexp operators are allowed:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-<literal><replaceable>element-name</replaceable></literal>
-</para>
-             </listitem>
-             
-             <listitem>
-               <para>
-<literal>(<replaceable>subexpr<subscript>1</subscript></replaceable> ,</literal> ... <literal>, <replaceable>subexpr<subscript>n</subscript></replaceable> )</literal>
-</para>
-             </listitem>
-             
-             <listitem>
-               <para>
-<literal>(<replaceable>subexpr<subscript>1</subscript></replaceable> |</literal> ... <literal>| <replaceable>subexpr<subscript>n</subscript></replaceable> )</literal>
-</para>
-             </listitem>
-             
-             <listitem>
-               <para>
-<literal><replaceable>subexpr</replaceable>*</literal>
-</para>
-             </listitem>
-             
-             <listitem>
-               <para>
-<literal><replaceable>subexpr</replaceable>+</literal>
-</para>
-             </listitem>
-             
-             <listitem>
-               <para>
-<literal><replaceable>subexpr</replaceable>?</literal>
-</para>
-             </listitem>
-           </itemizedlist>
-
-The <literal>,</literal> operator indicates a sequence of sub-models, the
-<literal>|</literal> operator describes alternative sub-models. The
-<literal>*</literal> indicates zero or more repetitions, and
-<literal>+</literal> one or more repetitions. Finally, <literal>?</literal> can
-be used for optional sub-models. As atoms the regexp can contain names of
-elements; note that it is not allowed to include <literal>#PCDATA</literal>.
-</para>
-
-         <para>
-The exact syntax of the regular expressions is rather strange. This can be
-explained best by a list of constraints:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-The outermost expression must not be
-<literal><replaceable>element-name</replaceable></literal>. 
-</para>
-               <para><emphasis>Illegal:</emphasis> 
-<literal><![CDATA[<!ELEMENT x y>]]></literal>; this must be written as
-<literal><![CDATA[<!ELEMENT x (y)>]]></literal>.</para>
-             </listitem>
-             <listitem>
-               <para>
-For the unary operators <literal><replaceable>subexpr</replaceable>*</literal>,
-<literal><replaceable>subexpr</replaceable>+</literal>, and
-<literal><replaceable>subexpr</replaceable>?</literal>, the
-<literal><replaceable>subexpr</replaceable></literal> must not be again an
-unary operator.
-</para>
-               <para><emphasis>Illegal:</emphasis> 
-<literal><![CDATA[<!ELEMENT x y**>]]></literal>; this must be written as
-<literal><![CDATA[<!ELEMENT x (y*)*>]]></literal>.</para>
-      </listitem>
-             <listitem>
-               <para>
-Between <literal>)</literal> and one of the unary operatory
-<literal>*</literal>, <literal>+</literal>, or <literal>?</literal>, there must
-not be whitespace.</para>
-               <para><emphasis>Illegal:</emphasis> 
-<literal><![CDATA[<!ELEMENT x (y|z) *>]]></literal>; this must be written as
-<literal><![CDATA[<!ELEMENT x (y|z)*>]]></literal>.</para>
-             </listitem>
-             <listitem><para>There is the additional constraint that the
-right parenthsis must be contained in the same entity as the left parenthesis;
-see the section about parsed entities below.</para>
-             </listitem>
-           </itemizedlist>
-
-</para>
-
-<para>
-Note that there is another restriction on regular expressions which must be
-deterministic. This means that the parser must be able to see by looking at the
-next token which alternative is actually used, or whether the repetition
-stops. The reason for this is simply compatability with SGML (there is no
-intrinsic reason for this rule; XML can live without this restriction).
-</para>
-
-         <blockquote>
-           <title>Example</title>
-           <para>
-The elements are declared as follows:
-
-<programlisting>
-<![CDATA[<!ELEMENT q (r?, (s | t)+)>
-<!ELEMENT r (#PCDATA)>
-<!ELEMENT s EMPTY>
-<!ELEMENT t (q | r)>
-]]></programlisting>
-
-This is a legal instance:
-
-<programlisting>
-<![CDATA[<q><r>Some characters</r><s/></q>]]>
-</programlisting>
-
-(Note: <literal>&lt;s/&gt;</literal> is an abbreviation for
-<literal>&lt;s&gt;&lt;/s&gt;</literal>.)
-
-It would be illegal to leave <literal><![CDATA[<s/>]]></literal> out because at
-least one instance of <literal>s</literal> or <literal>t</literal> must be
-present. It would be illegal, too, if characters existed outside the
-<literal>r</literal> element; the only exception is white space. -- This is
-legal, too:
-
-<programlisting>
-<![CDATA[<q><s/><t><q><s/></q></t></q>]]>
-</programlisting>
-</para>
-         </blockquote>
-
-       </sect2>
-
-        <!-- ======================================== -->
-
-       <sect2>
-         <title>Attribute lists and ATTLIST declarations</title>
-         <para>
-Elements may have attributes. These are put into the start tag of an element as
-follows:
-
-<programlisting>
-&lt;<replaceable>element-name</replaceable> <replaceable>attribute<subscript>1</subscript></replaceable>="<replaceable>value<subscript>1</subscript></replaceable>" ... <replaceable>attribute<subscript>n</subscript></replaceable>="<replaceable>value<subscript>n</subscript></replaceable>"&gt;
-</programlisting>
-
-Instead of
-<literal>"<replaceable>value<subscript>k</subscript></replaceable>"</literal>
-it is also possible to use single quotes as in
-<literal>'<replaceable>value<subscript>k</subscript></replaceable>'</literal>.
-Note that you cannot use double quotes literally within the value of the
-attribute if double quotes are the delimiters; the same applies to single
-quotes. You can generally not use &lt; and &amp; as characters in attribute
-values. It is possible to include the paraphrases &amp;lt;, &amp;gt;,
-&amp;amp;, &amp;apos;, and &amp;quot; (and any other reference to a general
-entity as long as the entity is not defined by an external file) as well as
-&amp;#<replaceable>n</replaceable>;.
-</para>
-
-         <para>
-Before you can use an attribute you must declare it. An ATTLIST declaration
-looks as follows:
-
-<programlisting>
-&lt;!ATTLIST <replaceable>element-name</replaceable> 
-          <replaceable>attribute-name</replaceable> <replaceable>attribute-type</replaceable> <replaceable>attribute-default</replaceable>
-          ...
-          <replaceable>attribute-name</replaceable> <replaceable>attribute-type</replaceable> <replaceable>attribute-default</replaceable>
-&gt;
-</programlisting>
-
-There are a lot of types, but most important are:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-<literal>CDATA</literal>: Every string is allowed as attribute value.
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>NMTOKEN</literal>: Every nametoken is allowed as attribute
-value. Nametokens consist (mainly) of letters, digits, ., :, -, _ in arbitrary
-order.
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>NMTOKENS</literal>: A space-separated list of nametokens is allowed as
-attribute value.
-</para>
-             </listitem>
-           </itemizedlist>
-
-The most interesting default declarations are:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-<literal>#REQUIRED</literal>: The attribute must be specified.
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>#IMPLIED</literal>: The attribute can be specified but also can be
-left out. The application can find out whether the attribute was present or
-not. 
-</para>
-             </listitem>
-             <listitem>
-               <para>
-<literal>"<replaceable>value</replaceable>"</literal> or
-<literal>'<replaceable>value</replaceable>'</literal>: This particular value is
-used as default if the attribute is omitted in the element.
-</para>
-             </listitem>
-           </itemizedlist>
-</para>
-
-         <blockquote>
-           <title>Example</title>
-           <para>
-This is a valid attribute declaration for element type <literal>r</literal>:
-
-<programlisting>
-<![CDATA[<!ATTLIST r 
-          x CDATA    #REQUIRED
-          y NMTOKEN  #IMPLIED
-          z NMTOKENS "one two three">
-]]></programlisting>
-
-This means that <literal>x</literal> is a required attribute that cannot be
-left out, while <literal>y</literal> and <literal>z</literal> are optional. The
-XML parser indicates the application whether <literal>y</literal> is present or
-not, but if <literal>z</literal> is missing the default value
-"one two three" is returned automatically. 
-</para>
-
-           <para>
-This is a valid example of these attributes:
-
-<programlisting>
-<![CDATA[<r x="He said: &quot;I don't like quotes!&quot;" y='1'>]]>
-</programlisting>
-</para>
-         </blockquote>
-
-       </sect2>
-
-       <sect2>
-         <title>Parsed entities</title>
-         <para>
-Elements describe the logical structure of the document, while
-<emphasis>entities</emphasis> determine the physical structure. Entities are
-the pieces of text the parser operates on, mostly files and macros. Entities
-may be <emphasis>parsed</emphasis> in which case the parser reads the text and
-interprets it as XML markup, or <emphasis>unparsed</emphasis> which simply
-means that the data of the entity has a foreign format (e.g. a GIF icon).
-</para>
-
-         <para>If the parsed entity is going to be used as part of the DTD, it
-is called a <emphasis>parameter entity</emphasis>. You can declare a parameter
-entity with a fixed text as content by:
-
-<programlisting>
-&lt;!ENTITY % <replaceable>name</replaceable> "<replaceable>value</replaceable>"&gt;
-</programlisting>
-
-Within the DTD, you can <emphasis>refer to</emphasis> this entity, i.e. read
-the text of the entity, by:
-
-<programlisting>
-%<replaceable>name</replaceable>;
-</programlisting>
-
-Such entities behave like macros, i.e. when they are referred to, the
-macro text is inserted and read instead of the original text.
-
-<blockquote>
-             <title>Example</title>
-             <para>
-For example, you can declare two elements with the same content model by:
-
-<programlisting>
-<![CDATA[
-<!ENTITY % model "a | b | c">
-<!ELEMENT x (%model;)>
-<!ELEMENT y (%model;)>
-]]>
-</programlisting>
-
-</para>
-           </blockquote>
-
-If the contents of the entity are given as string constant, the entity is
-called an <emphasis>internal</emphasis> entity. It is also possible to name a
-file to be used as content (an <emphasis>external</emphasis> entity):
-
-<programlisting>
-&lt;!ENTITY % <replaceable>name</replaceable> SYSTEM "<replaceable>file name</replaceable>"&gt;
-</programlisting>
-
-There are some restrictions for parameter entities:
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>
-If the internal parameter entity contains the first token of a declaration
-(i.e. <literal>&lt;!</literal>), it must also contain the last token of the
-declaration, i.e. the <literal>&gt;</literal>. This means that the entity
-either contains a whole number of complete declarations, or some text from the
-middle of one declaration.
-</para>
-<para><emphasis>Illegal:</emphasis>
-<programlisting>
-<![CDATA[
-<!ENTITY % e "(a | b | c)>">
-<!ELEMENT x %e;
-]]></programlisting> Because <literal>&lt;!</literal> is contained in the main
-entity, and the corresponding <literal>&gt;</literal> is contained in the
-entity <literal>e</literal>.</para>
-             </listitem>
-             <listitem>
-               <para>
-If the internal parameter entity contains a left paranthesis, it must also
-contain the corresponding right paranthesis.
-</para>
-<para><emphasis>Illegal:</emphasis>
-<programlisting>
-<![CDATA[
-<!ENTITY % e "(a | b | c">
-<!ELEMENT x %e;)>
-]]></programlisting> Because <literal>(</literal> is contained in the entity 
-<literal>e</literal>, and the corresponding <literal>)</literal> is
-contained in the main entity.</para>
-             </listitem>
-             <listitem>
-               <para>
-When reading text from an entity, the parser automatically inserts one space
-character before the entity text and one space character after the entity
-text. However, this rule is not applied within the definition of another
-entity.</para>
-<para><emphasis>Legal:</emphasis>
-<programlisting>
-<![CDATA[ 
-<!ENTITY % suffix "gif"> 
-<!ENTITY iconfile 'icon.%suffix;'>
-]]></programlisting> Because <literal>%suffix;</literal> is referenced within
-the definition text for <literal>iconfile</literal>, no additional spaces are
-added.
-</para>
-<para><emphasis>Illegal:</emphasis>
-<programlisting>
-<![CDATA[
-<!ENTITY % suffix "test">
-<!ELEMENT x.%suffix; ANY>
-]]></programlisting>
-Because <literal>%suffix;</literal> is referenced outside the definition
-text of another entity, the parser replaces <literal>%suffix;</literal> by
-<literal><replaceable>space</replaceable>test<replaceable>space</replaceable></literal>. </para>
-<para><emphasis>Illegal:</emphasis>
-<programlisting>
-<![CDATA[
-<!ENTITY % e "(a | b | c)">
-<!ELEMENT x %e;*>
-]]></programlisting> Because there is a whitespace between <literal>)</literal>
-and <literal>*</literal>, which is illegal.</para>
-             </listitem>
-             <listitem>
-               <para>
-An external parameter entity must always consist of a whole number of complete
-declarations.
-</para>
-             </listitem>
-             <listitem>
-               <para>
-In the internal subset of the DTD, a reference to a parameter entity (internal
-or external) is only allowed at positions where a new declaration can start.
-</para>
-             </listitem>
-           </itemizedlist>
-</para>
-
-         <para>
-If the parsed entity is going to be used in the document instance, it is called
-a <emphasis>general entity</emphasis>. Such entities can be used as
-abbreviations for frequent phrases, or to include external files. Internal
-general entities are declared as follows:
-
-<programlisting>
-&lt;!ENTITY <replaceable>name</replaceable> "<replaceable>value</replaceable>"&gt;
-</programlisting>
-
-External general entities are declared this way:
-
-<programlisting>
-&lt;!ENTITY <replaceable>name</replaceable> SYSTEM "<replaceable>file name</replaceable>"&gt;
-</programlisting>
-
-References to general entities are written as:
-
-<programlisting>
-&<replaceable>name</replaceable>;
-</programlisting>
-
-The main difference between parameter and general entities is that the former
-are only recognized in the DTD and that the latter are only recognized in the
-document instance. As the DTD is parsed before the document, the parameter
-entities are expanded first; for example it is possible to use the content of a
-parameter entity as the name of a general entity:
-<literal>&amp;#38;%name;;</literal><footnote><para>This construct is only
-allowed within the definition of another entity; otherwise extra spaces would
-be added (as explained above). Such indirection is not recommended.
-</para>
-<para>Complete example:
-<programlisting>
-<![CDATA[
-<!ENTITY % variant "a">      <!-- or "b" -->
-<!ENTITY text-a "This is text A.">
-<!ENTITY text-b "This is text B.">
-<!ENTITY text "&#38;text-%variant;;">
-]]></programlisting>
-You can now write <literal>&amp;text;</literal> in the document instance, and
-depending on the value of <literal>variant</literal> either
-<literal>text-a</literal> or <literal>text-b</literal> is inserted.</para>
-</footnote>.
-</para>
-         <para>
-General entities must respect the element hierarchy. This means that there must
-be an end tag for every start tag in the entity value, and that end tags
-without corresponding start tags are not allowed.
-</para>
-
-         <blockquote>
-           <title>Example</title>
-           <para>
-If the author of a document changes sometimes, it is worthwhile to set up a
-general entity containing the names of the authors. If the author changes, you
-need only to change the definition of the entity, and do not need to check all
-occurrences of authors' names:
-
-<programlisting>
-<![CDATA[
-<!ENTITY authors "Gerd Stolpmann">
-]]>
-</programlisting>
-
-In the document text, you can now refer to the author names by writing
-<literal>&amp;authors;</literal>.
-</para>
-
-           <para>
-<emphasis>Illegal:</emphasis>
-The following two entities are illegal because the elements in the definition
-do not nest properly:
-
-<programlisting>
-<![CDATA[
-<!ENTITY lengthy-tag "<section textcolor='white' background='graphic'>">
-<!ENTITY nonsense    "<a></b>">
-]]></programlisting>
-</para>
-         </blockquote>
-
-         <para>
-Earlier in this introduction we explained that there are substitutes for
-reserved characters: &amp;lt;, &amp;gt;, &amp;amp;, &amp;apos;, and
-&amp;quot;. These are simply predefined general entities; note that they are
-the only predefined entities. It is allowed to define these entities again
-as long as the meaning is unchanged.
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Notations and unparsed entities</title>
-         <para>
-Unparsed entities have a foreign format and can thus not be read by the XML
-parser. Unparsed entities are always external. The format of an unparsed entity
-must have been declared, such a format is called a
-<emphasis>notation</emphasis>. The entity can then be declared by referring to
-this notation. As unparsed entities do not contain XML text, it is not possible
-to include them directly into the document; you can only declare attributes
-such that names of unparsed entities are acceptable values.
-</para>
-
-         <para>
-As you can see, unparsed entities are too complicated in order to have any
-purpose. It is almost always better to simply pass the name of the data file as
-normal attribute value, and let the application recognize and process the
-foreign format. 
-</para>
-       </sect2>
-
-      </sect1>
-
-
-      <!-- ================================================== -->
-
-
-      <sect1 id="sect.readme.dtd">
-       <title>A complete example: The <emphasis>readme</emphasis> DTD</title>
-       <para>
-The reason for <emphasis>readme</emphasis> was that I often wrote two versions
-of files such as README and INSTALL which explain aspects of a distributed
-software archive; one version was ASCII-formatted, the other was written in
-HTML. Maintaining both versions means double amount of work, and changes
-of one version may be forgotten in the other version. To improve this situation
-I invented the <emphasis>readme</emphasis> DTD which allows me to maintain only
-one source written as XML document, and to generate the ASCII and the HTML
-version from it.
-</para>
-
-       <para>
-In this section, I explain only the DTD. The <emphasis>readme</emphasis> DTD is
-contained in the &markup; distribution together with the two converters to
-produce ASCII and HTML. Another <link
-linkend="sect.readme.to-html">section</link> of this manual describes the HTML
-converter.
-</para>
-
-       <para>
-The documents have a simple structure: There are up to three levels of nested
-sections, paragraphs, item lists, footnotes, hyperlinks, and text emphasis. The
-outermost element has usually the type <literal>readme</literal>, it is
-declared by
-
-<programlisting>
-<![CDATA[<!ELEMENT readme (sect1+)>
-<!ATTLIST readme
-          title CDATA #REQUIRED>
-]]></programlisting>
-
-This means that this element contains one or more sections of the first level
-(element type <literal>sect1</literal>), and that the element has a required
-attribute <literal>title</literal> containing character data (CDATA). Note that
-<literal>readme</literal> elements must not contain text data.
-</para>
-
-       <para>
-The three levels of sections are declared as follows:
-
-<programlisting>
-<![CDATA[<!ELEMENT sect1 (title,(sect2|p|ul)+)>
-
-<!ELEMENT sect2 (title,(sect3|p|ul)+)>
-
-<!ELEMENT sect3 (title,(p|ul)+)>
-]]></programlisting>
-
-Every section has a <literal>title</literal> element as first subelement. After
-the title an arbitrary but non-empty sequence of inner sections, paragraphs and
-item lists follows. Note that the inner sections must belong to the next higher
-section level; <literal>sect3</literal> elements must not contain inner
-sections because there is no next higher level.
-</para>
-
-       <para>
-Obviously, all three declarations allow paragraphs (<literal>p</literal>) and
-item lists (<literal>ul</literal>). The definition can be simplified at this
-point by using a parameter entity:
-
-<programlisting>
-<![CDATA[<!ENTITY % p.like "p|ul">
-
-<!ELEMENT sect1 (title,(sect2|%p.like;)+)>
-
-<!ELEMENT sect2 (title,(sect3|%p.like;)+)>
-
-<!ELEMENT sect3 (title,(%p.like;)+)>
-]]></programlisting>
-
-Here, the entity <literal>p.like</literal> is nothing but a macro abbreviating
-the same sequence of declarations; if new elements on the same level as
-<literal>p</literal> and <literal>ul</literal> are later added, it is
-sufficient only to change the entity definition. Note that there are some
-restrictions on the usage of entities in this context; most important, entities
-containing a left paranthesis must also contain the corresponding right
-paranthesis. 
-</para>
-
-       <para>
-Note that the entity <literal>p.like</literal> is a
-<emphasis>parameter</emphasis> entity, i.e. the ENTITY declaration contains a
-percent sign, and the entity is referred to by
-<literal>%p.like;</literal>. This kind of entity must be used to abbreviate
-parts of the DTD; the <emphasis>general</emphasis> entities declared without
-percent sign and referred to as <literal>&amp;name;</literal> are not allowed
-in this context.
-</para>
-
-       <para>
-The <literal>title</literal> element specifies the title of the section in
-which it occurs. The title is given as character data, optionally interspersed
-with line breaks (<literal>br</literal>):
-
-<programlisting>
-<![CDATA[<!ELEMENT title (#PCDATA|br)*>
-]]></programlisting>
-
-Compared with the <literal>title</literal> <emphasis>attribute</emphasis> of
-the <literal>readme</literal> element, this element allows inner markup
-(i.e. <literal>br</literal>) while attribute values do not: It is an error if
-an attribute value contains the left angle bracket &lt; literally such that it
-is impossible to include inner elements. 
-</para>
-
-       <para>
-The paragraph element <literal>p</literal> has a structure similar to
-<literal>title</literal>, but it allows more inner elements:
-
-<programlisting>
-<![CDATA[<!ENTITY % text "br|code|em|footnote|a">
-
-<!ELEMENT p (#PCDATA|%text;)*>
-]]></programlisting>
-
-Line breaks do not have inner structure, so they are declared as being empty:
-
-<programlisting>
-<![CDATA[<!ELEMENT br EMPTY>
-]]></programlisting>
-
-This means that really nothing is allowed within <literal>br</literal>; you
-must always write <literal><![CDATA[<br></br>]]></literal> or abbreviated
-<literal><![CDATA[<br/>]]></literal>.
-</para>
-
-       <para>
-Code samples should be marked up by the <literal>code</literal> tag; emphasized
-text can be indicated by <literal>em</literal>:
-
-<programlisting>
-<![CDATA[<!ELEMENT code (#PCDATA)>
-
-<!ELEMENT em (#PCDATA|%text;)*>
-]]></programlisting>
-
-That <literal>code</literal> elements are not allowed to contain further markup
-while <literal>em</literal> elements do is a design decision by the author of
-the DTD.
-</para>
-
-       <para>
-Unordered lists simply consists of one or more list items, and a list item may
-contain paragraph-level material:
-
-<programlisting>
-<![CDATA[<!ELEMENT ul (li+)>
-
-<!ELEMENT li (%p.like;)*>
-]]></programlisting>
-
-Footnotes are described by the text of the note; this text may contain
-text-level markup. There is no mechanism to describe the numbering scheme of
-footnotes, or to specify how footnote references are printed.
-
-<programlisting>
-<![CDATA[<!ELEMENT footnote (#PCDATA|%text;)*>
-]]></programlisting>
-
-Hyperlinks are written as in HTML. The anchor tag contains the text describing
-where the link points to, and the <literal>href</literal> attribute is the
-pointer (as URL). There is no way to describe locations of "hash marks". If the
-link refers to another <emphasis>readme</emphasis> document, the attribute
-<literal>readmeref</literal> should be used instead of <literal>href</literal>.
-The reason is that the converted document has usually a different system
-identifier (file name), and the link to a converted document must be
-converted, too.
-
-<programlisting>
-<![CDATA[<!ELEMENT a (#PCDATA)*>
-<!ATTLIST a 
-          href      CDATA #IMPLIED
-          readmeref CDATA #IMPLIED
->
-]]></programlisting>
-
-Note that although it is only sensible to specify one of the two attributes,
-the DTD has no means to express this restriction.
-</para>
-
-<para>
-So far the DTD. Finally, here is a document for it:
-
-<programlisting>
-<![CDATA[
-<?xml version="1.0" encoding="ISO-8859-1"?>
-<!DOCTYPE readme SYSTEM "readme.dtd">
-<readme title="How to use the readme converters">
-<sect1>
-  <title>Usage</title>
-  <p>
-    The <em>readme</em> converter is invoked on the command line by:
-  </p>
-  <p>
-    <code>readme [ -text | -html ] input.xml</code>
-  </p>
-  <p>
-    Here a list of options:
-  </p>
-  <ul>
-    <li>
-      <p><code>-text</code>: specifies that ASCII output should be produced</p>
-    </li>
-    <li>
-      <p><code>-html</code>: specifies that HTML output should be produced</p>
-    </li>
-  </ul>
-  <p>
-    The input file must be given on the command line. The converted output is
-    printed to <em>stdout</em>.
-  </p>
-</sect1>
-<sect1>
-  <title>Author</title>
-  <p>
-    The program has been written by
-    <a href="mailto:Gerd.Stolpmann@darmstadt.netsurf.de">Gerd Stolpmann</a>.
-  </p>
-</sect1>
-</readme>
-]]></programlisting>
-
-</para>
-
-
-      </sect1>
-    </chapter>
-
-<!-- ********************************************************************** -->
-
-    <chapter>
-      <title>Using &markup;</title>
-
-      <sect1>
-       <title>Validation</title>
-       <para>
-The parser can be used to <emphasis>validate</emphasis> a document. This means
-that all the constraints that must hold for a valid document are actually
-checked. Validation is the default mode of &markup;, i.e. every document is
-validated while it is being parsed.
-</para>
-
-       <para>
-In the <literal>examples</literal> directory of the distribution you find the
-<literal>pxpvalidate</literal> application. It is invoked in the following way:
-
-<programlisting>
-pxpvalidate [ -wf ] <replaceable>file</replaceable>...
-</programlisting>
-
-The files mentioned on the command line are validated, and every warning and
-every error messages are printed to stderr.
-</para>
-
-       <para>
-The -wf switch modifies the behaviour such that a well-formedness parser is
-simulated. In this mode, the ELEMENT, ATTLIST, and NOTATION declarations of the
-DTD are ignored, and only the ENTITY declarations will take effect. This mode
-is intended for documents lacking a DTD. Please note that the parser still
-scans the DTD fully and will report all errors in the DTD; such checks are not
-required by a well-formedness parser.
-</para>
-
-       <para>
-The <literal>pxpvalidate</literal> application is the simplest sensible program
-using &markup;, you may consider it as "hello world" program. 
-</para>
-      </sect1>
-
-
-      <!-- ================================================== -->
-
-
-      <sect1>
-       <title>How to parse a document from an application</title>
-       <para>
-Let me first give a rough overview of the object model of the parser. The
-following items are represented by objects:
-
-<itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para>
-<emphasis>Documents:</emphasis> The document representation is more or less the
-anchor for the application; all accesses to the parsed entities start here. It
-is described by the class <literal>document</literal> contained in the module
-<literal>Pxp_document</literal>. You can get some global information, such
-as the XML declaration the document begins with, the DTD of the document,
-global processing instructions, and most important, the document tree. 
-</para>
-           </listitem>
-
-           <listitem>
-             <para>
-<emphasis>The contents of documents:</emphasis> The contents have the structure
-of a tree: Elements contain other elements and text<footnote><para>Elements may
-also contain processing instructions. Unlike other document models, &markup;
-separates processing instructions from the rest of the text and provides a
-second interface to access them (method <literal>pinstr</literal>). However,
-there is a parser option (<literal>enable_pinstr_nodes</literal>) which changes
-the behaviour of the parser such that extra nodes for processing instructions
-are included into the tree.</para>
-<para>Furthermore, the tree does normally not contain nodes for XML comments;
-they are ignored by default. Again, there is an option
-(<literal>enable_comment_nodes</literal>) changing this.</para>
-</footnote>. 
-
-The common type to represent both kinds of content is <literal>node</literal>
-which is a class type that unifies the properties of elements and character
-data. Every node has a list of children (which is empty if the element is empty
-or the node represents text); nodes may have attributes; nodes have always text
-contents. There are two implementations of <literal>node</literal>, the class
-<literal>element_impl</literal> for elements, and the class
-<literal>data_impl</literal> for text data. You find these classes and class
-types in the module <literal>Pxp_document</literal>, too.
-</para>
-
-             <para>
-Note that attribute lists are represented by non-class values.
-</para>
-           </listitem>
-
-           <listitem>
-             <para>
-<emphasis>The node extension:</emphasis> For advanced usage, every node of the
-document may have an associated <emphasis>extension</emphasis> which is simply
-a second object. This object must have the three methods
-<literal>clone</literal>, <literal>node</literal>, and
-<literal>set_node</literal> as bare minimum, but you are free to add methods as
-you want. This is the preferred way to add functionality to the document
-tree<footnote><para>Due to the typing system it is more or less impossible to
-derive recursive classes in O'Caml. To get around this, it is common practice
-to put the modifiable or extensible part of recursive objects into parallel
-objects.</para> </footnote>. The class type <literal>extension</literal> is
-defined in <literal>Pxp_document</literal>, too.
-</para>
-           </listitem>
-
-           <listitem>
-             <para>
-<emphasis>The DTD:</emphasis> Sometimes it is necessary to access the DTD of a
-document; the average application does not need this feature. The class
-<literal>dtd</literal> describes DTDs, and makes it possible to get
-representations of element, entity, and notation declarations as well as
-processing instructions contained in the DTD. This class, and
-<literal>dtd_element</literal>, <literal>dtd_notation</literal>, and
-<literal>proc_instruction</literal> can be found in the module
-<literal>Pxp_dtd</literal>. There are a couple of classes representing
-different kinds of entities; these can be found in the module
-<literal>Pxp_entity</literal>. 
-</para>
-           </listitem>
-         </itemizedlist>
-
-Additionally, the following modules play a role:
-
-<itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para>
-<emphasis>Pxp_yacc:</emphasis> Here the main parsing functions such as
-<literal>parse_document_entity</literal> are located. Some additional types and
-functions allow the parser to be configured in a non-standard way.
-</para>
-           </listitem>
-
-           <listitem>
-             <para>
-<emphasis>Pxp_types:</emphasis> This is a collection of basic types and
-exceptions. 
-</para>
-           </listitem>
-         </itemizedlist>
-
-There are some further modules that are needed internally but are not part of
-the API.
-</para>
-
-       <para>
-Let the document to be parsed be stored in a file called
-<literal>doc.xml</literal>. The parsing process is started by calling the
-function
-
-<programlisting>
-val parse_document_entity : config -> source -> 'ext spec -> 'ext document
-</programlisting>
-
-defined in the module <literal>Pxp_yacc</literal>. The first argument
-specifies some global properties of the parser; it is recommended to start with
-the <literal>default_config</literal>. The second argument determines where the
-document to be parsed comes from; this may be a file, a channel, or an entity
-ID. To parse <literal>doc.xml</literal>, it is sufficient to pass
-<literal>from_file "doc.xml"</literal>. 
-</para>
-
-       <para>
-The third argument passes the object specification to use. Roughly
-speaking, it determines which classes implement the node objects of which
-element types, and which extensions are to be used. The <literal>'ext</literal>
-polymorphic variable is the type of the extension. For the moment, let us
-simply pass <literal>default_spec</literal> as this argument, and ignore it.
-</para>
-
-       <para>
-So the following expression parses <literal>doc.xml</literal>:
-
-<programlisting>
-open Pxp_yacc
-let d = parse_document_entity default_config (from_file "doc.xml") default_spec
-</programlisting>
-
-Note that <literal>default_config</literal> implies that warnings are collected
-but not printed. Errors raise one of the exception defined in
-<literal>Pxp_types</literal>; to get readable errors and warnings catch the
-exceptions as follows:
-
-<programlisting>
-<![CDATA[class warner =
-  object 
-    method warn w =
-      print_endline ("WARNING: " ^ w)
-  end
-;;
-
-try
-  let config = { default_config with warner = new warner } in
-  let d = parse_document_entity config (from_file "doc.xml") default_spec
-  in
-    ...
-with
-   e ->
-     print_endline (Pxp_types.string_of_exn e)
-]]></programlisting>
-
-Now <literal>d</literal> is an object of the <literal>document</literal>
-class. If you want the node tree, you can get the root element by
-
-<programlisting>
-let root = d # root
-</programlisting>
-
-and if you would rather like to access the DTD, determine it by
-
-<programlisting>
-let dtd = d # dtd
-</programlisting>
-
-As it is more interesting, let us investigate the node tree now. Given the root
-element, it is possible to recursively traverse the whole tree. The children of
-a node <literal>n</literal> are returned by the method
-<literal>sub_nodes</literal>, and the type of a node is returned by
-<literal>node_type</literal>. This function traverses the tree, and prints the
-type of each node:
-
-<programlisting>
-<![CDATA[let rec print_structure n =
-  let ntype = n # node_type in
-  match ntype with
-    T_element name ->
-      print_endline ("Element of type " ^ name);
-      let children = n # sub_nodes in
-      List.iter print_structure children
-  | T_data ->
-      print_endline "Data"
-  | _ ->
-      (* Other node types are not possible unless the parser is configured
-         differently.
-       *)
-      assert false
-]]></programlisting>
-
-You can call this function by
-
-<programlisting>
-print_structure root
-</programlisting>
-
-The type returned by <literal>node_type</literal> is either <literal>T_element
-name</literal> or <literal>T_data</literal>. The <literal>name</literal> of the
-element type is the string included in the angle brackets. Note that only
-elements have children; data nodes are always leaves of the tree.
-</para>
-
-       <para>
-There are some more methods in order to access a parsed node tree:
-
-<itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para>
-<literal>n # parent</literal>: Returns the parent node, or raises
-<literal>Not_found</literal> if the node is already the root
-</para>
-           </listitem>
-           <listitem>
-             <para>
-<literal>n # root</literal>: Returns the root of the node tree. 
-</para>
-           </listitem>
-           <listitem>
-             <para>
-<literal>n # attribute a</literal>: Returns the value of the attribute with
-name <literal>a</literal>. The method returns a value for every
-<emphasis>declared</emphasis> attribute, independently of whether the attribute
-instance is defined or not. If the attribute is not declared,
-<literal>Not_found</literal> will be raised. (In well-formedness mode, every
-attribute is considered as being implicitly declared with type
-<literal>CDATA</literal>.) 
-</para>
-
-<para>
-The following return values are possible: <literal>Value s</literal>, 
-<literal>Valuelist sl</literal> , and <literal>Implied_value</literal>. 
-The first two value types indicate that the attribute value is available,
-either because there is a definition
-<literal><replaceable>a</replaceable>="<replaceable>value</replaceable>"</literal>
-in the XML text, or because there is a default value (declared in the
-DTD). Only if both the instance definition and the default declaration are
-missing, the latter value <literal>Implied_value</literal> will be returned.
-</para>
-
-<para>
-In the DTD, every attribute is typed. There are single-value types (CDATA, ID,
-IDREF, ENTITY, NMTOKEN, enumerations), in which case the method passes
-<literal>Value s</literal> back, where <literal>s</literal> is the normalized
-string value of the attribute. The other types (IDREFS, ENTITIES, NMTOKENS)
-represent list values, and the parser splits the XML literal into several
-tokens and returns these tokens as <literal>Valuelist sl</literal>.
-</para>
-
-<para>
-Normalization means that entity references (the
-<literal>&amp;<replaceable>name</replaceable>;</literal> tokens) and
-character references
-(<literal>&amp;#<replaceable>number</replaceable>;</literal>) are replaced
-by the text they represent, and that white space characters are converted into
-plain spaces.
-</para>
-           </listitem>
-           <listitem>
-             <para>
-<literal>n # data</literal>: Returns the character data contained in the
-node. For data nodes, the meaning is obvious as this is the main content of
-data nodes. For element nodes, this method returns the concatenated contents of
-all inner data nodes.
-</para>
-             <para>
-Note that entity references included in the text are resolved while they are
-being parsed; for example the text <![CDATA["a &lt;&gt; b"]]> will be returned
-as <![CDATA["a <> b"]]> by this method. Spaces of data nodes are always
-preserved. Newlines are preserved, but always converted to \n characters even
-if newlines are encoded as \r\n or \r. Normally you will never see two adjacent
-data nodes because the parser collapses all data material at one location into
-one node. (However, if you create your own tree or transform the parsed tree,
-it is possible to have adjacent data nodes.)
-</para>
-             <para>
-Note that elements that do <emphasis>not</emphasis> allow #PCDATA as content
-will not have data nodes as children. This means that spaces and newlines, the
-only character material allowed for such elements, are silently dropped.
-</para>
-           </listitem>
-         </itemizedlist>
-
-For example, if the task is to print all contents of elements with type
-"valuable" whose attribute "priority" is "1", this function can help:
-
-<programlisting>
-<![CDATA[let rec print_valuable_prio1 n =
-  let ntype = n # node_type in
-  match ntype with
-    T_element "valuable" when n # attribute "priority" = Value "1" ->
-      print_endline "Valuable node with priotity 1 found:";
-      print_endline (n # data)
-  | (T_element _ | T_data) ->
-      let children = n # sub_nodes in
-      List.iter print_valuable_prio1 children
-  | _ ->
-      assert false
-]]></programlisting>
-
-You can call this function by:
-
-<programlisting>
-print_valuable_prio1 root
-</programlisting>
-
-If you like a DSSSL-like style, you can make the function
-<literal>process_children</literal> explicit:
-
-<programlisting>
-<![CDATA[let rec print_valuable_prio1 n =
-
-  let process_children n =
-    let children = n # sub_nodes in
-    List.iter print_valuable_prio1 children 
-  in
-
-  let ntype = n # node_type in
-  match ntype with
-    T_element "valuable" when n # attribute "priority" = Value "1" ->
-      print_endline "Valuable node with priority 1 found:";
-      print_endline (n # data)
-  | (T_element _ | T_data) ->
-      process_children n
-  | _ ->
-      assert false
-]]></programlisting>
-
-So far, O'Caml is now a simple "style-sheet language": You can form a big
-"match" expression to distinguish between all significant cases, and provide
-different reactions on different conditions. But this technique has
-limitations; the "match" expression tends to get larger and larger, and it is
-difficult to store intermediate values as there is only one big
-recursion. Alternatively, it is also possible to represent the various cases as
-classes, and to use dynamic method lookup to find the appropiate class. The
-next section explains this technique in detail.
-
-</para>
-      </sect1>
-
-
-      <!-- ================================================== -->
-
-
-      <sect1>
-       <title>Class-based processing of the node tree</title>
-       <para>
-By default, the parsed node tree consists of objects of the same class; this is
-a good design as long as you want only to access selected parts of the
-document. For complex transformations, it may be better to use different
-classes for objects describing different element types.
-</para>
-
-       <para>
-For example, if the DTD declares the element types <literal>a</literal>,
-<literal>b</literal>, and <literal>c</literal>, and if the task is to convert
-an arbitrary document into a printable format, the idea is to define for every
-element type a separate class that has a method <literal>print</literal>. The
-classes are <literal>eltype_a</literal>, <literal>eltype_b</literal>, and
-<literal>eltype_c</literal>, and every class implements
-<literal>print</literal> such that elements of the type corresponding to the
-class are converted to the output format.
-</para>
-
-       <para>
-The parser supports such a design directly. As it is impossible to derive
-recursive classes in O'Caml<footnote><para>The problem is that the subclass is
-usually not a subtype in this case because O'Caml has a contravariant subtyping
-rule. </para> </footnote>, the specialized element classes cannot be formed by
-simply inheriting from the built-in classes of the parser and adding methods
-for customized functionality. To get around this limitation, every node of the
-document tree is represented by <emphasis>two</emphasis> objects, one called
-"the node" and containing the recursive definition of the tree, one called "the
-extension". Every node object has a reference to the extension, and the
-extension has a reference to the node. The advantage of this model is that it
-is now possible to customize the extension without affecting the typing
-constraints of the recursive node definition.
-</para>
-
-       <para>
-Every extension must have the three methods <literal>clone</literal>,
-<literal>node</literal>, and <literal>set_node</literal>. The method
-<literal>clone</literal> creates a deep copy of the extension object and
-returns it; <literal>node</literal> returns the node object for this extension
-object; and <literal>set_node</literal> is used to tell the extension object
-which node is associated with it, this method is automatically called when the
-node tree is initialized. The following definition is a good starting point
-for these methods; usually <literal>clone</literal> must be further refined
-when instance variables are added to the class:
-
-<programlisting>
-<![CDATA[class custom_extension =
-  object (self)
-
-    val mutable node = (None : custom_extension node option)
-
-    method clone = {< >} 
-    method node =
-      match node with
-          None ->
-            assert false
-        | Some n -> n
-    method set_node n =
-      node <- Some n
-
-  end
-]]>
-</programlisting>
-
-This part of the extension is usually the same for all classes, so it is a good
-idea to consider <literal>custom_extension</literal> as the super-class of the
-further class definitions. Continuining the example of above, we can define the
-element type classes as follows:
-
-<programlisting>
-<![CDATA[class virtual custom_extension =
-  object (self)
-    ... clone, node, set_node defined as above ...
-
-    method virtual print : out_channel -> unit
-  end
-
-class eltype_a =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end
-
-class eltype_b =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end
-
-class eltype_c =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end
-]]></programlisting>
-
-The method <literal>print</literal> can now be implemented for every element
-type separately. Note that you get the associated node by invoking
-
-<programlisting>
-self # node
-</programlisting>
-
-and you get the extension object of a node <literal>n</literal> by writing 
-
-<programlisting>
-n # extension
-</programlisting>
-
-It is guaranteed that 
-
-<programlisting>
-self # node # extension == self
-</programlisting>
-
-always holds.
-</para>
-
-       <para>Here are sample definitions of the <literal>print</literal>
-methods:
-
-<programlisting><![CDATA[
-class eltype_a =
-  object (self)
-    inherit custom_extension
-    method print ch = 
-      (* Nodes <a>...</a> are only containers: *)
-      output_string ch "(";
-      List.iter
-        (fun n -> n # extension # print ch)
-        (self # node # sub_nodes);
-      output_string ch ")";
-  end
-
-class eltype_b =
-  object (self)
-    inherit custom_extension
-    method print ch =
-      (* Print the value of the CDATA attribute "print": *)
-      match self # node # attribute "print" with
-        Value s       -> output_string ch s
-      | Implied_value -> output_string ch "<missing>"
-      | Valuelist l   -> assert false   
-                         (* not possible because the att is CDATA *)
-  end
-
-class eltype_c =
-  object (self)
-    inherit custom_extension
-    method print ch = 
-      (* Print the contents of this element: *)
-      output_string ch (self # node # data)
-  end
-
-class null_extension =
-  object (self)
-    inherit custom_extension
-    method print ch = assert false
-  end
-]]></programlisting>
-</para>
-
-
-       <para>
-The remaining task is to configure the parser such that these extension classes
-are actually used. Here another problem arises: It is not possible to
-dynamically select the class of an object to be created. As workaround,
-&markup; allows the user to specify <emphasis>exemplar objects</emphasis> for
-the various element types; instead of creating the nodes of the tree by
-applying the <literal>new</literal> operator the nodes are produced by
-duplicating the exemplars. As object duplication preserves the class of the
-object, one can create fresh objects of every class for which previously an
-exemplar has been registered.
-</para>
-
-       <para>
-Exemplars are meant as objects without contents, the only interesting thing is
-that exemplars are instances of a certain class. The creation of an exemplar
-for an element node can be done by:
-
-<programlisting>
-let element_exemplar = new element_impl extension_exemplar
-</programlisting>
-
-And a data node exemplar is created by:
-
-<programlisting>
-let data_exemplar = new data_impl extension_exemplar
-</programlisting>
-
-The classes <literal>element_impl</literal> and <literal>data_impl</literal>
-are defined in the module <literal>Pxp_document</literal>. The constructors
-initialize the fresh objects as empty objects, i.e. without children, without
-data contents, and so on. The <literal>extension_exemplar</literal> is the
-initial extension object the exemplars are associated with. 
-</para>
-
-       <para>
-Once the exemplars are created and stored somewhere (e.g. in a hash table), you
-can take an exemplar and create a concrete instance (with contents) by
-duplicating it. As user of the parser you are normally not concerned with this
-as this is part of the internal logic of the parser, but as background knowledge
-it is worthwhile to mention that the two methods
-<literal>create_element</literal> and <literal>create_data</literal> actually
-perform the duplication of the exemplar for which they are invoked,
-additionally apply modifications to the clone, and finally return the new
-object. Moreover, the extension object is copied, too, and the new node object
-is associated with the fresh extension object. Note that this is the reason why
-every extension object must have a <literal>clone</literal> method.
-</para>
-
-       <para>
-The configuration of the set of exemplars is passed to the
-<literal>parse_document_entity</literal> function as third argument. In our
-example, this argument can be set up as follows:
-
-<programlisting>
-<![CDATA[let spec =
-  make_spec_from_alist
-    ~data_exemplar:            (new data_impl (new null_extension))
-    ~default_element_exemplar: (new element_impl (new null_extension))
-    ~element_alist:
-       [ "a",  new element_impl (new eltype_a);
-         "b",  new element_impl (new eltype_b);
-         "c",  new element_impl (new eltype_c);
-       ]
-    ()
-]]></programlisting>
-
-The <literal>~element_alist</literal> function argument defines the mapping
-from element types to exemplars as associative list. The argument
-<literal>~data_exemplar</literal> specifies the exemplar for data nodes, and
-the <literal>~default_element_exemplar</literal> is used whenever the parser
-finds an element type for which the associative list does not define an
-exemplar. 
-</para>
-
-       <para>
-The configuration is now complete. You can still use the same parsing
-functions, only the initialization is a bit different. For example, call the
-parser by:
-
-<programlisting>
-let d = parse_document_entity default_config (from_file "doc.xml") spec
-</programlisting>
-
-Note that the resulting document <literal>d</literal> has a usable type;
-especially the <literal>print</literal> method we added is visible. So you can
-print your document by
-
-<programlisting>
-d # root # extension # print stdout
-</programlisting>
-</para>
-
-       <para>
-This object-oriented approach looks rather complicated; this is mostly caused
-by working around some problems of the strict typing system of O'Caml. Some
-auxiliary concepts such as extensions were needed, but the practical
-consequences are low. In the next section, one of the examples of the
-distribution is explained, a converter from <emphasis>readme</emphasis>
-documents to HTML.
-</para>
-
-      </sect1>
-
-
-      <!-- ================================================== -->
-
-
-      <sect1 id="sect.readme.to-html">
-       <title>Example: An HTML backend for the <emphasis>readme</emphasis>
-DTD</title>
-
-       <para>The converter from <emphasis>readme</emphasis> documents to HTML
-documents follows strictly the approach to define one class per element
-type. The HTML code is similar to the <emphasis>readme</emphasis> source,
-because of this most elements can be converted in the following way: Given the
-input element 
-
-<programlisting>
-<![CDATA[<e>content</e>]]>
-</programlisting>
-
-the conversion text is the concatenation of a computed prefix, the recursively
-converted content, and a computed suffix. 
-</para>
-
-       <para>
-Only one element type cannot be handled by this scheme:
-<literal>footnote</literal>. Footnotes are collected while they are found in
-the input text, and they are printed after the main text has been converted and
-printed. 
-</para>
-
-       <sect2>
-         <title>Header</title>
-         <para>
-<programlisting>&readme.code.header;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Type declarations</title>
-         <para>
-<programlisting>&readme.code.footnote-printer;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>store</literal></title>
-         <para>
-The <literal>store</literal> is a container for footnotes. You can add a
-footnote by invoking <literal>alloc_footnote</literal>; the argument is an
-object of the class <literal>footnote_printer</literal>, the method returns the
-number of the footnote. The interesting property of a footnote is that it can
-be converted to HTML, so a <literal>footnote_printer</literal> is an object
-with a method <literal>footnote_to_html</literal>. The class
-<literal>footnote</literal> which is defined below has a compatible method
-<literal>footnote_to_html</literal> such that objects created from it can be
-used as <literal>footnote_printer</literal>s.
-</para>
-         <para>
-The other method, <literal>print_footnotes</literal> prints the footnotes as
-definition list, and is typically invoked after the main material of the page
-has already been printed. Every item of the list is printed by
-<literal>footnote_to_html</literal>.
-</para>
-
-         <para>
-<programlisting>&readme.code.store;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Function <literal>escape_html</literal></title>
-         <para>
-This function converts the characters &lt;, &gt;, &amp;, and " to their HTML
-representation. For example, 
-<literal>escape_html "&lt;&gt;" = "&amp;lt;&amp;gt;"</literal>. Other
-characters are left unchanged.
-
-<programlisting>&readme.code.escape-html;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Virtual class <literal>shared</literal></title>
-         <para>
-This virtual class is the abstract superclass of the extension classes shown
-below. It defines the standard methods <literal>clone</literal>,
-<literal>node</literal>, and <literal>set_node</literal>, and declares the type
-of the virtual method <literal>to_html</literal>. This method recursively
-traverses the whole element tree, and prints the converted HTML code to the
-output channel passed as second argument. The first argument is the reference
-to the global <literal>store</literal> object which collects the footnotes.
-
-<programlisting>&readme.code.shared;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>only_data</literal></title>
-         <para>
-This class defines <literal>to_html</literal> such that the character data of
-the current node is converted to HTML. Note that <literal>self</literal> is an
-extension object, <literal>self # node</literal> is the node object, and
-<literal>self # node # data</literal> returns the character data of the node. 
-
-<programlisting>&readme.code.only-data;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>readme</literal></title>
-         <para>
-This class converts elements of type <literal>readme</literal> to HTML. Such an
-element is (by definition) always the root element of the document. First, the
-HTML header is printed; the <literal>title</literal> attribute of the element
-determines the title of the HTML page. Some aspects of the HTML page can be
-configured by setting certain parameter entities, for example the background
-color, the text color, and link colors. After the header, the
-<literal>body</literal> tag, and the headline have been printed, the contents
-of the page are converted by invoking <literal>to_html</literal> on all
-children of the current node (which is the root node). Then, the footnotes are
-appended to this by telling the global <literal>store</literal> object to print
-the footnotes. Finally, the end tags of the HTML pages are printed.
-</para>
-
-         <para>
-This class is an example how to access the value of an attribute: The value is
-determined by invoking <literal>self # node # attribute "title"</literal>. As
-this attribute has been declared as CDATA and as being required, the value has
-always the form <literal>Value s</literal> where <literal>s</literal> is the
-string value of the attribute. 
-</para>
-
-         <para>
-You can also see how entity contents can be accessed. A parameter entity object
-can be looked up by <literal>self # node # dtd # par_entity "name"</literal>,
-and by invoking <literal>replacement_text</literal> the value of the entity
-is returned after inner parameter and character entities have been
-processed. Note that you must use <literal>gen_entity</literal> instead of
-<literal>par_entity</literal> to access general entities.
-</para>
-
-         <para>
-<programlisting>&readme.code.readme;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Classes <literal>section</literal>, <literal>sect1</literal>,
-<literal>sect2</literal>, and <literal>sect3</literal></title>
-         <para>
-As the conversion process is very similar, the conversion classes of the three
-section levels are derived from the more general <literal>section</literal>
-class. The HTML code of the section levels only differs in the type of the
-headline, and because of this the classes describing the section levels can be
-computed by replacing the class argument <literal>the_tag</literal> of
-<literal>section</literal> by the HTML name of the headline tag.
-</para>
-
-         <para>
-Section elements are converted to HTML by printing a headline and then
-converting the contents of the element recursively. More precisely, the first
-sub-element is always a <literal>title</literal> element, and the other
-elements are the contents of the section. This structure is declared in the
-DTD, and it is guaranteed that the document matches the DTD. Because of this
-the title node can be separated from the rest without any checks.
-</para>
-
-         <para>
-Both the title node, and the body nodes are then converted to HTML by calling
-<literal>to_html</literal> on them.
-</para>
-
-         <para>
-<programlisting>&readme.code.section;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Classes <literal>map_tag</literal>, <literal>p</literal>,
-<literal>em</literal>, <literal>ul</literal>, <literal>li</literal></title>
-         <para>
-Several element types are converted to HTML by simply mapping them to
-corresponding HTML element types. The class <literal>map_tag</literal>
-implements this, and the class argument <literal>the_target_tag</literal>
-determines the tag name to map to. The output consists of the start tag, the
-recursively converted inner elements, and the end tag.
-
-<programlisting>&readme.code.map-tag;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>br</literal></title>
-         <para>
-Element of type <literal>br</literal> are mapped to the same HTML type. Note
-that HTML forbids the end tag of <literal>br</literal>.
-
-<programlisting>&readme.code.br;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>code</literal></title>
-         <para>
-The <literal>code</literal> type is converted to a <literal>pre</literal>
-section (preformatted text). As the meaning of tabs is unspecified in HTML,
-tabs are expanded to spaces.
-
-<programlisting>&readme.code.code;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>a</literal></title>
-         <para>
-Hyperlinks, expressed by the <literal>a</literal> element type, are converted
-to the HTML <literal>a</literal> type. If the target of the hyperlink is given
-by <literal>href</literal>, the URL of this attribute can be used
-directly. Alternatively, the target can be given by
-<literal>readmeref</literal> in which case the ".html" suffix must be added to
-the file name. 
-</para>
-
-         <para>
-Note that within <literal>a</literal> only #PCDATA is allowed, so the contents
-can be converted directly by applying <literal>escape_html</literal> to the
-character data contents.
-
-<programlisting>&readme.code.a;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Class <literal>footnote</literal></title>
-         <para>
-The <literal>footnote</literal> class has two methods:
-<literal>to_html</literal> to convert the footnote reference to HTML, and
-<literal>footnote_to_html</literal> to convert the footnote text itself.
-</para>
-
-         <para>
-The footnote reference is converted to a local hyperlink; more precisely, to
-two anchor tags which are connected with each other. The text anchor points to
-the footnote anchor, and the footnote anchor points to the text anchor.
-</para>
-
-         <para>
-The footnote must be allocated in the <literal>store</literal> object. By
-allocating the footnote, you get the number of the footnote, and the text of
-the footnote is stored until the end of the HTML page is reached when the
-footnotes can be printed. The <literal>to_html</literal> method stores simply
-the object itself, such that the <literal>footnote_to_html</literal> method is
-invoked on the same object that encountered the footnote.
-</para>
-
-         <para>
-The <literal>to_html</literal> only allocates the footnote, and prints the
-reference anchor, but it does not print nor convert the contents of the
-note. This is deferred until the footnotes actually get printed, i.e. the
-recursive call of <literal>to_html</literal> on the sub nodes is done by
-<literal>footnote_to_html</literal>. 
-</para>
-
-         <para>
-Note that this technique does not work if you make another footnote within a
-footnote; the second footnote gets allocated but not printed.
-</para>
-
-         <para>
-<programlisting>&readme.code.footnote;</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>The specification of the document model</title>
-         <para>
-This code sets up the hash table that connects element types with the exemplars
-of the extension classes that convert the elements to HTML.
-
-<programlisting>&readme.code.tag-map;</programlisting>
-</para>
-       </sect2>
-
-<!-- <![RCDATA[&readme.code.to-html;]]> -->
-      </sect1>
-
-    </chapter>
-
-<!-- ********************************************************************** -->
-
-    <chapter>
-      <title>The objects representing the document</title>
-
-      <para>
-<emphasis>This description might be out-of-date. See the module interface files
-for updated information.</emphasis></para>
-
-      <sect1>
-       <title>The <literal>document</literal> class</title>
-       <para>
-<programlisting>
-<![CDATA[
-class [ 'ext ] document :
-  Pxp_types.collect_warnings -> 
-  object
-    method init_xml_version : string -> unit
-    method init_root : 'ext node -> unit
-
-    method xml_version : string
-    method xml_standalone : bool
-    method dtd : dtd
-    method root : 'ext node
-
-    method encoding : Pxp_types.rep_encoding
-
-    method add_pinstr : proc_instruction -> unit
-    method pinstr : string -> proc_instruction list
-    method pinstr_names : string list
-
-    method write : Pxp_types.output_stream -> Pxp_types.encoding -> unit
-
-  end
-;;
-]]>
-</programlisting>
-
-The methods beginning with <literal>init_</literal> are only for internal use
-of the parser.
-</para>
-
-       <itemizedlist mark="bullet" spacing="compact">
-         <listitem>
-           <para>
-<literal>xml_version</literal>: returns the version string at the beginning of
-the document. For example, "1.0" is returned if the document begins with
-<literal>&lt;?xml version="1.0"?&gt;</literal>.</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>xml_standalone</literal>: returns the boolean value of
-<literal>standalone</literal> declaration in the XML declaration. If the
-<literal>standalone</literal> attribute is missing, <literal>false</literal> is
-returned. </para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>dtd</literal>: returns a reference to the global DTD object.</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>root</literal>: returns a reference to the root element.</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>encoding</literal>: returns the internal encoding of the
-document. This means that all strings of which the document consists are
-encoded in this character set.
-</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>pinstr</literal>: returns the processing instructions outside the DTD
-and outside the root element. The argument passed to the method names a
-<emphasis>target</emphasis>, and the method returns all instructions with this
-target. The target is the first word inside <literal>&lt;?</literal> and
-<literal>?&gt;</literal>.</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>pinstr_names</literal>: returns the names of the processing instructions</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>add_pinstr</literal>: adds another processing instruction. This method
-is used by the parser itself to enter the instructions returned by
-<literal>pinstr</literal>, but you can also enter additional instructions.
-</para>
-         </listitem>
-         <listitem>
-           <para>
-<literal>write</literal>: writes the document to the passed stream as XML
-text using the passed (external) encoding. The generated text is always valid
-XML and can be parsed by PXP; however, the text is badly formatted (this is not
-a pretty printer).</para>
-         </listitem>
-       </itemizedlist>
-      </sect1>
-
-<!-- ********************************************************************** -->
-
-      <sect1>
-       <title>The class type <literal>node</literal></title>
-       <para>
-
-From <literal>Pxp_document</literal>:
-
-<programlisting>
-type node_type =
-  T_data
-| T_element of string
-| T_super_root
-| T_pinstr of string
-| T_comment
-<replaceable>and some other, reserved types</replaceable>
-;;
-
-class type [ 'ext ] node =
-  object ('self)
-    constraint 'ext = 'ext node #extension
-
-    <anchor id="type-node-general.sig"
-   >(* <link linkend="type-node-general" endterm="type-node-general.title"
-       ></link> *)
-
-    method extension : 'ext
-    method dtd : dtd
-    method parent : 'ext node
-    method root : 'ext node
-    method sub_nodes : 'ext node list
-    method iter_nodes : ('ext node &fun; unit) &fun; unit
-    method iter_nodes_sibl : 
-           ('ext node option &fun; 'ext node &fun; 'ext node option &fun; unit) &fun; unit
-    method node_type : node_type
-    method encoding : Pxp_types.rep_encoding
-    method data : string
-    method position : (string * int * int)
-    method comment : string option
-    method pinstr : string &fun; proc_instruction list
-    method pinstr_names : string list
-    method write : Pxp_types.output_stream -> Pxp_types.encoding -> unit
-
-    <anchor id="type-node-atts.sig"
-   >(* <link linkend="type-node-atts" endterm="type-node-atts.title"
-       ></link> *)
-
-    method attribute : string &fun; Pxp_types.att_value
-    method required_string_attribute : string &fun; string
-    method optional_string_attribute : string &fun; string option
-    method required_list_attribute : string &fun; string list
-    method optional_list_attribute : string &fun; string list
-    method attribute_names : string list
-    method attribute_type : string &fun; Pxp_types.att_type
-    method attributes : (string * Pxp_types.att_value) list
-    method id_attribute_name : string
-    method id_attribute_value : string
-    method idref_attribute_names : string
-
-    <anchor id="type-node-mods.sig"
-   >(* <link linkend="type-node-mods" endterm="type-node-mods.title"
-       ></link> *)
-
-    method add_node : ?force:bool &fun; 'ext node &fun; unit
-    method add_pinstr : proc_instruction &fun; unit
-    method delete : unit
-    method set_nodes : 'ext node list &fun; unit
-    method quick_set_attributes : (string * Pxp_types.att_value) list &fun; unit
-    method set_comment : string option &fun; unit
-
-    <anchor id="type-node-cloning.sig"
-   >(* <link linkend="type-node-cloning" endterm="type-node-cloning.title"
-       ></link> *)
-
-    method orphaned_clone : 'self
-    method orphaned_flat_clone : 'self
-    method create_element : 
-              ?position:(string * int * int) &fun;
-              dtd &fun; node_type &fun; (string * string) list &fun;
-                  'ext node
-    method create_data : dtd &fun; string &fun; 'ext node
-    method keep_always_whitespace_mode : unit
-
-    <anchor id="type-node-weird.sig"
-   >(* <link linkend="type-node-weird" endterm="type-node-weird.title"
-       ></link> *)
-
-    method local_validate : ?use_dfa:bool -> unit -> unit
-
-    (* ... Internal methods are undocumented. *)
-
-  end
-;;
-</programlisting>
-
-In the module <literal>Pxp_types</literal> you can find another type
-definition that is important in this context:
-
-<programlisting>
-type Pxp_types.att_value =
-    Value     of string
-  | Valuelist of string list
-  | Implied_value
-;;
-</programlisting>
-</para>
-
-       <sect2>
-         <title>The structure of document trees</title>
-
-<para>
-A node represents either an element or a character data section. There are two
-classes implementing the two aspects of nodes: <literal>element_impl</literal>
-and <literal>data_impl</literal>. The latter class does not implement all
-methods because some methods do not make sense for data nodes.
-</para>
-
-<para>
-(Note: PXP also supports a mode which forces that processing instructions and
-comments are represented as nodes of the document tree. However, these nodes
-are instances of <literal>element_impl</literal> with node types
-<literal>T_pinstr</literal> and <literal>T_comment</literal>,
-respectively. This mode must be explicitly configured; the basic representation
-knows only element and data nodes.)
-</para>
-
-       <para>The following figure 
-(<link linkend="node-term" endterm="node-term"></link>) shows an example how
-a tree is constructed from element and data nodes. The circular areas 
-represent element nodes whereas the ovals denote data nodes. Only elements
-may have subnodes; data nodes are always leaves of the tree. The subnodes
-of an element can be either element or data nodes; in both cases the O'Caml
-objects storing the nodes have the class type <literal>node</literal>.</para>
-
-       <para>Attributes (the clouds in the picture) are not directly
-integrated into the tree; there is always an extra link to the attribute
-list. This is also true for processing instructions (not shown in the
-picture). This means that there are separated access methods for attributes and
-processing instructions.</para>
-
-<figure id="node-term" float="1">
-<title>A tree with element nodes, data nodes, and attributes</title>
-<graphic fileref="pic/node_term" format="GIF"></graphic>
-</figure>
-
-       <para>Only elements, data sections, attributes and processing
-instructions (and comments, if configured) can, directly or indirectly, occur
-in the document tree. It is impossible to add entity references to the tree; if
-the parser finds such a reference, not the reference as such but the referenced
-text (i.e. the tree representing the structured text) is included in the
-tree.</para>
-
-       <para>Note that the parser collapses as much data material into one
-data node as possible such that there are normally never two adjacent data
-nodes. This invariant is enforced even if data material is included by entity
-references or CDATA sections, or if a data sequence is interrupted by
-comments. So <literal>a &amp;amp; b &lt;-- comment --&gt; c &lt;![CDATA[
-&lt;&gt; d]]&gt;</literal> is represented by only one data node, for
-instance. However, you can create document trees manually which break this
-invariant; it is only the way the parser forms the tree.
-</para> 
-
-<figure id="node-general" float="1">
-<title>Nodes are doubly linked trees</title>
-<graphic fileref="pic/node_general" format="GIF"></graphic>
-</figure>
-
-       <para>
-The node tree has links in both directions: Every node has a link to its parent
-(if any), and it has links to the subnodes (see 
-figure <link linkend="node-general" endterm="node-general"></link>). Obviously,
-this doubly-linked structure simplifies the navigation in the tree; but has
-also some consequences for the possible operations on trees.</para>
-
-       <para>
-Because every node must have at most <emphasis>one</emphasis> parent node,
-operations are illegal if they violate this condition. The following figure
-(<link linkend="node-add" endterm="node-add"></link>) shows on the left side
-that node <literal>y</literal> is added to <literal>x</literal> as new subnode
-which is allowed because <literal>y</literal> does not have a parent yet. The
-right side of the picture illustrates what would happen if <literal>y</literal>
-had a parent node; this is illegal because <literal>y</literal> would have two
-parents after the operation.</para>
-
-<figure id="node-add" float="1">
-<title>A node can only be added if it is a root</title>
-<graphic fileref="pic/node_add" format="GIF">
-</graphic>
-</figure>
-
-       <para>
-The "delete" operation simply removes the links between two nodes. In the
-picture (<link linkend="node-delete" endterm="node-delete"></link>) the node
-<literal>x</literal> is deleted from the list of subnodes of
-<literal>y</literal>. After that, <literal>x</literal> becomes the root of the
-subtree starting at this node.</para>
-
-<figure id="node-delete" float="1">
-<title>A deleted node becomes the root of the subtree</title>
-<graphic fileref="pic/node_delete" format="GIF"></graphic>
-</figure>
-
-       <para>
-It is also possible to make a clone of a subtree; illustrated in 
-<link linkend="node-clone" endterm="node-clone"></link>. In this case, the
-clone is a copy of the original subtree except that it is no longer a
-subnode. Because cloning never keeps the connection to the parent, the clones
-are called <emphasis>orphaned</emphasis>.
-</para>
-
-<figure id="node-clone" float="1">
-<title>The clone of a subtree</title>
-<graphic fileref="pic/node_clone" format="GIF"></graphic>
-</figure>
-       </sect2>
-
-       <sect2>
-         <title>The methods of the class type <literal>node</literal></title>
-
-         <anchor id="type-node-general">
-         <formalpara>
-           <title id="type-node-general.title">
-              <link linkend="type-node-general.sig">General observers</link>
-            </title>
-
-           <para>
-             <itemizedlist mark="bullet" spacing="compact">
-               <listitem>
-                 <para>
-<literal>extension</literal>: The reference to the extension object which
-belongs to this node (see ...).</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>dtd</literal>: Returns a reference to the global DTD. All nodes
-of a tree must share the same DTD.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>parent</literal>: Get the father node. Raises
-<literal>Not_found</literal> in the case the node does not have a
-parent, i.e. the node is the root.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>root</literal>: Gets the reference to the root node of the tree.
-Every node is contained in a tree with a root, so this method always 
-succeeds. Note that this method <emphasis>searches</emphasis> the root,
-which costs time proportional to the length of the path to the root.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>sub_nodes</literal>: Returns references to the children. The returned
-list reflects the order of the children. For data nodes, this method returns
-the empty list.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>iter_nodes f</literal>: Iterates over the children, and calls
-<literal>f</literal> for every child in turn. 
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>iter_nodes_sibl f</literal>: Iterates over the children, and calls
-<literal>f</literal> for every child in turn. <literal>f</literal> gets as
-arguments the previous node, the current node, and the next node.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>node_type</literal>: Returns either <literal>T_data</literal> which
-means that the node is a data node, or <literal>T_element n</literal>
-which means that the node is an element of type <literal>n</literal>. 
-If configured, possible node types are also <literal>T_pinstr t</literal>
-indicating that the node represents a processing instruction with target
-<literal>t</literal>, and <literal>T_comment</literal> in which case the node
-is a comment.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>encoding</literal>: Returns the encoding of the strings.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>data</literal>: Returns the character data of this node and all
-children, concatenated as one string. The encoding of the string is what
-the method <literal>encoding</literal> returns.
-- For data nodes, this method simply returns the represented characters.
-For elements, the meaning of the method has been extended such that it
-returns something useful, i.e. the effectively contained characters, without
-markup. (For <literal>T_pinstr</literal> and <literal>T_comment</literal>
-nodes, the method returns the empty string.)
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>position</literal>: If configured, this method returns the position of
-the element as triple (entity, line, byteposition). For data nodes, the
-position is not stored. If the position is not available the triple
-<literal>"?", 0, 0</literal> is returned.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>comment</literal>: Returns <literal>Some text</literal> for comment
-nodes, and <literal>None</literal> for other nodes. The <literal>text</literal>
-is everything between the comment delimiters <literal>&lt;--</literal> and
-<literal>--&gt;</literal>.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>pinstr n</literal>: Returns all processing instructions that are
-directly contained in this element and that have a <emphasis>target</emphasis>
-specification of <literal>n</literal>. The target is the first word after
-the <literal>&lt;?</literal>.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>pinstr_names</literal>: Returns the list of all targets of processing
-instructions directly contained in this element.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>write s enc</literal>: Prints the node and all subnodes to the passed
-output stream as valid XML text, using the passed external encoding.
-</para>
-               </listitem>
-             </itemizedlist>
-            </para>
-         </formalpara>
-
-         <anchor id="type-node-atts">
-         <formalpara>
-           <title id="type-node-atts.title">
-              <link linkend="type-node-atts.sig">Attribute observers</link>
-            </title>
-           <para>
-             <itemizedlist mark="bullet" spacing="compact">
-               <listitem>
-                 <para>
-<literal>attribute n</literal>: Returns the value of the attribute with name
-<literal>n</literal>. This method returns a value for every declared 
-attribute, and it raises <literal>Not_found</literal> for any undeclared
-attribute. Note that it even returns a value if the attribute is actually
-missing but is declared as <literal>#IMPLIED</literal> or has a default
-value. - Possible values are:
-                  <itemizedlist mark="bullet" spacing="compact">
-                     <listitem>
-                       <para>
-<literal>Implied_value</literal>: The attribute has been declared with the
-keyword <literal>#IMPLIED</literal>, and the attribute is missing in the
-attribute list of this element.</para>
-                     </listitem>
-                     <listitem>
-                       <para>
-<literal>Value s</literal>: The attribute has been declared as type
-<literal>CDATA</literal>, as <literal>ID</literal>, as
-<literal>IDREF</literal>, as <literal>ENTITY</literal>, or as
-<literal>NMTOKEN</literal>, or as enumeration or notation, and one of the two
-conditions holds: (1) The attribute value is present in the attribute list in
-which case the value is returned in the string <literal>s</literal>. (2) The
-attribute has been omitted, and the DTD declared the attribute with a default
-value. The default value is returned in <literal>s</literal>. 
-- Summarized, <literal>Value s</literal> is returned for non-implied, non-list 
-attribute values.
-</para>
-                     </listitem>
-                     <listitem>
-                       <para>
-<literal>Valuelist l</literal>: The attribute has been declared as type
-<literal>IDREFS</literal>, as <literal>ENTITIES</literal>, or
-as <literal>NMTOKENS</literal>, and one of the two conditions holds: (1) The
-attribute value is present in the attribute list in which case the
-space-separated tokens of the value are returned in the string list
-<literal>l</literal>. (2) The attribute has been omitted, and the DTD declared
-the attribute with a default value. The default value is returned in
-<literal>l</literal>. 
-- Summarized, <literal>Valuelist l</literal> is returned for all list-type
-attribute values.
-</para>
-                     </listitem>
-                   </itemizedlist>
-
-Note that before the attribute value is returned, the value is normalized. This
-means that newlines are converted to spaces, and that references to character
-entities (i.e. <literal>&amp;#<replaceable>n</replaceable>;</literal>) and
-general entities
-(i.e. <literal>&amp;<replaceable>name</replaceable>;</literal>) are expanded;
-if necessary, expansion is performed recursively.
-</para>
-
-<para>
-In well-formedness mode, there is no DTD which could declare an
-attribute. Because of this, every occuring attribute is considered as a CDATA
-attribute.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>required_string_attribute n</literal>: returns the Value attribute
-called n, or the Valuelist attribute as a string where the list elements
-are separated by spaces. If the attribute value is implied, or if the
-attribute does not exists, the method will fail. - This method is convenient
-if you expect a non-implied and non-list attribute value.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>optional_string_attribute n</literal>: returns the Value attribute
-called n, or the Valuelist attribute as a string where the list elements
-are separated by spaces. If the attribute value is implied, or if the
-attribute does not exists, the method returns None. - This method is 
-convenient if you expect a non-list attribute value including the implied
-value.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>required_list_attribute n</literal>: returns the Valuelist attribute
-called n, or the Value attribute as a list with a single element.
-If the attribute value is implied, or if the
-attribute does not exists, the method will fail. - This method is 
-convenient if you expect a list attribute value.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>optional_list_attribute n</literal>: returns the Valuelist attribute
-called n, or the Value attribute as a list with a single element.
-If the attribute value is implied, or if the
-attribute does not exists, an empty list will be returned. - This method
-is convenient if you expect a list attribute value or the implied value.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>attribute_names</literal>: returns the list of all attribute names of
-this element. As this is a validating parser, this list is equal to the
-list of declared attributes.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>attribute_type n</literal>: returns the type of the attribute called
-<literal>n</literal>. See the module <literal>Pxp_types</literal> for a
-description of the encoding of the types.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>attributes</literal>: returns the list of pairs of names and values
-for all attributes of
-this element.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>id_attribute_name</literal>: returns the name of the attribute that is
-declared with type ID. There is at most one such attribute. The method raises
-<literal>Not_found</literal> if there is no declared ID attribute for the
-element type.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>id_attribute_value</literal>: returns the value of the attribute that
-is declared with type ID. There is at most one such attribute. The method raises
-<literal>Not_found</literal> if there is no declared ID attribute for the
-element type.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>idref_attribute_names</literal>: returns the list of attribute names
-that are declared as IDREF or IDREFS.</para>
-               </listitem>
-             </itemizedlist>
-          </para>
-         </formalpara>
-         
-         <anchor id="type-node-mods">
-         <formalpara>
-           <title id="type-node-mods.title">
-              <link linkend="type-node-mods.sig">Modifying methods</link>
-            </title>
-           
-           <para>
-The following methods are only defined for element nodes (more exactly:
-the methods are defined for data nodes, too, but fail always).
-
-             <itemizedlist mark="bullet" spacing="compact">
-               <listitem>
-                 <para>
-<literal>add_node sn</literal>: Adds sub node <literal>sn</literal> to the list
-of children. This operation is illustrated in the picture 
-<link linkend="node-add" endterm="node-add"></link>. This method expects that
-<literal>sn</literal> is a root, and it requires that <literal>sn</literal> and
-the current object share the same DTD.
-</para>
-
-<para>Because <literal>add_node</literal> is the method the parser itself uses
-to add new nodes to the tree, it performs by default some simple validation
-checks: If the content model is a regular expression, it is not allowed to add
-data nodes to this node unless the new nodes consist only of whitespace. In
-this case, the new data nodes are silently dropped (you can change this by
-invoking <literal>keep_always_whitespace_mode</literal>).
-</para>
-
-<para>If the document is flagged as stand-alone, these data nodes only
-containing whitespace are even forbidden if the element declaration is
-contained in an external entity. This case is detected and rejected.</para>
-
-<para>If the content model is <literal>EMPTY</literal>, it is not allowed to
-add any data node unless the data node is empty. In this case, the new data
-node is silently dropped.
-</para>
-
-<para>These checks only apply if there is a DTD. In well-formedness mode, it is
-assumed that every element is declared with content model
-<literal>ANY</literal> which prohibits any validation check. Furthermore, you
-turn these checks off by passing <literal>~force:true</literal> as first
-argument.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>add_pinstr pi</literal>: Adds the processing instruction
-<literal>pi</literal> to the list of processing instructions.
-</para>
-               </listitem>
-
-               <listitem>
-                 <para>
-<literal>delete</literal>: Deletes this node from the tree. After this
-operation, this node is no longer the child of the former father node; and the
-node loses the connection to the father as well. This operation is illustrated
-by the figure <link linkend="node-delete" endterm="node-delete"></link>.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>set_nodes nl</literal>: Sets the list of children to
-<literal>nl</literal>. It is required that every member of <literal>nl</literal>
-is a root, and that all members and the current object share the same DTD.
-Unlike <literal>add_node</literal>, no validation checks are performed.
-</para>
-             </listitem>
-             <listitem>
-                 <para>
-<literal>quick_set_attributes atts</literal>: sets the attributes of this
-element to <literal>atts</literal>. It is <emphasis>not</emphasis> checked
-whether <literal>atts</literal> matches the DTD or not; it is up to the
-caller of this method to ensure this. (This method may be useful to transform
-the attribute values, i.e. apply a mapping to every attribute.)
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>set_comment text</literal>: This method is only applicable to
-<literal>T_comment</literal> nodes; it sets the comment text contained by such
-nodes. </para>
-               </listitem>
-             </itemizedlist>
-</para>
-         </formalpara>
-         
-         <anchor id="type-node-cloning">
-         <formalpara>
-           <title id="type-node-cloning.title">
-              <link linkend="type-node-cloning.sig">Cloning methods</link>
-            </title>
-
-           <para>
-             <itemizedlist mark="bullet" spacing="compact">
-               <listitem>
-                 <para>
-<literal>orphaned_clone</literal>: Returns a clone of the node and the complete
-tree below this node (deep clone). The clone does not have a parent (i.e. the
-reference to the parent node is <emphasis>not</emphasis> cloned). While
-copying the subtree, strings are skipped; it is likely that the original tree
-and the copy tree share strings. Extension objects are cloned by invoking
-the <literal>clone</literal> method on the original objects; how much of
-the extension objects is cloned depends on the implemention of this method.
-</para>
-                 <para>This operation is illustrated by the figure 
-<link linkend="node-clone" endterm="node-clone"></link>.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>orphaned_flat_clone</literal>: Returns a clone of the node,
-but sets the list of sub nodes to [], i.e. the sub nodes are not cloned.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<anchor id="type-node-meth-create-element">
-<literal>create_element dtd nt al</literal>: Returns a flat copy of this node
-(which must be an element) with the following modifications: The DTD is set to
-<literal>dtd</literal>; the node type is set to <literal>nt</literal>, and the
-new attribute list is set to <literal>al</literal> (given as list of
-(name,value) pairs). The copy does not have children nor a parent. It does not
-contain processing instructions. See 
-<link linkend="type-node-ex-create-element">the example below</link>.
-</para>
-
-                 <para>Note that you can specify the position of the new node
-by the optional argument <literal>~position</literal>.</para>
-               </listitem>
-               <listitem>
-                 <para>
-<anchor id="type-node-meth-create-data">
-<literal>create_data dtd cdata</literal>: Returns a flat copy of this node
-(which must be a data node) with the following modifications: The DTD is set to
-<literal>dtd</literal>; the node type is set to <literal>T_data</literal>; the
-attribute list is empty (data nodes never have attributes); the list of
-children and PIs is empty, too (same reason). The new node does not have a
-parent. The value <literal>cdata</literal> is the new character content of the
-node. See 
-<link linkend="type-node-ex-create-data">the example below</link>.
-</para>
-               </listitem>
-               <listitem>
-                 <para>
-<literal>keep_always_whitespace_mode</literal>: Even data nodes which are
-normally dropped because they only contain ignorable whitespace, can added to
-this node once this mode is turned on. (This mode is useful to produce
-canonical XML.)
-</para>
-               </listitem>
-             </itemizedlist>
-</para>
-         </formalpara>
-         
-         <anchor id="type-node-weird">
-         <formalpara>
-           <title id="type-node-weird.title">
-              <link linkend="type-node-weird.sig">Validating methods</link>
-            </title>
-           <para>
-There is one method which locally validates the node, i.e. checks whether the
-subnodes match the content model of this node.
-
-             <itemizedlist mark="bullet" spacing="compact">
-               <listitem>
-                 <para>
-<literal>local_validate</literal>: Checks that this node conforms to the
-DTD by comparing the type of the subnodes with the content model for this
-node. (Applications need not call this method unless they add new nodes
-themselves to the tree.)
-</para>
-               </listitem>
-             </itemizedlist>
-</para>
-         </formalpara>
-       </sect2>
-
-       <sect2>
-         <title>The class <literal>element_impl</literal></title>
-         <para>
-This class is an implementation of <literal>node</literal> which
-realizes element nodes:
-
-<programlisting>
-<![CDATA[
-class [ 'ext ] element_impl : 'ext -> [ 'ext ] node
-]]>
-</programlisting>
-
-</para>
-         <formalpara>
-           <title>Constructor</title>
-           <para>
-You can create a new instance by
-
-<programlisting>
-new element_impl <replaceable>extension_object</replaceable>
-</programlisting>
-
-which creates a special form of empty element which already contains a
-reference to the <replaceable>extension_object</replaceable>, but is
-otherwise empty. This special form is called an
-<emphasis>exemplar</emphasis>. The purpose of exemplars is that they serve as
-patterns that can be duplicated and filled with data. The method
-<link linkend="type-node-meth-create-element">
-<literal>create_element</literal></link> is designed to perform this action.
-</para>
-         </formalpara>
-
-         <anchor id="type-node-ex-create-element">
-         <formalpara>
-           <title>Example</title>
-
-           <para>First, create an exemplar by
-
-<programlisting>
-let exemplar_ext = ... in
-let exemplar     = new element_impl exemplar_ext in
-</programlisting>
-
-The <literal>exemplar</literal> is not used in node trees, but only as
-a pattern when the element nodes are created:
-
-<programlisting>
-let element = exemplar # <link linkend="type-node-meth-create-element">create_element</link> dtd (T_element name) attlist 
-</programlisting>
-
-The <literal>element</literal> is a copy of <literal>exemplar</literal>
-(even the extension <literal>exemplar_ext</literal> has been copied)
-which ensures that <literal>element</literal> and its extension are objects
-of the same class as the exemplars; note that you need not to pass a 
-class name or other meta information. The copy is initially connected 
-with the <literal>dtd</literal>, it gets a node type, and the attribute list
-is filled. The <literal>element</literal> is now fully functional; it can
-be added to another element as child, and it can contain references to
-subnodes.
-</para>
-         </formalpara>
-
-       </sect2>
-
-       <sect2>
-         <title>The class <literal>data_impl</literal></title>
-         <para>
-This class is an implementation of <literal>node</literal> which
-should be used for all character data nodes:
-
-<programlisting>
-<![CDATA[
-class [ 'ext ] data_impl : 'ext -> [ 'ext ] node
-]]>
-</programlisting>
-
-</para>
-
-         <formalpara>
-           <title>Constructor</title>
-           <para>
-You can create a new instance by
-
-<programlisting>
-new data_impl <replaceable>extension_object</replaceable>
-</programlisting>
-
-which creates an empty exemplar node which is connected to
-<replaceable>extension_object</replaceable>. The node does not contain a
-reference to any DTD, and because of this it cannot be added to node trees.
-</para>
-         </formalpara>
-           
-         <para>To get a fully working data node, apply the method
-<link linkend="type-node-meth-create-data"><literal>create_data</literal>
-</link> to the exemplar (see example).
-</para>
-
-         <anchor id="type-node-ex-create-data">
-         <formalpara>
-           <title>Example</title>
-
-           <para>First, create an exemplar by
-
-<programlisting>
-let exemplar_ext = ... in
-let exemplar     = new exemplar_ext data_impl in
-</programlisting>
-
-The <literal>exemplar</literal> is not used in node trees, but only as
-a pattern when the data nodes are created:
-
-<programlisting>
-let data_node = exemplar # <link
-                                linkend="type-node-meth-create-data">create_data</link> dtd "The characters contained in the data node" 
-</programlisting>
-
-The <literal>data_node</literal> is a copy of <literal>exemplar</literal>.
-The copy is initially connected 
-with the <literal>dtd</literal>, and it is filled with character material.
-The <literal>data_node</literal> is now fully functional; it can
-be added to an element as child.
-</para>
-         </formalpara>
-       </sect2>
-
-       <sect2>
-         <title>The type <literal>spec</literal></title>
-         <para>
-The type <literal>spec</literal> defines a way to handle the details of
-creating nodes from exemplars.
-
-<programlisting><![CDATA[
-type 'ext spec
-constraint 'ext = 'ext node #extension
-
-val make_spec_from_mapping :
-      ?super_root_exemplar : 'ext node ->
-      ?comment_exemplar : 'ext node ->
-      ?default_pinstr_exemplar : 'ext node ->
-      ?pinstr_mapping : (string, 'ext node) Hashtbl.t ->
-      data_exemplar: 'ext node ->
-      default_element_exemplar: 'ext node ->
-      element_mapping: (string, 'ext node) Hashtbl.t -> 
-      unit -> 
-        'ext spec
-
-val make_spec_from_alist :
-      ?super_root_exemplar : 'ext node ->
-      ?comment_exemplar : 'ext node ->
-      ?default_pinstr_exemplar : 'ext node ->
-      ?pinstr_alist : (string * 'ext node) list ->
-      data_exemplar: 'ext node ->
-      default_element_exemplar: 'ext node ->
-      element_alist: (string * 'ext node) list -> 
-      unit -> 
-        'ext spec
-]]></programlisting>
-
-The two functions <literal>make_spec_from_mapping</literal> and
-<literal>make_spec_from_alist</literal> create <literal>spec</literal>
-values. Both functions are functionally equivalent and the only difference is
-that the first function prefers hashtables and the latter associative lists to
-describe mappings from names to exemplars.
-</para>
-
-<para>
-You can specify exemplars for the various kinds of nodes that need to be
-generated when an XML document is parsed:
-             
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para><literal>~super_root_exemplar</literal>: This exemplar
-is used to create the super root. This special node is only created if the
-corresponding configuration option has been selected; it is the parent node of
-the root node which may be convenient if every working node must have a parent.</para>
-             </listitem>
-             <listitem>
-               <para><literal>~comment_exemplar</literal>: This exemplar is
-used when a comment node must be created. Note that such nodes are only created
-if the corresponding configuration option is "on".
-</para>
-             </listitem>
-             <listitem>
-               <para><literal>~default_pinstr_exemplar</literal>: If a node
-for a processing instruction must be created, and the instruction is not listed
-in the table passed by <literal>~pinstr_mapping</literal> or
-<literal>~pinstr_alist</literal>, this exemplar is used.
-Again the configuration option must be "on" in order to create such nodes at
-all. 
-</para>
-             </listitem>
-             <listitem>
-               <para><literal>~pinstr_mapping</literal> or
-<literal>~pinstr_alist</literal>: Map the target names of processing
-instructions to exemplars. These mappings are only used when nodes for
-processing instructions are created.</para>
-             </listitem>
-             <listitem>
-               <para><literal>~data_exemplar</literal>: The exemplar for
-ordinary data nodes.</para>
-             </listitem>
-             <listitem>
-               <para><literal>~default_element_exemplar</literal>: This
-exemplar is used if an element node must be created, but the element type
-cannot be found in the tables <literal>element_mapping</literal> or
-<literal>element_alist</literal>.</para>
-             </listitem>
-             <listitem>
-               <para><literal>~element_mapping</literal> or
-<literal>~element_alist</literal>: Map the element types to exemplars. These
-mappings are used to create element nodes.</para>
-             </listitem>
-           </itemizedlist>
-
-In most cases, you only want to create <literal>spec</literal> values to pass
-them to the parser functions found in <literal>Pxp_yacc</literal>. However, it
-might be useful to apply <literal>spec</literal> values directly.
-</para>
-
-<para>The following functions create various types of nodes by selecting the
-corresponding exemplar from the passed <literal>spec</literal> value, and by
-calling <literal>create_element</literal> or <literal>create_data</literal> on
-the exemplar.
-
-<programlisting><![CDATA[
-val create_data_node : 
-      'ext spec -> 
-      dtd -> 
-      (* data material: *) string -> 
-          'ext node
-
-val create_element_node : 
-      ?position:(string * int * int) ->
-      'ext spec -> 
-      dtd -> 
-      (* element type: *) string -> 
-      (* attributes: *) (string * string) list -> 
-          'ext node
-
-val create_super_root_node :
-      ?position:(string * int * int) ->
-      'ext spec -> 
-       dtd -> 
-           'ext node
-
-val create_comment_node :
-      ?position:(string * int * int) ->
-      'ext spec -> 
-      dtd -> 
-      (* comment text: *) string -> 
-          'ext node
-
-val create_pinstr_node :
-      ?position:(string * int * int) ->
-      'ext spec -> 
-      dtd -> 
-      proc_instruction -> 
-          'ext node
-]]></programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Examples</title>
-
-         <formalpara>
-           <title>Building trees.</title>
-
-           <para>Here is the piece of code that creates the tree of
-the figure <link linkend="node-term" endterm="node-term"></link>. The extension
-object and the DTD are beyond the scope of this example.
-
-<programlisting>
-let exemplar_ext = ... (* some extension *) in
-let dtd = ... (* some DTD *) in
-
-let element_exemplar = new element_impl exemplar_ext in
-let data_exemplar    = new data_impl    exemplar_ext in
-
-let a1 = element_exemplar # create_element dtd (T_element "a") ["att", "apple"]
-and b1 = element_exemplar # create_element dtd (T_element "b") []
-and c1 = element_exemplar # create_element dtd (T_element "c") []
-and a2 = element_exemplar # create_element dtd (T_element "a") ["att", "orange"]
-in
-
-let cherries = data_exemplar # create_data dtd "Cherries" in
-let orange   = data_exemplar # create_data dtd "An orange" in
-
-a1 # add_node b1;
-a1 # add_node c1;
-b1 # add_node a2;
-b1 # add_node cherries;
-a2 # add_node orange;
-</programlisting>
-
-Alternatively, the last block of statements could also be written as:
-
-<programlisting>
-a1 # set_nodes [b1; c1];
-b1 # set_nodes [a2; cherries];
-a2 # set_nodes [orange];
-</programlisting>
-
-The root of the tree is <literal>a1</literal>, i.e. it is true that
-
-<programlisting>
-x # root == a1
-</programlisting>
-
-for every x from { <literal>a1</literal>, <literal>a2</literal>,
-<literal>b1</literal>, <literal>c1</literal>, <literal>cherries</literal>,
-<literal>orange</literal> }.
-</para>
-         </formalpara>
-         <para>
-Furthermore, the following properties hold:
-
-<programlisting>
-  a1 # attribute "att" = Value "apple"
-& a2 # attribute "att" = Value "orange"
-
-& cherries # data = "Cherries"
-&   orange # data = "An orange"
-&       a1 # data = "CherriesAn orange"
-
-&       a1 # node_type = T_element "a"
-&       a2 # node_type = T_element "a"
-&       b1 # node_type = T_element "b"
-&       c1 # node_type = T_element "c"
-& cherries # node_type = T_data
-&   orange # node_type = T_data
-
-&       a1 # sub_nodes = [ b1; c1 ]
-&       a2 # sub_nodes = [ orange ]
-&       b1 # sub_nodes = [ a2; cherries ]
-&       c1 # sub_nodes = []
-& cherries # sub_nodes = []
-&   orange # sub_nodes = []
-
-&       a2 # parent == a1
-&       b1 # parent == b1
-&       c1 # parent == a1
-& cherries # parent == b1
-&   orange # parent == a2
-</programlisting>
-</para>
-         <formalpara>
-           <title>Searching nodes.</title>
-
-           <para>The following function searches all nodes of a tree 
-for which a certain condition holds:
-
-<programlisting>
-let rec search p t =
-  if p t then
-    t :: search_list p (t # sub_nodes)
-  else
-    search_list p (t # sub_nodes)
-
-and search_list p l =
-  match l with
-    []      -&gt; []
-  | t :: l' -&gt; (search p t) @ (search_list p l')
-;;
-</programlisting>
-</para>
-         </formalpara>
-
-         <para>For example, if you want to search all elements of a certain
-type <literal>et</literal>, the function <literal>search</literal> can be
-applied as follows:
-
-<programlisting>
-let search_element_type et t =
-  search (fun x -&gt; x # node_type = T_element et) t
-;;
-</programlisting>
-</para>
-
-         <formalpara>
-           <title>Getting attribute values.</title>
-
-           <para>Suppose we have the declaration:
-
-<programlisting><![CDATA[
-<!ATTLIST e a CDATA #REQUIRED
-            b CDATA #IMPLIED
-            c CDATA "12345">]]>
-</programlisting>
-
-In this case, every element <literal>e</literal> must have an attribute 
-<literal>a</literal>, otherwise the parser would indicate an error. If
-the O'Caml variable <literal>n</literal> holds the node of the tree 
-corresponding to the element, you can get the value of the attribute
-<literal>a</literal> by
-
-<programlisting>
-let value_of_a = n # required_string_attribute "a"
-</programlisting>
-
-which is more or less an abbreviation for 
-
-<programlisting><![CDATA[
-let value_of_a = 
-  match n # attribute "a" with
-    Value s -> s
-  | _       -> assert false]]>
-</programlisting>
-
-- as the attribute is required, the <literal>attribute</literal> method always
-returns a <literal>Value</literal>.
-</para>
-         </formalpara>
-         
-         <para>In contrast to this, the attribute <literal>b</literal> can be
-omitted. In this case, the method <literal>required_string_attribute</literal>
-works only if the attribute is there, and the method will fail if the attribute
-is missing. To get the value, you can apply the method
-<literal>optional_string_attribute</literal>:
-
-<programlisting>
-let value_of_b = n # optional_string_attribute "b"
-</programlisting>
-
-Now, <literal>value_of_b</literal> is of type <literal>string option</literal>,
-and <literal>None</literal> represents the omitted attribute. Alternatively, 
-you could also use <literal>attribute</literal>:
-
-<programlisting><![CDATA[
-let value_of_b = 
-  match n # attribute "b" with
-    Value s       -> Some s
-  | Implied_value -> None
-  | _             -> assert false]]>
-</programlisting>
-</para>
-
-         <para>The attribute <literal>c</literal> behaves much like
-<literal>a</literal>, because it has always a value. If the attribute is
-omitted, the default, here "12345", will be returned instead. Because of this,
-you can again use <literal>required_string_attribute</literal> to get the
-value.
-</para>
-
-         <para>The type <literal>CDATA</literal> is the most general string
-type. The types <literal>NMTOKEN</literal>, <literal>ID</literal>,
-<literal>IDREF</literal>, <literal>ENTITY</literal>, and all enumerators and
-notations are special forms of string types that restrict the possible
-values. From O'Caml, they behave like <literal>CDATA</literal>, i.e. you can
-use the methods <literal>required_string_attribute</literal> and
-<literal>optional_string_attribute</literal>, too.
-</para>
-
-         <para>In contrast to this, the types <literal>NMTOKENS</literal>,
-<literal>IDREFS</literal>, and <literal>ENTITIES</literal> mean lists of
-strings. Suppose we have the declaration:
-
-<programlisting><![CDATA[
-<!ATTLIST f d NMTOKENS #REQUIRED
-            e NMTOKENS #IMPLIED>]]>
-</programlisting>
-
-The type <literal>NMTOKENS</literal> stands for lists of space-separated
-tokens; for example the value <literal>"1 abc 23ef"</literal> means the list
-<literal>["1"; "abc"; "23ef"]</literal>. (Again, <literal>IDREFS</literal>
-and <literal>ENTITIES</literal> have more restricted values.) To get the
-value of attribute <literal>d</literal>, one can use
-
-<programlisting>
-let value_of_d = n # required_list_attribute "d"
-</programlisting>
-
-or
-
-<programlisting><![CDATA[
-let value_of_d = 
-  match n # attribute "d" with
-    Valuelist l -> l
-  | _           -> assert false]]>
-</programlisting>
- 
-As <literal>d</literal> is required, the attribute cannot be omitted, and 
-the <literal>attribute</literal> method returns always a
-<literal>Valuelist</literal>. 
-</para>
-
-         <para>For optional attributes like <literal>e</literal>, apply
-
-<programlisting>
-let value_of_e = n # optional_list_attribute "e"
-</programlisting>
-
-or
-
-<programlisting><![CDATA[
-let value_of_e = 
-  match n # attribute "e" with
-    Valuelist l   -> l
-  | Implied_value -> []
-  | _             -> assert false]]>
-</programlisting>
-
-Here, the case that the attribute is missing counts like the empty list.
-</para>
-
-       </sect2>
-
-
-       <sect2>
-         <title>Iterators</title>
-
-         <para>There are also several iterators in Pxp_document; please see
-the mli file for details. You can find examples for them in the
-"simple_transformation" directory.
-
-<programlisting><![CDATA[
-val find : ?deeply:bool -> 
-           f:('ext node -> bool) -> 'ext node -> 'ext node
-
-val find_all : ?deeply:bool ->
-               f:('ext node -> bool) -> 'ext node -> 'ext node list
-
-val find_element : ?deeply:bool ->
-                   string -> 'ext node -> 'ext node
-
-val find_all_elements : ?deeply:bool ->
-                        string -> 'ext node -> 'ext node list
-
-exception Skip
-val map_tree :  pre:('exta node -> 'extb node) ->
-               ?post:('extb node -> 'extb node) ->
-               'exta node -> 
-                   'extb node
-
-
-val map_tree_sibl : 
-        pre: ('exta node option -> 'exta node -> 'exta node option -> 
-                  'extb node) ->
-       ?post:('extb node option -> 'extb node -> 'extb node option -> 
-                  'extb node) ->
-       'exta node -> 
-           'extb node
-
-val iter_tree : ?pre:('ext node -> unit) ->
-                ?post:('ext node -> unit) ->
-                'ext node -> 
-                    unit
-
-val iter_tree_sibl :
-       ?pre: ('ext node option -> 'ext node -> 'ext node option -> unit) ->
-       ?post:('ext node option -> 'ext node -> 'ext node option -> unit) ->
-       'ext node -> 
-           unit
-]]></programlisting>
-</para>
-       </sect2>
-
-      </sect1>
-
-<!-- ********************************************************************** -->
-
-      <sect1>
-       <title>The class type <literal>extension</literal></title>
-       <para>
-
-<programlisting>
-<![CDATA[
-class type [ 'node ] extension =
-  object ('self)
-    method clone : 'self
-      (* "clone" should return an exact deep copy of the object. *)
-    method node : 'node
-      (* "node" returns the corresponding node of this extension. This method
-       * intended to return exactly what previously has been set by "set_node".
-       *)
-    method set_node : 'node -> unit
-      (* "set_node" is invoked once the extension is associated to a new
-       * node object.
-       *)
-  end
-]]>
-</programlisting>
-
-This is the type of classes used for node extensions. For every node of the
-document tree, there is not only the <literal>node</literal> object, but also
-an <literal>extension</literal> object. The latter has minimal
-functionality; it has only the necessary methods to be attached to the node
-object containing the details of the node instance. The extension object is
-called extension because its purpose is extensibility.</para>
-
-       <para>For some reasons, it is impossible to derive the
-<literal>node</literal> classes (i.e. <literal>element_impl</literal> and
-<literal>data_impl</literal>) such that the subclasses can be extended by new
-new methods. But
-subclassing nodes is a great feature, because it allows the user to provide
-different classes for different types of nodes. The extension objects are a
-workaround that is as powerful as direct subclassing, the costs are
-some notation overhead.
-</para>
-
-<figure id="extension-general" float="1">
-<title>The structure of nodes and extensions</title>
-<graphic fileref="pic/extension_general" format="GIF">
-</graphic>
-</figure>
-
-       <para>The picture shows how the nodes and extensions are linked
-together. Every node has a reference to its extension, and every extension has
-a reference to its node. The methods <literal>extension</literal> and
-<literal>node</literal> follow these references; a typical phrase is 
-
-<programlisting>
-self # node # attribute "xy"
-</programlisting>
-
-to get the value of an attribute from a method defined in the extension object;
-or 
-
-<programlisting>
-self # node # iter
-  (fun n -&gt; n # extension # my_method ...)
-</programlisting>
-
-to iterate over the subnodes and to call <literal>my_method</literal> of the
-corresponding extension objects.
-</para>
-
-       <para>Note that extension objects do not have references to subnodes
-(or "subextensions") themselves; in order to get one of the children of an
-extension you must first go to the node object, then get the child node, and
-finally reach the extension that is logically the child of the extension you
-started with.</para>
-
-       <sect2>
-         <title>How to define an extension class</title>
-
-         <para>At minimum, you must define the methods
-<literal>clone</literal>, <literal>node</literal>, and
-<literal>set_node</literal> such that your class is compatible with the type
-<literal>extension</literal>. The method <literal>set_node</literal> is called
-during the initialization of the node, or after a node has been cloned; the
-node object invokes <literal>set_node</literal> on the extension object to tell
-it that this node is now the object the extension is linked to. The extension
-must return the node object passed as argument of <literal>set_node</literal>
-when the <literal>node</literal> method is called.</para>
-
-         <para>The <literal>clone</literal> method must return a copy of the
-extension object; at least the object itself must be duplicated, but if
-required, the copy should deeply duplicate all objects and values that are
-referred by the extension, too. Whether this is required, depends on the
-application; <literal>clone</literal> is invoked by the node object when one of
-its cloning methods is called.</para>
-
-         <para>A good starting point for an extension class:
-
-<programlisting>
-<![CDATA[class custom_extension =
-  object (self)
-
-    val mutable node = (None : custom_extension node option)
-
-    method clone = {< >} 
-
-    method node =
-      match node with
-          None ->
-            assert false
-        | Some n -> n
-
-    method set_node n =
-      node <- Some n
-
-  end
-]]>
-</programlisting>
-
-This class is compatible with <literal>extension</literal>. The purpose of
-defining such a class is, of course, adding further methods; and you can do it
-without restriction. 
-</para>
-
-         <para>Often, you want not only one extension class. In this case,
-it is the simplest way that all your classes (for one kind of document) have
-the same type (with respect to the interface; i.e. it does not matter if your
-classes differ in the defined private methods and instance variables, but
-public methods count). This approach avoids lots of coercions and problems with
-type incompatibilities. It is simple to implement:
-
-<programlisting>
-<![CDATA[class custom_extension =
-  object (self)
-    val mutable node = (None : custom_extension node option)
-
-    method clone = ...      (* see above *)
-    method node = ...       (* see above *)
-    method set_node n = ... (* see above *)
-
-    method virtual my_method1 : ...
-    method virtual my_method2 : ...
-    ... (* etc. *)
-  end
-
-class custom_extension_kind_A =
-  object (self)
-    inherit custom_extension
-
-    method my_method1 = ...
-    method my_method2 = ...
-  end
-
-class custom_extension_kind_B =
-  object (self)
-    inherit custom_extension
-
-    method my_method1 = ...
-    method my_method2 = ...
-  end
-]]>
-</programlisting>
-
-If a class does not need a method (e.g. because it does not make sense, or it
-would violate some important condition), it is possible to define the method
-and to always raise an exception when the method is invoked
-(e.g. <literal>assert false</literal>).
-</para>
-
-         <para>The latter is a strong recommendation: do not try to further
-specialize the types of extension objects. It is difficult, sometimes even
-impossible, and almost never worth-while.</para>
-       </sect2>
-
-       <sect2>
-         <title>How to bind extension classes to element types</title>
-
-         <para>Once you have defined your extension classes, you can bind them
-to element types. The simplest case is that you have only one class and that
-this class is to be always used. The parsing functions in the module
-<literal>Pxp_yacc</literal> take a <literal>spec</literal> argument which
-can be customized. If your single class has the name <literal>c</literal>,
-this argument should be 
-
-<programlisting>
-let spec =
-  make_spec_from_alist
-    ~data_exemplar:            (new data_impl c)
-    ~default_element_exemplar: (new element_impl c)
-    ~element_alist:            []
-    ()
-</programlisting>
-
-This means that data nodes will be created from the exemplar passed by
-~data_exemplar and that all element nodes will be made from the exemplar
-specified by ~default_element_exemplar. In ~element_alist, you can 
-pass that different exemplars are to be used for different element types; but
-this is an optional feature. If you do not need it, pass the empty list.
-</para>
-
-<para>
-Remember that an exemplar is a (node, extension) pair that serves as pattern
-when new nodes (and the corresponding extension objects) are added to the
-document tree. In this case, the exemplar contains <literal>c</literal> as
-extension, and when nodes are created, the exemplar is cloned, and cloning
-makes also a copy of <literal>c</literal> such that all nodes of the document
-tree will have a copy of <literal>c</literal> as extension.
-</para>
-
-         <para>The <literal>~element_alist</literal> argument can bind
-specific element types to specific exemplars; as exemplars may be instances of
-different classes it is effectively possible to bind element types to
-classes. For example, if the element type "p" is implemented by class "c_p",
-and "q" is realized by "c_q", you can pass the following value:
-
-<programlisting>
-let spec =
-  make_spec_from_alist
-    ~data_exemplar:            (new data_impl c)
-    ~default_element_exemplar: (new element_impl c)
-    ~element_alist:            
-      [ "p", new element_impl c_p;
-        "q", new element_impl c_q;
-      ]
-    ()
-</programlisting>
-
-The extension object <literal>c</literal> is still used for all data nodes and
-for all other element types.
-</para>
-
-       </sect2>
-
-      </sect1>
-
-<!-- ********************************************************************** -->
-
-      <sect1>
-       <title>Details of the mapping from XML text to the tree representation
-</title>
-
-       <sect2>
-         <title>The representation of character-free elements</title>
-
-         <para>If an element declaration does not allow the element to 
-contain character data, the following rules apply.</para>
-
-         <para>If the element must be empty, i.e. it is declared with the
-keyword <literal>EMPTY</literal>, the element instance must be effectively
-empty (it must not even contain whitespace characters). The parser guarantees
-that a declared <literal>EMPTY</literal> element does never contain a data
-node, even if the data node represents the empty string.</para>
-
-         <para>If the element declaration only permits other elements to occur
-within that element but not character data, it is still possible to insert
-whitespace characters between the subelements. The parser ignores these
-characters, too, and does not create data nodes for them.</para>
-
-         <formalpara>
-           <title>Example.</title>
-
-           <para>Consider the following element types:
-
-<programlisting><![CDATA[
-<!ELEMENT x ( #PCDATA | z )* >
-<!ELEMENT y ( z )* >
-<!ELEMENT z EMPTY>
-]]></programlisting>
-
-Only <literal>x</literal> may contain character data, the keyword
-<literal>#PCDATA</literal> indicates this. The other types are character-free. 
-</para>
-         </formalpara>
-
-         <para>The XML term
-
-<programlisting><![CDATA[
-<x><z/> <z/></x>
-]]></programlisting>
-
-will be internally represented by an element node for <literal>x</literal> 
-with three subnodes: the first <literal>z</literal> element, a data node
-containing the space character, and the second <literal>z</literal> element. 
-In contrast to this, the term
-
-<programlisting><![CDATA[
-<y><z/> <z/></y>
-]]></programlisting>
-
-is represented by an  element node for <literal>y</literal> with only
-<emphasis>two</emphasis> subnodes, the two <literal>z</literal> elements. There
-is no data node for the space character because spaces are ignored in the
-character-free element <literal>y</literal>.
-</para>
-
-       </sect2>
-
-       <sect2>
-         <title>The representation of character data</title>
-
-         <para>The XML specification allows all Unicode characters in XML
-texts. This parser can be configured such that UTF-8 is used to represent the
-characters internally; however, the default character encoding is
-ISO-8859-1. (Currently, no other encodings are possible for the internal string
-representation; the type <literal>Pxp_types.rep_encoding</literal> enumerates
-the possible encodings. Principially, the parser could use any encoding that is
-ASCII-compatible, but there are currently only lexical analyzers for UTF-8 and
-ISO-8859-1. It is currently impossible to use UTF-16 or UCS-4 as internal
-encodings (or other multibyte encodings which are not ASCII-compatible) unless
-major parts of the parser are rewritten - unlikely...)
-</para>
-
-<para>
-The internal encoding may be different from the external encoding (specified
-in the XML declaration <literal>&lt;?xml ... encoding="..."?&gt;</literal>); in
-this case the strings are automatically converted to the internal encoding.
-</para>
-
-<para>
-If the internal encoding is ISO-8859-1, it is possible that there are
-characters that cannot be represented. In this case, the parser ignores such
-characters and prints a warning (to the <literal>collect_warning</literal>
-object that must be passed when the parser is called).
-</para>
-
-         <para>The XML specification allows lines to be separated by single LF
-characters, by CR LF character sequences, or by single CR
-characters. Internally, these separators are always converted to single LF
-characters.</para>
-
-         <para>The parser guarantees that there are never two adjacent data
-nodes; if necessary, data material that would otherwise be represented by
-several nodes is collapsed into one node. Note that you can still create node
-trees with adjacent data nodes; however, the parser does not return such trees.
-</para>
-
-         <para>Note that CDATA sections are not represented specially; such
-sections are added to the current data material that being collected for the
-next data node.</para>
-       </sect2>
-
-
-       <sect2>
-         <title>The representation of entities within documents</title>
-
-         <para><emphasis>Entities are not represented within
-documents!</emphasis> If the parser finds an entity reference in the document
-content, the reference is immediately expanded, and the parser reads the
-expansion text instead of the reference.
-</para>
-       </sect2>
-
-       <sect2>
-         <title>The representation of attributes</title> <para>As attribute
-values are composed of Unicode characters, too, the same problems with the
-character encoding arise as for character material. Attribute values are
-converted to the internal encoding, too; and if there are characters that
-cannot be represented, these are dropped, and a warning is printed.</para>
-
-         <para>Attribute values are normalized before they are returned by
-methods like <literal>attribute</literal>. First, any remaining entity
-references are expanded; if necessary, expansion is performed recursively.
-Second, newline characters (any of LF, CR LF, or CR characters) are converted
-to single space characters. Note that especially the latter action is
-prescribed by the XML standard (but <literal>&#10;</literal> is not converted
-such that it is still possible to include line feeds into attributes).
-</para>
-       </sect2>
-
-       <sect2>
-         <title>The representation of processing instructions</title>
-<para>Processing instructions are parsed to some extent: The first word of the
-PI is called the target, and it is stored separated from the rest of the PI:
-
-<programlisting><![CDATA[
-<?target rest?>
-]]></programlisting>
-
-The exact location where a PI occurs is not represented (by default). The
-parser puts the PI into the object that represents the embracing construct (an
-element, a DTD, or the whole document); that means you can find out which PIs
-occur in a certain element, in the DTD, or in the whole document, but you
-cannot lookup the exact position within the construct.
-</para>
-
-         <para>If you require the exact location of PIs, it is possible to
-create extra nodes for them. This mode is controled by the option
-<literal>enable_pinstr_nodes</literal>. The additional nodes have the node type
-<literal>T_pinstr <replaceable>target</replaceable></literal>, and are created
-from special exemplars contained in the <literal>spec</literal> (see
-pxp_document.mli).</para>
-       </sect2>
-
-       <sect2>
-         <title>The representation of comments</title> 
-
-<para>Normally, comments are not represented; they are dropped by
-default. However, if you require them, it is possible to create
-<literal>T_comment</literal> nodes for them. This mode can be specified by the
-option <literal>enable_comment_nodes</literal>. Comment nodes are created from
-special exemplars contained in the <literal>spec</literal> (see
-pxp_document.mli). You can access the contents of comments through the 
-method <literal>comment</literal>.</para>
-       </sect2>
-
-       <sect2>
-         <title>The attributes <literal>xml:lang</literal> and
-<literal>xml:space</literal></title>
-
-         <para>These attributes are not supported specially; they are handled
-like any other attribute.</para>
-       </sect2>
-
-
-       <sect2>
-         <title>And what about namespaces?</title>
-         <para>Currently, there is no special support for namespaces.
-However, the parser allows it that the colon occurs in names such that it is
-possible to implement namespaces on top of the current API.</para>
-
-         <para>Some future release of PXP will support namespaces as built-in
-feature...</para>
-       </sect2>
-
-      </sect1>
-
-    </chapter>
-
-<!-- ********************************************************************** -->
-
-    <chapter>
-      <title>Configuring and calling the parser</title>
-
-<!--
-      <para>
-<emphasis>
-Sorry, this chapter has not yet been written. For an introduction into parser
-configuration, see the previous chapters. As a first approximation, the
-interface definition of Markup_yacc outlines what could go here.
-</emphasis>
-</para>
--->
-
-<!--
-      <para>
-<programlisting>&markup-yacc.mli;</programlisting>
-</para>
--->
-
-      <sect1>
-       <title>Overview</title>
-       <para>
-There are the following main functions invoking the parser (in Pxp_yacc):
-
-          <itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para><emphasis>parse_document_entity:</emphasis> You want to
-parse a complete and closed document consisting of a DTD and the document body;
-the body is validated against the DTD. This mode is interesting if you have a
-file
-
-<programlisting><![CDATA[
-<!DOCTYPE root ... [ ... ] > <root> ... </root>
-]]></programlisting>
-
-and you can accept any DTD that is included in the file (e.g. because the file
-is under your control).
-</para>
-           </listitem>
-           <listitem>
-             <para><emphasis>parse_wfdocument_entity:</emphasis> You want to
-parse a complete and closed document consisting of a DTD and the document body;
-but the body is not validated, only checked for well-formedness. This mode is
-preferred if validation costs too much time or if the DTD is missing.
-</para>
-           </listitem>
-           <listitem>
-             <para><emphasis>parse_dtd_entity:</emphasis> You want only to
-parse an entity (file) containing the external subset of a DTD. Sometimes it is
-interesting to read such a DTD, for example to compare it with the DTD included
-in a document, or to apply the next mode:
-</para>
-           </listitem>
-           <listitem>
-             <para><emphasis>parse_content_entity:</emphasis> You want only to
-parse an entity (file) containing a fragment of a document body; this fragment
-is validated against the DTD you pass to the function. Especially, the fragment
-must not have a <literal> &lt;!DOCTYPE&gt;</literal> clause, and must directly
-begin with an element.  The element is validated against the DTD.  This mode is
-interesting if you want to check documents against a fixed, immutable DTD.
-</para>
-           </listitem>
-           <listitem>
-             <para><emphasis>parse_wfcontent_entity:</emphasis> This function
-also parses a single element without DTD, but does not validate it.</para>
-           </listitem>
-           <listitem>
-             <para><emphasis>extract_dtd_from_document_entity:</emphasis> This
-function extracts the DTD from a closed document consisting of a DTD and a
-document body. Both the internal and the external subsets are extracted.</para>
-           </listitem>
-         </itemizedlist>
-</para>
-
-<para>
-In many cases, <literal>parse_document_entity</literal> is the preferred mode
-to parse a document in a validating way, and
-<literal>parse_wfdocument_entity</literal> is the mode of choice to parse a
-file while only checking for well-formedness.
-</para>
-
-<para>
-There are a number of variations of these modes. One important application of a
-parser is to check documents of an untrusted source against a fixed DTD. One
-solution is to not allow the <literal>&lt;!DOCTYPE&gt;</literal> clause in
-these documents, and treat the document like a fragment (using mode
-<emphasis>parse_content_entity</emphasis>). This is very simple, but
-inflexible; users of such a system cannot even define additional entities to
-abbreviate frequent phrases of their text.
-</para>
-
-<para>
-It may be necessary to have a more intelligent checker. For example, it is also
-possible to parse the document to check fully, i.e. with DTD, and to compare
-this DTD with the prescribed one. In order to fully parse the document, mode
-<emphasis>parse_document_entity</emphasis> is applied, and to get the DTD to
-compare with mode <emphasis>parse_dtd_entity</emphasis> can be used.
-</para>
-
-<para>
-There is another very important configurable aspect of the parser: the
-so-called resolver. The task of the resolver is to locate the contents of an
-(external) entity for a given entity name, and to make the contents accessible
-as a character stream. (Furthermore, it also normalizes the character set;
-but this is a detail we can ignore here.) Consider you have a file called
-<literal>"main.xml"</literal> containing 
-
-<programlisting><![CDATA[
-<!ENTITY % sub SYSTEM "sub/sub.xml">
-%sub;
-]]></programlisting>
-
-and a file stored in the subdirectory <literal>"sub"</literal> with name
-<literal>"sub.xml"</literal> containing
-
-<programlisting><![CDATA[
-<!ENTITY % subsub SYSTEM "subsub/subsub.xml">
-%subsub;
-]]></programlisting>
-
-and a file stored in the subdirectory <literal>"subsub"</literal> of
-<literal>"sub"</literal> with name <literal>"subsub.xml"</literal> (the
-contents of this file do not matter). Here, the resolver must track that
-the second entity <literal>subsub</literal> is located in the directory
-<literal>"sub/subsub"</literal>, i.e. the difficulty is to interpret the
-system (file) names of entities relative to the entities containing them,
-even if the entities are deeply nested.
-</para>
-
-<para>
-There is not a fixed resolver already doing everything right - resolving entity
-names is a task that highly depends on the environment. The XML specification
-only demands that <literal>SYSTEM</literal> entities are interpreted like URLs
-(which is not very precise, as there are lots of URL schemes in use), hoping
-that this helps overcoming the local peculiarities of the environment; the idea
-is that if you do not know your environment you can refer to other entities by
-denoting URLs for them. I think that this interpretation of
-<literal>SYSTEM</literal> names may have some applications in the internet, but
-it is not the first choice in general. Because of this, the resolver is a
-separate module of the parser that can be exchanged by another one if
-necessary; more precisely, the parser already defines several resolvers.
-</para>
-
-<para>
-The following resolvers do already exist:
-
-          <itemizedlist mark="bullet" spacing="compact">
-           <listitem>
-             <para>Resolvers reading from arbitrary input channels. These
-can be configured such that a certain ID is associated with the channel; in
-this case inner references to external entities can be resolved. There is also
-a special resolver that interprets SYSTEM IDs as URLs; this resolver can
-process relative SYSTEM names and determine the corresponding absolute URL.
-</para>
-           </listitem>
-           <listitem>
-             <para>A resolver that reads always from a given O'Caml
-string. This resolver is not able to resolve further names unless the string is
-not associated with any name, i.e. if the document contained in the string
-refers to an external entity, this reference cannot be followed in this
-case.</para>
-           </listitem>
-           <listitem>
-             <para>A resolver for file names. The <literal>SYSTEM</literal>
-name is interpreted as file URL with the slash "/" as separator for
-directories. - This resolver is derived from the generic URL resolver.</para>
-           </listitem>
-         </itemizedlist>
-
-The interface a resolver must have is documented, so it is possible to write
-your own resolver. For example, you could connect the parser with an HTTP
-client, and resolve URLs of the HTTP namespace. The resolver classes support
-that several independent resolvers are combined to one more powerful resolver;
-thus it is possible to combine a self-written resolver with the already
-existing resolvers.
-</para>
-
-<para>
-Note that the existing resolvers only interpret <literal>SYSTEM</literal>
-names, not <literal>PUBLIC</literal> names. If it helps you, it is possible to
-define resolvers for <literal>PUBLIC</literal> names, too; for example, such a
-resolver could look up the public name in a hash table, and map it to a system
-name which is passed over to the existing resolver for system names. It is
-relatively simple to provide such a resolver.
-</para>
-
-
-      </sect1>
-
-      <sect1>
-       <title>Resolvers and sources</title>
-       
-       <sect2>
-         <title>Using the built-in resolvers (called sources)</title>
-
-         <para>The type <literal>source</literal> enumerates the two
-possibilities where the document to parse comes from.
-
-<programlisting>
-type source =
-    Entity of ((dtd -&gt; Pxp_entity.entity) * Pxp_reader.resolver)
-  | ExtID of (ext_id * Pxp_reader.resolver)
-</programlisting>
-
-You normally need not to worry about this type as there are convenience
-functions that create <literal>source</literal> values:
-
-
-            <itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para><literal>from_file s</literal>: The document is read from
-file <literal>s</literal>; you may specify absolute or relative path names.
-The file name must be encoded as UTF-8 string.
-</para>
-
-<para>There is an optional argument <literal>~system_encoding</literal>
-specifying the character encoding which is used for the names of the file
-system. For example, if this encoding is ISO-8859-1 and <literal>s</literal> is
-also a ISO-8859-1 string, you can form the source:
-
-<programlisting><![CDATA[
-let s_utf8  =  recode_string ~in_enc:`Enc_iso88591 ~out_enc:`Enc_utf8 s in
-from_file ~system_encoding:`Enc_iso88591 s_utf8
-]]></programlisting>
-</para>
-
-<para>
-This <literal>source</literal> has the advantage that
-it is able to resolve inner external entities; i.e. if your document includes
-data from another file (using the <literal>SYSTEM</literal> attribute), this
-mode will find that file. However, this mode cannot resolve
-<literal>PUBLIC</literal> identifiers nor <literal>SYSTEM</literal> identifiers
-other than "file:".
-</para>
-             </listitem>
-             <listitem>
-               <para><literal>from_channel ch</literal>: The document is read
-from the channel <literal>ch</literal>. In general, this source also supports
-file URLs found in the document; however, by default only absolute URLs are
-understood. It is possible to associate an ID with the channel such that the
-resolver knows how to interpret relative URLs:
-
-<programlisting>
-from_channel ~id:(System "file:///dir/dir1/") ch
-</programlisting>
-
-There is also the ~system_encoding argument specifying how file names are
-encoded. - The example from above can also be written (but it is no
-longer possible to interpret relative URLs because there is no ~id argument,
-and computing this argument is relatively complicated because it must
-be a valid URL):
-
-<programlisting>
-let ch = open_in s in
-let src = from_channel ~system_encoding:`Enc_iso88591 ch in
-...;
-close_in ch
-</programlisting>
-</para>
-             </listitem>
-             <listitem>
-               <para><literal>from_string s</literal>: The string
-<literal>s</literal> is the document to parse. This mode is not able to
-interpret file names of <literal>SYSTEM</literal> clauses, nor it can look up
-<literal>PUBLIC</literal> identifiers. </para> 
-
-               <para>Normally, the encoding of the string is detected as usual
-by analyzing the XML declaration, if any. However, it is also possible to
-specify the encoding directly:
-
-<programlisting>
-let src = from_string ~fixenc:`ISO-8859-2 s
-</programlisting>
-</para>
-             </listitem>
-             <listitem>
-               <para><literal>ExtID (id, r)</literal>: The document to parse
-is denoted by the identifier <literal>id</literal> (either a
-<literal>SYSTEM</literal> or <literal>PUBLIC</literal> clause), and this
-identifier is interpreted by the resolver <literal>r</literal>. Use this mode
-if you have written your own resolver.</para>
-               <para>Which character sets are possible depends on the passed
-resolver <literal>r</literal>.</para>
-             </listitem>
-             <listitem>
-               <para><literal>Entity (get_entity, r)</literal>: The document
-to parse is returned by the function invocation <literal>get_entity
-dtd</literal>, where <literal>dtd</literal> is the DTD object to use (it may be
-empty). Inner external references occuring in this entity are resolved using
-the resolver <literal>r</literal>.</para>
-               <para>Which character sets are possible depends on the passed
-resolver <literal>r</literal>.</para>
-             </listitem>
-           </itemizedlist></para>
-       </sect2>
-
-
-       <sect2>
-         <title>The resolver API</title>
-
-         <para>A resolver is an object that can be opened like a file, but you
-do not pass the file name to the resolver, but the XML identifier of the entity
-to read from (either a <literal>SYSTEM</literal> or <literal>PUBLIC</literal>
-clause). When opened, the resolver must return the
-<literal>Lexing.lexbuf</literal> that reads the characters.  The resolver can
-be closed, and it can be cloned. Furthermore, it is possible to tell the
-resolver which character set it should assume. - The following from Pxp_reader:
-
-<programlisting><![CDATA[
-exception Not_competent
-exception Not_resolvable of exn
-
-class type resolver =
-  object
-    method init_rep_encoding : rep_encoding -> unit
-    method init_warner : collect_warnings -> unit
-    method rep_encoding : rep_encoding
-    method open_in : ext_id -> Lexing.lexbuf
-    method close_in : unit
-    method change_encoding : string -> unit
-    method clone : resolver
-    method close_all : unit
-  end
-]]></programlisting>
-
-The resolver object must work as follows:</para>
-
-<para>
-            <itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para>When the parser is called, it tells the resolver the
-warner object and the internal encoding by invoking
-<literal>init_warner</literal> and <literal>init_rep_encoding</literal>. The
-resolver should store these values. The method <literal>rep_encoding</literal>
-should return the internal encoding.
-</para>
-             </listitem>
-             <listitem>
-               <para>If the parser wants to read from the resolver, it invokes
-the method <literal>open_in</literal>. Either the resolver succeeds, in which
-case the <literal>Lexing.lexbuf</literal> reading from the file or stream must
-be returned, or opening fails. In the latter case the method implementation
-should raise an exception (see below).</para>
-             </listitem>
-             <listitem>
-               <para>If the parser finishes reading, it calls the
-<literal>close_in</literal> method.</para>
-             </listitem>
-             <listitem>
-               <para>If the parser finds a reference to another external
-entity in the input stream, it calls <literal>clone</literal> to get a second
-resolver which must be initially closed (not yet connected with an input
-stream).  The parser then invokes <literal>open_in</literal> and the other
-methods as described.</para>
-             </listitem>
-             <listitem>
-               <para>If you already know the character set of the input
-stream, you should recode it to the internal encoding, and define the method
-<literal>change_encoding</literal> as an empty method.</para>
-             </listitem>
-             <listitem>
-               <para>If you want to support multiple external character sets,
-the object must follow a much more complicated protocol. Directly after
-<literal>open_in</literal> has been called, the resolver must return a lexical
-buffer that only reads one byte at a time. This is only possible if you create
-the lexical buffer with <literal>Lexing.from_function</literal>; the function
-must then always return 1 if the EOF is not yet reached, and 0 if EOF is
-reached. If the parser has read the first line of the document, it will invoke
-<literal>change_encoding</literal> to tell the resolver which character set to
-assume. From this moment, the object can return more than one byte at once. The
-argument of <literal>change_encoding</literal> is either the parameter of the
-"encoding" attribute of the XML declaration, or the empty string if there is
-not any XML declaration or if the declaration does not contain an encoding
-attribute. </para>
-
-               <para>At the beginning the resolver must only return one
-character every time something is read from the lexical buffer. The reason for
-this is that you otherwise would not exactly know at which position in the
-input stream the character set changes.</para>
-
-               <para>If you want automatic recognition of the character set,
-it is up to the resolver object to implement this.</para>
-             </listitem>
-
-             <listitem><para>If an error occurs, the parser calls the method
-<literal>close_all</literal> for the top-level resolver; this method should
-close itself (if not already done) and all clones.</para>
-             </listitem>
-           </itemizedlist>
-</para>
-         <formalpara><title>Exceptions</title>
-           <para>
-It is possible to chain resolvers such that when the first resolver is not able
-to open the entity, the other resolvers of the chain are tried in turn. The
-method <literal>open_in</literal> should raise the exception
-<literal>Not_competent</literal> to indicate that the next resolver should try
-to open the entity. If the resolver is able to handle the ID, but some other
-error occurs, the exception <literal>Not_resolvable</literal> should be raised
-to force that the chain breaks.
-         </para>
-         </formalpara>
-
-       <para>Example: How to define a resolver that is equivalent to
-from_string: ...</para>
-
-       </sect2>
-       
-       <sect2>
-         <title>Predefined resolver components</title>
-         <para>
-There are some classes in Pxp_reader that define common resolver behaviour.
-
-<programlisting><![CDATA[
-class resolve_read_this_channel : 
-    ?id:ext_id -> 
-    ?fixenc:encoding -> 
-    ?auto_close:bool -> 
-    in_channel -> 
-        resolver
-]]></programlisting>
-
-Reads from the passed channel (it may be even a pipe). If the
-<literal>~id</literal> argument is passed to the object, the created resolver
-accepts only this ID. Otherwise all IDs are accepted.  - Once the resolver has
-been cloned, it does not accept any ID. This means that this resolver cannot
-handle inner references to external entities. Note that you can combine this
-resolver with another resolver that can handle inner references (such as
-resolve_as_file); see class 'combine' below.  - If you pass the
-<literal>~fixenc</literal> argument, the encoding of the channel is set to the
-passed value, regardless of any auto-recognition or any XML declaration. - If
-<literal>~auto_close = true</literal> (which is the default), the channel is
-closed after use. If <literal>~auto_close = false</literal>, the channel is
-left open.
- </para>
-
-         <para>
-<programlisting><![CDATA[
-class resolve_read_any_channel : 
-    ?auto_close:bool -> 
-    channel_of_id:(ext_id -> (in_channel * encoding option)) -> 
-        resolver
-]]></programlisting>
-
-This resolver calls the function <literal>~channel_of_id</literal> to open a
-new channel for the passed <literal>ext_id</literal>. This function must either
-return the channel and the encoding, or it must fail with Not_competent.  The
-function must return <literal>None</literal> as encoding if the default
-mechanism to recognize the encoding should be used. It must return
-<literal>Some e</literal> if it is already known that the encoding of the
-channel is <literal>e</literal>.  If <literal>~auto_close = true</literal>
-(which is the default), the channel is closed after use. If
-<literal>~auto_close = false</literal>, the channel is left open.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-class resolve_read_url_channel :
-    ?base_url:Neturl.url ->
-    ?auto_close:bool -> 
-    url_of_id:(ext_id -> Neturl.url) -> 
-    channel_of_url:(Neturl.url -> (in_channel * encoding option)) -> 
-        resolver
-]]></programlisting>
-
-When this resolver gets an ID to read from, it calls the function
-<literal>~url_of_id</literal> to get the corresponding URL. This URL may be a
-relative URL; however, a URL scheme must be used which contains a path.  The
-resolver converts the URL to an absolute URL if necessary.  The second
-function, <literal>~channel_of_url</literal>, is fed with the absolute URL as
-input. This function opens the resource to read from, and returns the channel
-and the encoding of the resource.
-</para>
-<para>
-Both functions, <literal>~url_of_id</literal> and
-<literal>~channel_of_url</literal>, can raise Not_competent to indicate that
-the object is not able to read from the specified resource. However, there is a
-difference: A Not_competent from <literal>~url_of_id</literal> is left as it
-is, but a Not_competent from <literal>~channel_of_url</literal> is converted to
-Not_resolvable. So only <literal>~url_of_id</literal> decides which URLs are
-accepted by the resolver and which not.
-</para>
-<para>
-The function <literal>~channel_of_url</literal> must return
-<literal>None</literal> as encoding if the default mechanism to recognize the
-encoding should be used. It must return <literal>Some e</literal> if it is
-already known that the encoding of the channel is <literal>e</literal>.
-</para>
-<para>
-If <literal>~auto_close = true</literal> (which is the default), the channel is
-closed after use. If <literal>~auto_close = false</literal>, the channel is
-left open.
-</para>
-<para>
-Objects of this class contain a base URL relative to which relative URLs are
-interpreted. When creating a new object, you can specify the base URL by
-passing it as <literal>~base_url</literal> argument. When an existing object is
-cloned, the base URL of the clone is the URL of the original object. - Note
-that the term "base URL" has a strict definition in RFC 1808.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-class resolve_read_this_string : 
-    ?id:ext_id -> 
-    ?fixenc:encoding -> 
-    string -> 
-        resolver
-]]></programlisting>
-
-Reads from the passed string. If the <literal>~id</literal> argument is passed
-to the object, the created resolver accepts only this ID. Otherwise all IDs are
-accepted. - Once the resolver has been cloned, it does not accept any ID. This
-means that this resolver cannot handle inner references to external
-entities. Note that you can combine this resolver with another resolver that
-can handle inner references (such as resolve_as_file); see class 'combine'
-below. - If you pass the <literal>~fixenc</literal> argument, the encoding of
-the string is set to the passed value, regardless of any auto-recognition or
-any XML declaration.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-class resolve_read_any_string : 
-    string_of_id:(ext_id -> (string * encoding option)) -> 
-        resolver
-]]></programlisting>
-
-This resolver calls the function <literal>~string_of_id</literal> to get the
-string for the passed <literal>ext_id</literal>. This function must either
-return the string and the encoding, or it must fail with Not_competent.  The
-function must return <literal>None</literal> as encoding if the default
-mechanism to recognize the encoding should be used. It must return
-<literal>Some e</literal> if it is already known that the encoding of the
-string is <literal>e</literal>.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-class resolve_as_file :
-    ?file_prefix:[ `Not_recognized | `Allowed | `Required ] ->
-    ?host_prefix:[ `Not_recognized | `Allowed | `Required ] ->
-    ?system_encoding:encoding ->
-    ?url_of_id:(ext_id -> Neturl.url) -> 
-    ?channel_of_url: (Neturl.url -> (in_channel * encoding option)) ->
-    unit -> 
-        resolver
-]]></programlisting>
-Reads from the local file system. Every file name is interpreted as
-file name of the local file system, and the referred file is read.
-</para>
-<para>
-The full form of a file URL is: file://host/path, where
-'host' specifies the host system where the file identified 'path'
-resides. host = "" or host = "localhost" are accepted; other values
-will raise Not_competent. The standard for file URLs is 
-defined in RFC 1738.
-</para>
-<para>
-Option <literal>~file_prefix</literal>: Specifies how the "file:" prefix of
-file names is handled:
-            <itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para><literal>`Not_recognized:</literal>The prefix is not
-recognized.</para>
-             </listitem>
-             <listitem>
-               <para><literal>`Allowed:</literal> The prefix is allowed but
-not required (the default).</para>
-             </listitem>
-             <listitem>
-               <para><literal>`Required:</literal> The prefix is
-required.</para>
-             </listitem>
-           </itemizedlist>
-</para>
-<para>
-Option <literal>~host_prefix:</literal> Specifies how the "//host" phrase of
-file names is handled:
-            <itemizedlist mark="bullet" spacing="compact">
-             <listitem>
-               <para><literal>`Not_recognized:</literal>The prefix is not
-recognized.</para>
-             </listitem>
-             <listitem>
-               <para><literal>`Allowed:</literal> The prefix is allowed but
-not required (the default).</para>
-             </listitem>
-             <listitem>
-               <para><literal>`Required:</literal> The prefix is
-required.</para>
-             </listitem>
-           </itemizedlist>
-</para>
-<para>
-Option <literal>~system_encoding:</literal> Specifies the encoding of file
-names of the local file system. Default: UTF-8.
-</para>
-<para>
-Options <literal>~url_of_id</literal>, <literal>~channel_of_url</literal>: Not
-for the casual user!
-</para>
-
-         <para>
-<programlisting><![CDATA[
-class combine : 
-    ?prefer:resolver -> 
-    resolver list -> 
-        resolver
-]]></programlisting>
-
-Combines several resolver objects. If a concrete entity with an
-<literal>ext_id</literal> is to be opened, the combined resolver tries the
-contained resolvers in turn until a resolver accepts opening the entity
-(i.e. it does not raise Not_competent on open_in).
-</para>
-<para>
-Clones: If the 'clone' method is invoked before 'open_in', all contained
-resolvers are cloned separately and again combined. If the 'clone' method is 
-invoked after 'open_in' (i.e. while the resolver is open), additionally the
-clone of the active resolver is flagged as being preferred, i.e. it is tried
-first. 
-</para>
-
-       </sect2>
-      </sect1>
-
-      <sect1>
-       <title>The DTD classes</title> <para><emphasis>Sorry, not yet
-written. Perhaps the interface definition of Pxp_dtd expresses the same:
-</emphasis></para>
-       <para>
-<programlisting>&markup-dtd1.mli;&markup-dtd2.mli;</programlisting>
-</para>
-      </sect1>
-
-      <sect1>
-       <title>Invoking the parser</title>
-
-       <para>Here a description of Pxp_yacc.</para>
-
-       <sect2>
-         <title>Defaults</title>
-         <para>The following defaults are available:
-
-<programlisting>
-val default_config : config
-val default_extension : ('a node extension) as 'a
-val default_spec : ('a node extension as 'a) spec
-</programlisting>
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Parsing functions</title>
-         <para>In the following, the term "closed document" refers to
-an XML structure like
-
-<programlisting>
-&lt;!DOCTYPE ... [ <replaceable>declarations</replaceable> ] &gt;
-&lt;<replaceable>root</replaceable>&gt;
-...
-&lt;/<replaceable>root</replaceable>&gt;
-</programlisting>
-
-The term "fragment" refers to an XML structure like
-
-<programlisting>
-&lt;<replaceable>root</replaceable>&gt;
-...
-&lt;/<replaceable>root</replaceable>&gt;
-</programlisting>
-
-i.e. only to one isolated element instance.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val parse_dtd_entity : config -> source -> dtd
-]]></programlisting>
-
-Parses the declarations which are contained in the entity, and returns them as
-<literal>dtd</literal> object.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val extract_dtd_from_document_entity : config -> source -> dtd
-]]></programlisting>
-
-Extracts the DTD from a closed document. Both the internal and the external
-subsets are extracted and combined to one <literal>dtd</literal> object. This
-function does not parse the whole document, but only the parts that are
-necessary to extract the DTD.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val parse_document_entity : 
-    ?transform_dtd:(dtd -> dtd) ->
-    ?id_index:('ext index) ->
-    config -> 
-    source -> 
-    'ext spec -> 
-        'ext document
-]]></programlisting>
-
-Parses a closed document and validates it against the DTD that is contained in
-the document (internal and external subsets). The option
-<literal>~transform_dtd</literal> can be used to transform the DTD in the
-document, and to use the transformed DTD for validation. If
-<literal>~id_index</literal> is specified, an index of all ID attributes is
-created.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val parse_wfdocument_entity : 
-    config -> 
-    source -> 
-    'ext spec -> 
-        'ext document
-]]></programlisting>
-
-Parses a closed document, but checks it only on well-formedness.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val parse_content_entity  : 
-    ?id_index:('ext index) ->
-    config ->  
-    source -> 
-    dtd -> 
-    'ext spec -> 
-        'ext node
-]]></programlisting>
-
-Parses a fragment, and validates the element.
-</para>
-
-         <para>
-<programlisting><![CDATA[
-val parse_wfcontent_entity : 
-    config -> 
-    source -> 
-    'ext spec -> 
-        'ext node
-]]></programlisting>
-
-Parses a fragment, but checks it only on well-formedness.
-</para>
-       </sect2>
-
-       <sect2>
-         <title>Configuration options</title>
-         <para>
-
-<programlisting><![CDATA[
-type config =
-    { warner : collect_warnings;
-      errors_with_line_numbers : bool;
-      enable_pinstr_nodes : bool;
-      enable_super_root_node : bool;
-      enable_comment_nodes : bool;
-      encoding : rep_encoding;
-      recognize_standalone_declaration : bool;
-      store_element_positions : bool;
-      idref_pass : bool;
-      validate_by_dfa : bool;
-      accept_only_deterministic_models : bool;
-      ...
-    }
-]]></programlisting>
-
-<itemizedlist mark="bullet" spacing="compact">
-             <listitem><para><literal>warner:</literal>The parser prints
-warnings by invoking the method <literal>warn</literal> for this warner
-object. (Default: all warnings are dropped)</para>
-             </listitem>
-             <listitem><para><literal>errors_with_line_numbers:</literal>If
-true, errors contain line numbers; if false, errors contain only byte
-positions. The latter mode is faster. (Default: true)</para>
-             </listitem>
-             <listitem><para><literal>enable_pinstr_nodes:</literal>If true,
-the parser creates extra nodes for processing instructions. If false,
-processing instructions are simply added to the element or document surrounding
-the instructions. (Default: false)</para>
-             </listitem>
-             <listitem><para><literal>enable_super_root_node:</literal>If
-true, the parser creates an extra node which is the parent of the root of the
-document tree. This node is called super root; it is an element with type
-<literal>T_super_root</literal>. - If there are processing instructions outside
-the root element and outside the DTD, they are added to the super root instead
-of the document. - If false, the super root node is not created. (Default:
-false)</para>
-             </listitem>
-             <listitem><para><literal>enable_comment_nodes:</literal>If true,
-the parser creates nodes for comments with type <literal>T_comment</literal>;
-if false, such nodes are not created. (Default: false)</para>
-             </listitem>
-             <listitem><para><literal>encoding:</literal>Specifies the
-internal encoding of the parser. Most strings are then represented according to
-this encoding; however there are some exceptions (especially
-<literal>ext_id</literal> values which are always UTF-8 encoded).
-(Default: `Enc_iso88591)</para>
-             </listitem>
-             <listitem><para><literal>
-recognize_standalone_declaration:</literal> If true and if the parser is
-validating, the <literal>standalone="yes"</literal> declaration forces that it
-is checked whether the document is a standalone document. - If false, or if the
-parser is in well-formedness mode, such declarations are ignored.
-(Default: true)
-</para>
-             </listitem>
-             <listitem><para><literal>store_element_positions:</literal> If
-true, for every non-data node the source position is stored. If false, the
-position information is lost. If available, you can get the positions of nodes
-by invoking the <literal>position</literal> method.
-(Default: true)</para>
-             </listitem>
-             <listitem><para><literal>idref_pass:</literal>If true and if
-there is an ID index, the parser checks whether every IDREF or IDREFS attribute
-refer to an existing node; this requires that the parser traverses the whole
-doument tree. If false, this check is left out. (Default: false)</para>
-             </listitem>
-             <listitem><para><literal>validate_by_dfa:</literal>If true and if
-the content model for an element type is deterministic, a deterministic finite
-automaton is used to validate whether the element contents match the content
-model of the type. If false, or if a DFA is not available, a backtracking
-algorithm is used for validation. (Default: true)
-</para>
-             </listitem>
-             <listitem><para><literal>
-accept_only_deterministic_models:</literal> If true, only deterministic content
-models are accepted; if false, any syntactically correct content models can be
-processed. (Default: true)</para>
-             </listitem>
-           </itemizedlist></para>
-       </sect2>
-
-       <sect2>
-         <title>Which configuration should I use?</title>
-         <para>First, I recommend to vary the default configuration instead of
-creating a new configuration record. For instance, to set
-<literal>idref_pass</literal> to <literal>true</literal>, change the default
-as in:
-<programlisting>
-let config = { default_config with idref_pass = true }
-</programlisting>
-The background is that I can add more options to the record in future versions
-of the parser without breaking your programs.</para>
-
-         <formalpara>
-           <title>Do I need extra nodes for processing instructions?</title>
-<para>By default, such nodes are not created. This does not mean that the
-processing instructions are lost; however, you cannot find out the exact
-location where they occur. For example, the following XML text
-
-<programlisting><![CDATA[
-<x><?pi1?><y/><?pi2?></x> 
-]]></programlisting> 
-
-will normally create one element node for <literal>x</literal> containing
-<emphasis>one</emphasis> subnode for <literal>y</literal>. The processing
-instructions are attached to <literal>x</literal> in a separate hash table; you
-can access them using <literal>x # pinstr "pi1"</literal> and <literal>x #
-pinstr "pi2"</literal>, respectively. The information is lost where the
-instructions occur within <literal>x</literal>.
-</para>
-         </formalpara>
-
-           <para>If the option <literal>enable_pinstr_nodes</literal> is
-turned on, the parser creates extra nodes <literal>pi1</literal> and
-<literal>pi2</literal> such that the subnodes of <literal>x</literal> are now: 
-
-<programlisting><![CDATA[
-x # sub_nodes = [ pi1; y; pi2 ]
-]]></programlisting>
-
-The extra nodes contain the processing instructions in the usual way, i.e. you
-can access them using <literal>pi1 # pinstr "pi1"</literal> and <literal>pi2 #
-pinstr "pi2"</literal>, respectively.
-</para>
-
-         <para>Note that you will need an exemplar for the PI nodes (see
-<literal>make_spec_from_alist</literal>).</para> 
-
-         <formalpara>
-           <title>Do I need a super root node?</title>
-           <para>By default, there is no super root node. The
-<literal>document</literal> object refers directly to the node representing the
-root element of the document, i.e.
-
-<programlisting><![CDATA[
-doc # root = r
-]]></programlisting>
-
-if <literal>r</literal> is the root node. This is sometimes inconvenient: (1)
-Some algorithms become simpler if every node has a parent, even the root
-node. (2) Some standards such as XPath call the "root node" the node whose
-child represents the root of the document. (3) The super root node can serve
-as a container for processing instructions outside the root element. Because of
-these reasons, it is possible to create an extra super root node, whose child
-is the root node:
-
-<programlisting><![CDATA[
-doc # root = sr         &&
-sr # sub_nodes = [ r ]
-]]></programlisting>
-
-When extra nodes are also created for processing instructions, these nodes can
-be added to the super root node if they occur outside the root element (reason
-(3)), and the order reflects the order in the source text.</para>
-         </formalpara>
-
-         <para>Note that you will need an exemplar for the super root node
-(see <literal>make_spec_from_alist</literal>).</para>
-
-         <formalpara>
-           <title>What is the effect of the UTF-8 encoding?</title>
-           <para>By default, the parser represents strings (with few
-exceptions) as ISO-8859-1 strings. These are well-known, and there are tools
-and fonts for this encoding.</para>
-         </formalpara>
-         <para>However, internationalization may require that you switch over
-to UTF-8 encoding. In most environments, the immediate effect will be that you
-cannot read strings with character codes >= 160 any longer; your terminal will
-only show funny glyph combinations. It is strongly recommended to install
-Unicode fonts (<ulink URL="http://czyborra.com/unifont/">GNU Unifont</ulink>, 
-<ulink URL="http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz">
-Markus Kuhn's fonts</ulink>) and <ulink
-URL="http://myweb.clark.net/pub/dickey/xterm/xterm.html">terminal emulators
-that can handle UTF-8 byte sequences</ulink>. Furthermore, a Unicode editor may
-be helpful (such as <ulink
-URL="ftp://metalab.unc.edu/pub/Linux/apps/editors/X/">Yudit</ulink>). There are
-also <ulink URL="http://www.cl.cam.ac.uk/~mgk25/unicode.html">FAQ</ulink> by
-Markus Kuhn.
-</para>
-         <para>By setting <literal>encoding</literal> to
-<literal>`Enc_utf8</literal> all strings originating from the parsed XML
-document are represented as UTF-8 strings. This includes not only character
-data and attribute values but also element names, attribute names and so on, as
-it is possible to use any Unicode letter to form such names.  Strictly
-speaking, PXP is only XML-compliant if the UTF-8 mode is used; otherwise it
-will have difficulties when validating documents containing
-non-ISO-8859-1-names.
-</para>
-
-         <para>This mode does not have any impact on the external
-representation of documents. The character set assumed when reading a document
-is set in the XML declaration, and character set when writing a document must
-be passed to the <literal>write</literal> method.
-</para>
-
-         <formalpara>
-           <title>How do I check that nodes exist which are referred by IDREF attributes?</title>
-           <para>First, you must create an index of all occurring ID
-attributes:
-
-<programlisting><![CDATA[
-let index = new hash_index
-]]></programlisting>
-
-This index must be passed to the parsing function:
-
-<programlisting><![CDATA[
-parse_document_entity
-  ~id_index:(index :> index)
-  config source spec
-]]></programlisting>
-
-Next, you must turn on the <literal>idref_pass</literal> mode:
-
-<programlisting><![CDATA[
-let config = { default_config with idref_pass = true }
-]]></programlisting>
-
-Note that now the whole document tree will be traversed, and every node will be
-checked for IDREF and IDREFS attributes. If the tree is big, this may take some
-time.
-</para>
-         </formalpara>
-
-         <formalpara>
-           <title>What are deterministic content models?</title>
-           <para>These type of models can speed up the validation checks;
-furthermore they ensure SGML-compatibility. In particular, a content model is
-deterministic if the parser can determine the actually used alternative by
-inspecting only the current token. For example, this element has
-non-deterministic contents:
-
-<programlisting><![CDATA[
-<!ELEMENT x ((u,v) | (u,y+) | v)>
-]]></programlisting>
-
-If the first element in <literal>x</literal> is <literal>u</literal>, the
-parser does not know which of the alternatives <literal>(u,v)</literal> or
-<literal>(u,y+)</literal> will work; the parser must also inspect the second
-element to be able to distinguish between the alternatives. Because such
-look-ahead (or "guessing") is required, this example is
-non-deterministic.</para>
-         </formalpara>
-
-         <para>The XML standard demands that content models must be
-deterministic. So it is recommended to turn the option
-<literal>accept_only_deterministic_models</literal> on; however, PXP can also
-process non-deterministic models using a backtracking algorithm.</para>
-
-         <para>Deterministic models ensure that validation can be performed in
-linear time. In order to get the maximum benefits, PXP also implements a
-special validator that profits from deterministic models; this is the
-deterministic finite automaton (DFA). This validator is enabled per element
-type if the element type has a deterministic model and if the option
-<literal>validate_by_dfa</literal> is turned on.</para>
-
-         <para>In general, I expect that the DFA method is faster than the
-backtracking method; especially in the worst case the DFA takes only linear
-time. However, if the content model has only few alternatives and the
-alternatives do not nest, the backtracking algorithm may be better.</para>
-
-       </sect2>
-
-
-      </sect1>
-
-
-      <sect1>
-       <title>Updates</title> 
-
-       <para><emphasis>Some (often later added) features that are otherwise
-not explained in the manual but worth to be mentioned.</emphasis></para>
-
-       <itemizedlist mark="bullet" spacing="compact">
-         <listitem><para>Methods node_position, node_path, nth_node,
-previous_node, next_node for nodes: See pxp_document.mli</para>
-         </listitem>
-         <listitem><para>Functions to determine the document order of nodes:
-compare, create_ord_index, ord_number, ord_compare: See pxp_document.mli</para>
-         </listitem>
-       </itemizedlist>
-      </sect1>
-
-    </chapter>
- 
-  </part>
-</book>
-