--- /dev/null
+******************************************************************************
+README - PXP, the XML parser for O'Caml
+******************************************************************************
+
+
+==============================================================================
+Abstract
+==============================================================================
+
+PXP is a validating parser for XML-1.0 which has been written entirely in
+Objective Caml.
+
+PXP is the new name of the parser formerly known as "Markup". PXP means
+"Polymorphic XML parser" and emphasizes its most useful property: that the API
+is polymorphic and can be configured such that different objects are used to
+store different types of elements.
+
+==============================================================================
+Download
+==============================================================================
+
+You can download PXP as gzip'ed tarball [1]. The parser needs the Netstring [2]
+package (0.9.3). Note that PXP requires O'Caml 3.00.
+
+==============================================================================
+User's Manual
+==============================================================================
+
+The manual is included in the distribution both as Postscript document and
+bunch of HTML files. An online version can be found here [3].
+
+==============================================================================
+Author, Credits, Copying
+==============================================================================
+
+PXP has been written by Gerd Stolpmann [4]; it contains contributions by
+Claudio Sacerdoti Coen. You may copy it as you like, you may use it even for
+commercial purposes as long as the license conditions are respected, see the
+file LICENSE coming with the distribution. It allows almost everything.
+
+Thanks also to Alain Frisch and Haruo Hosoya for discussions and bug reports.
+
+==============================================================================
+Description
+==============================================================================
+
+PXP is a validating XML parser for O'Caml [5]. It strictly complies to the
+XML-1.0 [6] standard.
+
+The parser is simple to call, usually only one statement (function call) is
+sufficient to parse an XML document and to represent it as object tree.
+
+Once the document is parsed, it can be accessed using a class interface. The
+interface allows arbitrary access including transformations. One of the
+features of the document representation is its polymorphic nature; it is simple
+to add custom methods to the document classes. Furthermore, the parser can be
+configured such that different XML elements are represented by objects created
+from different classes. This is a very powerful feature, because it simplifies
+the structure of programs processing XML documents.
+
+Note that the class interface does not comply to the DOM standard. It was not a
+development goal to realize a standard API (industrial developers can this much
+better than I); however, the API is powerful enough to be considered as
+equivalent with DOM. More important, the interface is compatible with the XML
+information model required by many XML-related standards.
+
+------------------------------------------------------------------------------
+Detailed feature list
+------------------------------------------------------------------------------
+
+- The XML instance is validated against the DTD; any violation of a validation
+ constraint leads to the rejection of the instance. The validator has been
+ carefully implemented, and conforms strictly to the standard. If needed, it
+ is also possible to run the parser in a well-formedness mode.
+
+- If possible, the validator applies a deterministic finite automaton to
+ validate the content models. This ensures that validation can always be
+ performed in linear time. However, in the case that the content models are
+ not deterministic, the parser uses a backtracking algorithm which can be
+ much slower. - It is also possible to reject non-deterministic content
+ models.
+
+- In particular, the validator also checks the complicated rules whether
+ parentheses are properly nested with respect to entities, and whether the
+ standalone declaration is satisfied. On demand, it is checked whether the
+ IDREF attributes only refer to existing nodes.
+
+- Entity references are automatically resolved while the XML text is being
+ scanned. It is not possible to recognize in the object tree where a
+ referenced entity begins or ends; the object tree only represents the
+ logical structure.
+
+- External entities are loaded using a configurable resolver infrastructure.
+ It is possible to connect the parser with an arbitrary XML source.
+
+- The parser can read XML text encoded in a variety of character sets.
+ Independent of this, it is possible to choose the encoding of the internal
+ representation of the tree nodes; the parser automatically converts the
+ input text to this encoding. Currently, the parser supports UTF-8 and
+ ISO-8859-1 as internal encodings.
+
+- The interface of the parser has been designed such that it is best
+ integrated into the language O'Caml. The first goal was simplicity of usage
+ which is achieved by many convenience methods and functions, and by allowing
+ the user to select which parts of the XML text are actually represented in
+ the tree. For example, it is possible to store processing instructions as
+ tree nodes, but the parser can also be configured such that these
+ instructions are put into hashtables. The information model is compatible
+ with the requirements of XML-related standards such as XPath.
+
+- In particular, the node tree can optionally contain or leave out processing
+ instructions and comments. It is also possible to generate a "super root"
+ object which is the parent of the root element. The attributes of elements
+ are normally not stored as nodes, but it is possible to get them wrapped
+ into nodes.
+
+- There is also an interface for DTDs; you can parse and access sequences of
+ declarations. The declarations are fully represented as recursive O'Caml
+ values.
+
+------------------------------------------------------------------------------
+Code examples
+------------------------------------------------------------------------------
+
+This distribution contains several examples:
+
+- validate: simply parses a document and prints all error messages
+
+- readme: Defines a DTD for simple "README"-like documents, and offers
+ conversion to HTML and text files [7].
+
+- xmlforms: This is already a sophisticated application that uses XML as style
+ sheet language and data storage format. It shows how a Tk user interface can
+ be configured by an XML style, and how data records can be stored using XML.
+
+------------------------------------------------------------------------------
+Restrictions and missing features
+------------------------------------------------------------------------------
+
+The following restrictions apply that are not violations of the standard:
+
+- The attributes "xml:space", and "xml:lang" are not supported specially. (The
+ application can do this.)
+
+- The built-in support for SYSTEM and PUBLIC identifiers is limited to local
+ file access. There is no support for catalogs. The parser offers a hook to
+ add missing features.
+
+- It is currently not possible to check for interoperatibility with SGML.
+
+The following features are also missing:
+
+- There is no special support for namespaces. (Perhaps in the next release?)
+
+- There is no support for XPATH or XSLT.
+
+However, I hope that these features will be implemented soon, either by myself
+or by contributors (who are invited to do so).
+
+------------------------------------------------------------------------------
+Recent Changes
+------------------------------------------------------------------------------
+
+- Changed in 1.0:
+ Support for document order.
+
+- Changed in 0.99.8:
+ Several fixes of bugs reported by Haruo Hosoya and Alain Frisch.
+ The class type "node" has been extended: you can go directly to the next and
+ previous nodes in the list; you can refer to nodes by position.
+ There are now some iterators for nodes: find, find_all, find_element,
+ find_all_elements, map_tree, iter_tree.
+ Experimental support for viewing attributes as nodes; I hope that helps
+ Alain writing his XPath evaluator.
+ The user's manual has been revised and is almost up to date.
+
+- Changed in 0.99.7:
+ There are now additional node types T_super_root, T_pinstr and T_comment,
+ and the parser is able to create the corresponding nodes.
+ The functions for character set conversion have been moved to the Netstring
+ package; they are not specific for XML.
+
+- Changed in 0.99.6:
+ Implemented a check on deterministic content models. Added an alternate
+ validator basing on a DFA. - This means that now all mandatory features for
+ an XML-1.0 parser are implemented! The parser is now substantially complete.
+
+- Changed in 0.99.5:
+ The handling of ID and IDREF attributes has changed. The index of nodes
+ containing an ID attribute is now separated from the document. Optionally
+ the parser now checks whether the IDREF attributes refer to existing
+ elements.
+ The element nodes can optionally store the location in the source XML code.
+ The method 'write' writes the XML tree in every supported encoding.
+ (Successor of 'write_compact_as_latin1'.)
+ Several smaller changes and fixes.
+
+- Changed in 0.99.4:
+ The module Pxp_reader has been modernized. The resolver classes are simpler
+ to use. There is now support for URLs.
+ The interface of Pxp_yacc has been improved: The type 'source' is now
+ simpler. The type 'domspec' has gone; the new 'spec' is opaque and performs
+ better. There are some new parsing modes.
+ Many smaller changes.
+
+- Changed in 0.99.3:
+ The markup_* modules have been renamed to pxp_*. There is a new
+ compatibility API that tries to be compatible with markup-0.2.10.
+ The type "encoding" is now a polymorphic variant.
+
+- Changed in 0.99.2:
+ Added checks for the constraints about the standalone declaration.
+ Added regression tests about attribute normalization, attribute checks,
+ standalone checks.
+ Fixed some minor errors of the attribute normalization function.
+ The bytecode/native archives are now separated in a general part, in a
+ ISO-8859-1-relevant part, and a UTF-8-relevant part. The parser can again be
+ compiled with ocamlopt.
+
+- Changed in 0.99.1:
+ In general, this release is an early pre-release of the next stable version
+ 1.00. I do not recommend to use it for serious work; it is still very
+ experimental!
+ The core of the parser has been rewritten using a self-written parser
+ generator.
+ The lexer has been restructured, and can now handle UTF-8 encoded files.
+ Numerous other changes.
+
+
+--------------------------
+
+[1] see http://www.ocaml-programming.de/packages/pxp-1.0.tar.gz
+
+[2] see http://www.ocaml-programming.de/packages/documentation/netstring
+
+[3] see http://www.ocaml-programming.de/packages/documentation/pxp/manual
+
+[4] see mailto:gerd@gerd-stolpmann.de
+
+[5] see http://caml.inria.fr/
+
+[6] see http://www.w3.org/TR/1998/REC-xml-19980210.html
+
+[7] This particular document is an example of this DTD!
+
+
+