X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FREADME;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FREADME;h=b7ad5de592763e54b7279b5245d18a10aa289157;hb=c03d2c1fdab8d228cb88aaba5ca0f556318bebc5;hp=0000000000000000000000000000000000000000;hpb=758057e85325f94cd88583feb1fdf6b038e35055;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/README b/helm/DEVEL/pxp/pxp/doc/README new file mode 100644 index 000000000..b7ad5de59 --- /dev/null +++ b/helm/DEVEL/pxp/pxp/doc/README @@ -0,0 +1,247 @@ +****************************************************************************** +README - PXP, the XML parser for O'Caml +****************************************************************************** + + +============================================================================== +Abstract +============================================================================== + +PXP is a validating parser for XML-1.0 which has been written entirely in +Objective Caml. + +PXP is the new name of the parser formerly known as "Markup". PXP means +"Polymorphic XML parser" and emphasizes its most useful property: that the API +is polymorphic and can be configured such that different objects are used to +store different types of elements. + +============================================================================== +Download +============================================================================== + +You can download PXP as gzip'ed tarball [1]. The parser needs the Netstring [2] +package (0.9.3). Note that PXP requires O'Caml 3.00. + +============================================================================== +User's Manual +============================================================================== + +The manual is included in the distribution both as Postscript document and +bunch of HTML files. An online version can be found here [3]. + +============================================================================== +Author, Credits, Copying +============================================================================== + +PXP has been written by Gerd Stolpmann [4]; it contains contributions by +Claudio Sacerdoti Coen. You may copy it as you like, you may use it even for +commercial purposes as long as the license conditions are respected, see the +file LICENSE coming with the distribution. It allows almost everything. + +Thanks also to Alain Frisch and Haruo Hosoya for discussions and bug reports. + +============================================================================== +Description +============================================================================== + +PXP is a validating XML parser for O'Caml [5]. It strictly complies to the +XML-1.0 [6] standard. + +The parser is simple to call, usually only one statement (function call) is +sufficient to parse an XML document and to represent it as object tree. + +Once the document is parsed, it can be accessed using a class interface. The +interface allows arbitrary access including transformations. One of the +features of the document representation is its polymorphic nature; it is simple +to add custom methods to the document classes. Furthermore, the parser can be +configured such that different XML elements are represented by objects created +from different classes. This is a very powerful feature, because it simplifies +the structure of programs processing XML documents. + +Note that the class interface does not comply to the DOM standard. It was not a +development goal to realize a standard API (industrial developers can this much +better than I); however, the API is powerful enough to be considered as +equivalent with DOM. More important, the interface is compatible with the XML +information model required by many XML-related standards. + +------------------------------------------------------------------------------ +Detailed feature list +------------------------------------------------------------------------------ + +- The XML instance is validated against the DTD; any violation of a validation + constraint leads to the rejection of the instance. The validator has been + carefully implemented, and conforms strictly to the standard. If needed, it + is also possible to run the parser in a well-formedness mode. + +- If possible, the validator applies a deterministic finite automaton to + validate the content models. This ensures that validation can always be + performed in linear time. However, in the case that the content models are + not deterministic, the parser uses a backtracking algorithm which can be + much slower. - It is also possible to reject non-deterministic content + models. + +- In particular, the validator also checks the complicated rules whether + parentheses are properly nested with respect to entities, and whether the + standalone declaration is satisfied. On demand, it is checked whether the + IDREF attributes only refer to existing nodes. + +- Entity references are automatically resolved while the XML text is being + scanned. It is not possible to recognize in the object tree where a + referenced entity begins or ends; the object tree only represents the + logical structure. + +- External entities are loaded using a configurable resolver infrastructure. + It is possible to connect the parser with an arbitrary XML source. + +- The parser can read XML text encoded in a variety of character sets. + Independent of this, it is possible to choose the encoding of the internal + representation of the tree nodes; the parser automatically converts the + input text to this encoding. Currently, the parser supports UTF-8 and + ISO-8859-1 as internal encodings. + +- The interface of the parser has been designed such that it is best + integrated into the language O'Caml. The first goal was simplicity of usage + which is achieved by many convenience methods and functions, and by allowing + the user to select which parts of the XML text are actually represented in + the tree. For example, it is possible to store processing instructions as + tree nodes, but the parser can also be configured such that these + instructions are put into hashtables. The information model is compatible + with the requirements of XML-related standards such as XPath. + +- In particular, the node tree can optionally contain or leave out processing + instructions and comments. It is also possible to generate a "super root" + object which is the parent of the root element. The attributes of elements + are normally not stored as nodes, but it is possible to get them wrapped + into nodes. + +- There is also an interface for DTDs; you can parse and access sequences of + declarations. The declarations are fully represented as recursive O'Caml + values. + +------------------------------------------------------------------------------ +Code examples +------------------------------------------------------------------------------ + +This distribution contains several examples: + +- validate: simply parses a document and prints all error messages + +- readme: Defines a DTD for simple "README"-like documents, and offers + conversion to HTML and text files [7]. + +- xmlforms: This is already a sophisticated application that uses XML as style + sheet language and data storage format. It shows how a Tk user interface can + be configured by an XML style, and how data records can be stored using XML. + +------------------------------------------------------------------------------ +Restrictions and missing features +------------------------------------------------------------------------------ + +The following restrictions apply that are not violations of the standard: + +- The attributes "xml:space", and "xml:lang" are not supported specially. (The + application can do this.) + +- The built-in support for SYSTEM and PUBLIC identifiers is limited to local + file access. There is no support for catalogs. The parser offers a hook to + add missing features. + +- It is currently not possible to check for interoperatibility with SGML. + +The following features are also missing: + +- There is no special support for namespaces. (Perhaps in the next release?) + +- There is no support for XPATH or XSLT. + +However, I hope that these features will be implemented soon, either by myself +or by contributors (who are invited to do so). + +------------------------------------------------------------------------------ +Recent Changes +------------------------------------------------------------------------------ + +- Changed in 1.0: + Support for document order. + +- Changed in 0.99.8: + Several fixes of bugs reported by Haruo Hosoya and Alain Frisch. + The class type "node" has been extended: you can go directly to the next and + previous nodes in the list; you can refer to nodes by position. + There are now some iterators for nodes: find, find_all, find_element, + find_all_elements, map_tree, iter_tree. + Experimental support for viewing attributes as nodes; I hope that helps + Alain writing his XPath evaluator. + The user's manual has been revised and is almost up to date. + +- Changed in 0.99.7: + There are now additional node types T_super_root, T_pinstr and T_comment, + and the parser is able to create the corresponding nodes. + The functions for character set conversion have been moved to the Netstring + package; they are not specific for XML. + +- Changed in 0.99.6: + Implemented a check on deterministic content models. Added an alternate + validator basing on a DFA. - This means that now all mandatory features for + an XML-1.0 parser are implemented! The parser is now substantially complete. + +- Changed in 0.99.5: + The handling of ID and IDREF attributes has changed. The index of nodes + containing an ID attribute is now separated from the document. Optionally + the parser now checks whether the IDREF attributes refer to existing + elements. + The element nodes can optionally store the location in the source XML code. + The method 'write' writes the XML tree in every supported encoding. + (Successor of 'write_compact_as_latin1'.) + Several smaller changes and fixes. + +- Changed in 0.99.4: + The module Pxp_reader has been modernized. The resolver classes are simpler + to use. There is now support for URLs. + The interface of Pxp_yacc has been improved: The type 'source' is now + simpler. The type 'domspec' has gone; the new 'spec' is opaque and performs + better. There are some new parsing modes. + Many smaller changes. + +- Changed in 0.99.3: + The markup_* modules have been renamed to pxp_*. There is a new + compatibility API that tries to be compatible with markup-0.2.10. + The type "encoding" is now a polymorphic variant. + +- Changed in 0.99.2: + Added checks for the constraints about the standalone declaration. + Added regression tests about attribute normalization, attribute checks, + standalone checks. + Fixed some minor errors of the attribute normalization function. + The bytecode/native archives are now separated in a general part, in a + ISO-8859-1-relevant part, and a UTF-8-relevant part. The parser can again be + compiled with ocamlopt. + +- Changed in 0.99.1: + In general, this release is an early pre-release of the next stable version + 1.00. I do not recommend to use it for serious work; it is still very + experimental! + The core of the parser has been rewritten using a self-written parser + generator. + The lexer has been restructured, and can now handle UTF-8 encoded files. + Numerous other changes. + + +-------------------------- + +[1] see http://www.ocaml-programming.de/packages/pxp-1.0.tar.gz + +[2] see http://www.ocaml-programming.de/packages/documentation/netstring + +[3] see http://www.ocaml-programming.de/packages/documentation/pxp/manual + +[4] see mailto:gerd@gerd-stolpmann.de + +[5] see http://caml.inria.fr/ + +[6] see http://www.w3.org/TR/1998/REC-xml-19980210.html + +[7] This particular document is an example of this DTD! + + +