X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FREADME.xml;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FREADME.xml;h=0000000000000000000000000000000000000000;hb=e108abe5c0b4eb841c4ad332229a6c0e57e70079;hp=34c7726ad192e01459f1301b57896222e0db5888;hpb=1456c337a60f6677ee742ff7891d43fc382359a9;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/README.xml b/helm/DEVEL/pxp/pxp/doc/README.xml deleted file mode 100644 index 34c7726ad..000000000 --- a/helm/DEVEL/pxp/pxp/doc/README.xml +++ /dev/null @@ -1,423 +0,0 @@ - - - - - - -Gerd Stolpmann'> ---> - - -%common; - - -up'> - - -%config; - -]> - - - - Abstract -

-PXP is a validating parser for XML-1.0 which has been written -entirely in Objective Caml. -

- -

PXP is the new name of the parser formerly known as "Markup". -PXP means "Polymorphic XML parser" and emphasizes its most useful -property: that the API is polymorphic and can be configured such that -different objects are used to store different types of elements.

-
- - - Download -

-You can download PXP as gzip'ed tarball. The parser needs the Netstring package (0.9.3). Note that PXP -requires O'Caml 3.00. -

-
- - - User's Manual -

-The manual is included in the distribution both as Postscript document and -bunch of HTML files. An online version can be found here. -

-
- - - Author, Credits, Copying -

-PXP has been written by &person.gps;; it contains contributions by -Claudio Sacerdoti Coen. You may copy it as you like, -you may use it even for commercial purposes as long as the license conditions -are respected, see the file LICENSE coming with the distribution. It allows -almost everything. -

- -

Thanks also to Alain Frisch and Haruo Hosoya for discussions and bug -reports.

-
- - - Description -

-PXP is a validating XML parser for O'Caml. It strictly complies to the -XML-1.0 standard. -

- -

The parser is simple to call, usually only one statement (function -call) is sufficient to parse an XML document and to represent it as object -tree.

- -

-Once the document is parsed, it can be accessed using a class interface. -The interface allows arbitrary access including transformations. One of -the features of the document representation is its polymorphic nature; -it is simple to add custom methods to the document classes. Furthermore, -the parser can be configured such that different XML elements are represented -by objects created from different classes. This is a very powerful feature, -because it simplifies the structure of programs processing XML documents. -

- -

-Note that the class interface does not comply to the DOM standard. It was not a -development goal to realize a standard API (industrial developers can this much -better than I); however, the API is powerful enough to be considered as -equivalent with DOM. More important, the interface is compatible with the -XML information model required by many XML-related standards. -

- - - Detailed feature list - -
    -
  • The XML instance is validated against the DTD; any violation of -a validation constraint leads to the rejection of the instance. The validator -has been carefully implemented, and conforms strictly to the standard. If -needed, it is also possible to run the parser in a well-formedness mode.

    -
  • -
  • If possible, the validator applies a deterministic finite -automaton to validate the content models. This ensures that validation can -always be performed in linear time. However, in the case that the content -models are not deterministic, the parser uses a backtracking algorithm which -can be much slower. - It is also possible to reject non-deterministic content -models.

    -
  • -
  • In particular, the validator also checks the complicated rules -whether parentheses are properly nested with respect to entities, and whether -the standalone declaration is satisfied. On demand, it is checked whether the -IDREF attributes only refer to existing nodes.

    -
  • -
  • Entity references are automatically resolved while the XML text -is being scanned. It is not possible to recognize in the object tree where a -referenced entity begins or ends; the object tree only represents the logical structure.

    -
  • -
  • External entities are loaded using a configurable resolver -infrastructure. It is possible to connect the parser with an arbitrary XML source.

    -
  • -
  • The parser can read XML text encoded in a variety of character -sets. Independent of this, it is possible to choose the encoding of the -internal representation of the tree nodes; the parser automatically converts -the input text to this encoding. Currently, the parser supports UTF-8 and -ISO-8859-1 as internal encodings.

    -
  • -
  • The interface of the parser has been designed such that it is -best integrated into the language O'Caml. The first goal was simplicity of -usage which is achieved by many convenience methods and functions, and by -allowing the user to select which parts of the XML text are actually -represented in the tree. For example, it is possible to store processing -instructions as tree nodes, but the parser can also be configured such that -these instructions are put into hashtables. The information model is compatible -with the requirements of XML-related standards such as XPath.

    -
  • -
  • In particular, the node tree can optionally contain or leave out -processing instructions and comments. It is also possible to generate a "super -root" object which is the parent of the root element. The attributes of -elements are normally not stored as nodes, but it is possible to get them -wrapped into nodes.

    -
  • -
  • There is also an interface for DTDs; you can parse and access -sequences of declarations. The declarations are fully represented as recursive -O'Caml values. -

    -
  • -
-
- - - - Code examples -

-This distribution contains several examples:

-
    -
  • -validate: simply parses a -document and prints all error messages -

  • - -
  • -readme: Defines a DTD for simple "README"-like documents, and offers -conversion to HTML and text filesThis particular document is an -example of this DTD!. -

  • - -
  • -xmlforms: This is already a -sophisticated application that uses XML as style sheet language and data -storage format. It shows how a Tk user interface can be configured by an -XML style, and how data records can be stored using XML. -

  • -
-
- - - Restrictions and missing features -

-The following restrictions apply that are not violations of the standard: -

-
    -
  • -The attributes "xml:space", and "xml:lang" are not supported specially. - (The application can do this.)

  • - -
  • -The built-in support for SYSTEM and PUBLIC identifiers is limited to - local file access. There is no support for catalogs. The parser offers - a hook to add missing features.

  • - -
  • -It is currently not possible to check for interoperatibility with SGML. -

  • -
- -

The following features are also missing:

-
    -
  • There is no special support for namespaces. (Perhaps in the next release?)

    -
  • -
  • There is no support for XPATH or XSLT.

    -
  • -
-

However, I hope that these features will be implemented soon, either by -myself or by contributors (who are invited to do so).

-
- - - Recent Changes -
    -
  • -

    Changed in 1.0:

    -

    Support for document order.

    -
  • -
  • -

    Changed in 0.99.8:

    -

    Several fixes of bugs reported by Haruo Hosoya and Alain -Frisch.

    -

    The class type "node" has been extended: you can go directly to -the next and previous nodes in the list; you can refer to nodes by -position.

    -

    There are now some iterators for nodes: find, find_all, -find_element, find_all_elements, map_tree, iter_tree.

    -

    Experimental support for viewing attributes as nodes; I hope that -helps Alain writing his XPath evaluator.

    -

    The user's manual has been revised and is almost up to date.

    -
  • -
  • -

    Changed in 0.99.7:

    -

    There are now additional node types T_super_root, T_pinstr and -T_comment, and the parser is able to create the corresponding nodes.

    -

    The functions for character set conversion have been moved to -the Netstring package; they are not specific for XML.

    -
  • -
  • -

    Changed in 0.99.6:

    -

    Implemented a check on deterministic content models. Added -an alternate validator basing on a DFA. - This means that now all mandatory -features for an XML-1.0 parser are implemented! The parser is now substantially -complete.

    -
  • -
  • -

    Changed in 0.99.5:

    -

    The handling of ID and IDREF attributes has changed. The -index of nodes containing an ID attribute is now separated from the document. -Optionally the parser now checks whether the IDREF attributes refer to -existing elements.

    -

    The element nodes can optionally store the location in the -source XML code.

    -

    The method 'write' writes the XML tree in every supported -encoding. (Successor of 'write_compact_as_latin1'.)

    -

    Several smaller changes and fixes.

    -
  • -
  • -

    Changed in 0.99.4:

    -

    The module Pxp_reader has been modernized. The resolver classes -are simpler to use. There is now support for URLs.

    -

    The interface of Pxp_yacc has been improved: The type 'source' -is now simpler. The type 'domspec' has gone; the new 'spec' is opaque and -performs better. There are some new parsing modes.

    -

    Many smaller changes.

    -
  • -
  • -

    Changed in 0.99.3:

    -

    The markup_* modules have been renamed to pxp_*. There is a new -compatibility API that tries to be compatible with markup-0.2.10.

    -

    The type "encoding" is now a polymorphic variant.

    -
  • -
  • -

    Changed in 0.99.2:

    -

    Added checks for the constraints about the standalone -declaration.

    -

    Added regression tests about attribute normalization, -attribute checks, standalone checks.

    -

    Fixed some minor errors of the attribute normalization -function.

    -

    The bytecode/native archives are now separated in -a general part, in a ISO-8859-1-relevant part, and a UTF-8-relevant -part. The parser can again be compiled with ocamlopt.

    -
  • -
  • -

    Changed in 0.99.1:

    -

    In general, this release is an early pre-release of the -next stable version 1.00. I do not recommend to use it for serious -work; it is still very experimental!

    -

    The core of the parser has been rewritten using a self-written -parser generator.

    -

    The lexer has been restructured, and can now handle UTF-8 -encoded files.

    -

    Numerous other changes.

    -
  • - - -
-
-
-
-