X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC;h=28e6914ceb0d67c62e7b3335a35e0e55bb494ef7;hb=c03d2c1fdab8d228cb88aaba5ca0f556318bebc5;hp=0000000000000000000000000000000000000000;hpb=758057e85325f94cd88583feb1fdf6b038e35055;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/SPEC b/helm/DEVEL/pxp/pxp/doc/SPEC new file mode 100644 index 000000000..28e6914ce --- /dev/null +++ b/helm/DEVEL/pxp/pxp/doc/SPEC @@ -0,0 +1,185 @@ +****************************************************************************** +Notes on the XML specification +****************************************************************************** + + +============================================================================== +This document +============================================================================== + +There are some points in the XML specification which are ambiguous. The +following notes discuss these points, and describe how this parser behaves. + +============================================================================== +Conditional sections and the token ]]> +============================================================================== + +It is unclear what happens if an ignored section contains the token ]]> at +places where it is normally allowed, i.e. within string literals and comments, +e.g. + + --> ]]> + +On the one hand, the production rule of the XML grammar does not treat such +tokens specially. Following the grammar, already the first ]]> ends the +conditional section + + + +and the other tokens are included into the DTD. + +On the other hand, we can read: "Like the internal and external DTD subsets, a +conditional section may contain one or more complete declarations, comments, +processing instructions, or nested conditional sections, intermingled with +white space" (XML 1.0 spec, section 3.4). Complete declarations and comments +may contain ]]>, so this is contradictory to the grammar. + +The intention of conditional sections is to include or exclude the section +depending on the current replacement text of a parameter entity. Almost always +such sections are used as in + + (or "IGNORE") + + +This means that if it is possible to include a section it must also be legal to +ignore the same section. This is a strong indication that the token ]]> must +not count as section terminator if it occurs in a string literal or comment. + +This parser implements the latter. + +============================================================================== +Conditional sections and the inclusion of parameter entities +============================================================================== + +It is unclear what happens if an ignored section contains a reference to a +parameter entity. In most cases, this is not problematic because nesting of +parameter entities must respect declaration braces. The replacement text of +parameter entities must either contain a whole number of declarations or only +inner material of one declaration. Almost always it does not matter whether +these references are resolved or not (the section is ignored). + +But there is one case which is not explicitly specified: Is it allowed that the +replacement text of an entity contains the end marker ]]> of an ignored +conditional section? Example: + +"> + must be contained in +the same entity as the corresponding of +declarations). So it is possible to conclude that ]]> may be in another entity. + +Of course, there are many arguments not to allow such constructs: The resulting +code is incomprehensive, and parsing takes longer (especially if the entities +are external). I think the best argument against this kind of XML is that the +XML spec is not detailed enough, as it contains no rules where entity +references should be recognized and where not. For example: + +"> +"> + + +Which token ]]> counts? From a logical point of view, the ]]> in the third line +ends the conditional section. As already pointed out, the XML spec permits the +interpretation that ]]> is recognized even in string literals, and this may be +also true if it is "imported" from a separate entity; and so the first ]]> +denotes the end of the section. + +As a practical solution, this parser does not expand parameter entities in +ignored sections. Furthermore, it is also not allowed that the ending ]]> of +ignored or included sections is contained in a different entity than the +starting +%ext; +%ent; + +"ext" contains: + +"> + + + +Here, the reference %ent; would be illegal if the standalone declaration is +strictly interpreted. This parser handles the references %ent; and %ext; +equivalently which means that %ent; is allowed, but the element type "el" is +treated as externally declared. + +General entities can occur within the DTD, but they can only be contained in +the default value of attributes, or in the definition of other general +entities. The latter can be ignored, because the check will be repeated when +the entities are expanded. Though, general entities occuring in default +attribute values are actually checked at the moment when the default is used in +an element instance. + +General entities occuring in the document body are always checked. + +NDATA entities can occur in ENTITY attribute values; either in the element +instance or in the default declaration. Both cases are checked. +