+++ /dev/null
-******************************************************************************
-Notes on the XML specification
-******************************************************************************
-
-
-==============================================================================
-This document
-==============================================================================
-
-There are some points in the XML specification which are ambiguous. The
-following notes discuss these points, and describe how this parser behaves.
-
-==============================================================================
-Conditional sections and the token ]]>
-==============================================================================
-
-It is unclear what happens if an ignored section contains the token ]]> at
-places where it is normally allowed, i.e. within string literals and comments,
-e.g.
-
-<![IGNORE[ <!-- ]]> --> ]]>
-
-On the one hand, the production rule of the XML grammar does not treat such
-tokens specially. Following the grammar, already the first ]]> ends the
-conditional section
-
-<![IGNORE[ <!-- ]]>
-
-and the other tokens are included into the DTD.
-
-On the other hand, we can read: "Like the internal and external DTD subsets, a
-conditional section may contain one or more complete declarations, comments,
-processing instructions, or nested conditional sections, intermingled with
-white space" (XML 1.0 spec, section 3.4). Complete declarations and comments
-may contain ]]>, so this is contradictory to the grammar.
-
-The intention of conditional sections is to include or exclude the section
-depending on the current replacement text of a parameter entity. Almost always
-such sections are used as in
-
-<!ENTITY % want.a.feature.or.not "INCLUDE"> (or "IGNORE")
-<![ %want.a.feature.or.not; [ ... ]]>
-
-This means that if it is possible to include a section it must also be legal to
-ignore the same section. This is a strong indication that the token ]]> must
-not count as section terminator if it occurs in a string literal or comment.
-
-This parser implements the latter.
-
-==============================================================================
-Conditional sections and the inclusion of parameter entities
-==============================================================================
-
-It is unclear what happens if an ignored section contains a reference to a
-parameter entity. In most cases, this is not problematic because nesting of
-parameter entities must respect declaration braces. The replacement text of
-parameter entities must either contain a whole number of declarations or only
-inner material of one declaration. Almost always it does not matter whether
-these references are resolved or not (the section is ignored).
-
-But there is one case which is not explicitly specified: Is it allowed that the
-replacement text of an entity contains the end marker ]]> of an ignored
-conditional section? Example:
-
-<!ENTITY % end "]]>">
-<![ IGNORE [ %end;
-
-We do not find the statement in the XML spec that the ]]> must be contained in
-the same entity as the corresponding <![ (as for the tokens <! and > of
-declarations). So it is possible to conclude that ]]> may be in another entity.
-
-Of course, there are many arguments not to allow such constructs: The resulting
-code is incomprehensive, and parsing takes longer (especially if the entities
-are external). I think the best argument against this kind of XML is that the
-XML spec is not detailed enough, as it contains no rules where entity
-references should be recognized and where not. For example:
-
-<!ENTITY % y "]]>">
-<!ENTITY % x "<!ENTITY z '<![CDATA[some text%y;'>">
-<![ IGNORE [ %x; ]]>
-
-Which token ]]> counts? From a logical point of view, the ]]> in the third line
-ends the conditional section. As already pointed out, the XML spec permits the
-interpretation that ]]> is recognized even in string literals, and this may be
-also true if it is "imported" from a separate entity; and so the first ]]>
-denotes the end of the section.
-
-As a practical solution, this parser does not expand parameter entities in
-ignored sections. Furthermore, it is also not allowed that the ending ]]> of
-ignored or included sections is contained in a different entity than the
-starting <![ token.
-
-==============================================================================
-Standalone documents and attribute normalization
-==============================================================================
-
-If a document is declared as stand-alone, a restriction on the effect of
-attribute normalization takes effect for attributes declared in external
-entities. Normally, the parser knows the type of the attribute from the ATTLIST
-declaration, and it can normalize attribute values depending on their types.
-For example, an NMTOKEN attribute can be written with leading or trailing
-spaces, but the parser returns always the nmtoken without such added spaces; in
-contrast to this, a CDATA attribute is not normalized in this way. For
-stand-alone document the type information is not available if the ATTLIST
-declaration is located in an external entity. Because of this, the XML spec
-demands that attribute values must be written in their normal form in this
-case, i.e. without additional spaces.
-
-This parser interprets this restriction as follows. Obviously, the substitution
-of character and entity references is not considered as a "change of the value"
-as a result of the normalization, because these operations will be performed
-identically if the ATTLIST declaration is not available. The same applies to
-the substitution of TABs, CRs, and LFs by space characters. Only the removal of
-spaces depending on the type of the attribute changes the value if the ATTLIST
-is not available.
-
-This means in detail: CDATA attributes never violate the stand-alone status.
-ID, IDREF, NMTOKEN, ENTITY, NOTATION and enumerator attributes must not be
-written with leading and/or trailing spaces. IDREF, ENTITIES, and NMTOKENS
-attributes must not be written with extra spaces at the beginning or at the end
-of the value, or between the tokens of the list.
-
-The whole check is dubious, because the attribute type expresses also a
-semantical constraint, not only a syntactical one. At least this parser
-distinguishes strictly between single-value and list types, and returns the
-attribute values differently; the first are represented as Value s (where s is
-a string), the latter are represented as Valuelist [s1; s2; ...; sN]. The
-internal representation of the value is dependent on the attribute type, too,
-such that even normalized values are processed differently depending on whether
-the attribute has list type or not. For this parser, it makes still a
-difference whether a value is normalized and processed as if it were CDATA, or
-whether the value is processed according to its declared type.
-
-The stand-alone check is included to be able to make a statement whether other,
-well-formedness parsers can process the document. Of course, these parsers
-always process attributes as CDATA, and the stand-alone check guarantees that
-these parsers will always see the normalized values.
-
-==============================================================================
-Standalone documents and the restrictions on entity
-references
-==============================================================================
-
-Stand-alone documents must not refer to entities which are declared in an
-external entity. This parser applies this rule only: to general and NDATA
-entities when they occur in the document body (i.e. not in the DTD); and to
-general and NDATA entities occuring in default attribute values declared in the
-internal subset of the DTD.
-
-Parameter entities are out of discussion for the stand-alone property. If there
-is a parameter entity reference in the internal subset which was declared in an
-external entity, it is not available in the same way as the external entity is
-not available that contains its declaration. Because of this "equivalence",
-parameter entity references are not checked on violations against the
-stand-alone declaration. It simply does not matter. - Illustration:
-
-Main document:
-
-<!ENTITY % ext SYSTEM "ext">
-%ext;
-%ent;
-
-"ext" contains:
-
-<!ENTITY % ent "<!ELEMENT el (other*)>">
-
-
-
-Here, the reference %ent; would be illegal if the standalone declaration is
-strictly interpreted. This parser handles the references %ent; and %ext;
-equivalently which means that %ent; is allowed, but the element type "el" is
-treated as externally declared.
-
-General entities can occur within the DTD, but they can only be contained in
-the default value of attributes, or in the definition of other general
-entities. The latter can be ignored, because the check will be repeated when
-the entities are expanded. Though, general entities occuring in default
-attribute values are actually checked at the moment when the default is used in
-an element instance.
-
-General entities occuring in the document body are always checked.
-
-NDATA entities can occur in ENTITY attribute values; either in the element
-instance or in the default declaration. Both cases are checked.
-