--- /dev/null
+******************************************************************************
+Notes on the XML specification
+******************************************************************************
+
+
+==============================================================================
+This document
+==============================================================================
+
+There are some points in the XML specification which are ambiguous. The
+following notes discuss these points, and describe how this parser behaves.
+
+==============================================================================
+Conditional sections and the token ]]>
+==============================================================================
+
+It is unclear what happens if an ignored section contains the token ]]> at
+places where it is normally allowed, i.e. within string literals and comments,
+e.g.
+
+<![IGNORE[ <!-- ]]> --> ]]>
+
+On the one hand, the production rule of the XML grammar does not treat such
+tokens specially. Following the grammar, already the first ]]> ends the
+conditional section
+
+<![IGNORE[ <!-- ]]>
+
+and the other tokens are included into the DTD.
+
+On the other hand, we can read: "Like the internal and external DTD subsets, a
+conditional section may contain one or more complete declarations, comments,
+processing instructions, or nested conditional sections, intermingled with
+white space" (XML 1.0 spec, section 3.4). Complete declarations and comments
+may contain ]]>, so this is contradictory to the grammar.
+
+The intention of conditional sections is to include or exclude the section
+depending on the current replacement text of a parameter entity. Almost always
+such sections are used as in
+
+<!ENTITY % want.a.feature.or.not "INCLUDE"> (or "IGNORE")
+<![ %want.a.feature.or.not; [ ... ]]>
+
+This means that if it is possible to include a section it must also be legal to
+ignore the same section. This is a strong indication that the token ]]> must
+not count as section terminator if it occurs in a string literal or comment.
+
+This parser implements the latter.
+
+==============================================================================
+Conditional sections and the inclusion of parameter entities
+==============================================================================
+
+It is unclear what happens if an ignored section contains a reference to a
+parameter entity. In most cases, this is not problematic because nesting of
+parameter entities must respect declaration braces. The replacement text of
+parameter entities must either contain a whole number of declarations or only
+inner material of one declaration. Almost always it does not matter whether
+these references are resolved or not (the section is ignored).
+
+But there is one case which is not explicitly specified: Is it allowed that the
+replacement text of an entity contains the end marker ]]> of an ignored
+conditional section? Example:
+
+<!ENTITY % end "]]>">
+<![ IGNORE [ %end;
+
+We do not find the statement in the XML spec that the ]]> must be contained in
+the same entity as the corresponding <![ (as for the tokens <! and > of
+declarations). So it is possible to conclude that ]]> may be in another entity.
+
+Of course, there are many arguments not to allow such constructs: The resulting
+code is incomprehensive, and parsing takes longer (especially if the entities
+are external). I think the best argument against this kind of XML is that the
+XML spec is not detailed enough, as it contains no rules where entity
+references should be recognized and where not. For example:
+
+<!ENTITY % y "]]>">
+<!ENTITY % x "<!ENTITY z '<![CDATA[some text%y;'>">
+<![ IGNORE [ %x; ]]>
+
+Which token ]]> counts? From a logical point of view, the ]]> in the third line
+ends the conditional section. As already pointed out, the XML spec permits the
+interpretation that ]]> is recognized even in string literals, and this may be
+also true if it is "imported" from a separate entity; and so the first ]]>
+denotes the end of the section.
+
+As a practical solution, this parser does not expand parameter entities in
+ignored sections. Furthermore, it is also not allowed that the ending ]]> of
+ignored or included sections is contained in a different entity than the
+starting <![ token.
+
+==============================================================================
+Standalone documents and attribute normalization
+==============================================================================
+
+If a document is declared as stand-alone, a restriction on the effect of
+attribute normalization takes effect for attributes declared in external
+entities. Normally, the parser knows the type of the attribute from the ATTLIST
+declaration, and it can normalize attribute values depending on their types.
+For example, an NMTOKEN attribute can be written with leading or trailing
+spaces, but the parser returns always the nmtoken without such added spaces; in
+contrast to this, a CDATA attribute is not normalized in this way. For
+stand-alone document the type information is not available if the ATTLIST
+declaration is located in an external entity. Because of this, the XML spec
+demands that attribute values must be written in their normal form in this
+case, i.e. without additional spaces.
+
+This parser interprets this restriction as follows. Obviously, the substitution
+of character and entity references is not considered as a "change of the value"
+as a result of the normalization, because these operations will be performed
+identically if the ATTLIST declaration is not available. The same applies to
+the substitution of TABs, CRs, and LFs by space characters. Only the removal of
+spaces depending on the type of the attribute changes the value if the ATTLIST
+is not available.
+
+This means in detail: CDATA attributes never violate the stand-alone status.
+ID, IDREF, NMTOKEN, ENTITY, NOTATION and enumerator attributes must not be
+written with leading and/or trailing spaces. IDREF, ENTITIES, and NMTOKENS
+attributes must not be written with extra spaces at the beginning or at the end
+of the value, or between the tokens of the list.
+
+The whole check is dubious, because the attribute type expresses also a
+semantical constraint, not only a syntactical one. At least this parser
+distinguishes strictly between single-value and list types, and returns the
+attribute values differently; the first are represented as Value s (where s is
+a string), the latter are represented as Valuelist [s1; s2; ...; sN]. The
+internal representation of the value is dependent on the attribute type, too,
+such that even normalized values are processed differently depending on whether
+the attribute has list type or not. For this parser, it makes still a
+difference whether a value is normalized and processed as if it were CDATA, or
+whether the value is processed according to its declared type.
+
+The stand-alone check is included to be able to make a statement whether other,
+well-formedness parsers can process the document. Of course, these parsers
+always process attributes as CDATA, and the stand-alone check guarantees that
+these parsers will always see the normalized values.
+
+==============================================================================
+Standalone documents and the restrictions on entity
+references
+==============================================================================
+
+Stand-alone documents must not refer to entities which are declared in an
+external entity. This parser applies this rule only: to general and NDATA
+entities when they occur in the document body (i.e. not in the DTD); and to
+general and NDATA entities occuring in default attribute values declared in the
+internal subset of the DTD.
+
+Parameter entities are out of discussion for the stand-alone property. If there
+is a parameter entity reference in the internal subset which was declared in an
+external entity, it is not available in the same way as the external entity is
+not available that contains its declaration. Because of this "equivalence",
+parameter entity references are not checked on violations against the
+stand-alone declaration. It simply does not matter. - Illustration:
+
+Main document:
+
+<!ENTITY % ext SYSTEM "ext">
+%ext;
+%ent;
+
+"ext" contains:
+
+<!ENTITY % ent "<!ELEMENT el (other*)>">
+
+
+
+Here, the reference %ent; would be illegal if the standalone declaration is
+strictly interpreted. This parser handles the references %ent; and %ext;
+equivalently which means that %ent; is allowed, but the element type "el" is
+treated as externally declared.
+
+General entities can occur within the DTD, but they can only be contained in
+the default value of attributes, or in the definition of other general
+entities. The latter can be ignored, because the check will be repeated when
+the entities are expanded. Though, general entities occuring in default
+attribute values are actually checked at the moment when the default is used in
+an element instance.
+
+General entities occuring in the document body are always checked.
+
+NDATA entities can occur in ENTITY attribute values; either in the element
+instance or in the default declaration. Both cases are checked.
+