X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC.xml;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC.xml;h=906f45a798f446ec23564b0313a8de75fc1be788;hb=c03d2c1fdab8d228cb88aaba5ca0f556318bebc5;hp=0000000000000000000000000000000000000000;hpb=758057e85325f94cd88583feb1fdf6b038e35055;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/SPEC.xml b/helm/DEVEL/pxp/pxp/doc/SPEC.xml new file mode 100644 index 000000000..906f45a79 --- /dev/null +++ b/helm/DEVEL/pxp/pxp/doc/SPEC.xml @@ -0,0 +1,226 @@ + + +%common; + + +up'> + + +%config; + +]> + + + + + This document +

There are some points in the XML specification which are ambiguous. +The following notes discuss these points, and describe how this parser +behaves.

+
+ + + Conditional sections and the token ]]> + +

It is unclear what happens if an ignored section contains the +token ]]> at places where it is normally allowed, i.e. within string +literals and comments, e.g. + + +<![IGNORE[ <!-- ]]> --> ]]> + + +On the one hand, the production rule of the XML grammar does not treat such +tokens specially. Following the grammar, already the first ]]> ends +the conditional section + + +<![IGNORE[ <!-- ]]> + + +and the other tokens are included into the DTD.

+ +

On the other hand, we can read: "Like the internal and external DTD subsets, +a conditional section may contain one or more complete declarations, comments, +processing instructions, or nested conditional sections, intermingled with +white space" (XML 1.0 spec, section 3.4). Complete declarations and comments +may contain ]]>, so this is contradictory to the grammar.

+ +

The intention of conditional sections is to include or exclude the section +depending on the current replacement text of a parameter entity. Almost +always such sections are used as in + + +<!ENTITY % want.a.feature.or.not "INCLUDE"> (or "IGNORE") +<![ %want.a.feature.or.not; [ ... ]]> + + +This means that if it is possible to include a section it must also be +legal to ignore the same section. This is a strong indication that +the token ]]> must not count as section terminator if it occurs +in a string literal or comment.

+ +

This parser implements the latter.

+ +
+ + + Conditional sections and the inclusion of parameter entities + +

It is unclear what happens if an ignored section contains a reference +to a parameter entity. In most cases, this is not problematic because +nesting of parameter entities must respect declaration braces. The +replacement text of parameter entities must either contain a whole +number of declarations or only inner material of one declaration. Almost always +it does not matter whether these references are resolved or not +(the section is ignored).

+ +

But there is one case which is not explicitly specified: Is it allowed +that the replacement text of an entity contains the end marker ]]> +of an ignored conditional section? Example: + + +<!ENTITY % end "]]>"> +<![ IGNORE [ %end; + + +We do not find the statement in the XML spec that the ]]> must be contained +in the same entity as the corresponding <![ (as for the tokens <! and +> of declarations). So it is possible to conclude that ]]> may be in +another entity.

+ +

Of course, there are many arguments not to allow such constructs: The +resulting code is incomprehensive, and parsing takes longer (especially if the +entities are external). I think the best argument against this kind of XML +is that the XML spec is not detailed enough, as it contains no rules where +entity references should be recognized and where not. For example: + + +<!ENTITY % y "]]>"> +<!ENTITY % x "<!ENTITY z '<![CDATA[some text%y;'>"> +<![ IGNORE [ %x; ]]> + + +Which token ]]> counts? From a logical point of view, the ]]> in the +third line ends the conditional section. As already pointed out, the XML spec +permits the interpretation that ]]> is recognized even in string literals, +and this may be also true if it is "imported" from a separate entity; and so +the first ]]> denotes the end of the section.

+ +

As a practical solution, this parser does not expand parameter entities +in ignored sections. Furthermore, it is also not allowed that the ending ]]> +of ignored or included sections is contained in a different entity than the +starting <![ token.

+
+ + + + Standalone documents and attribute normalization + +

+If a document is declared as stand-alone, a restriction on the effect of +attribute normalization takes effect for attributes declared in external +entities. Normally, the parser knows the type of the attribute from +the ATTLIST declaration, and it can normalize attribute values depending +on their types. For example, an NMTOKEN attribute can be written with +leading or trailing spaces, but the parser returns always the nmtoken +without such added spaces; in contrast to this, a CDATA attribute is +not normalized in this way. For stand-alone document the type information is +not available if the ATTLIST declaration is located in an external +entity. Because of this, the XML spec demands that attribute values must +be written in their normal form in this case, i.e. without additional +spaces. +

+

This parser interprets this restriction as follows. Obviously, +the substitution of character and entity references is not considered +as a "change of the value" as a result of the normalization, because +these operations will be performed identically if the ATTLIST declaration +is not available. The same applies to the substitution of TABs, CRs, +and LFs by space characters. Only the removal of spaces depending on +the type of the attribute changes the value if the ATTLIST is not +available. +

+

This means in detail: CDATA attributes never violate the +stand-alone status. ID, IDREF, NMTOKEN, ENTITY, NOTATION and enumerator +attributes must not be written with leading and/or trailing spaces. IDREF, +ENTITIES, and NMTOKENS attributes must not be written with extra spaces at the +beginning or at the end of the value, or between the tokens of the list. +

+

The whole check is dubious, because the attribute type expresses also a +semantical constraint, not only a syntactical one. At least this parser +distinguishes strictly between single-value and list types, and returns the +attribute values differently; the first are represented as Value s (where s is +a string), the latter are represented as Valuelist [s1; s2; ...; sN]. The +internal representation of the value is dependent on the attribute type, too, +such that even normalized values are processed differently depending on +whether the attribute has list type or not. For this parser, it makes still a +difference whether a value is normalized and processed as if it were CDATA, or +whether the value is processed according to its declared type. +

+

The stand-alone check is included to be able to make a statement +whether other, well-formedness parsers can process the document. Of course, +these parsers always process attributes as CDATA, and the stand-alone check +guarantees that these parsers will always see the normalized values. +

+
+ + + Standalone documents and the restrictions on entity +references +

+Stand-alone documents must not refer to entities which are declared in an +external entity. This parser applies this rule only: to general and NDATA +entities when they occur in the document body (i.e. not in the DTD); and to +general and NDATA entities occuring in default attribute values declared in the +internal subset of the DTD. +

+

+Parameter entities are out of discussion for the stand-alone property. If there +is a parameter entity reference in the internal subset which was declared in an +external entity, it is not available in the same way as the external entity is +not available that contains its declaration. Because of this "equivalence", +parameter entity references are not checked on violations against the +stand-alone declaration. It simply does not matter. - Illustration: +

+ +

+Main document: + + +%ext; +%ent; +]]> + +"ext" contains: + + "> +]]> +

+ +

Here, the reference %ent; would be illegal if the standalone +declaration is strictly interpreted. This parser handles the references +%ent; and %ext; equivalently which means that %ent; is allowed, but the +element type "el" is treated as externally declared. +

+ +

+General entities can occur within the DTD, but they can only be contained in +the default value of attributes, or in the definition of other general +entities. The latter can be ignored, because the check will be repeated when +the entities are expanded. Though, general entities occuring in default +attribute values are actually checked at the moment when the default is +used in an element instance. +

+

+General entities occuring in the document body are always checked.

+

+NDATA entities can occur in ENTITY attribute values; either in the element +instance or in the default declaration. Both cases are checked. +

+
+ +