X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC.xml;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2FSPEC.xml;h=0000000000000000000000000000000000000000;hb=c7514aaa249a96c5fdd39b1123fbdb38d92f20b6;hp=906f45a798f446ec23564b0313a8de75fc1be788;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/SPEC.xml b/helm/DEVEL/pxp/pxp/doc/SPEC.xml deleted file mode 100644 index 906f45a79..000000000 --- a/helm/DEVEL/pxp/pxp/doc/SPEC.xml +++ /dev/null @@ -1,226 +0,0 @@ - - -%common; - - -up'> - - -%config; - -]> - - - - - This document -

There are some points in the XML specification which are ambiguous. -The following notes discuss these points, and describe how this parser -behaves.

-
- - - Conditional sections and the token ]]> - -

It is unclear what happens if an ignored section contains the -token ]]> at places where it is normally allowed, i.e. within string -literals and comments, e.g. - - -<![IGNORE[ <!-- ]]> --> ]]> - - -On the one hand, the production rule of the XML grammar does not treat such -tokens specially. Following the grammar, already the first ]]> ends -the conditional section - - -<![IGNORE[ <!-- ]]> - - -and the other tokens are included into the DTD.

- -

On the other hand, we can read: "Like the internal and external DTD subsets, -a conditional section may contain one or more complete declarations, comments, -processing instructions, or nested conditional sections, intermingled with -white space" (XML 1.0 spec, section 3.4). Complete declarations and comments -may contain ]]>, so this is contradictory to the grammar.

- -

The intention of conditional sections is to include or exclude the section -depending on the current replacement text of a parameter entity. Almost -always such sections are used as in - - -<!ENTITY % want.a.feature.or.not "INCLUDE"> (or "IGNORE") -<![ %want.a.feature.or.not; [ ... ]]> - - -This means that if it is possible to include a section it must also be -legal to ignore the same section. This is a strong indication that -the token ]]> must not count as section terminator if it occurs -in a string literal or comment.

- -

This parser implements the latter.

- -
- - - Conditional sections and the inclusion of parameter entities - -

It is unclear what happens if an ignored section contains a reference -to a parameter entity. In most cases, this is not problematic because -nesting of parameter entities must respect declaration braces. The -replacement text of parameter entities must either contain a whole -number of declarations or only inner material of one declaration. Almost always -it does not matter whether these references are resolved or not -(the section is ignored).

- -

But there is one case which is not explicitly specified: Is it allowed -that the replacement text of an entity contains the end marker ]]> -of an ignored conditional section? Example: - - -<!ENTITY % end "]]>"> -<![ IGNORE [ %end; - - -We do not find the statement in the XML spec that the ]]> must be contained -in the same entity as the corresponding <![ (as for the tokens <! and -> of declarations). So it is possible to conclude that ]]> may be in -another entity.

- -

Of course, there are many arguments not to allow such constructs: The -resulting code is incomprehensive, and parsing takes longer (especially if the -entities are external). I think the best argument against this kind of XML -is that the XML spec is not detailed enough, as it contains no rules where -entity references should be recognized and where not. For example: - - -<!ENTITY % y "]]>"> -<!ENTITY % x "<!ENTITY z '<![CDATA[some text%y;'>"> -<![ IGNORE [ %x; ]]> - - -Which token ]]> counts? From a logical point of view, the ]]> in the -third line ends the conditional section. As already pointed out, the XML spec -permits the interpretation that ]]> is recognized even in string literals, -and this may be also true if it is "imported" from a separate entity; and so -the first ]]> denotes the end of the section.

- -

As a practical solution, this parser does not expand parameter entities -in ignored sections. Furthermore, it is also not allowed that the ending ]]> -of ignored or included sections is contained in a different entity than the -starting <![ token.

-
- - - - Standalone documents and attribute normalization - -

-If a document is declared as stand-alone, a restriction on the effect of -attribute normalization takes effect for attributes declared in external -entities. Normally, the parser knows the type of the attribute from -the ATTLIST declaration, and it can normalize attribute values depending -on their types. For example, an NMTOKEN attribute can be written with -leading or trailing spaces, but the parser returns always the nmtoken -without such added spaces; in contrast to this, a CDATA attribute is -not normalized in this way. For stand-alone document the type information is -not available if the ATTLIST declaration is located in an external -entity. Because of this, the XML spec demands that attribute values must -be written in their normal form in this case, i.e. without additional -spaces. -

-

This parser interprets this restriction as follows. Obviously, -the substitution of character and entity references is not considered -as a "change of the value" as a result of the normalization, because -these operations will be performed identically if the ATTLIST declaration -is not available. The same applies to the substitution of TABs, CRs, -and LFs by space characters. Only the removal of spaces depending on -the type of the attribute changes the value if the ATTLIST is not -available. -

-

This means in detail: CDATA attributes never violate the -stand-alone status. ID, IDREF, NMTOKEN, ENTITY, NOTATION and enumerator -attributes must not be written with leading and/or trailing spaces. IDREF, -ENTITIES, and NMTOKENS attributes must not be written with extra spaces at the -beginning or at the end of the value, or between the tokens of the list. -

-

The whole check is dubious, because the attribute type expresses also a -semantical constraint, not only a syntactical one. At least this parser -distinguishes strictly between single-value and list types, and returns the -attribute values differently; the first are represented as Value s (where s is -a string), the latter are represented as Valuelist [s1; s2; ...; sN]. The -internal representation of the value is dependent on the attribute type, too, -such that even normalized values are processed differently depending on -whether the attribute has list type or not. For this parser, it makes still a -difference whether a value is normalized and processed as if it were CDATA, or -whether the value is processed according to its declared type. -

-

The stand-alone check is included to be able to make a statement -whether other, well-formedness parsers can process the document. Of course, -these parsers always process attributes as CDATA, and the stand-alone check -guarantees that these parsers will always see the normalized values. -

-
- - - Standalone documents and the restrictions on entity -references -

-Stand-alone documents must not refer to entities which are declared in an -external entity. This parser applies this rule only: to general and NDATA -entities when they occur in the document body (i.e. not in the DTD); and to -general and NDATA entities occuring in default attribute values declared in the -internal subset of the DTD. -

-

-Parameter entities are out of discussion for the stand-alone property. If there -is a parameter entity reference in the internal subset which was declared in an -external entity, it is not available in the same way as the external entity is -not available that contains its declaration. Because of this "equivalence", -parameter entity references are not checked on violations against the -stand-alone declaration. It simply does not matter. - Illustration: -

- -

-Main document: - - -%ext; -%ent; -]]> - -"ext" contains: - - "> -]]> -

- -

Here, the reference %ent; would be illegal if the standalone -declaration is strictly interpreted. This parser handles the references -%ent; and %ext; equivalently which means that %ent; is allowed, but the -element type "el" is treated as externally declared. -

- -

-General entities can occur within the DTD, but they can only be contained in -the default value of attributes, or in the definition of other general -entities. The latter can be ignored, because the check will be repeated when -the entities are expanded. Though, general entities occuring in default -attribute values are actually checked at the moment when the default is -used in an element instance. -

-

-General entities occuring in the document body are always checked.

-

-NDATA entities can occur in ENTITY attribute values; either in the element -instance or in the default declaration. Both cases are checked. -

-
- -