X-Git-Url: http://matita.cs.unibo.it/gitweb/?p=helm.git;a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1496.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1496.html;h=0000000000000000000000000000000000000000;hp=faea39fc62fe30a508da8da4c8aee1975c00b820;hb=869549224eef6278a48c16ae27dd786376082b38;hpb=89262281b6e83bd2321150f81f1a0583645eb0c8 diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x1496.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x1496.html deleted file mode 100644 index faea39fc6..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x1496.html +++ /dev/null @@ -1,442 +0,0 @@ -
If an element declaration does not allow the element to -contain character data, the following rules apply.
If the element must be empty, i.e. it is declared with the -keyword EMPTY, the element instance must be effectively -empty (it must not even contain whitespace characters). The parser guarantees -that a declared EMPTY element does never contain a data -node, even if the data node represents the empty string.
If the element declaration only permits other elements to occur -within that element but not character data, it is still possible to insert -whitespace characters between the subelements. The parser ignores these -characters, too, and does not create data nodes for them.
Example. Consider the following element types: - -
<!ELEMENT x ( #PCDATA | z )* > -<!ELEMENT y ( z )* > -<!ELEMENT z EMPTY>- -Only x may contain character data, the keyword -#PCDATA indicates this. The other types are character-free.
The XML term - -
<x><z/> <z/></x>- -will be internally represented by an element node for x -with three subnodes: the first z element, a data node -containing the space character, and the second z element. -In contrast to this, the term - -
<y><z/> <z/></y>- -is represented by an element node for y with only -two subnodes, the two z elements. There -is no data node for the space character because spaces are ignored in the -character-free element y.
The XML specification allows all Unicode characters in XML -texts. This parser can be configured such that UTF-8 is used to represent the -characters internally; however, the default character encoding is -ISO-8859-1. (Currently, no other encodings are possible for the internal string -representation; the type Pxp_types.rep_encoding enumerates -the possible encodings. Principially, the parser could use any encoding that is -ASCII-compatible, but there are currently only lexical analyzers for UTF-8 and -ISO-8859-1. It is currently impossible to use UTF-16 or UCS-4 as internal -encodings (or other multibyte encodings which are not ASCII-compatible) unless -major parts of the parser are rewritten - unlikely...)
The internal encoding may be different from the external encoding (specified -in the XML declaration <?xml ... encoding="..."?>); in -this case the strings are automatically converted to the internal encoding.
If the internal encoding is ISO-8859-1, it is possible that there are -characters that cannot be represented. In this case, the parser ignores such -characters and prints a warning (to the collect_warning -object that must be passed when the parser is called).
The XML specification allows lines to be separated by single LF -characters, by CR LF character sequences, or by single CR -characters. Internally, these separators are always converted to single LF -characters.
The parser guarantees that there are never two adjacent data -nodes; if necessary, data material that would otherwise be represented by -several nodes is collapsed into one node. Note that you can still create node -trees with adjacent data nodes; however, the parser does not return such trees.
Note that CDATA sections are not represented specially; such -sections are added to the current data material that being collected for the -next data node.
Entities are not represented within -documents! If the parser finds an entity reference in the document -content, the reference is immediately expanded, and the parser reads the -expansion text instead of the reference.
As attribute -values are composed of Unicode characters, too, the same problems with the -character encoding arise as for character material. Attribute values are -converted to the internal encoding, too; and if there are characters that -cannot be represented, these are dropped, and a warning is printed.
Attribute values are normalized before they are returned by -methods like attribute. First, any remaining entity -references are expanded; if necessary, expansion is performed recursively. -Second, newline characters (any of LF, CR LF, or CR characters) are converted -to single space characters. Note that especially the latter action is -prescribed by the XML standard (but is not converted -such that it is still possible to include line feeds into attributes).
Processing instructions are parsed to some extent: The first word of the -PI is called the target, and it is stored separated from the rest of the PI: - -
<?target rest?>- -The exact location where a PI occurs is not represented (by default). The -parser puts the PI into the object that represents the embracing construct (an -element, a DTD, or the whole document); that means you can find out which PIs -occur in a certain element, in the DTD, or in the whole document, but you -cannot lookup the exact position within the construct.
If you require the exact location of PIs, it is possible to -create extra nodes for them. This mode is controled by the option -enable_pinstr_nodes. The additional nodes have the node type -T_pinstr target, and are created -from special exemplars contained in the spec (see -pxp_document.mli).
Normally, comments are not represented; they are dropped by -default. However, if you require them, it is possible to create -T_comment nodes for them. This mode can be specified by the -option enable_comment_nodes. Comment nodes are created from -special exemplars contained in the spec (see -pxp_document.mli). You can access the contents of comments through the -method comment.
These attributes are not supported specially; they are handled -like any other attribute.
Currently, there is no special support for namespaces. -However, the parser allows it that the colon occurs in names such that it is -possible to implement namespaces on top of the current API.
Some future release of PXP will support namespaces as built-in -feature...