X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx107.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx107.html;h=0000000000000000000000000000000000000000;hb=c7514aaa249a96c5fdd39b1123fbdb38d92f20b6;hp=102aba218be483e4e202ba94d125fa3c705f478d;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x107.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x107.html deleted file mode 100644 index 102aba218..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x107.html +++ /dev/null @@ -1,1694 +0,0 @@ -
This section explains many of the features of XML, but not all, and some -features not in detail. For a complete description, see the XML -specification.
The DTD contains various declarations; in general you can only use a feature if -you have previously declared it. The document instance file may contain the -full DTD, but it is also possible to split the DTD into an internal and an -external subset. A document must begin as follows if the full DTD is included: - -
<?xml version="1.0" encoding="Your encoding"?> -<!DOCTYPE root [ - Declarations -]>- -These declarations are called the internal subset. Note -that the usage of entities and conditional sections is restricted within the -internal subset.
If the declarations are located in a different file, you can refer to this file -as follows: - -
<?xml version="1.0" encoding="Your encoding"?> -<!DOCTYPE root SYSTEM "file name">- -The declarations in the file are called the external -subset. The file name is called the system -identifier. -It is also possible to refer to the file by a so-called -public identifier, but most XML applications won't use -this feature.
You can also specify both internal and external subsets. In this case, the -declarations of both subsets are mixed, and if there are conflicts, the -declaration of the internal subset overrides those of the external subset with -the same name. This looks as follows: - -
<?xml version="1.0" encoding="Your encoding"?> -<!DOCTYPE root SYSTEM "file name" [ - Declarations -]>
The XML declaration (the string beginning with <?xml and -ending at ?>) should specify the encoding of the -file. Common values are UTF-8, and the ISO-8859 series of character sets. Note -that every file parsed by the XML processor can begin with an XML declaration -and that every file may have its own encoding.
The name of the root element must be mentioned directly after the -DOCTYPE string. This means that a full document instance -looks like - -
<?xml version="1.0" encoding="Your encoding"?> -<!DOCTYPE root SYSTEM "file name" [ - Declarations -]> - -<root> - inner contents -</root>
Some characters are generally reserved to indicate markup such that they cannot -be used for character data. These characters are <, >, and -&. Furthermore, single and double quotes are sometimes reserved. If you -want to include such a character as character, write it as follows: - -
< instead of <
> instead of >
& instead of &
' instead of '
" instead of "
&#n;- -where n is the decimal number of the -character. Alternatively, you can specify the character by its hexadecimal -number: - -
&#xn;- -In the scope of declarations, the character % is no longer free. To include it -as character, you must use the notations % or -%.
Note that besides <, >, &, -', and " there are no predefines character entities. This is -different from HTML which defines a list of characters that can be referenced -by name (e.g. ä for รค); however, if you prefer named characters, you -can declare such entities yourself (see below).
Elements structure the document instance in a hierarchical way. There is a -top-level element, the root element, which contains a -sequence of inner elements and character sections. The inner elements are -structured in the same way. Every element has an element -type. The beginning of the element is indicated by a start -tag, written - -
<element-type>- -and the element continues until the corresponding end tag -is reached: - -
</element-type>- -In XML, it is not allowed to omit start or end tags, even if the DTD would -permit this. Note that there are no special rules how to interpret spaces or -newlines near start or end tags; all spaces and newlines count.
Every element type must be declared before it can be used. The declaration -consists of two parts: the ELEMENT declaration describes the content model, -i.e. which inner elements are allowed; the ATTLIST declaration describes the -attributes of the element.
An element can simply allow everything as content. This is written: - -
<!ELEMENT name ANY>- -On the opposite, an element can be forced to be empty; declared by: - -
<!ELEMENT name EMPTY>- -Note that there is an abbreviated notation for empty element instances: -<name/>.
There are two more sophisticated forms of declarations: so-called -mixed declarations, and regular -expressions. An element with mixed content contains character data -interspersed with inner elements, and the set of allowed inner elements can be -specified. In contrast to this, a regular expression declaration does not allow -character data, but the inner elements can be described by the more powerful -means of regular expressions.
A declaration for mixed content looks as follows: - -
<!ELEMENT name (#PCDATA | element1 | ... | elementn )*>- -or if you do not want to allow any inner element, simply - -
<!ELEMENT name (#PCDATA)>
Example
If element type q is declared as - -
<!ELEMENT q (#PCDATA | r | s)*>- -this is a legal instance: - -<q>This is character data<r></r>with <s></s>inner elements</q>- -But this is illegal because t has not been enumerated in the -declaration: - -<q>This is character data<r></r>with <t></t>inner elements</q>
The other form uses a regular expression to describe the possible contents: - -
<!ELEMENT name regexp>- -The following well-known regexp operators are allowed: - -
element-name
(subexpr1 , ... , subexprn )
(subexpr1 | ... | subexprn )
subexpr*
subexpr+
subexpr?
The exact syntax of the regular expressions is rather strange. This can be -explained best by a list of constraints: - -
The outermost expression must not be -element-name.
Illegal: -<!ELEMENT x y>; this must be written as -<!ELEMENT x (y)>.
For the unary operators subexpr*, -subexpr+, and -subexpr?, the -subexpr must not be again an -unary operator.
Illegal: -<!ELEMENT x y**>; this must be written as -<!ELEMENT x (y*)*>.
Between ) and one of the unary operatory -*, +, or ?, there must -not be whitespace.
Illegal: -<!ELEMENT x (y|z) *>; this must be written as -<!ELEMENT x (y|z)*>.
There is the additional constraint that the -right parenthsis must be contained in the same entity as the left parenthesis; -see the section about parsed entities below.
Note that there is another restriction on regular expressions which must be -deterministic. This means that the parser must be able to see by looking at the -next token which alternative is actually used, or whether the repetition -stops. The reason for this is simply compatability with SGML (there is no -intrinsic reason for this rule; XML can live without this restriction).
Example
The elements are declared as follows: - -
<!ELEMENT q (r?, (s | t)+)> -<!ELEMENT r (#PCDATA)> -<!ELEMENT s EMPTY> -<!ELEMENT t (q | r)>- -This is a legal instance: - -<q><r>Some characters</r><s/></q>- -(Note: <s/> is an abbreviation for -<s></s>.) - -It would be illegal to leave <s/> out because at -least one instance of s or t must be -present. It would be illegal, too, if characters existed outside the -r element; the only exception is white space. -- This is -legal, too: - -<q><s/><t><q><s/></q></t></q>
Elements may have attributes. These are put into the start tag of an element as -follows: - -
<element-name attribute1="value1" ... attributen="valuen">- -Instead of -"valuek" -it is also possible to use single quotes as in -'valuek'. -Note that you cannot use double quotes literally within the value of the -attribute if double quotes are the delimiters; the same applies to single -quotes. You can generally not use < and & as characters in attribute -values. It is possible to include the paraphrases <, >, -&, ', and " (and any other reference to a general -entity as long as the entity is not defined by an external file) as well as -&#n;.
Before you can use an attribute you must declare it. An ATTLIST declaration -looks as follows: - -
<!ATTLIST element-name - attribute-name attribute-type attribute-default - ... - attribute-name attribute-type attribute-default ->- -There are a lot of types, but most important are: - -
CDATA: Every string is allowed as attribute value.
NMTOKEN: Every nametoken is allowed as attribute -value. Nametokens consist (mainly) of letters, digits, ., :, -, _ in arbitrary -order.
NMTOKENS: A space-separated list of nametokens is allowed as -attribute value.
#REQUIRED: The attribute must be specified.
#IMPLIED: The attribute can be specified but also can be -left out. The application can find out whether the attribute was present or -not.
"value" or -'value': This particular value is -used as default if the attribute is omitted in the element.
Example
This is a valid attribute declaration for element type r: - -
<!ATTLIST r - x CDATA #REQUIRED - y NMTOKEN #IMPLIED - z NMTOKENS "one two three">- -This means that x is a required attribute that cannot be -left out, while y and z are optional. The -XML parser indicates the application whether y is present or -not, but if z is missing the default value -"one two three" is returned automatically.This is a valid example of these attributes: - -
<r x="He said: "I don't like quotes!"" y='1'>
Elements describe the logical structure of the document, while -entities determine the physical structure. Entities are -the pieces of text the parser operates on, mostly files and macros. Entities -may be parsed in which case the parser reads the text and -interprets it as XML markup, or unparsed which simply -means that the data of the entity has a foreign format (e.g. a GIF icon).
If the parsed entity is going to be used as part of the DTD, it -is called a parameter entity. You can declare a parameter -entity with a fixed text as content by: - -
<!ENTITY % name "value">- -Within the DTD, you can refer to this entity, i.e. read -the text of the entity, by: - -
%name;- -Such entities behave like macros, i.e. when they are referred to, the -macro text is inserted and read instead of the original text. - -
- -If the contents of the entity are given as string constant, the entity is -called an internal entity. It is also possible to name a -file to be used as content (an external entity): - -Example
For example, you can declare two elements with the same content model by: - -
<!ENTITY % model "a | b | c"> -<!ELEMENT x (%model;)> -<!ELEMENT y (%model;)>
<!ENTITY % name SYSTEM "file name">- -There are some restrictions for parameter entities: - -
If the internal parameter entity contains the first token of a declaration -(i.e. <!), it must also contain the last token of the -declaration, i.e. the >. This means that the entity -either contains a whole number of complete declarations, or some text from the -middle of one declaration.
Illegal: -
<!ENTITY % e "(a | b | c)>"> -<!ELEMENT x %e;Because <! is contained in the main -entity, and the corresponding > is contained in the -entity e.
If the internal parameter entity contains a left paranthesis, it must also -contain the corresponding right paranthesis.
Illegal: -
<!ENTITY % e "(a | b | c"> -<!ELEMENT x %e;)>Because ( is contained in the entity -e, and the corresponding ) is -contained in the main entity.
When reading text from an entity, the parser automatically inserts one space -character before the entity text and one space character after the entity -text. However, this rule is not applied within the definition of another -entity.
Legal: -
-<!ENTITY % suffix "gif"> -<!ENTITY iconfile 'icon.%suffix;'>Because %suffix; is referenced within -the definition text for iconfile, no additional spaces are -added.
Illegal: -
<!ENTITY % suffix "test"> -<!ELEMENT x.%suffix; ANY>-Because %suffix; is referenced outside the definition -text of another entity, the parser replaces %suffix; by -spacetestspace.
Illegal: -
<!ENTITY % e "(a | b | c)"> -<!ELEMENT x %e;*>Because there is a whitespace between ) -and *, which is illegal.
An external parameter entity must always consist of a whole number of complete -declarations.
In the internal subset of the DTD, a reference to a parameter entity (internal -or external) is only allowed at positions where a new declaration can start.
If the parsed entity is going to be used in the document instance, it is called -a general entity. Such entities can be used as -abbreviations for frequent phrases, or to include external files. Internal -general entities are declared as follows: - -
<!ENTITY name "value">- -External general entities are declared this way: - -
<!ENTITY name SYSTEM "file name">- -References to general entities are written as: - -
&name;- -The main difference between parameter and general entities is that the former -are only recognized in the DTD and that the latter are only recognized in the -document instance. As the DTD is parsed before the document, the parameter -entities are expanded first; for example it is possible to use the content of a -parameter entity as the name of a general entity: -&%name;;[1].
General entities must respect the element hierarchy. This means that there must -be an end tag for every start tag in the entity value, and that end tags -without corresponding start tags are not allowed.
Example
If the author of a document changes sometimes, it is worthwhile to set up a -general entity containing the names of the authors. If the author changes, you -need only to change the definition of the entity, and do not need to check all -occurrences of authors' names: - -
<!ENTITY authors "Gerd Stolpmann">- -In the document text, you can now refer to the author names by writing -&authors;.Illegal: -The following two entities are illegal because the elements in the definition -do not nest properly: - -
<!ENTITY lengthy-tag "<section textcolor='white' background='graphic'>"> -<!ENTITY nonsense "<a></b>">
Earlier in this introduction we explained that there are substitutes for -reserved characters: <, >, &, ', and -". These are simply predefined general entities; note that they are -the only predefined entities. It is allowed to define these entities again -as long as the meaning is unchanged.
Unparsed entities have a foreign format and can thus not be read by the XML -parser. Unparsed entities are always external. The format of an unparsed entity -must have been declared, such a format is called a -notation. The entity can then be declared by referring to -this notation. As unparsed entities do not contain XML text, it is not possible -to include them directly into the document; you can only declare attributes -such that names of unparsed entities are acceptable values.
As you can see, unparsed entities are too complicated in order to have any -purpose. It is almost always better to simply pass the name of the data file as -normal attribute value, and let the application recognize and process the -foreign format.
[1] | This construct is only -allowed within the definition of another entity; otherwise extra spaces would -be added (as explained above). Such indirection is not recommended. Complete example: - <!ENTITY % variant "a"> <!-- or "b" --> -<!ENTITY text-a "This is text A."> -<!ENTITY text-b "This is text B."> -<!ENTITY text "&text-%variant;;">-You can now write &text; in the document instance, and -depending on the value of variant either -text-a or text-b is inserted. |