X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fc1567.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fc1567.html;h=0000000000000000000000000000000000000000;hb=e108abe5c0b4eb841c4ad332229a6c0e57e70079;hp=ab88e87bf7a921858f63dcbc69a8763b458c976e;hpb=1456c337a60f6677ee742ff7891d43fc382359a9;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html b/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html deleted file mode 100644 index ab88e87bf..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html +++ /dev/null @@ -1,434 +0,0 @@ -
There are the following main functions invoking the parser (in Pxp_yacc): - -
parse_document_entity: You want to -parse a complete and closed document consisting of a DTD and the document body; -the body is validated against the DTD. This mode is interesting if you have a -file - -
<!DOCTYPE root ... [ ... ] > <root> ... </root>- -and you can accept any DTD that is included in the file (e.g. because the file -is under your control).
parse_wfdocument_entity: You want to -parse a complete and closed document consisting of a DTD and the document body; -but the body is not validated, only checked for well-formedness. This mode is -preferred if validation costs too much time or if the DTD is missing.
parse_dtd_entity: You want only to -parse an entity (file) containing the external subset of a DTD. Sometimes it is -interesting to read such a DTD, for example to compare it with the DTD included -in a document, or to apply the next mode:
parse_content_entity: You want only to -parse an entity (file) containing a fragment of a document body; this fragment -is validated against the DTD you pass to the function. Especially, the fragment -must not have a <!DOCTYPE> clause, and must directly -begin with an element. The element is validated against the DTD. This mode is -interesting if you want to check documents against a fixed, immutable DTD.
parse_wfcontent_entity: This function -also parses a single element without DTD, but does not validate it.
extract_dtd_from_document_entity: This -function extracts the DTD from a closed document consisting of a DTD and a -document body. Both the internal and the external subsets are extracted.
In many cases, parse_document_entity is the preferred mode -to parse a document in a validating way, and -parse_wfdocument_entity is the mode of choice to parse a -file while only checking for well-formedness.
There are a number of variations of these modes. One important application of a -parser is to check documents of an untrusted source against a fixed DTD. One -solution is to not allow the <!DOCTYPE> clause in -these documents, and treat the document like a fragment (using mode -parse_content_entity). This is very simple, but -inflexible; users of such a system cannot even define additional entities to -abbreviate frequent phrases of their text.
It may be necessary to have a more intelligent checker. For example, it is also -possible to parse the document to check fully, i.e. with DTD, and to compare -this DTD with the prescribed one. In order to fully parse the document, mode -parse_document_entity is applied, and to get the DTD to -compare with mode parse_dtd_entity can be used.
There is another very important configurable aspect of the parser: the -so-called resolver. The task of the resolver is to locate the contents of an -(external) entity for a given entity name, and to make the contents accessible -as a character stream. (Furthermore, it also normalizes the character set; -but this is a detail we can ignore here.) Consider you have a file called -"main.xml" containing - -
<!ENTITY % sub SYSTEM "sub/sub.xml"> -%sub;- -and a file stored in the subdirectory "sub" with name -"sub.xml" containing - -
<!ENTITY % subsub SYSTEM "subsub/subsub.xml"> -%subsub;- -and a file stored in the subdirectory "subsub" of -"sub" with name "subsub.xml" (the -contents of this file do not matter). Here, the resolver must track that -the second entity subsub is located in the directory -"sub/subsub", i.e. the difficulty is to interpret the -system (file) names of entities relative to the entities containing them, -even if the entities are deeply nested.
There is not a fixed resolver already doing everything right - resolving entity -names is a task that highly depends on the environment. The XML specification -only demands that SYSTEM entities are interpreted like URLs -(which is not very precise, as there are lots of URL schemes in use), hoping -that this helps overcoming the local peculiarities of the environment; the idea -is that if you do not know your environment you can refer to other entities by -denoting URLs for them. I think that this interpretation of -SYSTEM names may have some applications in the internet, but -it is not the first choice in general. Because of this, the resolver is a -separate module of the parser that can be exchanged by another one if -necessary; more precisely, the parser already defines several resolvers.
The following resolvers do already exist: - -
Resolvers reading from arbitrary input channels. These -can be configured such that a certain ID is associated with the channel; in -this case inner references to external entities can be resolved. There is also -a special resolver that interprets SYSTEM IDs as URLs; this resolver can -process relative SYSTEM names and determine the corresponding absolute URL.
A resolver that reads always from a given O'Caml -string. This resolver is not able to resolve further names unless the string is -not associated with any name, i.e. if the document contained in the string -refers to an external entity, this reference cannot be followed in this -case.
A resolver for file names. The SYSTEM -name is interpreted as file URL with the slash "/" as separator for -directories. - This resolver is derived from the generic URL resolver.
Note that the existing resolvers only interpret SYSTEM -names, not PUBLIC names. If it helps you, it is possible to -define resolvers for PUBLIC names, too; for example, such a -resolver could look up the public name in a hash table, and map it to a system -name which is passed over to the existing resolver for system names. It is -relatively simple to provide such a resolver.