X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fc1567.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fc1567.html;h=0000000000000000000000000000000000000000;hb=c7514aaa249a96c5fdd39b1123fbdb38d92f20b6;hp=ab88e87bf7a921858f63dcbc69a8763b458c976e;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html b/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html deleted file mode 100644 index ab88e87bf..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/c1567.html +++ /dev/null @@ -1,434 +0,0 @@ -Configuring and calling the parser
The PXP user's guide
PrevNext

Chapter 4. Configuring and calling the parser

Table of Contents
4.1. Overview
4.2. Resolvers and sources
4.3. The DTD classes
4.4. Invoking the parser
4.5. Updates

4.1. Overview

There are the following main functions invoking the parser (in Pxp_yacc): - -

In many cases, parse_document_entity is the preferred mode -to parse a document in a validating way, and -parse_wfdocument_entity is the mode of choice to parse a -file while only checking for well-formedness.

There are a number of variations of these modes. One important application of a -parser is to check documents of an untrusted source against a fixed DTD. One -solution is to not allow the <!DOCTYPE> clause in -these documents, and treat the document like a fragment (using mode -parse_content_entity). This is very simple, but -inflexible; users of such a system cannot even define additional entities to -abbreviate frequent phrases of their text.

It may be necessary to have a more intelligent checker. For example, it is also -possible to parse the document to check fully, i.e. with DTD, and to compare -this DTD with the prescribed one. In order to fully parse the document, mode -parse_document_entity is applied, and to get the DTD to -compare with mode parse_dtd_entity can be used.

There is another very important configurable aspect of the parser: the -so-called resolver. The task of the resolver is to locate the contents of an -(external) entity for a given entity name, and to make the contents accessible -as a character stream. (Furthermore, it also normalizes the character set; -but this is a detail we can ignore here.) Consider you have a file called -"main.xml" containing - -

<!ENTITY % sub SYSTEM "sub/sub.xml">
-%sub;
- -and a file stored in the subdirectory "sub" with name -"sub.xml" containing - -
<!ENTITY % subsub SYSTEM "subsub/subsub.xml">
-%subsub;
- -and a file stored in the subdirectory "subsub" of -"sub" with name "subsub.xml" (the -contents of this file do not matter). Here, the resolver must track that -the second entity subsub is located in the directory -"sub/subsub", i.e. the difficulty is to interpret the -system (file) names of entities relative to the entities containing them, -even if the entities are deeply nested.

There is not a fixed resolver already doing everything right - resolving entity -names is a task that highly depends on the environment. The XML specification -only demands that SYSTEM entities are interpreted like URLs -(which is not very precise, as there are lots of URL schemes in use), hoping -that this helps overcoming the local peculiarities of the environment; the idea -is that if you do not know your environment you can refer to other entities by -denoting URLs for them. I think that this interpretation of -SYSTEM names may have some applications in the internet, but -it is not the first choice in general. Because of this, the resolver is a -separate module of the parser that can be exchanged by another one if -necessary; more precisely, the parser already defines several resolvers.

The following resolvers do already exist: - -

- -The interface a resolver must have is documented, so it is possible to write -your own resolver. For example, you could connect the parser with an HTTP -client, and resolve URLs of the HTTP namespace. The resolver classes support -that several independent resolvers are combined to one more powerful resolver; -thus it is possible to combine a self-written resolver with the already -existing resolvers.

Note that the existing resolvers only interpret SYSTEM -names, not PUBLIC names. If it helps you, it is possible to -define resolvers for PUBLIC names, too; for example, such a -resolver could look up the public name in a hash table, and map it to a system -name which is passed over to the existing resolver for system names. It is -relatively simple to provide such a resolver.


PrevHomeNext
Details of the mapping from XML text to the tree representationUpResolvers and sources
\ No newline at end of file