X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1629.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1629.html;h=06b1e60ea5caac67fb51a5aabe19d8341e6a6735;hb=c03d2c1fdab8d228cb88aaba5ca0f556318bebc5;hp=0000000000000000000000000000000000000000;hpb=758057e85325f94cd88583feb1fdf6b038e35055;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html new file mode 100644 index 000000000..06b1e60ea --- /dev/null +++ b/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html @@ -0,0 +1,895 @@ +Resolvers and sources
The PXP user's guide
PrevChapter 4. Configuring and calling the parserNext

4.2. Resolvers and sources

4.2.1. Using the built-in resolvers (called sources)

The type source enumerates the two +possibilities where the document to parse comes from. + +

type source =
+    Entity of ((dtd -> Pxp_entity.entity) * Pxp_reader.resolver)
+  | ExtID of (ext_id * Pxp_reader.resolver)
+ +You normally need not to worry about this type as there are convenience +functions that create source values: + + +

4.2.2. The resolver API

A resolver is an object that can be opened like a file, but you +do not pass the file name to the resolver, but the XML identifier of the entity +to read from (either a SYSTEM or PUBLIC +clause). When opened, the resolver must return the +Lexing.lexbuf that reads the characters. The resolver can +be closed, and it can be cloned. Furthermore, it is possible to tell the +resolver which character set it should assume. - The following from Pxp_reader: + +

exception Not_competent
+exception Not_resolvable of exn
+
+class type resolver =
+  object
+    method init_rep_encoding : rep_encoding -> unit
+    method init_warner : collect_warnings -> unit
+    method rep_encoding : rep_encoding
+    method open_in : ext_id -> Lexing.lexbuf
+    method close_in : unit
+    method change_encoding : string -> unit
+    method clone : resolver
+    method close_all : unit
+  end
+ +The resolver object must work as follows:

Exceptions. It is possible to chain resolvers such that when the first resolver is not able +to open the entity, the other resolvers of the chain are tried in turn. The +method open_in should raise the exception +Not_competent to indicate that the next resolver should try +to open the entity. If the resolver is able to handle the ID, but some other +error occurs, the exception Not_resolvable should be raised +to force that the chain breaks. +

Example: How to define a resolver that is equivalent to +from_string: ...

4.2.3. Predefined resolver components

There are some classes in Pxp_reader that define common resolver behaviour. + +

class resolve_read_this_channel : 
+    ?id:ext_id -> 
+    ?fixenc:encoding -> 
+    ?auto_close:bool -> 
+    in_channel -> 
+        resolver
+ +Reads from the passed channel (it may be even a pipe). If the +~id argument is passed to the object, the created resolver +accepts only this ID. Otherwise all IDs are accepted. - Once the resolver has +been cloned, it does not accept any ID. This means that this resolver cannot +handle inner references to external entities. Note that you can combine this +resolver with another resolver that can handle inner references (such as +resolve_as_file); see class 'combine' below. - If you pass the +~fixenc argument, the encoding of the channel is set to the +passed value, regardless of any auto-recognition or any XML declaration. - If +~auto_close = true (which is the default), the channel is +closed after use. If ~auto_close = false, the channel is +left open. +

class resolve_read_any_channel : 
+    ?auto_close:bool -> 
+    channel_of_id:(ext_id -> (in_channel * encoding option)) -> 
+        resolver
+ +This resolver calls the function ~channel_of_id to open a +new channel for the passed ext_id. This function must either +return the channel and the encoding, or it must fail with Not_competent. The +function must return None as encoding if the default +mechanism to recognize the encoding should be used. It must return +Some e if it is already known that the encoding of the +channel is e. If ~auto_close = true +(which is the default), the channel is closed after use. If +~auto_close = false, the channel is left open.

class resolve_read_url_channel :
+    ?base_url:Neturl.url ->
+    ?auto_close:bool -> 
+    url_of_id:(ext_id -> Neturl.url) -> 
+    channel_of_url:(Neturl.url -> (in_channel * encoding option)) -> 
+        resolver
+ +When this resolver gets an ID to read from, it calls the function +~url_of_id to get the corresponding URL. This URL may be a +relative URL; however, a URL scheme must be used which contains a path. The +resolver converts the URL to an absolute URL if necessary. The second +function, ~channel_of_url, is fed with the absolute URL as +input. This function opens the resource to read from, and returns the channel +and the encoding of the resource.

Both functions, ~url_of_id and +~channel_of_url, can raise Not_competent to indicate that +the object is not able to read from the specified resource. However, there is a +difference: A Not_competent from ~url_of_id is left as it +is, but a Not_competent from ~channel_of_url is converted to +Not_resolvable. So only ~url_of_id decides which URLs are +accepted by the resolver and which not.

The function ~channel_of_url must return +None as encoding if the default mechanism to recognize the +encoding should be used. It must return Some e if it is +already known that the encoding of the channel is e.

If ~auto_close = true (which is the default), the channel is +closed after use. If ~auto_close = false, the channel is +left open.

Objects of this class contain a base URL relative to which relative URLs are +interpreted. When creating a new object, you can specify the base URL by +passing it as ~base_url argument. When an existing object is +cloned, the base URL of the clone is the URL of the original object. - Note +that the term "base URL" has a strict definition in RFC 1808.

class resolve_read_this_string : 
+    ?id:ext_id -> 
+    ?fixenc:encoding -> 
+    string -> 
+        resolver
+ +Reads from the passed string. If the ~id argument is passed +to the object, the created resolver accepts only this ID. Otherwise all IDs are +accepted. - Once the resolver has been cloned, it does not accept any ID. This +means that this resolver cannot handle inner references to external +entities. Note that you can combine this resolver with another resolver that +can handle inner references (such as resolve_as_file); see class 'combine' +below. - If you pass the ~fixenc argument, the encoding of +the string is set to the passed value, regardless of any auto-recognition or +any XML declaration.

class resolve_read_any_string : 
+    string_of_id:(ext_id -> (string * encoding option)) -> 
+        resolver
+ +This resolver calls the function ~string_of_id to get the +string for the passed ext_id. This function must either +return the string and the encoding, or it must fail with Not_competent. The +function must return None as encoding if the default +mechanism to recognize the encoding should be used. It must return +Some e if it is already known that the encoding of the +string is e.

class resolve_as_file :
+    ?file_prefix:[ `Not_recognized | `Allowed | `Required ] ->
+    ?host_prefix:[ `Not_recognized | `Allowed | `Required ] ->
+    ?system_encoding:encoding ->
+    ?url_of_id:(ext_id -> Neturl.url) -> 
+    ?channel_of_url: (Neturl.url -> (in_channel * encoding option)) ->
+    unit -> 
+        resolver
+Reads from the local file system. Every file name is interpreted as +file name of the local file system, and the referred file is read.

The full form of a file URL is: file://host/path, where +'host' specifies the host system where the file identified 'path' +resides. host = "" or host = "localhost" are accepted; other values +will raise Not_competent. The standard for file URLs is +defined in RFC 1738.

Option ~file_prefix: Specifies how the "file:" prefix of +file names is handled: +

Option ~host_prefix: Specifies how the "//host" phrase of +file names is handled: +

Option ~system_encoding: Specifies the encoding of file +names of the local file system. Default: UTF-8.

Options ~url_of_id, ~channel_of_url: Not +for the casual user!

class combine : 
+    ?prefer:resolver -> 
+    resolver list -> 
+        resolver
+ +Combines several resolver objects. If a concrete entity with an +ext_id is to be opened, the combined resolver tries the +contained resolvers in turn until a resolver accepts opening the entity +(i.e. it does not raise Not_competent on open_in).

Clones: If the 'clone' method is invoked before 'open_in', all contained +resolvers are cloned separately and again combined. If the 'clone' method is +invoked after 'open_in' (i.e. while the resolver is open), additionally the +clone of the active resolver is flagged as being preferred, i.e. it is tried +first.


PrevHomeNext
Configuring and calling the parserUpThe DTD classes
\ No newline at end of file