X-Git-Url: http://matita.cs.unibo.it/gitweb/?p=helm.git;a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1629.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1629.html;h=0000000000000000000000000000000000000000;hp=06b1e60ea5caac67fb51a5aabe19d8341e6a6735;hb=3ef089a4c58fbe429dd539af6215991ecbe11ee2;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html deleted file mode 100644 index 06b1e60ea..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x1629.html +++ /dev/null @@ -1,895 +0,0 @@ -
The type source enumerates the two -possibilities where the document to parse comes from. - -
type source = - Entity of ((dtd -> Pxp_entity.entity) * Pxp_reader.resolver) - | ExtID of (ext_id * Pxp_reader.resolver)- -You normally need not to worry about this type as there are convenience -functions that create source values: - - -
from_file s: The document is read from -file s; you may specify absolute or relative path names. -The file name must be encoded as UTF-8 string.
There is an optional argument ~system_encoding -specifying the character encoding which is used for the names of the file -system. For example, if this encoding is ISO-8859-1 and s is -also a ISO-8859-1 string, you can form the source: - -
let s_utf8 = recode_string ~in_enc:`Enc_iso88591 ~out_enc:`Enc_utf8 s in -from_file ~system_encoding:`Enc_iso88591 s_utf8
This source has the advantage that -it is able to resolve inner external entities; i.e. if your document includes -data from another file (using the SYSTEM attribute), this -mode will find that file. However, this mode cannot resolve -PUBLIC identifiers nor SYSTEM identifiers -other than "file:".
from_channel ch: The document is read -from the channel ch. In general, this source also supports -file URLs found in the document; however, by default only absolute URLs are -understood. It is possible to associate an ID with the channel such that the -resolver knows how to interpret relative URLs: - -
from_channel ~id:(System "file:///dir/dir1/") ch- -There is also the ~system_encoding argument specifying how file names are -encoded. - The example from above can also be written (but it is no -longer possible to interpret relative URLs because there is no ~id argument, -and computing this argument is relatively complicated because it must -be a valid URL): - -
let ch = open_in s in -let src = from_channel ~system_encoding:`Enc_iso88591 ch in -...; -close_in ch
from_string s: The string -s is the document to parse. This mode is not able to -interpret file names of SYSTEM clauses, nor it can look up -PUBLIC identifiers.
Normally, the encoding of the string is detected as usual -by analyzing the XML declaration, if any. However, it is also possible to -specify the encoding directly: - -
let src = from_string ~fixenc:`ISO-8859-2 s
ExtID (id, r): The document to parse -is denoted by the identifier id (either a -SYSTEM or PUBLIC clause), and this -identifier is interpreted by the resolver r. Use this mode -if you have written your own resolver.
Which character sets are possible depends on the passed -resolver r.
Entity (get_entity, r): The document -to parse is returned by the function invocation get_entity -dtd, where dtd is the DTD object to use (it may be -empty). Inner external references occuring in this entity are resolved using -the resolver r.
Which character sets are possible depends on the passed -resolver r.
A resolver is an object that can be opened like a file, but you -do not pass the file name to the resolver, but the XML identifier of the entity -to read from (either a SYSTEM or PUBLIC -clause). When opened, the resolver must return the -Lexing.lexbuf that reads the characters. The resolver can -be closed, and it can be cloned. Furthermore, it is possible to tell the -resolver which character set it should assume. - The following from Pxp_reader: - -
exception Not_competent -exception Not_resolvable of exn - -class type resolver = - object - method init_rep_encoding : rep_encoding -> unit - method init_warner : collect_warnings -> unit - method rep_encoding : rep_encoding - method open_in : ext_id -> Lexing.lexbuf - method close_in : unit - method change_encoding : string -> unit - method clone : resolver - method close_all : unit - end- -The resolver object must work as follows:
When the parser is called, it tells the resolver the -warner object and the internal encoding by invoking -init_warner and init_rep_encoding. The -resolver should store these values. The method rep_encoding -should return the internal encoding.
If the parser wants to read from the resolver, it invokes -the method open_in. Either the resolver succeeds, in which -case the Lexing.lexbuf reading from the file or stream must -be returned, or opening fails. In the latter case the method implementation -should raise an exception (see below).
If the parser finishes reading, it calls the -close_in method.
If the parser finds a reference to another external -entity in the input stream, it calls clone to get a second -resolver which must be initially closed (not yet connected with an input -stream). The parser then invokes open_in and the other -methods as described.
If you already know the character set of the input -stream, you should recode it to the internal encoding, and define the method -change_encoding as an empty method.
If you want to support multiple external character sets, -the object must follow a much more complicated protocol. Directly after -open_in has been called, the resolver must return a lexical -buffer that only reads one byte at a time. This is only possible if you create -the lexical buffer with Lexing.from_function; the function -must then always return 1 if the EOF is not yet reached, and 0 if EOF is -reached. If the parser has read the first line of the document, it will invoke -change_encoding to tell the resolver which character set to -assume. From this moment, the object can return more than one byte at once. The -argument of change_encoding is either the parameter of the -"encoding" attribute of the XML declaration, or the empty string if there is -not any XML declaration or if the declaration does not contain an encoding -attribute.
At the beginning the resolver must only return one -character every time something is read from the lexical buffer. The reason for -this is that you otherwise would not exactly know at which position in the -input stream the character set changes.
If you want automatic recognition of the character set, -it is up to the resolver object to implement this.
If an error occurs, the parser calls the method -close_all for the top-level resolver; this method should -close itself (if not already done) and all clones.
Exceptions. It is possible to chain resolvers such that when the first resolver is not able -to open the entity, the other resolvers of the chain are tried in turn. The -method open_in should raise the exception -Not_competent to indicate that the next resolver should try -to open the entity. If the resolver is able to handle the ID, but some other -error occurs, the exception Not_resolvable should be raised -to force that the chain breaks. -
Example: How to define a resolver that is equivalent to -from_string: ...
There are some classes in Pxp_reader that define common resolver behaviour. - -
class resolve_read_this_channel : - ?id:ext_id -> - ?fixenc:encoding -> - ?auto_close:bool -> - in_channel -> - resolver- -Reads from the passed channel (it may be even a pipe). If the -~id argument is passed to the object, the created resolver -accepts only this ID. Otherwise all IDs are accepted. - Once the resolver has -been cloned, it does not accept any ID. This means that this resolver cannot -handle inner references to external entities. Note that you can combine this -resolver with another resolver that can handle inner references (such as -resolve_as_file); see class 'combine' below. - If you pass the -~fixenc argument, the encoding of the channel is set to the -passed value, regardless of any auto-recognition or any XML declaration. - If -~auto_close = true (which is the default), the channel is -closed after use. If ~auto_close = false, the channel is -left open. -
class resolve_read_any_channel : - ?auto_close:bool -> - channel_of_id:(ext_id -> (in_channel * encoding option)) -> - resolver- -This resolver calls the function ~channel_of_id to open a -new channel for the passed ext_id. This function must either -return the channel and the encoding, or it must fail with Not_competent. The -function must return None as encoding if the default -mechanism to recognize the encoding should be used. It must return -Some e if it is already known that the encoding of the -channel is e. If ~auto_close = true -(which is the default), the channel is closed after use. If -~auto_close = false, the channel is left open.
class resolve_read_url_channel : - ?base_url:Neturl.url -> - ?auto_close:bool -> - url_of_id:(ext_id -> Neturl.url) -> - channel_of_url:(Neturl.url -> (in_channel * encoding option)) -> - resolver- -When this resolver gets an ID to read from, it calls the function -~url_of_id to get the corresponding URL. This URL may be a -relative URL; however, a URL scheme must be used which contains a path. The -resolver converts the URL to an absolute URL if necessary. The second -function, ~channel_of_url, is fed with the absolute URL as -input. This function opens the resource to read from, and returns the channel -and the encoding of the resource.
Both functions, ~url_of_id and -~channel_of_url, can raise Not_competent to indicate that -the object is not able to read from the specified resource. However, there is a -difference: A Not_competent from ~url_of_id is left as it -is, but a Not_competent from ~channel_of_url is converted to -Not_resolvable. So only ~url_of_id decides which URLs are -accepted by the resolver and which not.
The function ~channel_of_url must return -None as encoding if the default mechanism to recognize the -encoding should be used. It must return Some e if it is -already known that the encoding of the channel is e.
If ~auto_close = true (which is the default), the channel is -closed after use. If ~auto_close = false, the channel is -left open.
Objects of this class contain a base URL relative to which relative URLs are -interpreted. When creating a new object, you can specify the base URL by -passing it as ~base_url argument. When an existing object is -cloned, the base URL of the clone is the URL of the original object. - Note -that the term "base URL" has a strict definition in RFC 1808.
class resolve_read_this_string : - ?id:ext_id -> - ?fixenc:encoding -> - string -> - resolver- -Reads from the passed string. If the ~id argument is passed -to the object, the created resolver accepts only this ID. Otherwise all IDs are -accepted. - Once the resolver has been cloned, it does not accept any ID. This -means that this resolver cannot handle inner references to external -entities. Note that you can combine this resolver with another resolver that -can handle inner references (such as resolve_as_file); see class 'combine' -below. - If you pass the ~fixenc argument, the encoding of -the string is set to the passed value, regardless of any auto-recognition or -any XML declaration.
class resolve_read_any_string : - string_of_id:(ext_id -> (string * encoding option)) -> - resolver- -This resolver calls the function ~string_of_id to get the -string for the passed ext_id. This function must either -return the string and the encoding, or it must fail with Not_competent. The -function must return None as encoding if the default -mechanism to recognize the encoding should be used. It must return -Some e if it is already known that the encoding of the -string is e.
class resolve_as_file : - ?file_prefix:[ `Not_recognized | `Allowed | `Required ] -> - ?host_prefix:[ `Not_recognized | `Allowed | `Required ] -> - ?system_encoding:encoding -> - ?url_of_id:(ext_id -> Neturl.url) -> - ?channel_of_url: (Neturl.url -> (in_channel * encoding option)) -> - unit -> - resolver-Reads from the local file system. Every file name is interpreted as -file name of the local file system, and the referred file is read.
The full form of a file URL is: file://host/path, where -'host' specifies the host system where the file identified 'path' -resides. host = "" or host = "localhost" are accepted; other values -will raise Not_competent. The standard for file URLs is -defined in RFC 1738.
Option ~file_prefix: Specifies how the "file:" prefix of -file names is handled: -
`Not_recognized:The prefix is not -recognized.
`Allowed: The prefix is allowed but -not required (the default).
`Required: The prefix is -required.
Option ~host_prefix: Specifies how the "//host" phrase of -file names is handled: -
`Not_recognized:The prefix is not -recognized.
`Allowed: The prefix is allowed but -not required (the default).
`Required: The prefix is -required.
Option ~system_encoding: Specifies the encoding of file -names of the local file system. Default: UTF-8.
Options ~url_of_id, ~channel_of_url: Not -for the casual user!
class combine : - ?prefer:resolver -> - resolver list -> - resolver- -Combines several resolver objects. If a concrete entity with an -ext_id is to be opened, the combined resolver tries the -contained resolvers in turn until a resolver accepts opening the entity -(i.e. it does not raise Not_competent on open_in).
Clones: If the 'clone' method is invoked before 'open_in', all contained -resolvers are cloned separately and again combined. If the 'clone' method is -invoked after 'open_in' (i.e. while the resolver is open), additionally the -clone of the active resolver is flagged as being preferred, i.e. it is tried -first.