X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1818.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx1818.html;h=0000000000000000000000000000000000000000;hb=e108abe5c0b4eb841c4ad332229a6c0e57e70079;hp=b289a3674bf0ab67255cbf4a2d66c5dba3b6a89b;hpb=1456c337a60f6677ee742ff7891d43fc382359a9;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x1818.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x1818.html deleted file mode 100644 index b289a3674..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x1818.html +++ /dev/null @@ -1,779 +0,0 @@ -
Here a description of Pxp_yacc.
The following defaults are available: - -
val default_config : config -val default_extension : ('a node extension) as 'a -val default_spec : ('a node extension as 'a) spec
In the following, the term "closed document" refers to -an XML structure like - -
<!DOCTYPE ... [ declarations ] > -<root> -... -</root>- -The term "fragment" refers to an XML structure like - -
<root> -... -</root>- -i.e. only to one isolated element instance.
val parse_dtd_entity : config -> source -> dtd- -Parses the declarations which are contained in the entity, and returns them as -dtd object.
val extract_dtd_from_document_entity : config -> source -> dtd- -Extracts the DTD from a closed document. Both the internal and the external -subsets are extracted and combined to one dtd object. This -function does not parse the whole document, but only the parts that are -necessary to extract the DTD.
val parse_document_entity : - ?transform_dtd:(dtd -> dtd) -> - ?id_index:('ext index) -> - config -> - source -> - 'ext spec -> - 'ext document- -Parses a closed document and validates it against the DTD that is contained in -the document (internal and external subsets). The option -~transform_dtd can be used to transform the DTD in the -document, and to use the transformed DTD for validation. If -~id_index is specified, an index of all ID attributes is -created.
val parse_wfdocument_entity : - config -> - source -> - 'ext spec -> - 'ext document- -Parses a closed document, but checks it only on well-formedness.
val parse_content_entity : - ?id_index:('ext index) -> - config -> - source -> - dtd -> - 'ext spec -> - 'ext node- -Parses a fragment, and validates the element.
val parse_wfcontent_entity : - config -> - source -> - 'ext spec -> - 'ext node- -Parses a fragment, but checks it only on well-formedness.
type config = - { warner : collect_warnings; - errors_with_line_numbers : bool; - enable_pinstr_nodes : bool; - enable_super_root_node : bool; - enable_comment_nodes : bool; - encoding : rep_encoding; - recognize_standalone_declaration : bool; - store_element_positions : bool; - idref_pass : bool; - validate_by_dfa : bool; - accept_only_deterministic_models : bool; - ... - }- -
warner:The parser prints -warnings by invoking the method warn for this warner -object. (Default: all warnings are dropped)
errors_with_line_numbers:If -true, errors contain line numbers; if false, errors contain only byte -positions. The latter mode is faster. (Default: true)
enable_pinstr_nodes:If true, -the parser creates extra nodes for processing instructions. If false, -processing instructions are simply added to the element or document surrounding -the instructions. (Default: false)
enable_super_root_node:If -true, the parser creates an extra node which is the parent of the root of the -document tree. This node is called super root; it is an element with type -T_super_root. - If there are processing instructions outside -the root element and outside the DTD, they are added to the super root instead -of the document. - If false, the super root node is not created. (Default: -false)
enable_comment_nodes:If true, -the parser creates nodes for comments with type T_comment; -if false, such nodes are not created. (Default: false)
encoding:Specifies the -internal encoding of the parser. Most strings are then represented according to -this encoding; however there are some exceptions (especially -ext_id values which are always UTF-8 encoded). -(Default: `Enc_iso88591)
recognize_standalone_declaration: If true and if the parser is -validating, the standalone="yes" declaration forces that it -is checked whether the document is a standalone document. - If false, or if the -parser is in well-formedness mode, such declarations are ignored. -(Default: true)
store_element_positions: If -true, for every non-data node the source position is stored. If false, the -position information is lost. If available, you can get the positions of nodes -by invoking the position method. -(Default: true)
idref_pass:If true and if -there is an ID index, the parser checks whether every IDREF or IDREFS attribute -refer to an existing node; this requires that the parser traverses the whole -doument tree. If false, this check is left out. (Default: false)
validate_by_dfa:If true and if -the content model for an element type is deterministic, a deterministic finite -automaton is used to validate whether the element contents match the content -model of the type. If false, or if a DFA is not available, a backtracking -algorithm is used for validation. (Default: true)
accept_only_deterministic_models: If true, only deterministic content -models are accepted; if false, any syntactically correct content models can be -processed. (Default: true)
First, I recommend to vary the default configuration instead of -creating a new configuration record. For instance, to set -idref_pass to true, change the default -as in: -
let config = { default_config with idref_pass = true }-The background is that I can add more options to the record in future versions -of the parser without breaking your programs.
Do I need extra nodes for processing instructions? By default, such nodes are not created. This does not mean that the -processing instructions are lost; however, you cannot find out the exact -location where they occur. For example, the following XML text - -
<x><?pi1?><y/><?pi2?></x>- -will normally create one element node for x containing -one subnode for y. The processing -instructions are attached to x in a separate hash table; you -can access them using x # pinstr "pi1" and x # -pinstr "pi2", respectively. The information is lost where the -instructions occur within x.
If the option enable_pinstr_nodes is -turned on, the parser creates extra nodes pi1 and -pi2 such that the subnodes of x are now: - -
x # sub_nodes = [ pi1; y; pi2 ]- -The extra nodes contain the processing instructions in the usual way, i.e. you -can access them using pi1 # pinstr "pi1" and pi2 # -pinstr "pi2", respectively.
Note that you will need an exemplar for the PI nodes (see -make_spec_from_alist).
Do I need a super root node? By default, there is no super root node. The -document object refers directly to the node representing the -root element of the document, i.e. - -
doc # root = r- -if r is the root node. This is sometimes inconvenient: (1) -Some algorithms become simpler if every node has a parent, even the root -node. (2) Some standards such as XPath call the "root node" the node whose -child represents the root of the document. (3) The super root node can serve -as a container for processing instructions outside the root element. Because of -these reasons, it is possible to create an extra super root node, whose child -is the root node: - -
doc # root = sr && -sr # sub_nodes = [ r ]- -When extra nodes are also created for processing instructions, these nodes can -be added to the super root node if they occur outside the root element (reason -(3)), and the order reflects the order in the source text.
Note that you will need an exemplar for the super root node -(see make_spec_from_alist).
What is the effect of the UTF-8 encoding? By default, the parser represents strings (with few -exceptions) as ISO-8859-1 strings. These are well-known, and there are tools -and fonts for this encoding.
However, internationalization may require that you switch over -to UTF-8 encoding. In most environments, the immediate effect will be that you -cannot read strings with character codes >= 160 any longer; your terminal will -only show funny glyph combinations. It is strongly recommended to install -Unicode fonts (GNU Unifont, -Markus Kuhn's fonts) and terminal emulators -that can handle UTF-8 byte sequences. Furthermore, a Unicode editor may -be helpful (such as Yudit). There are -also FAQ by -Markus Kuhn.
By setting encoding to -`Enc_utf8 all strings originating from the parsed XML -document are represented as UTF-8 strings. This includes not only character -data and attribute values but also element names, attribute names and so on, as -it is possible to use any Unicode letter to form such names. Strictly -speaking, PXP is only XML-compliant if the UTF-8 mode is used; otherwise it -will have difficulties when validating documents containing -non-ISO-8859-1-names.
This mode does not have any impact on the external -representation of documents. The character set assumed when reading a document -is set in the XML declaration, and character set when writing a document must -be passed to the write method.
How do I check that nodes exist which are referred by IDREF attributes? First, you must create an index of all occurring ID -attributes: - -
let index = new hash_index- -This index must be passed to the parsing function: - -
parse_document_entity - ~id_index:(index :> index) - config source spec- -Next, you must turn on the idref_pass mode: - -
let config = { default_config with idref_pass = true }- -Note that now the whole document tree will be traversed, and every node will be -checked for IDREF and IDREFS attributes. If the tree is big, this may take some -time.
What are deterministic content models? These type of models can speed up the validation checks; -furthermore they ensure SGML-compatibility. In particular, a content model is -deterministic if the parser can determine the actually used alternative by -inspecting only the current token. For example, this element has -non-deterministic contents: - -
<!ELEMENT x ((u,v) | (u,y+) | v)>- -If the first element in x is u, the -parser does not know which of the alternatives (u,v) or -(u,y+) will work; the parser must also inspect the second -element to be able to distinguish between the alternatives. Because such -look-ahead (or "guessing") is required, this example is -non-deterministic.
The XML standard demands that content models must be -deterministic. So it is recommended to turn the option -accept_only_deterministic_models on; however, PXP can also -process non-deterministic models using a backtracking algorithm.
Deterministic models ensure that validation can be performed in -linear time. In order to get the maximum benefits, PXP also implements a -special validator that profits from deterministic models; this is the -deterministic finite automaton (DFA). This validator is enabled per element -type if the element type has a deterministic model and if the option -validate_by_dfa is turned on.
In general, I expect that the DFA method is faster than the -backtracking method; especially in the worst case the DFA takes only linear -time. However, if the content model has only few alternatives and the -alternatives do not nest, the backtracking algorithm may be better.