X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx550.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx550.html;h=0000000000000000000000000000000000000000;hb=3ef089a4c58fbe429dd539af6215991ecbe11ee2;hp=f2dcdd79b66940c8fef7e250ca3486057c7b47f4;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html deleted file mode 100644 index f2dcdd79b..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html +++ /dev/null @@ -1,765 +0,0 @@ -
Let me first give a rough overview of the object model of the parser. The -following items are represented by objects: - -
Documents: The document representation is more or less the -anchor for the application; all accesses to the parsed entities start here. It -is described by the class document contained in the module -Pxp_document. You can get some global information, such -as the XML declaration the document begins with, the DTD of the document, -global processing instructions, and most important, the document tree.
The contents of documents: The contents have the structure -of a tree: Elements contain other elements and text[1]. - -The common type to represent both kinds of content is node -which is a class type that unifies the properties of elements and character -data. Every node has a list of children (which is empty if the element is empty -or the node represents text); nodes may have attributes; nodes have always text -contents. There are two implementations of node, the class -element_impl for elements, and the class -data_impl for text data. You find these classes and class -types in the module Pxp_document, too.
Note that attribute lists are represented by non-class values.
The node extension: For advanced usage, every node of the -document may have an associated extension which is simply -a second object. This object must have the three methods -clone, node, and -set_node as bare minimum, but you are free to add methods as -you want. This is the preferred way to add functionality to the document -tree[2]. The class type extension is -defined in Pxp_document, too.
The DTD: Sometimes it is necessary to access the DTD of a -document; the average application does not need this feature. The class -dtd describes DTDs, and makes it possible to get -representations of element, entity, and notation declarations as well as -processing instructions contained in the DTD. This class, and -dtd_element, dtd_notation, and -proc_instruction can be found in the module -Pxp_dtd. There are a couple of classes representing -different kinds of entities; these can be found in the module -Pxp_entity.
Pxp_yacc: Here the main parsing functions such as -parse_document_entity are located. Some additional types and -functions allow the parser to be configured in a non-standard way.
Pxp_types: This is a collection of basic types and -exceptions.
Let the document to be parsed be stored in a file called -doc.xml. The parsing process is started by calling the -function - -
val parse_document_entity : config -> source -> 'ext spec -> 'ext document- -defined in the module Pxp_yacc. The first argument -specifies some global properties of the parser; it is recommended to start with -the default_config. The second argument determines where the -document to be parsed comes from; this may be a file, a channel, or an entity -ID. To parse doc.xml, it is sufficient to pass -from_file "doc.xml".
The third argument passes the object specification to use. Roughly -speaking, it determines which classes implement the node objects of which -element types, and which extensions are to be used. The 'ext -polymorphic variable is the type of the extension. For the moment, let us -simply pass default_spec as this argument, and ignore it.
So the following expression parses doc.xml: - -
open Pxp_yacc -let d = parse_document_entity default_config (from_file "doc.xml") default_spec- -Note that default_config implies that warnings are collected -but not printed. Errors raise one of the exception defined in -Pxp_types; to get readable errors and warnings catch the -exceptions as follows: - -
class warner = - object - method warn w = - print_endline ("WARNING: " ^ w) - end -;; - -try - let config = { default_config with warner = new warner } in - let d = parse_document_entity config (from_file "doc.xml") default_spec - in - ... -with - e -> - print_endline (Pxp_types.string_of_exn e)- -Now d is an object of the document -class. If you want the node tree, you can get the root element by - -
let root = d # root- -and if you would rather like to access the DTD, determine it by - -
let dtd = d # dtd- -As it is more interesting, let us investigate the node tree now. Given the root -element, it is possible to recursively traverse the whole tree. The children of -a node n are returned by the method -sub_nodes, and the type of a node is returned by -node_type. This function traverses the tree, and prints the -type of each node: - -
let rec print_structure n = - let ntype = n # node_type in - match ntype with - T_element name -> - print_endline ("Element of type " ^ name); - let children = n # sub_nodes in - List.iter print_structure children - | T_data -> - print_endline "Data" - | _ -> - (* Other node types are not possible unless the parser is configured - differently. - *) - assert false- -You can call this function by - -
print_structure root- -The type returned by node_type is either T_element -name or T_data. The name of the -element type is the string included in the angle brackets. Note that only -elements have children; data nodes are always leaves of the tree.
There are some more methods in order to access a parsed node tree: - -
n # parent: Returns the parent node, or raises -Not_found if the node is already the root
n # root: Returns the root of the node tree.
n # attribute a: Returns the value of the attribute with -name a. The method returns a value for every -declared attribute, independently of whether the attribute -instance is defined or not. If the attribute is not declared, -Not_found will be raised. (In well-formedness mode, every -attribute is considered as being implicitly declared with type -CDATA.)
The following return values are possible: Value s, -Valuelist sl , and Implied_value. -The first two value types indicate that the attribute value is available, -either because there is a definition -a="value" -in the XML text, or because there is a default value (declared in the -DTD). Only if both the instance definition and the default declaration are -missing, the latter value Implied_value will be returned.
In the DTD, every attribute is typed. There are single-value types (CDATA, ID, -IDREF, ENTITY, NMTOKEN, enumerations), in which case the method passes -Value s back, where s is the normalized -string value of the attribute. The other types (IDREFS, ENTITIES, NMTOKENS) -represent list values, and the parser splits the XML literal into several -tokens and returns these tokens as Valuelist sl.
Normalization means that entity references (the -&name; tokens) and -character references -(&#number;) are replaced -by the text they represent, and that white space characters are converted into -plain spaces.
n # data: Returns the character data contained in the -node. For data nodes, the meaning is obvious as this is the main content of -data nodes. For element nodes, this method returns the concatenated contents of -all inner data nodes.
Note that entity references included in the text are resolved while they are -being parsed; for example the text "a <> b" will be returned -as "a <> b" by this method. Spaces of data nodes are always -preserved. Newlines are preserved, but always converted to \n characters even -if newlines are encoded as \r\n or \r. Normally you will never see two adjacent -data nodes because the parser collapses all data material at one location into -one node. (However, if you create your own tree or transform the parsed tree, -it is possible to have adjacent data nodes.)
Note that elements that do not allow #PCDATA as content -will not have data nodes as children. This means that spaces and newlines, the -only character material allowed for such elements, are silently dropped.
let rec print_valuable_prio1 n = - let ntype = n # node_type in - match ntype with - T_element "valuable" when n # attribute "priority" = Value "1" -> - print_endline "Valuable node with priotity 1 found:"; - print_endline (n # data) - | (T_element _ | T_data) -> - let children = n # sub_nodes in - List.iter print_valuable_prio1 children - | _ -> - assert false- -You can call this function by: - -
print_valuable_prio1 root- -If you like a DSSSL-like style, you can make the function -process_children explicit: - -
let rec print_valuable_prio1 n = - - let process_children n = - let children = n # sub_nodes in - List.iter print_valuable_prio1 children - in - - let ntype = n # node_type in - match ntype with - T_element "valuable" when n # attribute "priority" = Value "1" -> - print_endline "Valuable node with priority 1 found:"; - print_endline (n # data) - | (T_element _ | T_data) -> - process_children n - | _ -> - assert false- -So far, O'Caml is now a simple "style-sheet language": You can form a big -"match" expression to distinguish between all significant cases, and provide -different reactions on different conditions. But this technique has -limitations; the "match" expression tends to get larger and larger, and it is -difficult to store intermediate values as there is only one big -recursion. Alternatively, it is also possible to represent the various cases as -classes, and to use dynamic method lookup to find the appropiate class. The -next section explains this technique in detail.
[1] | Elements may -also contain processing instructions. Unlike other document models, PXP -separates processing instructions from the rest of the text and provides a -second interface to access them (method pinstr). However, -there is a parser option (enable_pinstr_nodes) which changes -the behaviour of the parser such that extra nodes for processing instructions -are included into the tree. Furthermore, the tree does normally not contain nodes for XML comments; -they are ignored by default. Again, there is an option -(enable_comment_nodes) changing this. |
[2] | Due to the typing system it is more or less impossible to -derive recursive classes in O'Caml. To get around this, it is common practice -to put the modifiable or extensible part of recursive objects into parallel -objects. |