X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx675.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx675.html;h=cf3f4737ce506b5f6385bf097cd7614209e72a3c;hb=c03d2c1fdab8d228cb88aaba5ca0f556318bebc5;hp=0000000000000000000000000000000000000000;hpb=758057e85325f94cd88583feb1fdf6b038e35055;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x675.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x675.html new file mode 100644 index 000000000..cf3f4737c --- /dev/null +++ b/helm/DEVEL/pxp/pxp/doc/manual/html/x675.html @@ -0,0 +1,538 @@ +
By default, the parsed node tree consists of objects of the same class; this is +a good design as long as you want only to access selected parts of the +document. For complex transformations, it may be better to use different +classes for objects describing different element types.
For example, if the DTD declares the element types a, +b, and c, and if the task is to convert +an arbitrary document into a printable format, the idea is to define for every +element type a separate class that has a method print. The +classes are eltype_a, eltype_b, and +eltype_c, and every class implements +print such that elements of the type corresponding to the +class are converted to the output format.
The parser supports such a design directly. As it is impossible to derive +recursive classes in O'Caml[1], the specialized element classes cannot be formed by +simply inheriting from the built-in classes of the parser and adding methods +for customized functionality. To get around this limitation, every node of the +document tree is represented by two objects, one called +"the node" and containing the recursive definition of the tree, one called "the +extension". Every node object has a reference to the extension, and the +extension has a reference to the node. The advantage of this model is that it +is now possible to customize the extension without affecting the typing +constraints of the recursive node definition.
Every extension must have the three methods clone, +node, and set_node. The method +clone creates a deep copy of the extension object and +returns it; node returns the node object for this extension +object; and set_node is used to tell the extension object +which node is associated with it, this method is automatically called when the +node tree is initialized. The following definition is a good starting point +for these methods; usually clone must be further refined +when instance variables are added to the class: + +
class custom_extension = + object (self) + + val mutable node = (None : custom_extension node option) + + method clone = {< >} + method node = + match node with + None -> + assert false + | Some n -> n + method set_node n = + node <- Some n + + end+ +This part of the extension is usually the same for all classes, so it is a good +idea to consider custom_extension as the super-class of the +further class definitions. Continuining the example of above, we can define the +element type classes as follows: + +
class virtual custom_extension = + object (self) + ... clone, node, set_node defined as above ... + + method virtual print : out_channel -> unit + end + +class eltype_a = + object (self) + inherit custom_extension + method print ch = ... + end + +class eltype_b = + object (self) + inherit custom_extension + method print ch = ... + end + +class eltype_c = + object (self) + inherit custom_extension + method print ch = ... + end+ +The method print can now be implemented for every element +type separately. Note that you get the associated node by invoking + +
self # node+ +and you get the extension object of a node n by writing + +
n # extension+ +It is guaranteed that + +
self # node # extension == self+ +always holds.
Here are sample definitions of the print +methods: + +
class eltype_a = + object (self) + inherit custom_extension + method print ch = + (* Nodes <a>...</a> are only containers: *) + output_string ch "("; + List.iter + (fun n -> n # extension # print ch) + (self # node # sub_nodes); + output_string ch ")"; + end + +class eltype_b = + object (self) + inherit custom_extension + method print ch = + (* Print the value of the CDATA attribute "print": *) + match self # node # attribute "print" with + Value s -> output_string ch s + | Implied_value -> output_string ch "<missing>" + | Valuelist l -> assert false + (* not possible because the att is CDATA *) + end + +class eltype_c = + object (self) + inherit custom_extension + method print ch = + (* Print the contents of this element: *) + output_string ch (self # node # data) + end + +class null_extension = + object (self) + inherit custom_extension + method print ch = assert false + end
The remaining task is to configure the parser such that these extension classes +are actually used. Here another problem arises: It is not possible to +dynamically select the class of an object to be created. As workaround, +PXP allows the user to specify exemplar objects for +the various element types; instead of creating the nodes of the tree by +applying the new operator the nodes are produced by +duplicating the exemplars. As object duplication preserves the class of the +object, one can create fresh objects of every class for which previously an +exemplar has been registered.
Exemplars are meant as objects without contents, the only interesting thing is +that exemplars are instances of a certain class. The creation of an exemplar +for an element node can be done by: + +
let element_exemplar = new element_impl extension_exemplar+ +And a data node exemplar is created by: + +
let data_exemplar = new data_impl extension_exemplar+ +The classes element_impl and data_impl +are defined in the module Pxp_document. The constructors +initialize the fresh objects as empty objects, i.e. without children, without +data contents, and so on. The extension_exemplar is the +initial extension object the exemplars are associated with.
Once the exemplars are created and stored somewhere (e.g. in a hash table), you +can take an exemplar and create a concrete instance (with contents) by +duplicating it. As user of the parser you are normally not concerned with this +as this is part of the internal logic of the parser, but as background knowledge +it is worthwhile to mention that the two methods +create_element and create_data actually +perform the duplication of the exemplar for which they are invoked, +additionally apply modifications to the clone, and finally return the new +object. Moreover, the extension object is copied, too, and the new node object +is associated with the fresh extension object. Note that this is the reason why +every extension object must have a clone method.
The configuration of the set of exemplars is passed to the +parse_document_entity function as third argument. In our +example, this argument can be set up as follows: + +
let spec = + make_spec_from_alist + ~data_exemplar: (new data_impl (new null_extension)) + ~default_element_exemplar: (new element_impl (new null_extension)) + ~element_alist: + [ "a", new element_impl (new eltype_a); + "b", new element_impl (new eltype_b); + "c", new element_impl (new eltype_c); + ] + ()+ +The ~element_alist function argument defines the mapping +from element types to exemplars as associative list. The argument +~data_exemplar specifies the exemplar for data nodes, and +the ~default_element_exemplar is used whenever the parser +finds an element type for which the associative list does not define an +exemplar.
The configuration is now complete. You can still use the same parsing +functions, only the initialization is a bit different. For example, call the +parser by: + +
let d = parse_document_entity default_config (from_file "doc.xml") spec+ +Note that the resulting document d has a usable type; +especially the print method we added is visible. So you can +print your document by + +
d # root # extension # print stdout
This object-oriented approach looks rather complicated; this is mostly caused +by working around some problems of the strict typing system of O'Caml. Some +auxiliary concepts such as extensions were needed, but the practical +consequences are low. In the next section, one of the examples of the +distribution is explained, a converter from readme +documents to HTML.
[1] | The problem is that the subclass is +usually not a subtype in this case because O'Caml has a contravariant subtyping +rule. |