The PXP user's guide
Prev	Chapter 2. Using PXP	Next

2.3. Class-based processing of the node tree

By default, the parsed node tree consists of objects of the same class; this is +a good design as long as you want only to access selected parts of the +document. For complex transformations, it may be better to use different +classes for objects describing different element types.

For example, if the DTD declares the element types a, +b, and c, and if the task is to convert +an arbitrary document into a printable format, the idea is to define for every +element type a separate class that has a method print. The +classes are eltype_a, eltype_b, and +eltype_c, and every class implements +print such that elements of the type corresponding to the +class are converted to the output format.

The parser supports such a design directly. As it is impossible to derive +recursive classes in O'Caml[1], the specialized element classes cannot be formed by +simply inheriting from the built-in classes of the parser and adding methods +for customized functionality. To get around this limitation, every node of the +document tree is represented by two objects, one called +"the node" and containing the recursive definition of the tree, one called "the +extension". Every node object has a reference to the extension, and the +extension has a reference to the node. The advantage of this model is that it +is now possible to customize the extension without affecting the typing +constraints of the recursive node definition.

Every extension must have the three methods clone, +node, and set_node. The method +clone creates a deep copy of the extension object and +returns it; node returns the node object for this extension +object; and set_node is used to tell the extension object +which node is associated with it, this method is automatically called when the +node tree is initialized. The following definition is a good starting point +for these methods; usually clone must be further refined +when instance variables are added to the class: + +

class custom_extension =
+  object (self)
+
+    val mutable node = (None : custom_extension node option)
+
+    method clone = {< >} 
+    method node =
+      match node with
+          None ->
+            assert false
+        | Some n -> n
+    method set_node n =
+      node <- Some n
+
+  end

+ +This part of the extension is usually the same for all classes, so it is a good +idea to consider custom_extension as the super-class of the +further class definitions. Continuining the example of above, we can define the +element type classes as follows: + +

class virtual custom_extension =
+  object (self)
+    ... clone, node, set_node defined as above ...
+
+    method virtual print : out_channel -> unit
+  end
+
+class eltype_a =
+  object (self)
+    inherit custom_extension
+    method print ch = ...
+  end
+
+class eltype_b =
+  object (self)
+    inherit custom_extension
+    method print ch = ...
+  end
+
+class eltype_c =
+  object (self)
+    inherit custom_extension
+    method print ch = ...
+  end

+ +The method print can now be implemented for every element +type separately. Note that you get the associated node by invoking + +

self # node

+ +and you get the extension object of a node n by writing + +

n # extension

+ +It is guaranteed that + +

self # node # extension == self

+ +always holds.

Here are sample definitions of the print +methods: + +

class eltype_a =
+  object (self)
+    inherit custom_extension
+    method print ch = 
+      (* Nodes <a>...</a> are only containers: *)
+      output_string ch "(";
+      List.iter
+        (fun n -> n # extension # print ch)
+        (self # node # sub_nodes);
+      output_string ch ")";
+  end
+
+class eltype_b =
+  object (self)
+    inherit custom_extension
+    method print ch =
+      (* Print the value of the CDATA attribute "print": *)
+      match self # node # attribute "print" with
+        Value s       -> output_string ch s
+      | Implied_value -> output_string ch "<missing>"
+      | Valuelist l   -> assert false   
+                         (* not possible because the att is CDATA *)
+  end
+
+class eltype_c =
+  object (self)
+    inherit custom_extension
+    method print ch = 
+      (* Print the contents of this element: *)
+      output_string ch (self # node # data)
+  end
+
+class null_extension =
+  object (self)
+    inherit custom_extension
+    method print ch = assert false
+  end

The remaining task is to configure the parser such that these extension classes +are actually used. Here another problem arises: It is not possible to +dynamically select the class of an object to be created. As workaround, +PXP allows the user to specify exemplar objects for +the various element types; instead of creating the nodes of the tree by +applying the new operator the nodes are produced by +duplicating the exemplars. As object duplication preserves the class of the +object, one can create fresh objects of every class for which previously an +exemplar has been registered.

Exemplars are meant as objects without contents, the only interesting thing is +that exemplars are instances of a certain class. The creation of an exemplar +for an element node can be done by: + +

let element_exemplar = new element_impl extension_exemplar

+ +And a data node exemplar is created by: + +

let data_exemplar = new data_impl extension_exemplar

+ +The classes element_impl and data_impl +are defined in the module Pxp_document. The constructors +initialize the fresh objects as empty objects, i.e. without children, without +data contents, and so on. The extension_exemplar is the +initial extension object the exemplars are associated with.

Once the exemplars are created and stored somewhere (e.g. in a hash table), you +can take an exemplar and create a concrete instance (with contents) by +duplicating it. As user of the parser you are normally not concerned with this +as this is part of the internal logic of the parser, but as background knowledge +it is worthwhile to mention that the two methods +create_element and create_data actually +perform the duplication of the exemplar for which they are invoked, +additionally apply modifications to the clone, and finally return the new +object. Moreover, the extension object is copied, too, and the new node object +is associated with the fresh extension object. Note that this is the reason why +every extension object must have a clone method.

The configuration of the set of exemplars is passed to the +parse_document_entity function as third argument. In our +example, this argument can be set up as follows: + +

let spec =
+  make_spec_from_alist
+    ~data_exemplar:            (new data_impl (new null_extension))
+    ~default_element_exemplar: (new element_impl (new null_extension))
+    ~element_alist:
+       [ "a",  new element_impl (new eltype_a);
+         "b",  new element_impl (new eltype_b);
+         "c",  new element_impl (new eltype_c);
+       ]
+    ()

+ +The ~element_alist function argument defines the mapping +from element types to exemplars as associative list. The argument +~data_exemplar specifies the exemplar for data nodes, and +the ~default_element_exemplar is used whenever the parser +finds an element type for which the associative list does not define an +exemplar.

The configuration is now complete. You can still use the same parsing +functions, only the initialization is a bit different. For example, call the +parser by: + +

let d = parse_document_entity default_config (from_file "doc.xml") spec

+ +Note that the resulting document d has a usable type; +especially the print method we added is visible. So you can +print your document by + +

d # root # extension # print stdout

This object-oriented approach looks rather complicated; this is mostly caused +by working around some problems of the strict typing system of O'Caml. Some +auxiliary concepts such as extensions were needed, but the practical +consequences are low. In the next section, one of the examples of the +distribution is explained, a converter from readme +documents to HTML.

Prev	Home	Next
How to parse a document from an application	Up	Example: An HTML backend for the readme +DTD

2.3. Class-based processing of the node tree

Notes