The PXP user's guide
Prev	Chapter 2. Using PXP	Next

2.3. Class-based processing of the node tree

By default, the parsed node tree consists of objects of the same class; this is -a good design as long as you want only to access selected parts of the -document. For complex transformations, it may be better to use different -classes for objects describing different element types.

For example, if the DTD declares the element types a, -b, and c, and if the task is to convert -an arbitrary document into a printable format, the idea is to define for every -element type a separate class that has a method print. The -classes are eltype_a, eltype_b, and -eltype_c, and every class implements -print such that elements of the type corresponding to the -class are converted to the output format.

The parser supports such a design directly. As it is impossible to derive -recursive classes in O'Caml[1], the specialized element classes cannot be formed by -simply inheriting from the built-in classes of the parser and adding methods -for customized functionality. To get around this limitation, every node of the -document tree is represented by two objects, one called -"the node" and containing the recursive definition of the tree, one called "the -extension". Every node object has a reference to the extension, and the -extension has a reference to the node. The advantage of this model is that it -is now possible to customize the extension without affecting the typing -constraints of the recursive node definition.

Every extension must have the three methods clone, -node, and set_node. The method -clone creates a deep copy of the extension object and -returns it; node returns the node object for this extension -object; and set_node is used to tell the extension object -which node is associated with it, this method is automatically called when the -node tree is initialized. The following definition is a good starting point -for these methods; usually clone must be further refined -when instance variables are added to the class: - -

class custom_extension =
-  object (self)
-
-    val mutable node = (None : custom_extension node option)
-
-    method clone = {< >} 
-    method node =
-      match node with
-          None ->
-            assert false
-        | Some n -> n
-    method set_node n =
-      node <- Some n
-
-  end

- -This part of the extension is usually the same for all classes, so it is a good -idea to consider custom_extension as the super-class of the -further class definitions. Continuining the example of above, we can define the -element type classes as follows: - -

class virtual custom_extension =
-  object (self)
-    ... clone, node, set_node defined as above ...
-
-    method virtual print : out_channel -> unit
-  end
-
-class eltype_a =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end
-
-class eltype_b =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end
-
-class eltype_c =
-  object (self)
-    inherit custom_extension
-    method print ch = ...
-  end

- -The method print can now be implemented for every element -type separately. Note that you get the associated node by invoking - -

self # node

- -and you get the extension object of a node n by writing - -

n # extension

- -It is guaranteed that - -

self # node # extension == self

- -always holds.

Here are sample definitions of the print -methods: - -

class eltype_a =
-  object (self)
-    inherit custom_extension
-    method print ch = 
-      (* Nodes <a>...</a> are only containers: *)
-      output_string ch "(";
-      List.iter
-        (fun n -> n # extension # print ch)
-        (self # node # sub_nodes);
-      output_string ch ")";
-  end
-
-class eltype_b =
-  object (self)
-    inherit custom_extension
-    method print ch =
-      (* Print the value of the CDATA attribute "print": *)
-      match self # node # attribute "print" with
-        Value s       -> output_string ch s
-      | Implied_value -> output_string ch "<missing>"
-      | Valuelist l   -> assert false   
-                         (* not possible because the att is CDATA *)
-  end
-
-class eltype_c =
-  object (self)
-    inherit custom_extension
-    method print ch = 
-      (* Print the contents of this element: *)
-      output_string ch (self # node # data)
-  end
-
-class null_extension =
-  object (self)
-    inherit custom_extension
-    method print ch = assert false
-  end

The remaining task is to configure the parser such that these extension classes -are actually used. Here another problem arises: It is not possible to -dynamically select the class of an object to be created. As workaround, -PXP allows the user to specify exemplar objects for -the various element types; instead of creating the nodes of the tree by -applying the new operator the nodes are produced by -duplicating the exemplars. As object duplication preserves the class of the -object, one can create fresh objects of every class for which previously an -exemplar has been registered.

Exemplars are meant as objects without contents, the only interesting thing is -that exemplars are instances of a certain class. The creation of an exemplar -for an element node can be done by: - -

let element_exemplar = new element_impl extension_exemplar

- -And a data node exemplar is created by: - -

let data_exemplar = new data_impl extension_exemplar

- -The classes element_impl and data_impl -are defined in the module Pxp_document. The constructors -initialize the fresh objects as empty objects, i.e. without children, without -data contents, and so on. The extension_exemplar is the -initial extension object the exemplars are associated with.

Once the exemplars are created and stored somewhere (e.g. in a hash table), you -can take an exemplar and create a concrete instance (with contents) by -duplicating it. As user of the parser you are normally not concerned with this -as this is part of the internal logic of the parser, but as background knowledge -it is worthwhile to mention that the two methods -create_element and create_data actually -perform the duplication of the exemplar for which they are invoked, -additionally apply modifications to the clone, and finally return the new -object. Moreover, the extension object is copied, too, and the new node object -is associated with the fresh extension object. Note that this is the reason why -every extension object must have a clone method.

The configuration of the set of exemplars is passed to the -parse_document_entity function as third argument. In our -example, this argument can be set up as follows: - -

let spec =
-  make_spec_from_alist
-    ~data_exemplar:            (new data_impl (new null_extension))
-    ~default_element_exemplar: (new element_impl (new null_extension))
-    ~element_alist:
-       [ "a",  new element_impl (new eltype_a);
-         "b",  new element_impl (new eltype_b);
-         "c",  new element_impl (new eltype_c);
-       ]
-    ()

- -The ~element_alist function argument defines the mapping -from element types to exemplars as associative list. The argument -~data_exemplar specifies the exemplar for data nodes, and -the ~default_element_exemplar is used whenever the parser -finds an element type for which the associative list does not define an -exemplar.

The configuration is now complete. You can still use the same parsing -functions, only the initialization is a bit different. For example, call the -parser by: - -

let d = parse_document_entity default_config (from_file "doc.xml") spec

- -Note that the resulting document d has a usable type; -especially the print method we added is visible. So you can -print your document by - -

d # root # extension # print stdout

This object-oriented approach looks rather complicated; this is mostly caused -by working around some problems of the strict typing system of O'Caml. Some -auxiliary concepts such as extensions were needed, but the practical -consequences are low. In the next section, one of the examples of the -distribution is explained, a converter from readme -documents to HTML.

Prev	Home	Next
How to parse a document from an application	Up	Example: An HTML backend for the readme -DTD

2.3. Class-based processing of the node tree

Notes