4 >Class-based processing of the node tree</TITLE
7 CONTENT="Modular DocBook HTML Stylesheet Version 1.46"><LINK
9 TITLE="The PXP user's guide"
10 HREF="index.html"><LINK
13 HREF="c533.html"><LINK
15 TITLE="How to parse a document from an application"
16 HREF="x550.html"><LINK
18 TITLE="Example: An HTML backend for the readme
20 HREF="x738.html"><LINK
23 HREF="markup.css"></HEAD
42 >The PXP user's guide</TH
57 >Chapter 2. Using <SPAN
80 >2.3. Class-based processing of the node tree</A
83 >By default, the parsed node tree consists of objects of the same class; this is
84 a good design as long as you want only to access selected parts of the
85 document. For complex transformations, it may be better to use different
86 classes for objects describing different element types.</P
88 >For example, if the DTD declares the element types <TT
98 >, and if the task is to convert
99 an arbitrary document into a printable format, the idea is to define for every
100 element type a separate class that has a method <TT
114 >, and every class implements
118 > such that elements of the type corresponding to the
119 class are converted to the output format.</P
121 >The parser supports such a design directly. As it is impossible to derive
122 recursive classes in O'Caml<A
126 >, the specialized element classes cannot be formed by
127 simply inheriting from the built-in classes of the parser and adding methods
128 for customized functionality. To get around this limitation, every node of the
129 document tree is represented by <I
132 > objects, one called
133 "the node" and containing the recursive definition of the tree, one called "the
134 extension". Every node object has a reference to the extension, and the
135 extension has a reference to the node. The advantage of this model is that it
136 is now possible to customize the extension without affecting the typing
137 constraints of the recursive node definition.</P
139 >Every extension must have the three methods <TT
153 > creates a deep copy of the extension object and
157 > returns the node object for this extension
161 > is used to tell the extension object
162 which node is associated with it, this method is automatically called when the
163 node tree is initialized. The following definition is a good starting point
164 for these methods; usually <TT
167 > must be further refined
168 when instance variables are added to the class:
171 CLASS="PROGRAMLISTING"
172 >class custom_extension =
175 val mutable node = (None : custom_extension node option)
177 method clone = {< >}
189 This part of the extension is usually the same for all classes, so it is a good
192 >custom_extension</TT
193 > as the super-class of the
194 further class definitions. Continuining the example of above, we can define the
195 element type classes as follows:
198 CLASS="PROGRAMLISTING"
199 >class virtual custom_extension =
201 ... clone, node, set_node defined as above ...
203 method virtual print : out_channel -> unit
208 inherit custom_extension
209 method print ch = ...
214 inherit custom_extension
215 method print ch = ...
220 inherit custom_extension
221 method print ch = ...
228 > can now be implemented for every element
229 type separately. Note that you get the associated node by invoking
232 CLASS="PROGRAMLISTING"
236 and you get the extension object of a node <TT
242 CLASS="PROGRAMLISTING"
246 It is guaranteed that
249 CLASS="PROGRAMLISTING"
250 >self # node # extension == self</PRE
255 >Here are sample definitions of the <TT
262 CLASS="PROGRAMLISTING"
265 inherit custom_extension
267 (* Nodes <a>...</a> are only containers: *)
268 output_string ch "(";
270 (fun n -> n # extension # print ch)
271 (self # node # sub_nodes);
272 output_string ch ")";
277 inherit custom_extension
279 (* Print the value of the CDATA attribute "print": *)
280 match self # node # attribute "print" with
281 Value s -> output_string ch s
282 | Implied_value -> output_string ch "<missing>"
283 | Valuelist l -> assert false
284 (* not possible because the att is CDATA *)
289 inherit custom_extension
291 (* Print the contents of this element: *)
292 output_string ch (self # node # data)
295 class null_extension =
297 inherit custom_extension
298 method print ch = assert false
302 >The remaining task is to configure the parser such that these extension classes
303 are actually used. Here another problem arises: It is not possible to
304 dynamically select the class of an object to be created. As workaround,
308 > allows the user to specify <I
312 the various element types; instead of creating the nodes of the tree by
316 > operator the nodes are produced by
317 duplicating the exemplars. As object duplication preserves the class of the
318 object, one can create fresh objects of every class for which previously an
319 exemplar has been registered.</P
321 >Exemplars are meant as objects without contents, the only interesting thing is
322 that exemplars are instances of a certain class. The creation of an exemplar
323 for an element node can be done by:
326 CLASS="PROGRAMLISTING"
327 >let element_exemplar = new element_impl extension_exemplar</PRE
330 And a data node exemplar is created by:
333 CLASS="PROGRAMLISTING"
334 >let data_exemplar = new data_impl extension_exemplar</PRE
344 are defined in the module <TT
348 initialize the fresh objects as empty objects, i.e. without children, without
349 data contents, and so on. The <TT
351 >extension_exemplar</TT
353 initial extension object the exemplars are associated with. </P
355 >Once the exemplars are created and stored somewhere (e.g. in a hash table), you
356 can take an exemplar and create a concrete instance (with contents) by
357 duplicating it. As user of the parser you are normally not concerned with this
358 as this is part of the internal logic of the parser, but as background knowledge
359 it is worthwhile to mention that the two methods
367 perform the duplication of the exemplar for which they are invoked,
368 additionally apply modifications to the clone, and finally return the new
369 object. Moreover, the extension object is copied, too, and the new node object
370 is associated with the fresh extension object. Note that this is the reason why
371 every extension object must have a <TT
376 >The configuration of the set of exemplars is passed to the
379 >parse_document_entity</TT
380 > function as third argument. In our
381 example, this argument can be set up as follows:
384 CLASS="PROGRAMLISTING"
387 ~data_exemplar: (new data_impl (new null_extension))
388 ~default_element_exemplar: (new element_impl (new null_extension))
390 [ "a", new element_impl (new eltype_a);
391 "b", new element_impl (new eltype_b);
392 "c", new element_impl (new eltype_c);
400 > function argument defines the mapping
401 from element types to exemplars as associative list. The argument
405 > specifies the exemplar for data nodes, and
408 >~default_element_exemplar</TT
409 > is used whenever the parser
410 finds an element type for which the associative list does not define an
413 >The configuration is now complete. You can still use the same parsing
414 functions, only the initialization is a bit different. For example, call the
418 CLASS="PROGRAMLISTING"
419 >let d = parse_document_entity default_config (from_file "doc.xml") spec</PRE
422 Note that the resulting document <TT
429 > method we added is visible. So you can
430 print your document by
433 CLASS="PROGRAMLISTING"
434 >d # root # extension # print stdout</PRE
437 >This object-oriented approach looks rather complicated; this is mostly caused
438 by working around some problems of the strict typing system of O'Caml. Some
439 auxiliary concepts such as extensions were needed, but the practical
440 consequences are low. In the next section, one of the examples of the
441 distribution is explained, a converter from <I
445 documents to HTML.</P
461 HREF="x675.html#AEN688"
469 >The problem is that the subclass is
470 usually not a subtype in this case because O'Caml has a contravariant subtyping
515 >How to parse a document from an application</TD
528 >Example: An HTML backend for the <I