x675.html

   1 <HTML
   2 ><HEAD
   3 ><TITLE
   4 >Class-based processing of the node tree</TITLE
   5 ><META
   6 NAME="GENERATOR"
   7 CONTENT="Modular DocBook HTML Stylesheet Version 1.46"><LINK
   8 REL="HOME"
   9 TITLE="The PXP user's guide"
  10 HREF="index.html"><LINK
  11 REL="UP"
  12 TITLE="Using PXP"
  13 HREF="c533.html"><LINK
  14 REL="PREVIOUS"
  15 TITLE="How to parse a document from an application"
  16 HREF="x550.html"><LINK
  17 REL="NEXT"
  18 TITLE="Example: An HTML backend for the readme
  19 DTD"
  20 HREF="x738.html"><LINK
  21 REL="STYLESHEET"
  22 TYPE="text/css"
  23 HREF="markup.css"></HEAD
  24 ><BODY
  25 CLASS="SECT1"
  26 BGCOLOR="#FFFFFF"
  27 TEXT="#000000"
  28 LINK="#0000FF"
  29 VLINK="#840084"
  30 ALINK="#0000FF"
  31 ><DIV
  32 CLASS="NAVHEADER"
  33 ><TABLE
  34 WIDTH="100%"
  35 BORDER="0"
  36 CELLPADDING="0"
  37 CELLSPACING="0"
  38 ><TR
  39 ><TH
  40 COLSPAN="3"
  41 ALIGN="center"
  42 >The PXP user's guide</TH
  43 ></TR
  44 ><TR
  45 ><TD
  46 WIDTH="10%"
  47 ALIGN="left"
  48 VALIGN="bottom"
  49 ><A
  50 HREF="x550.html"
  51 >Prev</A
  52 ></TD
  53 ><TD
  54 WIDTH="80%"
  55 ALIGN="center"
  56 VALIGN="bottom"
  57 >Chapter 2. Using <SPAN
  58 CLASS="ACRONYM"
  59 >PXP</SPAN
  60 ></TD
  61 ><TD
  62 WIDTH="10%"
  63 ALIGN="right"
  64 VALIGN="bottom"
  65 ><A
  66 HREF="x738.html"
  67 >Next</A
  68 ></TD
  69 ></TR
  70 ></TABLE
  71 ><HR
  72 ALIGN="LEFT"
  73 WIDTH="100%"></DIV
  74 ><DIV
  75 CLASS="SECT1"
  76 ><H1
  77 CLASS="SECT1"
  78 ><A
  79 NAME="AEN675"
  80 >2.3. Class-based processing of the node tree</A
  81 ></H1
  82 ><P
  83 >By default, the parsed node tree consists of objects of the same class; this is
  84 a good design as long as you want only to access selected parts of the
  85 document. For complex transformations, it may be better to use different
  86 classes for objects describing different element types.</P
  87 ><P
  88 >For example, if the DTD declares the element types <TT
  89 CLASS="LITERAL"
  90 >a</TT
  91 >,
  92 <TT
  93 CLASS="LITERAL"
  94 >b</TT
  95 >, and <TT
  96 CLASS="LITERAL"
  97 >c</TT
  98 >, and if the task is to convert
  99 an arbitrary document into a printable format, the idea is to define for every
 100 element type a separate class that has a method <TT
 101 CLASS="LITERAL"
 102 >print</TT
 103 >. The
 104 classes are <TT
 105 CLASS="LITERAL"
 106 >eltype_a</TT
 107 >, <TT
 108 CLASS="LITERAL"
 109 >eltype_b</TT
 110 >, and
 111 <TT
 112 CLASS="LITERAL"
 113 >eltype_c</TT
 114 >, and every class implements
 115 <TT
 116 CLASS="LITERAL"
 117 >print</TT
 118 > such that elements of the type corresponding to the
 119 class are converted to the output format.</P
 120 ><P
 121 >The parser supports such a design directly. As it is impossible to derive
 122 recursive classes in O'Caml<A
 123 NAME="AEN688"
 124 HREF="#FTN.AEN688"
 125 >[1]</A
 126 >, the specialized element classes cannot be formed by
 127 simply inheriting from the built-in classes of the parser and adding methods
 128 for customized functionality. To get around this limitation, every node of the
 129 document tree is represented by <I
 130 CLASS="EMPHASIS"
 131 >two</I
 132 > objects, one called
 133 "the node" and containing the recursive definition of the tree, one called "the
 134 extension". Every node object has a reference to the extension, and the
 135 extension has a reference to the node. The advantage of this model is that it
 136 is now possible to customize the extension without affecting the typing
 137 constraints of the recursive node definition.</P
 138 ><P
 139 >Every extension must have the three methods <TT
 140 CLASS="LITERAL"
 141 >clone</TT
 142 >,
 143 <TT
 144 CLASS="LITERAL"
 145 >node</TT
 146 >, and <TT
 147 CLASS="LITERAL"
 148 >set_node</TT
 149 >. The method
 150 <TT
 151 CLASS="LITERAL"
 152 >clone</TT
 153 > creates a deep copy of the extension object and
 154 returns it; <TT
 155 CLASS="LITERAL"
 156 >node</TT
 157 > returns the node object for this extension
 158 object; and <TT
 159 CLASS="LITERAL"
 160 >set_node</TT
 161 > is used to tell the extension object
 162 which node is associated with it, this method is automatically called when the
 163 node tree is initialized. The following definition is a good starting point
 164 for these methods; usually <TT
 165 CLASS="LITERAL"
 166 >clone</TT
 167 > must be further refined
 168 when instance variables are added to the class:
 169
 170 <PRE
 171 CLASS="PROGRAMLISTING"
 172 >class custom_extension =
 173   object (self)
 174
 175     val mutable node = (None : custom_extension node option)
 176
 177     method clone = {&#60; &#62;}
 178     method node =
 179       match node with
 180           None -&#62;
 181             assert false
 182         | Some n -&#62; n
 183     method set_node n =
 184       node &#60;- Some n
 185
 186   end</PRE
 187 >
 188
 189 This part of the extension is usually the same for all classes, so it is a good
 190 idea to consider <TT
 191 CLASS="LITERAL"
 192 >custom_extension</TT
 193 > as the super-class of the
 194 further class definitions. Continuining the example of above, we can define the
 195 element type classes as follows:
 196
 197 <PRE
 198 CLASS="PROGRAMLISTING"
 199 >class virtual custom_extension =
 200   object (self)
 201     ... clone, node, set_node defined as above ...
 202
 203     method virtual print : out_channel -&#62; unit
 204   end
 205
 206 class eltype_a =
 207   object (self)
 208     inherit custom_extension
 209     method print ch = ...
 210   end
 211
 212 class eltype_b =
 213   object (self)
 214     inherit custom_extension
 215     method print ch = ...
 216   end
 217
 218 class eltype_c =
 219   object (self)
 220     inherit custom_extension
 221     method print ch = ...
 222   end</PRE
 223 >
 224
 225 The method <TT
 226 CLASS="LITERAL"
 227 >print</TT
 228 > can now be implemented for every element
 229 type separately. Note that you get the associated node by invoking
 230
 231 <PRE
 232 CLASS="PROGRAMLISTING"
 233 >self # node</PRE
 234 >
 235
 236 and you get the extension object of a node <TT
 237 CLASS="LITERAL"
 238 >n</TT
 239 > by writing
 240
 241 <PRE
 242 CLASS="PROGRAMLISTING"
 243 >n # extension</PRE
 244 >
 245
 246 It is guaranteed that
 247
 248 <PRE
 249 CLASS="PROGRAMLISTING"
 250 >self # node # extension == self</PRE
 251 >
 252
 253 always holds.</P
 254 ><P
 255 >Here are sample definitions of the <TT
 256 CLASS="LITERAL"
 257 >print</TT
 258 >
 259 methods:
 260
 261 <PRE
 262 CLASS="PROGRAMLISTING"
 263 >class eltype_a =
 264   object (self)
 265     inherit custom_extension
 266     method print ch =
 267       (* Nodes &#60;a&#62;...&#60;/a&#62; are only containers: *)
 268       output_string ch "(";
 269       List.iter
 270         (fun n -&#62; n # extension # print ch)
 271         (self # node # sub_nodes);
 272       output_string ch ")";
 273   end
 274
 275 class eltype_b =
 276   object (self)
 277     inherit custom_extension
 278     method print ch =
 279       (* Print the value of the CDATA attribute "print": *)
 280       match self # node # attribute "print" with
 281         Value s       -&#62; output_string ch s
 282       | Implied_value -&#62; output_string ch "&#60;missing&#62;"
 283       | Valuelist l   -&#62; assert false
 284                          (* not possible because the att is CDATA *)
 285   end
 286
 287 class eltype_c =
 288   object (self)
 289     inherit custom_extension
 290     method print ch =
 291       (* Print the contents of this element: *)
 292       output_string ch (self # node # data)
 293   end
 294
 295 class null_extension =
 296   object (self)
 297     inherit custom_extension
 298     method print ch = assert false
 299   end</PRE
 300 ></P
 301 ><P
 302 >The remaining task is to configure the parser such that these extension classes
 303 are actually used. Here another problem arises: It is not possible to
 304 dynamically select the class of an object to be created. As workaround,
 305 <SPAN
 306 CLASS="ACRONYM"
 307 >PXP</SPAN
 308 > allows the user to specify <I
 309 CLASS="EMPHASIS"
 310 >exemplar objects</I
 311 > for
 312 the various element types; instead of creating the nodes of the tree by
 313 applying the <TT
 314 CLASS="LITERAL"
 315 >new</TT
 316 > operator the nodes are produced by
 317 duplicating the exemplars. As object duplication preserves the class of the
 318 object, one can create fresh objects of every class for which previously an
 319 exemplar has been registered.</P
 320 ><P
 321 >Exemplars are meant as objects without contents, the only interesting thing is
 322 that exemplars are instances of a certain class. The creation of an exemplar
 323 for an element node can be done by:
 324
 325 <PRE
 326 CLASS="PROGRAMLISTING"
 327 >let element_exemplar = new element_impl extension_exemplar</PRE
 328 >
 329
 330 And a data node exemplar is created by:
 331
 332 <PRE
 333 CLASS="PROGRAMLISTING"
 334 >let data_exemplar = new data_impl extension_exemplar</PRE
 335 >
 336
 337 The classes <TT
 338 CLASS="LITERAL"
 339 >element_impl</TT
 340 > and <TT
 341 CLASS="LITERAL"
 342 >data_impl</TT
 343 >
 344 are defined in the module <TT
 345 CLASS="LITERAL"
 346 >Pxp_document</TT
 347 >. The constructors
 348 initialize the fresh objects as empty objects, i.e. without children, without
 349 data contents, and so on. The <TT
 350 CLASS="LITERAL"
 351 >extension_exemplar</TT
 352 > is the
 353 initial extension object the exemplars are associated with. </P
 354 ><P
 355 >Once the exemplars are created and stored somewhere (e.g. in a hash table), you
 356 can take an exemplar and create a concrete instance (with contents) by
 357 duplicating it. As user of the parser you are normally not concerned with this
 358 as this is part of the internal logic of the parser, but as background knowledge
 359 it is worthwhile to mention that the two methods
 360 <TT
 361 CLASS="LITERAL"
 362 >create_element</TT
 363 > and <TT
 364 CLASS="LITERAL"
 365 >create_data</TT
 366 > actually
 367 perform the duplication of the exemplar for which they are invoked,
 368 additionally apply modifications to the clone, and finally return the new
 369 object. Moreover, the extension object is copied, too, and the new node object
 370 is associated with the fresh extension object. Note that this is the reason why
 371 every extension object must have a <TT
 372 CLASS="LITERAL"
 373 >clone</TT
 374 > method.</P
 375 ><P
 376 >The configuration of the set of exemplars is passed to the
 377 <TT
 378 CLASS="LITERAL"
 379 >parse_document_entity</TT
 380 > function as third argument. In our
 381 example, this argument can be set up as follows:
 382
 383 <PRE
 384 CLASS="PROGRAMLISTING"
 385 >let spec =
 386   make_spec_from_alist
 387     ~data_exemplar:            (new data_impl (new null_extension))
 388     ~default_element_exemplar: (new element_impl (new null_extension))
 389     ~element_alist:
 390        [ "a",  new element_impl (new eltype_a);
 391          "b",  new element_impl (new eltype_b);
 392          "c",  new element_impl (new eltype_c);
 393        ]
 394     ()</PRE
 395 >
 396
 397 The <TT
 398 CLASS="LITERAL"
 399 >~element_alist</TT
 400 > function argument defines the mapping
 401 from element types to exemplars as associative list. The argument
 402 <TT
 403 CLASS="LITERAL"
 404 >~data_exemplar</TT
 405 > specifies the exemplar for data nodes, and
 406 the <TT
 407 CLASS="LITERAL"
 408 >~default_element_exemplar</TT
 409 > is used whenever the parser
 410 finds an element type for which the associative list does not define an
 411 exemplar. </P
 412 ><P
 413 >The configuration is now complete. You can still use the same parsing
 414 functions, only the initialization is a bit different. For example, call the
 415 parser by:
 416
 417 <PRE
 418 CLASS="PROGRAMLISTING"
 419 >let d = parse_document_entity default_config (from_file "doc.xml") spec</PRE
 420 >
 421
 422 Note that the resulting document <TT
 423 CLASS="LITERAL"
 424 >d</TT
 425 > has a usable type;
 426 especially the <TT
 427 CLASS="LITERAL"
 428 >print</TT
 429 > method we added is visible. So you can
 430 print your document by
 431
 432 <PRE
 433 CLASS="PROGRAMLISTING"
 434 >d # root # extension # print stdout</PRE
 435 ></P
 436 ><P
 437 >This object-oriented approach looks rather complicated; this is mostly caused
 438 by working around some problems of the strict typing system of O'Caml. Some
 439 auxiliary concepts such as extensions were needed, but the practical
 440 consequences are low. In the next section, one of the examples of the
 441 distribution is explained, a converter from <I
 442 CLASS="EMPHASIS"
 443 >readme</I
 444 >
 445 documents to HTML.</P
 446 ></DIV
 447 ><H3
 448 CLASS="FOOTNOTES"
 449 >Notes</H3
 450 ><TABLE
 451 BORDER="0"
 452 CLASS="FOOTNOTES"
 453 WIDTH="100%"
 454 ><TR
 455 ><TD
 456 ALIGN="LEFT"
 457 VALIGN="TOP"
 458 WIDTH="5%"
 459 ><A
 460 NAME="FTN.AEN688"
 461 HREF="x675.html#AEN688"
 462 >[1]</A
 463 ></TD
 464 ><TD
 465 ALIGN="LEFT"
 466 VALIGN="TOP"
 467 WIDTH="95%"
 468 ><P
 469 >The problem is that the subclass is
 470 usually not a subtype in this case because O'Caml has a contravariant subtyping
 471 rule. </P
 472 ></TD
 473 ></TR
 474 ></TABLE
 475 ><DIV
 476 CLASS="NAVFOOTER"
 477 ><HR
 478 ALIGN="LEFT"
 479 WIDTH="100%"><TABLE
 480 WIDTH="100%"
 481 BORDER="0"
 482 CELLPADDING="0"
 483 CELLSPACING="0"
 484 ><TR
 485 ><TD
 486 WIDTH="33%"
 487 ALIGN="left"
 488 VALIGN="top"
 489 ><A
 490 HREF="x550.html"
 491 >Prev</A
 492 ></TD
 493 ><TD
 494 WIDTH="34%"
 495 ALIGN="center"
 496 VALIGN="top"
 497 ><A
 498 HREF="index.html"
 499 >Home</A
 500 ></TD
 501 ><TD
 502 WIDTH="33%"
 503 ALIGN="right"
 504 VALIGN="top"
 505 ><A
 506 HREF="x738.html"
 507 >Next</A
 508 ></TD
 509 ></TR
 510 ><TR
 511 ><TD
 512 WIDTH="33%"
 513 ALIGN="left"
 514 VALIGN="top"
 515 >How to parse a document from an application</TD
 516 ><TD
 517 WIDTH="34%"
 518 ALIGN="center"
 519 VALIGN="top"
 520 ><A
 521 HREF="c533.html"
 522 >Up</A
 523 ></TD
 524 ><TD
 525 WIDTH="33%"
 526 ALIGN="right"
 527 VALIGN="top"
 528 >Example: An HTML backend for the <I
 529 CLASS="EMPHASIS"
 530 >readme</I
 531 >
 532 DTD</TD
 533 ></TR
 534 ></TABLE
 535 ></DIV
 536 ></BODY
 537 ></HTML
 538 >