4 >Example: An HTML backend for the readme
8 CONTENT="Modular DocBook HTML Stylesheet Version 1.46"><LINK
10 TITLE="The PXP user's guide"
11 HREF="index.html"><LINK
14 HREF="c533.html"><LINK
16 TITLE="Class-based processing of the node tree"
17 HREF="x675.html"><LINK
19 TITLE="The objects representing the document"
20 HREF="c893.html"><LINK
23 HREF="markup.css"></HEAD
42 >The PXP user's guide</TH
57 >Chapter 2. Using <SPAN
79 NAME="SECT.README.TO-HTML"
80 >2.4. Example: An HTML backend for the <I
87 >The converter from <I
91 documents follows strictly the approach to define one class per element
92 type. The HTML code is similar to the <I
96 because of this most elements can be converted in the following way: Given the
100 CLASS="PROGRAMLISTING"
101 ><e>content</e></PRE
104 the conversion text is the concatenation of a computed prefix, the recursively
105 converted content, and a computed suffix. </P
107 >Only one element type cannot be handled by this scheme:
111 >. Footnotes are collected while they are found in
112 the input text, and they are printed after the main text has been converted and
124 CLASS="PROGRAMLISTING"
126 open Pxp_document</PRE
135 >2.4.2. Type declarations</A
139 CLASS="PROGRAMLISTING"
140 >class type footnote_printer =
142 method footnote_to_html : store_type -> out_channel -> unit
147 method alloc_footnote : footnote_printer -> int
148 method print_footnotes : out_channel -> unit
168 > is a container for footnotes. You can add a
169 footnote by invoking <TT
172 >; the argument is an
173 object of the class <TT
175 >footnote_printer</TT
176 >, the method returns the
177 number of the footnote. The interesting property of a footnote is that it can
178 be converted to HTML, so a <TT
180 >footnote_printer</TT
184 >footnote_to_html</TT
189 > which is defined below has a compatible method
192 >footnote_to_html</TT
193 > such that objects created from it can be
196 >footnote_printer</TT
199 >The other method, <TT
202 > prints the footnotes as
203 definition list, and is typically invoked after the main material of the page
204 has already been printed. Every item of the list is printed by
207 >footnote_to_html</TT
211 CLASS="PROGRAMLISTING"
215 val mutable footnotes = ( [] : (int * footnote_printer) list )
216 val mutable next_footnote_number = 1
218 method alloc_footnote n =
219 let number = next_footnote_number in
220 next_footnote_number <- number+1;
221 footnotes <- footnotes @ [ number, n ];
224 method print_footnotes ch =
225 if footnotes <> [] then begin
226 output_string ch "<hr align=left noshade=noshade width=\"30%\">\n";
227 output_string ch "<dl>\n";
230 n # footnote_to_html (self : #store_type :> store_type) ch)
232 output_string ch "</dl>\n";
251 >This function converts the characters <, >, &, and " to their HTML
252 representation. For example,
255 >escape_html "<>" = "&lt;&gt;"</TT
257 characters are left unchanged.
260 CLASS="PROGRAMLISTING"
262 Str.global_substitute
263 (Str.regexp "<\\|>\\|&\\|\"")
265 match Str.matched_string s with
266 "<" -> "&lt;"
267 | ">" -> "&gt;"
268 | "&" -> "&amp;"
269 | "\"" -> "&quot;"
270 | _ -> assert false)
281 >2.4.5. Virtual class <TT
287 >This virtual class is the abstract superclass of the extension classes shown
288 below. It defines the standard methods <TT
298 >, and declares the type
299 of the virtual method <TT
302 >. This method recursively
303 traverses the whole element tree, and prints the converted HTML code to the
304 output channel passed as second argument. The first argument is the reference
308 > object which collects the footnotes.
311 CLASS="PROGRAMLISTING"
312 >class virtual shared =
315 (* --- default_ext --- *)
317 val mutable node = (None : shared node option)
319 method clone = {< >}
328 (* --- virtual --- *)
330 method virtual to_html : store -> out_channel -> unit
348 >This class defines <TT
351 > such that the character data of
352 the current node is converted to HTML. Note that <TT
356 extension object, <TT
359 > is the node object, and
362 >self # node # data</TT
363 > returns the character data of the node.
366 CLASS="PROGRAMLISTING"
371 method to_html store ch =
372 output_string ch (escape_html (self # node # data))
389 >This class converts elements of type <TT
393 element is (by definition) always the root element of the document. First, the
394 HTML header is printed; the <TT
397 > attribute of the element
398 determines the title of the HTML page. Some aspects of the HTML page can be
399 configured by setting certain parameter entities, for example the background
400 color, the text color, and link colors. After the header, the
404 > tag, and the headline have been printed, the contents
405 of the page are converted by invoking <TT
409 children of the current node (which is the root node). Then, the footnotes are
410 appended to this by telling the global <TT
414 the footnotes. Finally, the end tags of the HTML pages are printed.</P
416 >This class is an example how to access the value of an attribute: The value is
417 determined by invoking <TT
419 >self # node # attribute "title"</TT
421 this attribute has been declared as CDATA and as being required, the value has
429 string value of the attribute. </P
431 >You can also see how entity contents can be accessed. A parameter entity object
432 can be looked up by <TT
434 >self # node # dtd # par_entity "name"</TT
438 >replacement_text</TT
439 > the value of the entity
440 is returned after inner parameter and character entities have been
441 processed. Note that you must use <TT
448 > to access general entities.</P
451 CLASS="PROGRAMLISTING"
456 method to_html store ch =
459 ch "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">";
461 ch "<!-- WARNING! This is a generated file, do not edit! -->\n";
463 match self # node # attribute "title" with
465 | _ -> assert false
468 try (self # node # dtd # par_entity "readme:html:header")
470 with WF_error _ -> "", false in
471 let html_trailer, _ =
472 try (self # node # dtd # par_entity "readme:html:trailer")
474 with WF_error _ -> "", false in
475 let html_bgcolor, _ =
476 try (self # node # dtd # par_entity "readme:html:bgcolor")
478 with WF_error _ -> "white", false in
479 let html_textcolor, _ =
480 try (self # node # dtd # par_entity "readme:html:textcolor")
482 with WF_error _ -> "", false in
483 let html_alinkcolor, _ =
484 try (self # node # dtd # par_entity "readme:html:alinkcolor")
486 with WF_error _ -> "", false in
487 let html_vlinkcolor, _ =
488 try (self # node # dtd # par_entity "readme:html:vlinkcolor")
490 with WF_error _ -> "", false in
491 let html_linkcolor, _ =
492 try (self # node # dtd # par_entity "readme:html:linkcolor")
494 with WF_error _ -> "", false in
495 let html_background, _ =
496 try (self # node # dtd # par_entity "readme:html:background")
498 with WF_error _ -> "", false in
500 output_string ch "<html><header><title>\n";
501 output_string ch (escape_html title);
502 output_string ch "</title></header>\n";
503 output_string ch "<body ";
505 (fun (name,value) ->
506 if value <> "" then
507 output_string ch (name ^ "=\"" ^ escape_html value ^ "\" "))
508 [ "bgcolor", html_bgcolor;
509 "text", html_textcolor;
510 "link", html_linkcolor;
511 "alink", html_alinkcolor;
512 "vlink", html_vlinkcolor;
514 output_string ch ">\n";
515 output_string ch html_header;
516 output_string ch "<h1>";
517 output_string ch (escape_html title);
518 output_string ch "</h1>\n";
519 (* process main content: *)
521 (fun n -> n # extension # to_html store ch)
522 (self # node # sub_nodes);
523 (* now process footnotes *)
524 store # print_footnotes ch;
526 output_string ch html_trailer;
527 output_string ch "</html>\n";
555 >As the conversion process is very similar, the conversion classes of the three
556 section levels are derived from the more general <TT
560 class. The HTML code of the section levels only differs in the type of the
561 headline, and because of this the classes describing the section levels can be
562 computed by replacing the class argument <TT
569 > by the HTML name of the headline tag.</P
571 >Section elements are converted to HTML by printing a headline and then
572 converting the contents of the element recursively. More precisely, the first
573 sub-element is always a <TT
576 > element, and the other
577 elements are the contents of the section. This structure is declared in the
578 DTD, and it is guaranteed that the document matches the DTD. Because of this
579 the title node can be separated from the rest without any checks.</P
581 >Both the title node, and the body nodes are then converted to HTML by calling
588 CLASS="PROGRAMLISTING"
589 >class section the_tag =
595 method to_html store ch =
596 let sub_nodes = self # node # sub_nodes in
598 title_node :: rest ->
599 output_string ch ("<" ^ tag ^ ">\n");
600 title_node # extension # to_html store ch;
601 output_string ch ("\n</" ^ tag ^ ">");
603 (fun n -> n # extension # to_html store ch)
610 class sect1 = section "h1";;
611 class sect2 = section "h3";;
612 class sect3 = section "h4";;</PRE
640 >Several element types are converted to HTML by simply mapping them to
641 corresponding HTML element types. The class <TT
645 implements this, and the class argument <TT
649 determines the tag name to map to. The output consists of the start tag, the
650 recursively converted inner elements, and the end tag.
653 CLASS="PROGRAMLISTING"
654 >class map_tag the_target_tag =
658 val target_tag = the_target_tag
660 method to_html store ch =
661 output_string ch ("<" ^ target_tag ^ ">\n");
663 (fun n -> n # extension # to_html store ch)
664 (self # node # sub_nodes);
665 output_string ch ("\n</" ^ target_tag ^ ">");
669 class p = map_tag "p";;
670 class em = map_tag "b";;
671 class ul = map_tag "ul";;
672 class li = map_tag "li";;</PRE
690 > are mapped to the same HTML type. Note
691 that HTML forbids the end tag of <TT
697 CLASS="PROGRAMLISTING"
702 method to_html store ch =
703 output_string ch "<br>\n";
705 (fun n -> n # extension # to_html store ch)
706 (self # node # sub_nodes);
726 > type is converted to a <TT
730 section (preformatted text). As the meaning of tabs is unspecified in HTML,
731 tabs are expanded to spaces.
734 CLASS="PROGRAMLISTING"
739 method to_html store ch =
740 let data = self # node # data in
742 let l = String.length data in
743 let rec preprocess i column =
744 (* this is very ineffective but comprehensive: *)
748 let n = 8 - (column mod 8) in
749 String.make n ' ' ^ preprocess (i+1) (column + n)
751 "\n" ^ preprocess (i+1) 0
753 String.make 1 c ^ preprocess (i+1) (column + 1)
757 output_string ch "<p><pre>";
758 output_string ch (escape_html (preprocess 0 0));
759 output_string ch "</pre></p>";
777 >Hyperlinks, expressed by the <TT
780 > element type, are converted
784 > type. If the target of the hyperlink is given
788 >, the URL of this attribute can be used
789 directly. Alternatively, the target can be given by
793 > in which case the ".html" suffix must be added to
796 >Note that within <TT
799 > only #PCDATA is allowed, so the contents
800 can be converted directly by applying <TT
804 character data contents.
807 CLASS="PROGRAMLISTING"
812 method to_html store ch =
813 output_string ch "<a ";
815 match self # node # attribute "href" with
816 Value v -> escape_html v
817 | Valuelist _ -> assert false
818 | Implied_value ->
819 begin match self # node # attribute "readmeref" with
820 Value v -> escape_html v ^ ".html"
821 | Valuelist _ -> assert false
822 | Implied_value ->
826 if href <> "" then
827 output_string ch ("href=\"" ^ href ^ "\"");
828 output_string ch ">";
829 output_string ch (escape_html (self # node # data));
830 output_string ch "</a>";
851 > class has two methods:
855 > to convert the footnote reference to HTML, and
858 >footnote_to_html</TT
859 > to convert the footnote text itself.</P
861 >The footnote reference is converted to a local hyperlink; more precisely, to
862 two anchor tags which are connected with each other. The text anchor points to
863 the footnote anchor, and the footnote anchor points to the text anchor.</P
865 >The footnote must be allocated in the <TT
869 allocating the footnote, you get the number of the footnote, and the text of
870 the footnote is stored until the end of the HTML page is reached when the
871 footnotes can be printed. The <TT
874 > method stores simply
875 the object itself, such that the <TT
877 >footnote_to_html</TT
879 invoked on the same object that encountered the footnote.</P
884 > only allocates the footnote, and prints the
885 reference anchor, but it does not print nor convert the contents of the
886 note. This is deferred until the footnotes actually get printed, i.e. the
887 recursive call of <TT
890 > on the sub nodes is done by
893 >footnote_to_html</TT
896 >Note that this technique does not work if you make another footnote within a
897 footnote; the second footnote gets allocated but not printed.</P
900 CLASS="PROGRAMLISTING"
905 val mutable footnote_number = 0
907 method to_html store ch =
909 store # alloc_footnote (self : #shared :> footnote_printer) in
911 "footnote" ^ string_of_int number in
913 "textnote" ^ string_of_int number in
914 footnote_number <- number;
915 output_string ch ( "<a name=\"" ^ text_anchor ^ "\" href=\"#" ^
916 foot_anchor ^ "\">[" ^ string_of_int number ^
919 method footnote_to_html store ch =
920 (* prerequisite: we are in a definition list <dl>...</dl> *)
922 "footnote" ^ string_of_int footnote_number in
924 "textnote" ^ string_of_int footnote_number in
925 output_string ch ("<dt><a name=\"" ^ foot_anchor ^ "\" href=\"#" ^
926 text_anchor ^ "\">[" ^ string_of_int footnote_number ^
927 "]</a></dt>\n<dd>");
929 (fun n -> n # extension # to_html store ch)
930 (self # node # sub_nodes);
931 output_string ch ("\n</dd>")
943 >2.4.14. The specification of the document model</A
946 >This code sets up the hash table that connects element types with the exemplars
947 of the extension classes that convert the elements to HTML.
950 CLASS="PROGRAMLISTING"
955 ~data_exemplar:(new data_impl (new only_data))
956 ~default_element_exemplar:(new element_impl (new no_markup))
958 [ "readme", (new element_impl (new readme));
959 "sect1", (new element_impl (new sect1));
960 "sect2", (new element_impl (new sect2));
961 "sect3", (new element_impl (new sect3));
962 "title", (new element_impl (new no_markup));
963 "p", (new element_impl (new p));
964 "br", (new element_impl (new br));
965 "code", (new element_impl (new code));
966 "em", (new element_impl (new em));
967 "ul", (new element_impl (new ul));
968 "li", (new element_impl (new li));
969 "footnote", (new element_impl (new footnote : #shared :> shared));
970 "a", (new element_impl (new a));
1017 >Class-based processing of the node tree</TD
1030 >The objects representing the document</TD