X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx550.html;fp=helm%2FDEVEL%2Fpxp%2Fpxp%2Fdoc%2Fmanual%2Fhtml%2Fx550.html;h=0000000000000000000000000000000000000000;hb=3ef089a4c58fbe429dd539af6215991ecbe11ee2;hp=f2dcdd79b66940c8fef7e250ca3486057c7b47f4;hpb=1c7fb836e2af4f2f3d18afd0396701f2094265ff;p=helm.git diff --git a/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html b/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html deleted file mode 100644 index f2dcdd79b..000000000 --- a/helm/DEVEL/pxp/pxp/doc/manual/html/x550.html +++ /dev/null @@ -1,765 +0,0 @@ -How to parse a document from an application
The PXP user's guide
PrevChapter 2. Using PXPNext

2.2. How to parse a document from an application

Let me first give a rough overview of the object model of the parser. The -following items are represented by objects: - -

- -Additionally, the following modules play a role: - -

- -There are some further modules that are needed internally but are not part of -the API.

Let the document to be parsed be stored in a file called -doc.xml. The parsing process is started by calling the -function - -

val parse_document_entity : config -> source -> 'ext spec -> 'ext document
- -defined in the module Pxp_yacc. The first argument -specifies some global properties of the parser; it is recommended to start with -the default_config. The second argument determines where the -document to be parsed comes from; this may be a file, a channel, or an entity -ID. To parse doc.xml, it is sufficient to pass -from_file "doc.xml".

The third argument passes the object specification to use. Roughly -speaking, it determines which classes implement the node objects of which -element types, and which extensions are to be used. The 'ext -polymorphic variable is the type of the extension. For the moment, let us -simply pass default_spec as this argument, and ignore it.

So the following expression parses doc.xml: - -

open Pxp_yacc
-let d = parse_document_entity default_config (from_file "doc.xml") default_spec
- -Note that default_config implies that warnings are collected -but not printed. Errors raise one of the exception defined in -Pxp_types; to get readable errors and warnings catch the -exceptions as follows: - -
class warner =
-  object 
-    method warn w =
-      print_endline ("WARNING: " ^ w)
-  end
-;;
-
-try
-  let config = { default_config with warner = new warner } in
-  let d = parse_document_entity config (from_file "doc.xml") default_spec
-  in
-    ...
-with
-   e ->
-     print_endline (Pxp_types.string_of_exn e)
- -Now d is an object of the document -class. If you want the node tree, you can get the root element by - -
let root = d # root
- -and if you would rather like to access the DTD, determine it by - -
let dtd = d # dtd
- -As it is more interesting, let us investigate the node tree now. Given the root -element, it is possible to recursively traverse the whole tree. The children of -a node n are returned by the method -sub_nodes, and the type of a node is returned by -node_type. This function traverses the tree, and prints the -type of each node: - -
let rec print_structure n =
-  let ntype = n # node_type in
-  match ntype with
-    T_element name ->
-      print_endline ("Element of type " ^ name);
-      let children = n # sub_nodes in
-      List.iter print_structure children
-  | T_data ->
-      print_endline "Data"
-  | _ ->
-      (* Other node types are not possible unless the parser is configured
-         differently.
-       *)
-      assert false
- -You can call this function by - -
print_structure root
- -The type returned by node_type is either T_element -name or T_data. The name of the -element type is the string included in the angle brackets. Note that only -elements have children; data nodes are always leaves of the tree.

There are some more methods in order to access a parsed node tree: - -

- -For example, if the task is to print all contents of elements with type -"valuable" whose attribute "priority" is "1", this function can help: - -
let rec print_valuable_prio1 n =
-  let ntype = n # node_type in
-  match ntype with
-    T_element "valuable" when n # attribute "priority" = Value "1" ->
-      print_endline "Valuable node with priotity 1 found:";
-      print_endline (n # data)
-  | (T_element _ | T_data) ->
-      let children = n # sub_nodes in
-      List.iter print_valuable_prio1 children
-  | _ ->
-      assert false
- -You can call this function by: - -
print_valuable_prio1 root
- -If you like a DSSSL-like style, you can make the function -process_children explicit: - -
let rec print_valuable_prio1 n =
-
-  let process_children n =
-    let children = n # sub_nodes in
-    List.iter print_valuable_prio1 children 
-  in
-
-  let ntype = n # node_type in
-  match ntype with
-    T_element "valuable" when n # attribute "priority" = Value "1" ->
-      print_endline "Valuable node with priority 1 found:";
-      print_endline (n # data)
-  | (T_element _ | T_data) ->
-      process_children n
-  | _ ->
-      assert false
- -So far, O'Caml is now a simple "style-sheet language": You can form a big -"match" expression to distinguish between all significant cases, and provide -different reactions on different conditions. But this technique has -limitations; the "match" expression tends to get larger and larger, and it is -difficult to store intermediate values as there is only one big -recursion. Alternatively, it is also possible to represent the various cases as -classes, and to use dynamic method lookup to find the appropiate class. The -next section explains this technique in detail.

Notes

[1]

Elements may -also contain processing instructions. Unlike other document models, PXP -separates processing instructions from the rest of the text and provides a -second interface to access them (method pinstr). However, -there is a parser option (enable_pinstr_nodes) which changes -the behaviour of the parser such that extra nodes for processing instructions -are included into the tree.

Furthermore, the tree does normally not contain nodes for XML comments; -they are ignored by default. Again, there is an option -(enable_comment_nodes) changing this.

[2]

Due to the typing system it is more or less impossible to -derive recursive classes in O'Caml. To get around this, it is common practice -to put the modifiable or extensible part of recursive objects into parallel -objects.


PrevHomeNext
Using PXPUpClass-based processing of the node tree
\ No newline at end of file