helm/mathql/doc/mathql_introduction_property.tex

   1 \subsection{High level access to metadata} \label{HighAccess}
   2
   3 {\MathQL} high level access to an {\RDF} database is \emph{graph-oriented} and
   4 is delegated to its \TT{property} operator, that formally accesses an {\RDF}
   5 graph%
   6 \footnote
   7 {When we say {\RDF} graph, we actually mean both the {\RDFM} graph and the
   8 {\RDFS} graph.}
   9 through an \emph{access relation} which is better understood by explaining
  10 the informal semantics of the operator itself.
  11
  12 This operator builds a \emph{result} {\av} set starting from two mandatory
  13 arguments: the \emph{source} {\av} set and the \emph{head path}.
  14 Other optional arguments may be used to change its default behaviour or to
  15 request advanced functionalities.
  16 Its textual syntax is (see \subsecref{Textual}):
  17
  18 \begin{center}
  19 \TT{property} \EM{optional-flags} \EM{head-path} \EM{optional-clauses} \TT{of}
  20 \EM{optional-flag} \EM{av-set}
  21 \end{center}
  22
  23 A path has the structure of an attribute name ({\ie} a list of strings) and
  24 denotes a (possibly empty) finite sequence of contiguous arcs (describing
  25 properties in the {\RDF} graph).
  26
  27 \begin{figure}[ht]
  28 \begin{footnotesize} \begin{verbatim}
  29 These examples refer to the resources "A" and "B" of Figure 2.
  30
  31 Example 1: reading an unstructured property - simple case:
  32 property "id"/"major" of {"A", "B"}   returns   "1"
  33 property "id"/"minor" of {"A", "B"}   returns   "2"; "7"
  34
  35 Example 2: reading an unstructured property - use of pattern:
  36 property "id"/"minor" of pattern ".*"   returns   "2"; "7"
  37
  38 Example 3: reading a structured property without main component:
  39 property "id" attr "major", "minor" of {"A", "B"}
  40 generates the following attributed values:
  41 "" attr {"major" = "1"; "minor" = "2"}; "" attr {"major" = "1"; "minor" = "7"}
  42 that are composed using MathQL-1 set-theoretic union giving the one-element set:
  43 "" attr {"major" = "1"; "minor" = "2"}, {"major" = "1"; "minor" = "7"}
  44
  45 Example 4: reading a structured property specifying a main component:
  46 property "id" main "major" attr "minor" of {"A", "B"}   gives
  47 "1" attr {"minor" = "2"}, {"minor" = "7"}
  48
  49 Example 5: the renaming mechanism:
  50 property "id" attr "minor" as "new-name" of {"A", "B"}   gives
  51 "" attr {"new-name" = "2"}, {"new-name" = "7"}
  52
  53 Example 6: imposing constraints on property values:
  54 property "date" istrue "first" in "2002-01-01" attr "modified" of {"A", "B"}   and
  55 property "date" istrue "first" match ".*01.*" attr "modified" of {"A", "B"}    give
  56 "" attr {"modified" = "2002-03-01"}
  57 Only the instance of "date" with "first" set to "2002-01-01" is considered.
  58
  59 Example 7: inverse traversal of the head path:
  60 property inverse "date" attr "first" in subj ""   gives
  61 "A" attr {"first" = "2002-01-01"}; "B" attr {"first" = "2002-02-01"}
  62
  63 Example 8: some triples of an access relation:
  64 The triples formalizing the property "date" of the resource "A":
  65 ("A", "date", "");
  66 ("A", "date"/"first", "2002-01-01"); ("A", "date"/"modified", "2002-03-01")
  67 \end{verbatim} \end{footnotesize}
  68 \vskip-1pc
  69 \caption{The ``property'' operator}
  70 \label{Property}
  71 \end{figure}
  72
  73 In the simplest case \TT{property} is used to read the values of a (possibly
  74 compound) property with an unstructured value and does the following:
  75
  76 \begin{enumerate}
  77
  78 \item
  79 It computes the instances of the given path in the {\RDF} graph available to
  80 the query engine, using the resources specified in the head strings of the
  81 source {\av} set (call them source resources) as start-nodes.
  82
  83 \item
  84 The computation gives a set of nodes in the {\RDF} graph ({\ie} the end-nodes
  85 of the instantiated paths) which are the values of the instances of the
  86 (possibly compound) property specified by the path and concerning the source
  87 resources.
  88
  89 \item
  90 These values, encoded into {\av}'s as explained above, are composed by means
  91 of the {\MathQL} set-theoretic union to form the result {\av} set.
  92
  93 \end{enumerate}
  94
  95 \figref{Property} (example 1) shows an instance of this procedure.
  96 Note that the result sets of this example have no attributes and that a path
  97 is represented by a slash-separated list of strings denoting the path's arcs.%
  98 \footnote{If needed, the empty path is represented by a single slash.}
  99
 100 Using the \TT{pattern} flag, \TT{property} can be instructed to regard the
 101 values of the source {\av} set as POSIX regular expressions rather than as
 102 constant strings.
 103 In this case \TT{pattern} selects the set of resources matching at least one
 104 of the given expressions.
 105 See for instance \figref{Property} (example 2).
 106
 107 If we want to read the value of a structured property we can specify the
 108 value's main component in the \TT{main} \EM{optional-clause} (this
 109 specification overrides the default setting inferred from the {\RDF} graph
 110 through the \emph{rdf:value} property) and the list of the value's secondary
 111 components in the \TT{attr} \EM{optional-clause}.
 112 Note that if a secondary component is not listed in the \TT{attr} clause, it
 113 will not be read.
 114 Also recall that, when the result {\av}'s are formed, the main component is
 115 is read in the head string, whereas the secondary components are
 116 encoded using the attributes of a single group.
 117 See for instance \figref{Property} (examples 3 and 4).
 118 As a component of a property's value may be a structured property, its
 119 specification (appearing in the \TT{main} or \TT{attr} clause) is
 120 actually a path in the {\RDF} graph starting from the end-node of the head
 121 path.
 122
 123 Note that the name of an attribute, which by default is its defining path in
 124 the \TT{attr} clause, can be changed with an optional \TT{as} clause for the
 125 user's convenience. See for instance \figref{Property} (example 5).
 126 The alternative could be a simple string but it needs to be a path for typing
 127 reasons. In any case a string can be seen as a one-element path.
 128
 129 In the default case \TT{property} builds its result considering every
 130 component of the {\RDFM} graph ({\ie} every {\RDFM}) but we can constrain
 131 some nodes of the inspected components to have (or not to have) a given value,
 132 with the aim of improving the performance of the inspection procedure.
 133 The constrained nodes are specified in the \TT{istrue} and \TT{istrue}
 134 \EM{optional-clauses} and the constraining values are expressed by \TT{in} or
 135 \TT{match} constructions depending on their semantics (constant values or
 136 POSIX regular expressions respectively).
 137 See for instance \figref{Property} (example 6).
 138 Again a constrained node may be the value of a compound property, therefore
 139 its specification (appearing in the \TT{istrue} or \TT{isfalse} clause) is
 140 a path in the {\RDF} graph starting from the end-node of the head path.
 141
 142 \TT{property} allows to access the {\RDFS} property hierarchy by specifying
 143 a flag named \TT{sub} or \TT{super}.
 144 If the \TT{sub} flag is present, \TT{property} inspects the instances of the
 145 default tree (made by the head path and by the \EM{optional-clauses} paths)
 146 and every other tree obtained by substituting an arc $ p $ with the arc of a
 147 subproperty of $ p $.
 148 If the \TT{super} flag is present, super-property arcs are substituted instead.
 149
 150 \TT{property} also allows the inverse traversal of its head path if the
 151 \TT{inverse} flag is specified.
 152 In this case the operator works as follows:
 153
 154 \begin{enumerate}
 155
 156 \item
 157 It instantiates the head path using as end-nodes the values whose main
 158 component is specified in head strings of the source {\av} set.
 159
 160 \item
 161 It encodes the resources corresponding to the instances of the start-nodes into
 162 {\av}'s assigning the attributes obtained instantiating the attribute paths%
 163 \footnote{The path in \EM{optional-clauses} are never traversed backward.}
 164 and composes these {\av}'s using the {\MathQL} set-theoretic union to build
 165 the result set.
 166
 167 \end{enumerate}
 168
 169 See for instance \figref{Property} (example 7).
 170
 171 Now we can present \emph{access relations} which are the formal tools used by
 172 {\MathQL} semantics to access the {\RDF} graph.
 173 An access relation is a set of triples $ (r_1, p, r_2) $ where $ r_1 $ and
 174 $ r_2 $ are strings, $ p $ is a path (encoded as a list of strings).
 175 Each triple is a sort of ``extended {\RDF} triple'' in the sense that $ r_1 $
 176 is is a resource for which metadata is provided, $ p $ is a path in the {\RDF}
 177 graph and $ r_2 $ is the main value of the end-node of the instance of $ p $
 178 starting from $ r_1 $ (this includes the instances of sub- and super-arcs of
 179 $ p $ if necessary).
 180 See for instance \figref{Property} (example 8).
 181
 182 {\MathQL} does not provide for any built-in access relation so any query
 183 engine can freely define the access relations that are appropriate with
 184 respect to the metadata it can access.
 185 In particular, \secref{Interpreter} describes the access relations implemented
 186 by the {\HELM} query engine.
 187
 188 It is worth remarking, as it was already stressed in \cite{GS03, Gui03}, that
 189 the concept of access relation corresponds to the abstract concept of
 190 property in the basic {\RDF} data model which draws on well established
 191 principles from various data representation communities.
 192 In this sense an {\RDF} property can be thought of either as an attribute of a
 193 resource (traditional attribute-value pairs model), or as a relation between
 194 a resource and a value (entity-relationship model).
 195 This observation leads us to conclude that {\MathQL} is sound and complete
 196 with respect to querying an abstract {\RDF} metadata model.
 197
 198 Finally note that access relations are close to {\RDF} entity-relationship
 199 model, but they do not work if we allow paths with an arbitrary number of
 200 loops ({\ie} with an arbitrary length) because this would lead to creating
 201 infinite sets of triples.
 202 If we want to handle this case, we need to turn these relations into
 203 multivalued functions.