updating and structuring

[helm.git] / helm / mathql / doc / mathql_introduction_property.tex
diff --git a/helm/mathql/doc/mathql_introduction_property.tex b/helm/mathql/doc/mathql_introduction_property.tex

new file mode 100644 (file)

index 0000000..f73f5bc
--- /dev/null
+++ b/helm/mathql/doc/mathql_introduction_property.tex
@@ -0,0 +1,203 @@
+\subsection{High level access to metadata} \label{HighAccess}
+
+{\MathQL} high level access to an {\RDF} database is \emph{graph-oriented} and
+is delegated to its \TT{property} operator, that formally accesses an {\RDF}
+graph%
+\footnote
+{When we say {\RDF} graph, we actually mean both the {\RDFM} graph and the
+{\RDFS} graph.}
+through an \emph{access relation} which is better understood by explaining
+the informal semantics of the operator itself.  
+
+This operator builds a \emph{result} {\av} set starting from two mandatory
+arguments: the \emph{source} {\av} set and the \emph{head path}.
+Other optional arguments may be used to change its default behaviour or to
+request advanced functionalities. 
+Its textual syntax is (see \subsecref{Textual}):
+
+\begin{center}
+\TT{property} \EM{optional-flags} \EM{head-path} \EM{optional-clauses} \TT{of}
+\EM{optional-flag} \EM{av-set}
+\end{center}
+
+A path has the structure of an attribute name ({\ie} a list of strings) and
+denotes a (possibly empty) finite sequence of contiguous arcs (describing
+properties in the {\RDF} graph).
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+These examples refer to the resources "A" and "B" of Figure 2.
+
+Example 1: reading an unstructured property - simple case:
+property "id"/"major" of {"A", "B"}   returns   "1"
+property "id"/"minor" of {"A", "B"}   returns   "2"; "7"
+
+Example 2: reading an unstructured property - use of pattern:
+property "id"/"minor" of pattern ".*"   returns   "2"; "7"
+
+Example 3: reading a structured property without main component:
+property "id" attr "major", "minor" of {"A", "B"}  
+generates the following attributed values:
+"" attr {"major" = "1"; "minor" = "2"}; "" attr {"major" = "1"; "minor" = "7"}
+that are composed using MathQL-1 set-theoretic union giving the one-element set:
+"" attr {"major" = "1"; "minor" = "2"}, {"major" = "1"; "minor" = "7"} 
+
+Example 4: reading a structured property specifying a main component:
+property "id" main "major" attr "minor" of {"A", "B"}   gives
+"1" attr {"minor" = "2"}, {"minor" = "7"} 
+
+Example 5: the renaming mechanism:
+property "id" attr "minor" as "new-name" of {"A", "B"}   gives  
+"" attr {"new-name" = "2"}, {"new-name" = "7"}
+
+Example 6: imposing constraints on property values:
+property "date" istrue "first" in "2002-01-01" attr "modified" of {"A", "B"}   and  
+property "date" istrue "first" match ".*01.*" attr "modified" of {"A", "B"}    give
+"" attr {"modified" = "2002-03-01"}
+Only the instance of "date" with "first" set to "2002-01-01" is considered.
+
+Example 7: inverse traversal of the head path:
+property inverse "date" attr "first" in subj ""   gives
+"A" attr {"first" = "2002-01-01"}; "B" attr {"first" = "2002-02-01"}
+
+Example 8: some triples of an access relation:
+The triples formalizing the property "date" of the resource "A":
+("A", "date", "");
+("A", "date"/"first", "2002-01-01"); ("A", "date"/"modified", "2002-03-01")
+\end{verbatim} \end{footnotesize}
+\vskip-1pc
+\caption{The ``property'' operator}
+\label{Property}
+\end{figure}
+
+In the simplest case \TT{property} is used to read the values of a (possibly
+compound) property with an unstructured value and does the following:
+
+\begin{enumerate}
+
+\item
+It computes the instances of the given path in the {\RDF} graph available to
+the query engine, using the resources specified in the head strings of the 
+source {\av} set (call them source resources) as start-nodes.
+
+\item
+The computation gives a set of nodes in the {\RDF} graph ({\ie} the end-nodes
+of the instantiated paths) which are the values of the instances of the
+(possibly compound) property specified by the path and concerning the source
+resources.
+
+\item
+These values, encoded into {\av}'s as explained above, are composed by means
+of the {\MathQL} set-theoretic union to form the result {\av} set.
+
+\end{enumerate}
+
+\figref{Property} (example 1) shows an instance of this procedure. 
+Note that the result sets of this example have no attributes and that a path
+is represented by a slash-separated list of strings denoting the path's arcs.%
+\footnote{If needed, the empty path is represented by a single slash.}
+
+Using the \TT{pattern} flag, \TT{property} can be instructed to regard the
+values of the source {\av} set as POSIX regular expressions rather than as
+constant strings.
+In this case \TT{pattern} selects the set of resources matching at least one
+of the given expressions.
+See for instance \figref{Property} (example 2).
+
+If we want to read the value of a structured property we can specify the
+value's main component in the \TT{main} \EM{optional-clause} (this
+specification overrides the default setting inferred from the {\RDF} graph
+through the \emph{rdf:value} property) and the list of the value's secondary
+components in the \TT{attr} \EM{optional-clause}. 
+Note that if a secondary component is not listed in the \TT{attr} clause, it
+will not be read.
+Also recall that, when the result {\av}'s are formed, the main component is
+is read in the head string, whereas the secondary components are
+encoded using the attributes of a single group.
+See for instance \figref{Property} (examples 3 and 4).
+As a component of a property's value may be a structured property, its
+specification (appearing in the \TT{main} or \TT{attr} clause) is
+actually a path in the {\RDF} graph starting from the end-node of the head
+path.
+
+Note that the name of an attribute, which by default is its defining path in
+the \TT{attr} clause, can be changed with an optional \TT{as} clause for the
+user's convenience. See for instance \figref{Property} (example 5).
+The alternative could be a simple string but it needs to be a path for typing
+reasons. In any case a string can be seen as a one-element path.
+
+In the default case \TT{property} builds its result considering every
+component of the {\RDFM} graph ({\ie} every {\RDFM}) but we can constrain
+some nodes of the inspected components to have (or not to have) a given value,
+with the aim of improving the performance of the inspection procedure.
+The constrained nodes are specified in the \TT{istrue} and \TT{istrue}
+\EM{optional-clauses} and the constraining values are expressed by \TT{in} or
+\TT{match} constructions depending on their semantics (constant values or
+POSIX regular expressions respectively).
+See for instance \figref{Property} (example 6).
+Again a constrained node may be the value of a compound property, therefore
+its specification (appearing in the \TT{istrue} or \TT{isfalse} clause) is
+a path in the {\RDF} graph starting from the end-node of the head path.
+
+\TT{property} allows to access the {\RDFS} property hierarchy by specifying
+a flag named \TT{sub} or \TT{super}.
+If the \TT{sub} flag is present, \TT{property} inspects the instances of the
+default tree (made by the head path and by the \EM{optional-clauses} paths)
+and every other tree obtained by substituting an arc $ p $ with the arc of a
+subproperty of $ p $.
+If the \TT{super} flag is present, super-property arcs are substituted instead.
+
+\TT{property} also allows the inverse traversal of its head path if the
+\TT{inverse} flag is specified.
+In this case the operator works as follows:
+
+\begin{enumerate}
+
+\item
+It instantiates the head path using as end-nodes the values whose main
+component is specified in head strings of the source {\av} set.
+
+\item
+It encodes the resources corresponding to the instances of the start-nodes into
+{\av}'s assigning the attributes obtained instantiating the attribute paths%
+\footnote{The path in \EM{optional-clauses} are never traversed backward.}
+and composes these {\av}'s using the {\MathQL} set-theoretic union to build
+the result set.
+
+\end{enumerate}
+
+See for instance \figref{Property} (example 7).
+
+Now we can present \emph{access relations} which are the formal tools used by
+{\MathQL} semantics to access the {\RDF} graph.
+An access relation is a set of triples $ (r_1, p, r_2) $ where $ r_1 $ and
+$ r_2 $ are strings, $ p $ is a path (encoded as a list of strings). 
+Each triple is a sort of ``extended {\RDF} triple'' in the sense that $ r_1 $
+is is a resource for which metadata is provided, $ p $ is a path in the {\RDF}
+graph and $ r_2 $ is the main value of the end-node of the instance of $ p $
+starting from $ r_1 $ (this includes the instances of sub- and super-arcs of
+$ p $ if necessary).
+See for instance \figref{Property} (example 8).
+
+{\MathQL} does not provide for any built-in access relation so any query
+engine can freely define the access relations that are appropriate with
+respect to the metadata it can access.
+In particular, \secref{Interpreter} describes the access relations implemented
+by the {\HELM} query engine.
+
+It is worth remarking, as it was already stressed in \cite{GS03, Gui03}, that
+the concept of access relation corresponds to the abstract concept of
+property in the basic {\RDF} data model which draws on well established
+principles from various data representation communities.
+In this sense an {\RDF} property can be thought of either as an attribute of a
+resource (traditional attribute-value pairs model), or as a relation between
+a resource and a value (entity-relationship model).
+This observation leads us to conclude that {\MathQL} is sound and complete
+with respect to querying an abstract {\RDF} metadata model. 
+
+Finally note that access relations are close to {\RDF} entity-relationship
+model, but they do not work if we allow paths with an arbitrary number of
+loops ({\ie} with an arbitrary length) because this would lead to creating
+infinite sets of triples.
+If we want to handle this case, we need to turn these relations into
+multivalued functions.