updating and structuring

[helm.git] / helm / mathql / doc / mathql_introduction_avsets.tex
diff --git a/helm/mathql/doc/mathql_introduction_avsets.tex b/helm/mathql/doc/mathql_introduction_avsets.tex

new file mode 100644 (file)

index 0000000..f90ea92
--- /dev/null
+++ b/helm/mathql/doc/mathql_introduction_avsets.tex
@@ -0,0 +1,301 @@
+\subsection {Sets of attributed values.}
+
+The data representation model used by {\MathQL} relies on the notion of 
+\emph{set of attributed values} ({\av} set for short) that is, in practice,
+the only data type available in {\MathQL}.4. In this sense {\MathQL}.4 is a
+statically untyped language.%
+\footnote
+{A type system that fits {\MathQL} as an {\RDF}-oriented query language,
+should be driven from the {\RDFS} class system. This may be a future
+improvement.}
+Each {\av} in an {\av} set consists of a string% 
+\footnote{When we say \emph{string}, we mean a finite sequence of characters.}
+(that we call the \emph{head string} or \emph{value}) and a (possibly emty)
+multiset of named attributes whose content is a set of strings.
+Attribute names are made of a (possibly empty) list of string components, so
+they can be hierarchically structured. 
+Moreover the attributes of a value are partitioned into a set of \emph{groups}
+({\ie} subsets) to improve its structure.
+
+In the above description a \emph{set} is an \emph{unordered} finite
+sequence \emph{without} repetitions wheras a \emph{multiset} is an
+\emph{unordered} finite sequence \emph{with} repetitions.
+
+In the present context repetitions are defined as follows:
+two {\av}'s are repeated if they share the same head string without any
+condition on their attributes, two groups are repeated of they contain the
+same attributes (equal both in name and content), two attributes of a group
+are repeated if they share the same name without any condition on their
+content, and two strings are always compared in a case-sensitive manner.%
+\footnote
+{The Author's experience with {\MathQL} seems to show that the above
+definition of an {\av} set is just the right one among the many alternatives
+that were tried.} 
+
+As we said, {\MathQL}.4 uses {\av} sets to represent many kinds of
+information, namely:
+
+\begin{enumerate}
+
+\item
+A pool of {\RDF} triples having a common subject $r$, which in general is a
+{\URI} reference \cite{URI}%
+\footnote 
+{A {\URI} \emph {reference} is a {\URI} with an optional fragment identifier.},
+is encoded in a single {\av} placing $r$ in the head string.
+The predicates of the triples are encoded as attribute names and their objects
+are placed in the attributes' contents.
+These contents are structured as multiple strings with the aim of holding the
+objects of repeated predicates.
+Moreover structured attribute names can encode various components of
+structured properties preserving their semantics.
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+The RDF triples:
+("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Sandro Hawke")
+("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Eric Prud'hommeaux")
+("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:date", "2002-01-08")
+
+The corresponding attributed value:
+"http://www.w3.org/2002/01/rdf-databases/protocol" attr
+             {"dc:creator" = {"Sandro Hawke", "Eric Prud'hommeaux"}; "dc:date" = "2002-01-08"}
+\end{verbatim} \end{footnotesize}
+\vskip-1pc
+\caption{The representation of a pool of {\RDF} triples} \label{AVOne}
+\end{figure}
+
+\figref{AVOne} shows how a set of triples can be coded in an {\av}.
+Note that the word \emph{attr} separates the head string from its attributes,
+braces enclose an attribute group in which attributes are separated by
+semicolons, and an equal sign separates an attribute name from its contents
+(see \subsecref{Textual} for the complete {\av} syntax).
+
+In this setting the grouping feature can be used to separate semantically
+different classes of properties associated to a resource (as for instance
+Dublin Core metadata, Euler metadata and user-defined metadata). 
+
+\item
+A pool of arbitrarily chosen {\RDF} triples is encoded in an {\av} set 
+placing different {\av}'s the subset of triples sharing the same subject.
+
+Note that the use of {\av} sets to build query results allows {\MathQL} queries
+to return sets of {\RDF} triples instead of mere sets of resources, in the
+spirit of what is currently done by other {\RDF}-oriented query languages.
+
+If the {\av}'s of an {\av} set share the same attribute names and grouping
+structure, this set can be represented as a table in which each row encodes
+an {\av} and each column is associated to an attribute (except the first one
+which holds the head strings).
+\figref{Table} shows an {\av} set describing the properties of two resources
+``A'' and ``B'' giving its table representation, in which the columns
+corresponding to attributes in the same group are clustered between
+double-line delimiters%
+\footnote{A table with grouped labelled columns like the one above resembles a
+set of relational database tables.}.   
+
+%Another possible use of a {\MathQL} query result is for the encoding of a
+%relational database table: in this sense the indexed column is stored in the
+%subject strings, the names of the other columns are stored in attribute names
+%and cell contents are stored in attribute values.
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+"A" attr {"major" = "1"; "minor" = "2"}, {"first" = "2002-01-01"; "modified" = "2002-03-01"};
+"B" attr {"major" = "1"; "minor" = "7"}, {"first" = "2002-02-01"; "modified" = "2002-04-01"}
+\end{verbatim}
+\begin{center} \begin{tabular}{|c||c|c||c|c||}
+\hline   & {\bf ``major''} & {\bf ``minor''} & {\bf ``first''} & {\bf ``modified''} \\
+\hline ``A'' & ``1'' & ``2'' & ``2002-01-01'' & ``2002-03-01'' \\
+\hline ``B'' & ``1'' & ``7'' & ``2002-02-01'' & ``2002-04-01'' \\
+\hline
+\end{tabular} \end{center} \end{footnotesize}
+\caption{A set of attributed values displayed as a table} \label{Table}
+\end{figure}
+
+The above example gives a spatial idea of the geometry of an {\av} set ({\ie}
+a query result) which fits in 4 dimensions: namely we can extend independently
+the set of the head strings (dimension 1), the attributes in each group
+(dimension 2), the groups in each {\av} (dimension 3) and the contents of each
+attribute (dimension 4).
+
+The metadata defined in the table of \figref{Table} will be used in subsequent
+examples.
+For this purpose assume that \TT{first} and \TT{modified} are the components
+of a structured property \TT{date} available for the resources ``A'' and ``B''.
+
+\item
+The value of an {\RDF} property is encoded in a single {\av} distinguishing
+three situations:
+
+\begin{itemize}
+
+\item
+If the property is unstructured, its value is placed in the {\av} head
+string and no attributes are defined.
+
+\item
+If the property is structured and its value has a main component%
+\footnote{Which is set by the \emph{rdf:value} property or defined by a
+specific application.},
+the content of this component is placed in the {\av} head string and the
+other components are stored in the {\av} attributes as in the case 1.
+
+\item
+If the property is structured and its value does not have a main component,
+the {\av} head string is empty and the components are stored in the
+attributes.
+
+\end{itemize}
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+First example, one instance:
+"" attr {"major" = "1"; "minor" = "2"};  no main component
+"1" attr {"minor" = "2"};                main component is "major"
+"2" attr {"major" = "1"}                 main component is "minor"
+
+Second example: two separate instances:
+"" attr {"major" = "1"; "minor" = "2"}, {"major" = "1"; "minor" = "7"}; no main component
+"1" attr {"minor" = "2"}, {"minor" = "7"}                            main component is "major"
+
+Third example: two mixed instances:
+"" attr {"major" = "3", "6"; "minor" = "4", "9"} no main component
+\end{verbatim} \end{footnotesize}
+\vskip-1pc
+\caption{The representation of the structured value of a property}
+\label{AVTwo}
+\end{figure}
+
+\figref{AVTwo} (first example) shows three possible ways of representing in
+{\av}'s an instance of a structured property \TT{id} whose value has two
+fields ({\ie} properties) \TT{major} and \TT{minor}.
+In this instance, \TT{major} is set to ``1'' and \TT{minor} is set to ``2''.
+The representations depend on which component of \TT{id} is chosen as the
+main component (none, \TT{major} or \TT{minor} respectively).
+Several structured property values sharing a common main component can be
+encodes in a single {\av} exploiting the grouping facility: in this case the
+attributes of every instance are enclosed in separate groups.
+\figref{AVTwo} (second example) shows the representations of two instances of
+\TT{id}: the previous one and a new one for which \TT{major} is ``1'' and
+\TT{minor} is ``7''.
+
+Note that if the attributes of the two groups are encoded in a single group,
+the notion of which components belong to the same property value can not be
+recovered in the general case because the values of an attribute form a set
+and thus are unordered. \newline
+As an example think of two instances of \TT{id} encoded as in \figref{AVTwo}
+(third example).
+
+\item
+A natural number is stored, using its decimal representation, in the head
+string of a single {\av} with no attributes.
+
+\item
+The boolean value \emph{false} is stored as an empty {\av} set, whereas
+a non-empty {\av} set may be interpreted as the boolean value \emph{true}.
+The default representation of \emph{true} is a single {\av} with an empty
+head string and no attributes.
+
+\end{enumerate}
+
+{\MathQL} defines five binary operations on {\av} sets: two unions, two
+intersections and a difference. The first four are defined in terms of an
+operation, that we call \emph{addition}, involving two {\av}'s with the same
+head string.
+The result is an {\av} with the same head string of the operands but there are
+two ways to compose the attribute groups:
+
+\begin{itemize}
+
+\item
+With the \emph{set-theoretic} addition, the set of attribute groups in the
+resulting {\av} is the set-theoretic union of the sets of attribute groups in
+the operands.
+
+\item
+With the \emph{distributive} addition, the set of attribute groups in the
+resulting {\av} is the ``Cartesian product'' of the sets of attribute groups
+in the two operands. 
+In this context, an element of the ``Cartesian product'' is not a pair of
+groups but it is the set-theoretic union of these groups where the contents of
+homonymous attributes are clustered together using set-theoretic unions.
+
+\end{itemize}
+
+\figref{Addition} shows an example of the two kinds of addition.
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+Attributed values used as operands for the addition:
+"1" attr {"A" = "a"}, {"B" = "b1"}
+"1" attr {"A" = "a"}, {"B" = "b2"}
+
+Set-theoretic addition:
+"1" attr {"A" = "a"}, {"B" = "b1"}, {"B" = "b2"}
+
+Distributive addition:
+"1" attr {"A" = "a"}, {"A" = "a"; "B" = "b2"}, {"B" = "b1"; "A" = "a"}, {"B" = {"b1", "b2"}}
+\end{verbatim} \end{footnotesize}
+\vskip-1pc
+\caption{The addition of attributed values}
+\label{Addition}
+\end{figure}
+
+Now we can discuss the five operations between {\av} sets that we mentioned
+above:
+
+\begin{itemize}
+
+\item
+The two unions ocorresponds to the set-theoretic union of their operand where
+the {\av}'s sharing the head string are are added either set-theoretically or
+distributively as explained above (thus we have a set-theoretic union and a
+distributive union in the two cases). In this context the empty {\av} set
+plays the role of the neutral element. 
+These operations play a central role {\MathQL} architecture and allow to
+compose the attributes of the operands preserving their group structure.
+
+\item
+The two intersections are the dual of the above unions: they contain the
+{\av}'s whose head string appears in each argument where {\av}'s sharing the
+head string are added either set-theoretically or distributively as before.
+
+The distributive intersection has the double benefit of filtering the
+common values of the given {\av} sets, and of merging their attribute groups
+in every possible way. This feature enables the possibility of performing
+additional filtering operations checking the content of the merged groups.
+
+\item
+The difference of two {\av} sets contains the {\av}'s of the first
+argument whose head string does not appear in the second argument.
+
+\end{itemize}
+
+\figref{Binary} shows how the above operations work in a simple example.
+
+\begin{figure}[ht]
+\begin{footnotesize} \begin{verbatim}
+Sets of attributed values used as operands for the operations:
+"1" attr {"A" = "a"}; "2" attr {"B" = "b1"} 
+"2" attr {"B" = "b2"}
+
+Set-theoretic union:
+"1" attr {"A" = "a"}; "2" attr {"B" = "b1"}, {"B" = "b2"}
+
+Distributive union:
+"1" attr {"A" = "a"}; "2" attr {"B" = {"b1", "b2"}}
+
+Set-theoretic intersection:
+"2" attr {"B" = "b1"}, {"B" = "b2"}
+
+Distributive intersection:
+"2" attr {"B" = {"b1", "b2"}}
+
+Difference:
+"1" attr {"A" = "a"}
+\end{verbatim} \end{footnotesize}
+\vskip-1pc
+\caption{The binary operations on sets of attributed values}
+\label{Binary}
+\end{figure}