X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2Fmathql%2Fdoc%2Fmathql_introduction_avsets.tex;fp=helm%2Fmathql%2Fdoc%2Fmathql_introduction_avsets.tex;h=f90ea924708df73306215faffbc3306a7371c416;hb=468da7af4b52d01451073ff1cca5aa1949b9657f;hp=0000000000000000000000000000000000000000;hpb=ac98db69d03adb7be581f4f4cc5577991bc0572d;p=helm.git diff --git a/helm/mathql/doc/mathql_introduction_avsets.tex b/helm/mathql/doc/mathql_introduction_avsets.tex new file mode 100644 index 000000000..f90ea9247 --- /dev/null +++ b/helm/mathql/doc/mathql_introduction_avsets.tex @@ -0,0 +1,301 @@ +\subsection {Sets of attributed values.} + +The data representation model used by {\MathQL} relies on the notion of +\emph{set of attributed values} ({\av} set for short) that is, in practice, +the only data type available in {\MathQL}.4. In this sense {\MathQL}.4 is a +statically untyped language.% +\footnote +{A type system that fits {\MathQL} as an {\RDF}-oriented query language, +should be driven from the {\RDFS} class system. This may be a future +improvement.} +Each {\av} in an {\av} set consists of a string% +\footnote{When we say \emph{string}, we mean a finite sequence of characters.} +(that we call the \emph{head string} or \emph{value}) and a (possibly emty) +multiset of named attributes whose content is a set of strings. +Attribute names are made of a (possibly empty) list of string components, so +they can be hierarchically structured. +Moreover the attributes of a value are partitioned into a set of \emph{groups} +({\ie} subsets) to improve its structure. + +In the above description a \emph{set} is an \emph{unordered} finite +sequence \emph{without} repetitions wheras a \emph{multiset} is an +\emph{unordered} finite sequence \emph{with} repetitions. + +In the present context repetitions are defined as follows: +two {\av}'s are repeated if they share the same head string without any +condition on their attributes, two groups are repeated of they contain the +same attributes (equal both in name and content), two attributes of a group +are repeated if they share the same name without any condition on their +content, and two strings are always compared in a case-sensitive manner.% +\footnote +{The Author's experience with {\MathQL} seems to show that the above +definition of an {\av} set is just the right one among the many alternatives +that were tried.} + +As we said, {\MathQL}.4 uses {\av} sets to represent many kinds of +information, namely: + +\begin{enumerate} + +\item +A pool of {\RDF} triples having a common subject $r$, which in general is a +{\URI} reference \cite{URI}% +\footnote +{A {\URI} \emph {reference} is a {\URI} with an optional fragment identifier.}, +is encoded in a single {\av} placing $r$ in the head string. +The predicates of the triples are encoded as attribute names and their objects +are placed in the attributes' contents. +These contents are structured as multiple strings with the aim of holding the +objects of repeated predicates. +Moreover structured attribute names can encode various components of +structured properties preserving their semantics. + +\begin{figure}[ht] +\begin{footnotesize} \begin{verbatim} +The RDF triples: +("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Sandro Hawke") +("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Eric Prud'hommeaux") +("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:date", "2002-01-08") + +The corresponding attributed value: +"http://www.w3.org/2002/01/rdf-databases/protocol" attr + {"dc:creator" = {"Sandro Hawke", "Eric Prud'hommeaux"}; "dc:date" = "2002-01-08"} +\end{verbatim} \end{footnotesize} +\vskip-1pc +\caption{The representation of a pool of {\RDF} triples} \label{AVOne} +\end{figure} + +\figref{AVOne} shows how a set of triples can be coded in an {\av}. +Note that the word \emph{attr} separates the head string from its attributes, +braces enclose an attribute group in which attributes are separated by +semicolons, and an equal sign separates an attribute name from its contents +(see \subsecref{Textual} for the complete {\av} syntax). + +In this setting the grouping feature can be used to separate semantically +different classes of properties associated to a resource (as for instance +Dublin Core metadata, Euler metadata and user-defined metadata). + +\item +A pool of arbitrarily chosen {\RDF} triples is encoded in an {\av} set +placing different {\av}'s the subset of triples sharing the same subject. + +Note that the use of {\av} sets to build query results allows {\MathQL} queries +to return sets of {\RDF} triples instead of mere sets of resources, in the +spirit of what is currently done by other {\RDF}-oriented query languages. + +If the {\av}'s of an {\av} set share the same attribute names and grouping +structure, this set can be represented as a table in which each row encodes +an {\av} and each column is associated to an attribute (except the first one +which holds the head strings). +\figref{Table} shows an {\av} set describing the properties of two resources +``A'' and ``B'' giving its table representation, in which the columns +corresponding to attributes in the same group are clustered between +double-line delimiters% +\footnote{A table with grouped labelled columns like the one above resembles a +set of relational database tables.}. + +%Another possible use of a {\MathQL} query result is for the encoding of a +%relational database table: in this sense the indexed column is stored in the +%subject strings, the names of the other columns are stored in attribute names +%and cell contents are stored in attribute values. + +\begin{figure}[ht] +\begin{footnotesize} \begin{verbatim} +"A" attr {"major" = "1"; "minor" = "2"}, {"first" = "2002-01-01"; "modified" = "2002-03-01"}; +"B" attr {"major" = "1"; "minor" = "7"}, {"first" = "2002-02-01"; "modified" = "2002-04-01"} +\end{verbatim} +\begin{center} \begin{tabular}{|c||c|c||c|c||} +\hline & {\bf ``major''} & {\bf ``minor''} & {\bf ``first''} & {\bf ``modified''} \\ +\hline ``A'' & ``1'' & ``2'' & ``2002-01-01'' & ``2002-03-01'' \\ +\hline ``B'' & ``1'' & ``7'' & ``2002-02-01'' & ``2002-04-01'' \\ +\hline +\end{tabular} \end{center} \end{footnotesize} +\caption{A set of attributed values displayed as a table} \label{Table} +\end{figure} + +The above example gives a spatial idea of the geometry of an {\av} set ({\ie} +a query result) which fits in 4 dimensions: namely we can extend independently +the set of the head strings (dimension 1), the attributes in each group +(dimension 2), the groups in each {\av} (dimension 3) and the contents of each +attribute (dimension 4). + +The metadata defined in the table of \figref{Table} will be used in subsequent +examples. +For this purpose assume that \TT{first} and \TT{modified} are the components +of a structured property \TT{date} available for the resources ``A'' and ``B''. + +\item +The value of an {\RDF} property is encoded in a single {\av} distinguishing +three situations: + +\begin{itemize} + +\item +If the property is unstructured, its value is placed in the {\av} head +string and no attributes are defined. + +\item +If the property is structured and its value has a main component% +\footnote{Which is set by the \emph{rdf:value} property or defined by a +specific application.}, +the content of this component is placed in the {\av} head string and the +other components are stored in the {\av} attributes as in the case 1. + +\item +If the property is structured and its value does not have a main component, +the {\av} head string is empty and the components are stored in the +attributes. + +\end{itemize} + +\begin{figure}[ht] +\begin{footnotesize} \begin{verbatim} +First example, one instance: +"" attr {"major" = "1"; "minor" = "2"}; no main component +"1" attr {"minor" = "2"}; main component is "major" +"2" attr {"major" = "1"} main component is "minor" + +Second example: two separate instances: +"" attr {"major" = "1"; "minor" = "2"}, {"major" = "1"; "minor" = "7"}; no main component +"1" attr {"minor" = "2"}, {"minor" = "7"} main component is "major" + +Third example: two mixed instances: +"" attr {"major" = "3", "6"; "minor" = "4", "9"} no main component +\end{verbatim} \end{footnotesize} +\vskip-1pc +\caption{The representation of the structured value of a property} +\label{AVTwo} +\end{figure} + +\figref{AVTwo} (first example) shows three possible ways of representing in +{\av}'s an instance of a structured property \TT{id} whose value has two +fields ({\ie} properties) \TT{major} and \TT{minor}. +In this instance, \TT{major} is set to ``1'' and \TT{minor} is set to ``2''. +The representations depend on which component of \TT{id} is chosen as the +main component (none, \TT{major} or \TT{minor} respectively). +Several structured property values sharing a common main component can be +encodes in a single {\av} exploiting the grouping facility: in this case the +attributes of every instance are enclosed in separate groups. +\figref{AVTwo} (second example) shows the representations of two instances of +\TT{id}: the previous one and a new one for which \TT{major} is ``1'' and +\TT{minor} is ``7''. + +Note that if the attributes of the two groups are encoded in a single group, +the notion of which components belong to the same property value can not be +recovered in the general case because the values of an attribute form a set +and thus are unordered. \newline +As an example think of two instances of \TT{id} encoded as in \figref{AVTwo} +(third example). + +\item +A natural number is stored, using its decimal representation, in the head +string of a single {\av} with no attributes. + +\item +The boolean value \emph{false} is stored as an empty {\av} set, whereas +a non-empty {\av} set may be interpreted as the boolean value \emph{true}. +The default representation of \emph{true} is a single {\av} with an empty +head string and no attributes. + +\end{enumerate} + +{\MathQL} defines five binary operations on {\av} sets: two unions, two +intersections and a difference. The first four are defined in terms of an +operation, that we call \emph{addition}, involving two {\av}'s with the same +head string. +The result is an {\av} with the same head string of the operands but there are +two ways to compose the attribute groups: + +\begin{itemize} + +\item +With the \emph{set-theoretic} addition, the set of attribute groups in the +resulting {\av} is the set-theoretic union of the sets of attribute groups in +the operands. + +\item +With the \emph{distributive} addition, the set of attribute groups in the +resulting {\av} is the ``Cartesian product'' of the sets of attribute groups +in the two operands. +In this context, an element of the ``Cartesian product'' is not a pair of +groups but it is the set-theoretic union of these groups where the contents of +homonymous attributes are clustered together using set-theoretic unions. + +\end{itemize} + +\figref{Addition} shows an example of the two kinds of addition. + +\begin{figure}[ht] +\begin{footnotesize} \begin{verbatim} +Attributed values used as operands for the addition: +"1" attr {"A" = "a"}, {"B" = "b1"} +"1" attr {"A" = "a"}, {"B" = "b2"} + +Set-theoretic addition: +"1" attr {"A" = "a"}, {"B" = "b1"}, {"B" = "b2"} + +Distributive addition: +"1" attr {"A" = "a"}, {"A" = "a"; "B" = "b2"}, {"B" = "b1"; "A" = "a"}, {"B" = {"b1", "b2"}} +\end{verbatim} \end{footnotesize} +\vskip-1pc +\caption{The addition of attributed values} +\label{Addition} +\end{figure} + +Now we can discuss the five operations between {\av} sets that we mentioned +above: + +\begin{itemize} + +\item +The two unions ocorresponds to the set-theoretic union of their operand where +the {\av}'s sharing the head string are are added either set-theoretically or +distributively as explained above (thus we have a set-theoretic union and a +distributive union in the two cases). In this context the empty {\av} set +plays the role of the neutral element. +These operations play a central role {\MathQL} architecture and allow to +compose the attributes of the operands preserving their group structure. + +\item +The two intersections are the dual of the above unions: they contain the +{\av}'s whose head string appears in each argument where {\av}'s sharing the +head string are added either set-theoretically or distributively as before. + +The distributive intersection has the double benefit of filtering the +common values of the given {\av} sets, and of merging their attribute groups +in every possible way. This feature enables the possibility of performing +additional filtering operations checking the content of the merged groups. + +\item +The difference of two {\av} sets contains the {\av}'s of the first +argument whose head string does not appear in the second argument. + +\end{itemize} + +\figref{Binary} shows how the above operations work in a simple example. + +\begin{figure}[ht] +\begin{footnotesize} \begin{verbatim} +Sets of attributed values used as operands for the operations: +"1" attr {"A" = "a"}; "2" attr {"B" = "b1"} +"2" attr {"B" = "b2"} + +Set-theoretic union: +"1" attr {"A" = "a"}; "2" attr {"B" = "b1"}, {"B" = "b2"} + +Distributive union: +"1" attr {"A" = "a"}; "2" attr {"B" = {"b1", "b2"}} + +Set-theoretic intersection: +"2" attr {"B" = "b1"}, {"B" = "b2"} + +Distributive intersection: +"2" attr {"B" = {"b1", "b2"}} + +Difference: +"1" attr {"A" = "a"} +\end{verbatim} \end{footnotesize} +\vskip-1pc +\caption{The binary operations on sets of attributed values} +\label{Binary} +\end{figure}