X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2Fmathql%2Fdoc%2Fmathql_introduction_avsets.tex;fp=helm%2Fmathql%2Fdoc%2Fmathql_introduction_avsets.tex;h=0000000000000000000000000000000000000000;hb=55dc61d4b5a62883ea5532ed61e8780ca82f4bd7;hp=20e826403421c1b501b9509c327edc7dd155e540;hpb=3e84eec9bbaf93687f72d1a77ca03dea34b50739;p=helm.git diff --git a/helm/mathql/doc/mathql_introduction_avsets.tex b/helm/mathql/doc/mathql_introduction_avsets.tex deleted file mode 100644 index 20e826403..000000000 --- a/helm/mathql/doc/mathql_introduction_avsets.tex +++ /dev/null @@ -1,296 +0,0 @@ -\subsection {Sets of attributed values.} \label{AVSets} - -The data representation model used by {\MathQL} relies on the notion of -\emph{set of attributed values} ({\av} set for short) that is, in practice, -the only data type available in {\MathQL}.4. In this sense {\MathQL}.4 is a -statically untyped language.% -\footnote -{A type system that fits {\MathQL} as an {\RDF}-oriented query language, -should be driven from the {\RDFS} class system. This may be a future -improvement.} -Each {\av} in an {\av} set consists of a string% -\footnote{When we say \emph{string}, we mean a finite sequence of characters.} -(that we call the \emph{head string} or \emph{value}) and a (possibly empty) -multiset of named attributes whose content is a set of strings. -Attribute names are made of a (possibly empty) list of string components, so -they can be hierarchically structured. -Moreover the attributes of a value are partitioned into a set of \emph{groups} -({\ie} subsets) to improve its structure. - -In the above description a \emph{set} is an \emph{unordered} finite -sequence \emph{without} repetitions whereas a \emph{multiset} is an -\emph{unordered} finite sequence \emph{with} repetitions. - -In the present context repetitions are defined as follows: -two {\av}'s are repeated if they share the same head string without any -condition on their attributes, two groups are repeated of they contain the -same attributes (equal both in name and content), two attributes of a group -are repeated if they share the same name without any condition on their -content, and two strings are always compared in a case-sensitive manner.% -\footnote -{The Author's experience with {\MathQL} seems to show that the above -definition of an {\av} set is just the right one among the many alternatives -that were tried.} - -As we said, {\MathQL}.4 uses {\av} sets to represent many kinds of -information: - -\begin{enumerate} - -\item -A pool of {\RDF} triples having a common subject $r$, which in general is a -{\URI} reference \cite{URI}% -\footnote -{A {\URI} \emph {reference} is a {\URI} with an optional fragment identifier.}, -is encoded in a single {\av} placing $r$ in the head string. -The predicates of the triples are encoded as attribute names and their objects -are placed in the attributes' contents. -These contents are structured as multiple strings with the aim of holding the -objects of repeated predicates. -Moreover structured attribute names can encode various components of -structured properties preserving their semantics. - -\begin{figure} -\begin{footnotesize} \begin{verbatim} -The RDF triples: - ("protocol", "dc:creator", "Sandro Hawke") - ("protocol", "dc:creator", "Eric Prud'hommeaux") - ("protocol", "dc:date", "2002-01-08") - -The corresponding attributed value: - "protocol" attr {/"dc:creator" = {"Sandro Hawke", "Eric Prud'hommeaux"}; - /"dc:date" = "2002-01-08"} -\end{verbatim} \end{footnotesize} -\vspace{-1pc} -\caption{The representation of a pool of {\RDF} triples} \label{AVOne} -\end{figure} - -\figref{AVOne} shows how a set of triples can be coded in an {\av}. -Note that the word \TT{attr} separates the head string from its attributes, -braces enclose an attribute group in which attributes are separated by -semicolons, and an equal sign separates an attribute name from its contents. - -In this setting the grouping feature can be used to separate semantically -different classes of properties associated to a resource (as for instance -Dublin Core metadata, Euler metadata and user-defined metadata). - -\item -A pool of arbitrarily chosen {\RDF} triples is encoded in an {\av} set -placing in each {\av} the subset of triples sharing the same head string. - -Note that the use of {\av} sets to build query results allows {\MathQL} queries -to return sets of {\RDF} triples instead of mere sets of resources, in the -spirit of what is currently done by other {\RDF}-oriented query languages. - -If the {\av}'s of an {\av} set share the same attribute names and grouping -structure, this set can be represented as a table in which each row encodes -an {\av} and each column is associated to an attribute (except the first one -which holds the head strings). -\figref{Table} shows an {\av} set describing the properties of two resources -``A'' and ``B'' giving its table representation, in which the columns -corresponding to attributes in the same group are clustered between -double-line delimiters.% -\footnote{A table with grouped labelled columns like the one above resembles a -set of relational database tables.} - -\begin{figure} -\begin{footnotesize} \begin{verbatim} -"A" attr {/"major" = "1"; /"minor" = "2"}, - {/"first" = "2002-01-01"; /"modified" = "2002-03-01"}; -"B" attr {/"major" = "1"; /"minor" = "7"}, - {/"first" = "2002-02-01"; /"modified" = "2002-04-01"} -\end{verbatim} -\begin{center} \begin{tabular}{|c||c|c||c|c||} -\hline & \textbf{``major''} & \textbf{``minor''} & \textbf{``first''} & \textbf{``modified''} \\ -\hline ``A'' & ``1'' & ``2'' & ``2002-01-01'' & ``2002-03-01'' \\ -\hline ``B'' & ``1'' & ``7'' & ``2002-02-01'' & ``2002-04-01'' \\ -\hline -\end{tabular} \end{center} \end{footnotesize} -\caption{A set of attributed values displayed as a table} \label{Table} -\end{figure} - -The above example gives a spatial idea of the geometry of an {\av} set ({\ie} -a query result) which fits in 4 dimensions: namely we can extend independently -the set of the head strings (dimension 1), the attributes in each group -(dimension 2), the groups in each {\av} (dimension 3) and the contents of each -attribute (dimension 4). -The metadata defined in the table of \figref{Table} will be used in subsequent -examples. -For this purpose assume that ``first'' and ``modified'' are the components -of a structured property ``date'' available for the resources ``A'' and ``B''. - -\item -The value of an {\RDF} property is encoded in an {\av} distinguishing three -cases: - -\begin{itemize} - -\item -If the property is unstructured, its value is placed in the {\av} head -string and no attributes are defined. - -\item -If the property is structured and its value has a main component% -\footnote{Which is set by the \emph{rdf:value} property or defined by a -specific application.}, -the content of this component is placed in the {\av} head string and the -other components are stored in the {\av} attributes as in the case 1. - -\item -For the value of a structured property without a main component, the head -string is empty and the components are stored in the attributes. - -\end{itemize} - -\begin{figure} -\begin{footnotesize} \begin{verbatim} -First example, one instance: - "" attr {/"major" = "1"; /"minor" = "2"} no main component - "1" attr {/"minor" = "2"} main component is "major" - "2" attr {/"major" = "1"} main component is "minor" - -Second example: two separate instances: - "" attr {/"major" = "1"; /"minor" = "2"}, - {/"major" = "1"; /"minor" = "7"} no main component - "1" attr {/"minor" = "2"}, {/"minor" = "7"} main component is "major" - -Third example: two mixed instances: - "" attr {/"major" = "3", "6"; /"minor" = {"4", "9"}} no main component -\end{verbatim} \end{footnotesize} -\vspace{-1pc} -\caption{The representation of the structured value of a property} -\label{AVTwo} -\end{figure} - -\figref{AVTwo} (first example) shows three possible ways of representing in -{\av}'s an instance of a structured property ``id'' whose value has two -fields ({\ie} properties) ``major'' and ``minor''. -In this instance, ``major'' is set to ``1'' and ``minor'' is set to ``2''. -The representations depend on which component of ``id'' is chosen as the -main component (none, ``major'' or ``minor'' respectively). -Several structured property values sharing a common main component can be -encodes in a single {\av} exploiting the grouping facility: in this case the -attributes of every instance are enclosed in separate groups. -\figref{AVTwo} (second example) shows the representations of two instances of -``id'': the former and a new one for which ``major'' is ``1'' and ``minor'' is -``7''. - -Note that if the attributes of the two groups are encoded in a single group, -the notion of which components belong to the same property value can not be -recovered in the general case because the values of an attribute form a set -and thus are unordered. -As an example think of two instances of ``id'' encoded as in \figref{AVTwo} -(third example). - -\item -A natural number is stored, using its decimal representation, in the head -string of a single {\av} with no attributes. - -\item -The boolean value \emph{false} is stored as an empty {\av} set, whereas -an inhabited {\av} set may be interpreted as the boolean value \emph{true}. -The default representation of \emph{true} is a single {\av} with an empty -head string and no attributes. - -\end{enumerate} - -{\MathQL} defines five core binary operations on {\av} sets: two unions, two -intersections and a difference. The first four are defined in terms of an -operation, that we call \emph{addition}, involving two {\av}'s with the same -head string. -The result is an {\av} with the same head string of the operands but there are -two ways to compose the attribute groups: - -\begin{itemize} - -\item -with the \emph{set-theoretic} addition, the set of attribute groups in the -resulting {\av} is the set-theoretic union of the sets of attribute groups in -the operands; - -\item -with the \emph{distributive} addition, the set of attribute groups in the -resulting {\av} is the ``Cartesian product'' of the sets of attribute groups -in the two operands. -Here an element of the ``Cartesian product'' is not a pair of groups but it is -the set-theoretic union of these groups where the contents of homonymous -attributes are clustered together using set-theoretic unions. - -\end{itemize} - -\figref{Addition} shows an example of the two kinds of addition. - -\begin{figure} -\begin{footnotesize} \begin{verbatim} -Attributed values used as operands for the addition: - "1" attr {/"A" = "a"}, {/"B" = "b1"} - "1" attr {/"A" = "a"}, {/"B" = "b2"} - -Set-theoretic addition: -" 1" attr {/"A" = "a"}, {/"B" = "b1"}, {/"B" = "b2"} - -Distributive addition: - "1" attr {/"A" = "a"}, {/"A" = "a"; /"B" = "b2"}, - {/"B" = "b1"; /"A" = "a"}, {/"B" = {"b1", "b2"}} -\end{verbatim} \end{footnotesize} -\vspace{-1pc} -\caption{The addition of attributed values} -\label{Addition} -\end{figure} - -Now we can discuss the five operations between {\av} sets: - -\begin{itemize} - -\item -The two unions corresponds to the set-theoretic union of their operand where -the {\av}'s sharing the head string are added either set-theoretically or -distributively as explained above (thus we have a set-theoretic union and a -distributive union in the two cases). In this context the empty {\av} set -plays the role of the neutral element. -These operations play a central role {\MathQL} architecture and allow to -compose the attributes of the operands preserving their group structure. - -\item -The two intersections are the dual of the above unions: they contain the -{\av}'s whose head string appears in each argument where the {\av}'s sharing -the head string are added either set-theoretically or distributively as before. - -The distributive intersection has the double benefit of filtering the -common values of the given {\av} sets, and of merging their attribute groups -in every possible way. This feature enables the possibility of performing -additional filtering operations checking the content of the merged groups. - -\item -The difference of two {\av} sets contains the {\av}'s of the first -argument whose head string does not appear in the second argument. - -\end{itemize} - -\figref{Binary} shows how the above operations work in a simple example. - -\begin{figure} -\begin{footnotesize} \begin{verbatim} -Sets of attributed values used as operands for the operations: - "1" attr {/"A" = "a"}; "2" attr {/"B" = "b1"} - "2" attr {/"B" = "b2"} - -Set-theoretic union: - "1" attr {/"A" = "a"}; "2" attr {/"B" = "b1"}, {/"B" = "b2"} - -Distributive union: - "1" attr {/"A" = "a"}; "2" attr {/"B" = {"b1", "b2"}} - -Set-theoretic intersection: - "2" attr {/"B" = "b1"}, {/"B" = "b2"} - -Distributive intersection: - "2" attr {/"B" = {"b1", "b2"}} - -Difference: - "1" attr {/"A" = "a"} -\end{verbatim} \end{footnotesize} -\vspace{-1pc} -\caption{The binary operations on sets of attributed values} -\label{Binary} -\end{figure}