helm/mathql/doc/mathql_introduction_avsets.tex

   1 \subsection {Sets of attributed values.} \label{AVSets}
   2
   3 The data representation model used by {\MathQL} relies on the notion of
   4 \emph{set of attributed values} ({\av} set for short) that is, in practice,
   5 the only data type available in {\MathQL}.4. In this sense {\MathQL}.4 is a
   6 statically untyped language.%
   7 \footnote
   8 {A type system that fits {\MathQL} as an {\RDF}-oriented query language,
   9 should be driven from the {\RDFS} class system. This may be a future
  10 improvement.}
  11 Each {\av} in an {\av} set consists of a string%
  12 \footnote{When we say \emph{string}, we mean a finite sequence of characters.}
  13 (that we call the \emph{head string} or \emph{value}) and a (possibly empty)
  14 multiset of named attributes whose content is a set of strings.
  15 Attribute names are made of a (possibly empty) list of string components, so
  16 they can be hierarchically structured.
  17 Moreover the attributes of a value are partitioned into a set of \emph{groups}
  18 ({\ie} subsets) to improve its structure.
  19
  20 In the above description a \emph{set} is an \emph{unordered} finite
  21 sequence \emph{without} repetitions whereas a \emph{multiset} is an
  22 \emph{unordered} finite sequence \emph{with} repetitions.
  23
  24 In the present context repetitions are defined as follows:
  25 two {\av}'s are repeated if they share the same head string without any
  26 condition on their attributes, two groups are repeated of they contain the
  27 same attributes (equal both in name and content), two attributes of a group
  28 are repeated if they share the same name without any condition on their
  29 content, and two strings are always compared in a case-sensitive manner.%
  30 \footnote
  31 {The Author's experience with {\MathQL} seems to show that the above
  32 definition of an {\av} set is just the right one among the many alternatives
  33 that were tried.}
  34
  35 As we said, {\MathQL}.4 uses {\av} sets to represent many kinds of
  36 information:
  37
  38 \begin{enumerate}
  39
  40 \item
  41 A pool of {\RDF} triples having a common subject $r$, which in general is a
  42 {\URI} reference \cite{URI}%
  43 \footnote
  44 {A {\URI} \emph {reference} is a {\URI} with an optional fragment identifier.},
  45 is encoded in a single {\av} placing $r$ in the head string.
  46 The predicates of the triples are encoded as attribute names and their objects
  47 are placed in the attributes' contents.
  48 These contents are structured as multiple strings with the aim of holding the
  49 objects of repeated predicates.
  50 Moreover structured attribute names can encode various components of
  51 structured properties preserving their semantics.
  52
  53 \begin{figure}
  54 \begin{footnotesize} \begin{verbatim}
  55 The RDF triples:
  56  ("protocol", "dc:creator", "Sandro Hawke")
  57  ("protocol", "dc:creator", "Eric Prud'hommeaux")
  58  ("protocol", "dc:date", "2002-01-08")
  59
  60 The corresponding attributed value:
  61  "protocol" attr {/"dc:creator" = {"Sandro Hawke", "Eric Prud'hommeaux"};
  62                   /"dc:date" = "2002-01-08"}
  63 \end{verbatim} \end{footnotesize}
  64 \vspace{-1pc}
  65 \caption{The representation of a pool of {\RDF} triples} \label{AVOne}
  66 \end{figure}
  67
  68 \figref{AVOne} shows how a set of triples can be coded in an {\av}.
  69 Note that the word \TT{attr} separates the head string from its attributes,
  70 braces enclose an attribute group in which attributes are separated by
  71 semicolons, and an equal sign separates an attribute name from its contents.
  72
  73 In this setting the grouping feature can be used to separate semantically
  74 different classes of properties associated to a resource (as for instance
  75 Dublin Core metadata, Euler metadata and user-defined metadata).
  76
  77 \item
  78 A pool of arbitrarily chosen {\RDF} triples is encoded in an {\av} set
  79 placing in each {\av} the subset of triples sharing the same head string.
  80
  81 Note that the use of {\av} sets to build query results allows {\MathQL} queries
  82 to return sets of {\RDF} triples instead of mere sets of resources, in the
  83 spirit of what is currently done by other {\RDF}-oriented query languages.
  84
  85 If the {\av}'s of an {\av} set share the same attribute names and grouping
  86 structure, this set can be represented as a table in which each row encodes
  87 an {\av} and each column is associated to an attribute (except the first one
  88 which holds the head strings).
  89 \figref{Table} shows an {\av} set describing the properties of two resources
  90 ``A'' and ``B'' giving its table representation, in which the columns
  91 corresponding to attributes in the same group are clustered between
  92 double-line delimiters.%
  93 \footnote{A table with grouped labelled columns like the one above resembles a
  94 set of relational database tables.}
  95
  96 \begin{figure}
  97 \begin{footnotesize} \begin{verbatim}
  98 "A" attr {/"major" = "1"; /"minor" = "2"},
  99          {/"first" = "2002-01-01"; /"modified" = "2002-03-01"};
 100 "B" attr {/"major" = "1"; /"minor" = "7"},
 101          {/"first" = "2002-02-01"; /"modified" = "2002-04-01"}
 102 \end{verbatim}
 103 \begin{center} \begin{tabular}{|c||c|c||c|c||}
 104 \hline   & \textbf{``major''} & \textbf{``minor''} & \textbf{``first''} & \textbf{``modified''} \\
 105 \hline ``A'' & ``1'' & ``2'' & ``2002-01-01'' & ``2002-03-01'' \\
 106 \hline ``B'' & ``1'' & ``7'' & ``2002-02-01'' & ``2002-04-01'' \\
 107 \hline
 108 \end{tabular} \end{center} \end{footnotesize}
 109 \caption{A set of attributed values displayed as a table} \label{Table}
 110 \end{figure}
 111
 112 The above example gives a spatial idea of the geometry of an {\av} set ({\ie}
 113 a query result) which fits in 4 dimensions: namely we can extend independently
 114 the set of the head strings (dimension 1), the attributes in each group
 115 (dimension 2), the groups in each {\av} (dimension 3) and the contents of each
 116 attribute (dimension 4).
 117 The metadata defined in the table of \figref{Table} will be used in subsequent
 118 examples.
 119 For this purpose assume that ``first'' and ``modified'' are the components
 120 of a structured property ``date'' available for the resources ``A'' and ``B''.
 121
 122 \item
 123 The value of an {\RDF} property is encoded in an {\av} distinguishing three
 124 cases:
 125
 126 \begin{itemize}
 127
 128 \item
 129 If the property is unstructured, its value is placed in the {\av} head
 130 string and no attributes are defined.
 131
 132 \item
 133 If the property is structured and its value has a main component%
 134 \footnote{Which is set by the \emph{rdf:value} property or defined by a
 135 specific application.},
 136 the content of this component is placed in the {\av} head string and the
 137 other components are stored in the {\av} attributes as in the case 1.
 138
 139 \item
 140 For the value of a structured property without a main component, the head
 141 string is empty and the components are stored in the attributes.
 142
 143 \end{itemize}
 144
 145 \begin{figure}
 146 \begin{footnotesize} \begin{verbatim}
 147 First example, one instance:
 148  "" attr {/"major" = "1"; /"minor" = "2"} no main component
 149  "1" attr {/"minor" = "2"} main component is "major"
 150  "2" attr {/"major" = "1"} main component is "minor"
 151
 152 Second example: two separate instances:
 153  "" attr {/"major" = "1"; /"minor" = "2"},
 154          {/"major" = "1"; /"minor" = "7"} no main component
 155  "1" attr {/"minor" = "2"}, {/"minor" = "7"} main component is "major"
 156
 157 Third example: two mixed instances:
 158  "" attr {/"major" = "3", "6"; /"minor" = {"4", "9"}} no main component
 159 \end{verbatim} \end{footnotesize}
 160 \vspace{-1pc}
 161 \caption{The representation of the structured value of a property}
 162 \label{AVTwo}
 163 \end{figure}
 164
 165 \figref{AVTwo} (first example) shows three possible ways of representing in
 166 {\av}'s an instance of a structured property ``id'' whose value has two
 167 fields ({\ie} properties) ``major'' and ``minor''.
 168 In this instance, ``major'' is set to ``1'' and ``minor'' is set to ``2''.
 169 The representations depend on which component of ``id'' is chosen as the
 170 main component (none, ``major'' or ``minor'' respectively).
 171 Several structured property values sharing a common main component can be
 172 encodes in a single {\av} exploiting the grouping facility: in this case the
 173 attributes of every instance are enclosed in separate groups.
 174 \figref{AVTwo} (second example) shows the representations of two instances of
 175 ``id'': the former and a new one for which ``major'' is ``1'' and ``minor'' is
 176 ``7''.
 177
 178 Note that if the attributes of the two groups are encoded in a single group,
 179 the notion of which components belong to the same property value can not be
 180 recovered in the general case because the values of an attribute form a set
 181 and thus are unordered.
 182 As an example think of two instances of ``id'' encoded as in \figref{AVTwo}
 183 (third example).
 184
 185 \item
 186 A natural number is stored, using its decimal representation, in the head
 187 string of a single {\av} with no attributes.
 188
 189 \item
 190 The boolean value \emph{false} is stored as an empty {\av} set, whereas
 191 an inhabited {\av} set may be interpreted as the boolean value \emph{true}.
 192 The default representation of \emph{true} is a single {\av} with an empty
 193 head string and no attributes.
 194
 195 \end{enumerate}
 196
 197 {\MathQL} defines five core binary operations on {\av} sets: two unions, two
 198 intersections and a difference. The first four are defined in terms of an
 199 operation, that we call \emph{addition}, involving two {\av}'s with the same
 200 head string.
 201 The result is an {\av} with the same head string of the operands but there are
 202 two ways to compose the attribute groups:
 203
 204 \begin{itemize}
 205
 206 \item
 207 with the \emph{set-theoretic} addition, the set of attribute groups in the
 208 resulting {\av} is the set-theoretic union of the sets of attribute groups in
 209 the operands;
 210
 211 \item
 212 with the \emph{distributive} addition, the set of attribute groups in the
 213 resulting {\av} is the ``Cartesian product'' of the sets of attribute groups
 214 in the two operands.
 215 Here an element of the ``Cartesian product'' is not a pair of groups but it is
 216 the set-theoretic union of these groups where the contents of homonymous
 217 attributes are clustered together using set-theoretic unions.
 218
 219 \end{itemize}
 220
 221 \figref{Addition} shows an example of the two kinds of addition.
 222
 223 \begin{figure}
 224 \begin{footnotesize} \begin{verbatim}
 225 Attributed values used as operands for the addition:
 226  "1" attr {/"A" = "a"}, {/"B" = "b1"}
 227  "1" attr {/"A" = "a"}, {/"B" = "b2"}
 228
 229 Set-theoretic addition:
 230 " 1" attr {/"A" = "a"}, {/"B" = "b1"}, {/"B" = "b2"}
 231
 232 Distributive addition:
 233  "1" attr {/"A" = "a"}, {/"A" = "a"; /"B" = "b2"},
 234           {/"B" = "b1"; /"A" = "a"}, {/"B" = {"b1", "b2"}}
 235 \end{verbatim} \end{footnotesize}
 236 \vspace{-1pc}
 237 \caption{The addition of attributed values}
 238 \label{Addition}
 239 \end{figure}
 240
 241 Now we can discuss the five operations between {\av} sets:
 242
 243 \begin{itemize}
 244
 245 \item
 246 The two unions corresponds to the set-theoretic union of their operand where
 247 the {\av}'s sharing the head string are added either set-theoretically or
 248 distributively as explained above (thus we have a set-theoretic union and a
 249 distributive union in the two cases). In this context the empty {\av} set
 250 plays the role of the neutral element.
 251 These operations play a central role {\MathQL} architecture and allow to
 252 compose the attributes of the operands preserving their group structure.
 253
 254 \item
 255 The two intersections are the dual of the above unions: they contain the
 256 {\av}'s whose head string appears in each argument where the {\av}'s sharing
 257 the head string are added either set-theoretically or distributively as before.
 258
 259 The distributive intersection has the double benefit of filtering the
 260 common values of the given {\av} sets, and of merging their attribute groups
 261 in every possible way. This feature enables the possibility of performing
 262 additional filtering operations checking the content of the merged groups.
 263
 264 \item
 265 The difference of two {\av} sets contains the {\av}'s of the first
 266 argument whose head string does not appear in the second argument.
 267
 268 \end{itemize}
 269
 270 \figref{Binary} shows how the above operations work in a simple example.
 271
 272 \begin{figure}
 273 \begin{footnotesize} \begin{verbatim}
 274 Sets of attributed values used as operands for the operations:
 275  "1" attr {/"A" = "a"}; "2" attr {/"B" = "b1"}
 276  "2" attr {/"B" = "b2"}
 277
 278 Set-theoretic union:
 279  "1" attr {/"A" = "a"}; "2" attr {/"B" = "b1"}, {/"B" = "b2"}
 280
 281 Distributive union:
 282  "1" attr {/"A" = "a"}; "2" attr {/"B" = {"b1", "b2"}}
 283
 284 Set-theoretic intersection:
 285  "2" attr {/"B" = "b1"}, {/"B" = "b2"}
 286
 287 Distributive intersection:
 288  "2" attr {/"B" = {"b1", "b2"}}
 289
 290 Difference:
 291  "1" attr {/"A" = "a"}
 292 \end{verbatim} \end{footnotesize}
 293 \vspace{-1pc}
 294 \caption{The binary operations on sets of attributed values}
 295 \label{Binary}
 296 \end{figure}