helm/mathql/doc/mathql_introduction_avsets.tex

   1 \subsection {Sets of attributed values.}
   2
   3 The data representation model used by {\MathQL} relies on the notion of
   4 \emph{set of attributed values} ({\av} set for short) that is, in practice,
   5 the only data type available in {\MathQL}.4. In this sense {\MathQL}.4 is a
   6 statically untyped language.%
   7 \footnote
   8 {A type system that fits {\MathQL} as an {\RDF}-oriented query language,
   9 should be driven from the {\RDFS} class system. This may be a future
  10 improvement.}
  11 Each {\av} in an {\av} set consists of a string%
  12 \footnote{When we say \emph{string}, we mean a finite sequence of characters.}
  13 (that we call the \emph{head string} or \emph{value}) and a (possibly emty)
  14 multiset of named attributes whose content is a set of strings.
  15 Attribute names are made of a (possibly empty) list of string components, so
  16 they can be hierarchically structured.
  17 Moreover the attributes of a value are partitioned into a set of \emph{groups}
  18 ({\ie} subsets) to improve its structure.
  19
  20 In the above description a \emph{set} is an \emph{unordered} finite
  21 sequence \emph{without} repetitions wheras a \emph{multiset} is an
  22 \emph{unordered} finite sequence \emph{with} repetitions.
  23
  24 In the present context repetitions are defined as follows:
  25 two {\av}'s are repeated if they share the same head string without any
  26 condition on their attributes, two groups are repeated of they contain the
  27 same attributes (equal both in name and content), two attributes of a group
  28 are repeated if they share the same name without any condition on their
  29 content, and two strings are always compared in a case-sensitive manner.%
  30 \footnote
  31 {The Author's experience with {\MathQL} seems to show that the above
  32 definition of an {\av} set is just the right one among the many alternatives
  33 that were tried.}
  34
  35 As we said, {\MathQL}.4 uses {\av} sets to represent many kinds of
  36 information, namely:
  37
  38 \begin{enumerate}
  39
  40 \item
  41 A pool of {\RDF} triples having a common subject $r$, which in general is a
  42 {\URI} reference \cite{URI}%
  43 \footnote
  44 {A {\URI} \emph {reference} is a {\URI} with an optional fragment identifier.},
  45 is encoded in a single {\av} placing $r$ in the head string.
  46 The predicates of the triples are encoded as attribute names and their objects
  47 are placed in the attributes' contents.
  48 These contents are structured as multiple strings with the aim of holding the
  49 objects of repeated predicates.
  50 Moreover structured attribute names can encode various components of
  51 structured properties preserving their semantics.
  52
  53 \begin{figure}[ht]
  54 \begin{footnotesize} \begin{verbatim}
  55 The RDF triples:
  56 ("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Sandro Hawke")
  57 ("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:creator", "Eric Prud'hommeaux")
  58 ("http://www.w3.org/2002/01/rdf-databases/protocol", "dc:date", "2002-01-08")
  59
  60 The corresponding attributed value:
  61 "http://www.w3.org/2002/01/rdf-databases/protocol" attr
  62              {"dc:creator" = {"Sandro Hawke", "Eric Prud'hommeaux"}; "dc:date" = "2002-01-08"}
  63 \end{verbatim} \end{footnotesize}
  64 \vskip-1pc
  65 \caption{The representation of a pool of {\RDF} triples} \label{AVOne}
  66 \end{figure}
  67
  68 \figref{AVOne} shows how a set of triples can be coded in an {\av}.
  69 Note that the word \emph{attr} separates the head string from its attributes,
  70 braces enclose an attribute group in which attributes are separated by
  71 semicolons, and an equal sign separates an attribute name from its contents
  72 (see \subsecref{Textual} for the complete {\av} syntax).
  73
  74 In this setting the grouping feature can be used to separate semantically
  75 different classes of properties associated to a resource (as for instance
  76 Dublin Core metadata, Euler metadata and user-defined metadata).
  77
  78 \item
  79 A pool of arbitrarily chosen {\RDF} triples is encoded in an {\av} set
  80 placing different {\av}'s the subset of triples sharing the same subject.
  81
  82 Note that the use of {\av} sets to build query results allows {\MathQL} queries
  83 to return sets of {\RDF} triples instead of mere sets of resources, in the
  84 spirit of what is currently done by other {\RDF}-oriented query languages.
  85
  86 If the {\av}'s of an {\av} set share the same attribute names and grouping
  87 structure, this set can be represented as a table in which each row encodes
  88 an {\av} and each column is associated to an attribute (except the first one
  89 which holds the head strings).
  90 \figref{Table} shows an {\av} set describing the properties of two resources
  91 ``A'' and ``B'' giving its table representation, in which the columns
  92 corresponding to attributes in the same group are clustered between
  93 double-line delimiters%
  94 \footnote{A table with grouped labelled columns like the one above resembles a
  95 set of relational database tables.}.
  96
  97 %Another possible use of a {\MathQL} query result is for the encoding of a
  98 %relational database table: in this sense the indexed column is stored in the
  99 %subject strings, the names of the other columns are stored in attribute names
 100 %and cell contents are stored in attribute values.
 101
 102 \begin{figure}[ht]
 103 \begin{footnotesize} \begin{verbatim}
 104 "A" attr {"major" = "1"; "minor" = "2"}, {"first" = "2002-01-01"; "modified" = "2002-03-01"};
 105 "B" attr {"major" = "1"; "minor" = "7"}, {"first" = "2002-02-01"; "modified" = "2002-04-01"}
 106 \end{verbatim}
 107 \begin{center} \begin{tabular}{|c||c|c||c|c||}
 108 \hline   & {\bf ``major''} & {\bf ``minor''} & {\bf ``first''} & {\bf ``modified''} \\
 109 \hline ``A'' & ``1'' & ``2'' & ``2002-01-01'' & ``2002-03-01'' \\
 110 \hline ``B'' & ``1'' & ``7'' & ``2002-02-01'' & ``2002-04-01'' \\
 111 \hline
 112 \end{tabular} \end{center} \end{footnotesize}
 113 \caption{A set of attributed values displayed as a table} \label{Table}
 114 \end{figure}
 115
 116 The above example gives a spatial idea of the geometry of an {\av} set ({\ie}
 117 a query result) which fits in 4 dimensions: namely we can extend independently
 118 the set of the head strings (dimension 1), the attributes in each group
 119 (dimension 2), the groups in each {\av} (dimension 3) and the contents of each
 120 attribute (dimension 4).
 121
 122 The metadata defined in the table of \figref{Table} will be used in subsequent
 123 examples.
 124 For this purpose assume that \TT{first} and \TT{modified} are the components
 125 of a structured property \TT{date} available for the resources ``A'' and ``B''.
 126
 127 \item
 128 The value of an {\RDF} property is encoded in a single {\av} distinguishing
 129 three situations:
 130
 131 \begin{itemize}
 132
 133 \item
 134 If the property is unstructured, its value is placed in the {\av} head
 135 string and no attributes are defined.
 136
 137 \item
 138 If the property is structured and its value has a main component%
 139 \footnote{Which is set by the \emph{rdf:value} property or defined by a
 140 specific application.},
 141 the content of this component is placed in the {\av} head string and the
 142 other components are stored in the {\av} attributes as in the case 1.
 143
 144 \item
 145 If the property is structured and its value does not have a main component,
 146 the {\av} head string is empty and the components are stored in the
 147 attributes.
 148
 149 \end{itemize}
 150
 151 \begin{figure}[ht]
 152 \begin{footnotesize} \begin{verbatim}
 153 First example, one instance:
 154 "" attr {"major" = "1"; "minor" = "2"};  no main component
 155 "1" attr {"minor" = "2"};                main component is "major"
 156 "2" attr {"major" = "1"}                 main component is "minor"
 157
 158 Second example: two separate instances:
 159 "" attr {"major" = "1"; "minor" = "2"}, {"major" = "1"; "minor" = "7"}; no main component
 160 "1" attr {"minor" = "2"}, {"minor" = "7"}                            main component is "major"
 161
 162 Third example: two mixed instances:
 163 "" attr {"major" = "3", "6"; "minor" = "4", "9"} no main component
 164 \end{verbatim} \end{footnotesize}
 165 \vskip-1pc
 166 \caption{The representation of the structured value of a property}
 167 \label{AVTwo}
 168 \end{figure}
 169
 170 \figref{AVTwo} (first example) shows three possible ways of representing in
 171 {\av}'s an instance of a structured property \TT{id} whose value has two
 172 fields ({\ie} properties) \TT{major} and \TT{minor}.
 173 In this instance, \TT{major} is set to ``1'' and \TT{minor} is set to ``2''.
 174 The representations depend on which component of \TT{id} is chosen as the
 175 main component (none, \TT{major} or \TT{minor} respectively).
 176 Several structured property values sharing a common main component can be
 177 encodes in a single {\av} exploiting the grouping facility: in this case the
 178 attributes of every instance are enclosed in separate groups.
 179 \figref{AVTwo} (second example) shows the representations of two instances of
 180 \TT{id}: the previous one and a new one for which \TT{major} is ``1'' and
 181 \TT{minor} is ``7''.
 182
 183 Note that if the attributes of the two groups are encoded in a single group,
 184 the notion of which components belong to the same property value can not be
 185 recovered in the general case because the values of an attribute form a set
 186 and thus are unordered. \newline
 187 As an example think of two instances of \TT{id} encoded as in \figref{AVTwo}
 188 (third example).
 189
 190 \item
 191 A natural number is stored, using its decimal representation, in the head
 192 string of a single {\av} with no attributes.
 193
 194 \item
 195 The boolean value \emph{false} is stored as an empty {\av} set, whereas
 196 a non-empty {\av} set may be interpreted as the boolean value \emph{true}.
 197 The default representation of \emph{true} is a single {\av} with an empty
 198 head string and no attributes.
 199
 200 \end{enumerate}
 201
 202 {\MathQL} defines five binary operations on {\av} sets: two unions, two
 203 intersections and a difference. The first four are defined in terms of an
 204 operation, that we call \emph{addition}, involving two {\av}'s with the same
 205 head string.
 206 The result is an {\av} with the same head string of the operands but there are
 207 two ways to compose the attribute groups:
 208
 209 \begin{itemize}
 210
 211 \item
 212 With the \emph{set-theoretic} addition, the set of attribute groups in the
 213 resulting {\av} is the set-theoretic union of the sets of attribute groups in
 214 the operands.
 215
 216 \item
 217 With the \emph{distributive} addition, the set of attribute groups in the
 218 resulting {\av} is the ``Cartesian product'' of the sets of attribute groups
 219 in the two operands.
 220 In this context, an element of the ``Cartesian product'' is not a pair of
 221 groups but it is the set-theoretic union of these groups where the contents of
 222 homonymous attributes are clustered together using set-theoretic unions.
 223
 224 \end{itemize}
 225
 226 \figref{Addition} shows an example of the two kinds of addition.
 227
 228 \begin{figure}[ht]
 229 \begin{footnotesize} \begin{verbatim}
 230 Attributed values used as operands for the addition:
 231 "1" attr {"A" = "a"}, {"B" = "b1"}
 232 "1" attr {"A" = "a"}, {"B" = "b2"}
 233
 234 Set-theoretic addition:
 235 "1" attr {"A" = "a"}, {"B" = "b1"}, {"B" = "b2"}
 236
 237 Distributive addition:
 238 "1" attr {"A" = "a"}, {"A" = "a"; "B" = "b2"}, {"B" = "b1"; "A" = "a"}, {"B" = {"b1", "b2"}}
 239 \end{verbatim} \end{footnotesize}
 240 \vskip-1pc
 241 \caption{The addition of attributed values}
 242 \label{Addition}
 243 \end{figure}
 244
 245 Now we can discuss the five operations between {\av} sets that we mentioned
 246 above:
 247
 248 \begin{itemize}
 249
 250 \item
 251 The two unions ocorresponds to the set-theoretic union of their operand where
 252 the {\av}'s sharing the head string are are added either set-theoretically or
 253 distributively as explained above (thus we have a set-theoretic union and a
 254 distributive union in the two cases). In this context the empty {\av} set
 255 plays the role of the neutral element.
 256 These operations play a central role {\MathQL} architecture and allow to
 257 compose the attributes of the operands preserving their group structure.
 258
 259 \item
 260 The two intersections are the dual of the above unions: they contain the
 261 {\av}'s whose head string appears in each argument where {\av}'s sharing the
 262 head string are added either set-theoretically or distributively as before.
 263
 264 The distributive intersection has the double benefit of filtering the
 265 common values of the given {\av} sets, and of merging their attribute groups
 266 in every possible way. This feature enables the possibility of performing
 267 additional filtering operations checking the content of the merged groups.
 268
 269 \item
 270 The difference of two {\av} sets contains the {\av}'s of the first
 271 argument whose head string does not appear in the second argument.
 272
 273 \end{itemize}
 274
 275 \figref{Binary} shows how the above operations work in a simple example.
 276
 277 \begin{figure}[ht]
 278 \begin{footnotesize} \begin{verbatim}
 279 Sets of attributed values used as operands for the operations:
 280 "1" attr {"A" = "a"}; "2" attr {"B" = "b1"}
 281 "2" attr {"B" = "b2"}
 282
 283 Set-theoretic union:
 284 "1" attr {"A" = "a"}; "2" attr {"B" = "b1"}, {"B" = "b2"}
 285
 286 Distributive union:
 287 "1" attr {"A" = "a"}; "2" attr {"B" = {"b1", "b2"}}
 288
 289 Set-theoretic intersection:
 290 "2" attr {"B" = "b1"}, {"B" = "b2"}
 291
 292 Distributive intersection:
 293 "2" attr {"B" = {"b1", "b2"}}
 294
 295 Difference:
 296 "1" attr {"A" = "a"}
 297 \end{verbatim} \end{footnotesize}
 298 \vskip-1pc
 299 \caption{The binary operations on sets of attributed values}
 300 \label{Binary}
 301 \end{figure}