From: Ferruccio Guidi Date: Mon, 8 Dec 2003 16:22:24 +0000 (+0000) Subject: mathql documentation for version 4 X-Git-Tag: V_0_2_2~16 X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=commitdiff_plain;h=e24d6f693e7fef6120b0730b5cc0564ba701f530;p=helm.git mathql documentation for version 4 --- diff --git a/helm/mathql/doc/.cvsignore b/helm/mathql/doc/.cvsignore new file mode 100644 index 000000000..a1d23648f --- /dev/null +++ b/helm/mathql/doc/.cvsignore @@ -0,0 +1 @@ +*.dvi *.aux *.log *.ps diff --git a/helm/mathql/doc/mathql.tex b/helm/mathql/doc/mathql.tex new file mode 100644 index 000000000..30646a98f --- /dev/null +++ b/helm/mathql/doc/mathql.tex @@ -0,0 +1,141 @@ +\documentclass[10pt]{article} + +% \usepackage{fguidi} +\addtolength{\textheight}{2.5cm} +\addtolength{\oddsidemargin}{-1.0cm} +\addtolength{\evensidemargin}{-1.0cm} +\addtolength{\textwidth}{2.0cm} +\addtolength{\topmargin}{-1.0cm} + +\newcommand{\MathQL}{\textsc{mathql-1}} +\newcommand{\RDF}{\textsc{rdf}} +\newcommand{\RDFS}{\textsc{rdf schema}} +\newcommand{\HELM}{\textsc{helm}} +\newcommand{\POSIX}{\textsc{posix}} +\newcommand{\XML}{\textsc{xml}} +\newcommand{\CAML}{\textsc{caml}} +\newcommand{\SQL}{\textsc{sql}} +\newcommand{\PostgreSQL}{\textsc{postgresql}} +\newcommand{\Galax}{\textsc{galax}} +\newcommand{\XQuery}{\textsc{xquery}} + +\title{MathQL-1.4} +\author{Ferruccio Guidi} + +\begin{document} + +\maketitle + +\section{Overview} + +{\MathQL} is a query language for {\RDF} databases, developed in the context +of the {\HELM} project. Its name suggests that it is supposed to be the first +of a group of query languages for retrieving information from distributed +digital libraries of formal mathematical knowledge, but no other languages of +this group have been implemented yet except for {\MathQL} that is not +Mathematics-oriented. So the name is a bit misleading. + +{\MathQL} is carefully designed for having the following features: + +\begin{enumerate} + +\item +compliance with the main requirements stated by the {\RDF} community; + +\item +native support for post-processing the query results; + +\item +{\HELM}-independent implementation of the query engine. + +\end{enumerate} + +We will briefly analyze these features in the remaining part of this section. + +\subsubsection*{The main requirements from the RDF community} + +As a query language for {\RDF} databases, {\MathQL} has a well-conceived +semantics, defined in term of an abstract metadata model, according to which +queries return exhaustive solutions. +The language provides facilities for imposing query constraints based on +{\RDFS} and for the traversal of compound values of properties. +It also provides a full set of Boolean operators to compose the query +constraints and facilities for selecting resources or literals by means of +{\POSIX} regular expressions. +Moreover the language allows to customize the query results specifying what +part of a solution should be preserved, and supports a machine-processable +{\XML} syntax as well as a human-readable textual syntax to achieve the best +usability. + +The two syntaxes concern both queries and results, making {\MathQL} usable in +a distributed environment where query engines are implemented as stand-alone +components. This is because in this setting both queries and query results +must be exchanged by the system's components and thus need to be encoded in +clearly defined format. + +{\MathQL} provides a graph-oriented access to the {\RDF} metadata, based on +tree instantiation. +This approach has the advantage of providing an abstraction over the +concrete representation of the {\RDF} database (that can consist of {\RDF} +triples and {\XML} files simultaneously) at the user level, and this is +definitely desirable especially in a distributed context. + +{\MathQL} query results are meant to capture the structure of trees coming +from an {\RDF} graph and for this purpose a standard $1$- or $2$-dimensional +organization (as provided by most {\RDF}-oriented query languages) is not +satisfactory. Here {\MathQL} approach is to use a $4$-dimensional organization +for its query results. + +\subsubsection*{Post-processing and code generation capabilities} + +The {\MathQL} query engine, that is written in {\CAML} for an easy integration +with the {\HELM} software, provides two ways of processing the query results: +at {\CAML} side and natively. + +At {\CAML} side, an application issues a query calling a function of the +engine and manipulates the result either operating directly on its internal +representation (that is placed in the public scope), or using a set of +dedicated functions specifically designed to manage the query results. +This set of functions includes a basic library but is extensible depending +on the {\CAML} modules included in the engine at compile-time. In this way +an expert user can write a {\CAML} module with new dedicated functions and can +include it in the engine recompiling it. + +{\MathQL} supports native post-processing of the query results including the +standard constructions of an imperative Turing-complete programming language, +whose aim is definitely not that of being all-purpose (the user can work at +{\CAML} side for that), but of being optimized for the management of the +query results. +In this context an {\SQL}-like ``select-from-where'' construction is provided +(as required by the {\RDF} community) as well as a mechanism for accessing the +post-processing dedicated functions available to the engine. + +Moreover the language provides access to an extensible set of code-generating +functions (also available at {\CAML} side) that the expert user can define +writing suitable {\CAML} modules for the engine. +Note that the generated code is always {\MathQL} code. + +The code generation features allow to build complex queries incrementally and +in an automatic manner, as required by the needs of the {\HELM} project. +Using the native programming language, instead, queries can include the +post-processing algorithms on their results so the querying code and the +subsequent processing code (if needed) are treated together as a +self-contained object that can be computed by a single engine. +In this sense the alternative of performing a complex query on a remote +component issuing some {\MathQL} querying code followed by some {\CAML} +post-processing code is really infeasible in a distributed context. + +\subsubsection*{Physical organization of the RDF database} + +The implementation of the {\MathQL} query engine does not depend on any +software developed within the {\HELM} project, nor it depends on the {\HELM} +metadata model in any way. + +However the engine does make few assumptions on the way metadata are +physically organized and needs some user-provided knowledge about the concrete +metadata representation. +Metadata stored as {\RDF} triples are accessed through a {\PostgreSQL} engine, +while metadata stored as {\RDF}/{\XML} files are accessed through a {\Galax} +{\XQuery} engine. + +\end{document}