updating

[helm.git] / helm / mathql / doc / mathql_overview.tex
diff --git a/helm/mathql/doc/mathql_overview.tex b/helm/mathql/doc/mathql_overview.tex

new file mode 100644 (file)

index 0000000..d511c7a
--- /dev/null
+++ b/helm/mathql/doc/mathql_overview.tex
@@ -0,0 +1,156 @@
+\section{Overview}
+
+{\MathQL}%
+\footnote{See \URI{http://helm.cs.unibo.it/mathql}.}
+is a query language for {\RDF} \cite{RDF,RDFS} databases, developed in the
+context of the {\HELM}%
+\footnote{See \URI{http://helm.cs.unibo.it}.} 
+project \cite{APSCGS03}.
+Its name suggests that it is supposed to be the first of a group of query
+languages for retrieving information from distributed digital libraries of
+formal mathematical knowledge, but no other languages of this proposal have
+been implemented yet except for {\MathQL} that is not Mathematics-oriented.
+So the name is a bit misleading.    
+
+\xcomment {
+
+The MathQL proposal rises within the HELM project with the final aim of
+providing a set of query languages for digital libraries of formalized
+mathematical resources, capable of expressing content-aware requests.
+
+This proposal has several domains of application and may be useful for
+database or on-line libraries reviewers, for proof assistants or
+proof-checking systems, and also for learning environments because these
+applications require features for classifying, searching and browsing
+mathematical information in a semantically meaningful way.
+
+As the most natural way to handle content information about a resource is
+by means of metadata, our first task is providing a query language that we
+call MathQL level 1 (or {\MathQL} for short), suitable for a metadata
+framework.
+Other languages to be defined in the context of the MathQL proposal may be
+suitable for queries about the semantic structure of mathematical data:
+this includes content-based pattern-matching (MathQL-2) and possibly other
+forms of formal matching involving for instance isomorphism, unification and
+$\delta$-expansion%
+\footnote{by $\delta$-expansion we mean the expansion of definitions.}
+(MathQL-3).
+
+In this perspective the role of a query on metadata can be that of producing a
+filtered knowledge base containing relevant information for subsequent queries
+of other kind.
+
+}
+
+{\MathQL} is carefully designed for making up for two limitations that seem to
+characterize several implementations and proposals of current {\RDF}-oriented
+query languages, namely the insufficient compliance with the most requested
+features and the poor attention paid to query result management. 
+Thus the language has the following design goals:
+
+\begin{enumerate}
+
+\item
+compliance with the main requirements stated by the {\RDF} community;
+
+\item
+native support for post-processing the query results;
+
+\item
+{\HELM}-independent implementation of the query engine. 
+
+\end{enumerate}
+
+We will briefly analyze these features in the remaining part of this
+section.
+
+\subsubsection*{The main requirements from the RDF community}
+
+As a query language for {\RDF} databases, {\MathQL} has a well-conceived
+semantics, defined in term of an abstract metadata model, according to which
+queries return exhaustive solutions.
+The language provides facilities for imposing query constraints based on
+{\RDFS} \cite{RDFS} and for the traversal of compound values of properties.
+It also provides a full set of Boolean operators to compose the query
+constraints and facilities for selecting resources or literals by means of
+{\POSIX} regular expressions.
+Moreover the language allows to customize the query results specifying what
+part of a solution should be preserved, and supports a machine-processable 
+{\XML} \cite{XML} syntax as well as a human-readable textual syntax to achieve
+the best usability.
+The two syntaxes concern both queries and results, making {\MathQL} usable in
+a distributed environment where query engines are implemented as stand-alone
+components. This is because in this setting both queries and query results
+must be exchanged by the system's components and thus need to be encoded in
+clearly defined format.
+
+{\MathQL} provides a graph-oriented access to the {\RDF} metadata, based on
+tree instantiation.
+This approach has the advantage of providing an abstraction over the
+concrete representation of the {\RDF} database (that can consist of {\RDF}
+triples and {\XML} files simultaneously) at the user level, and this is 
+definitely desirable especially in a distributed context.
+
+{\MathQL} query results are meant to capture the structure of trees coming
+from an {\RDF} graph and for this purpose a standard $1$- or $2$-dimensional
+organization (as provided by most {\RDF}-oriented query languages) is not
+satisfactory. Here {\MathQL} approach is to use a $4$-dimensional organization
+for its query results.
+
+\subsubsection*{Post-processing and code generation capabilities}
+
+The {\MathQL} query engine, that is written in {\CAML}%
+\footnote{See \URI{http://caml.inria.fr}.}
+for an easy integration with the {\HELM} software, provides two ways of
+processing the query results: at {\CAML} side and natively.
+
+At {\CAML} side, an application issues a query calling a function of the
+engine and manipulates the result either operating directly on its internal
+representation (through a low-level interface), or using a set of dedicated
+functions specifically designed to manage the query results.
+This set of functions includes a basic library but is extensible depending 
+on the {\CAML} modules included in the engine at compile-time. In this way
+an expert user can write a {\CAML} module with new dedicated functions and can
+include it in the engine recompiling it. 
+
+{\MathQL} supports native post-processing of the query results including the
+standard constructions of an imperative Turing-complete programming language,
+whose aim is definitely not that of being all-purpose (the user can work at
+{\CAML} side for that), but of being optimized for the management of the
+query results. 
+In this context an {\SQL}-like ``select-from-where'' construction is provided
+(as required by the {\RDF} community) as well as a mechanism for accessing the
+post-processing dedicated functions available to the engine.
+
+Moreover the language provides access to an extensible set of code-generating
+functions (also available at {\CAML} side) that the expert user can define
+writing suitable {\CAML} modules for the engine.
+Note that the generated code is always {\MathQL} code.
+
+The code generation features allow to build complex queries incrementally and
+in an automatic manner, as required by the needs of the {\HELM} project.
+Using the native programming language, instead, queries can include the
+post-processing algorithms on their results so the querying code and the
+subsequent processing code (if needed) are treated together as a
+self-contained object that can be computed by a single engine.
+In this sense the alternative of performing a complex query on a remote
+component issuing some {\MathQL} querying code followed by some {\CAML}
+post-processing code is really infeasible in a distributed context.  
+
+\subsubsection*{Physical organization of the RDF database}
+
+The implementation of the {\MathQL} query engine does not depend on any
+software developed within the {\HELM} project, nor it depends on the {\HELM}
+metadata model in any way.  
+
+However the engine does make few assumptions on the way metadata are
+physically organized and needs some user-provided knowledge about the concrete
+metadata representation.   
+Metadata stored as {\RDF} triples are accessed through a {\MySQL}%
+\footnote{See \URI{http://www.mysql.com}.}
+or a {\PostgreSQL}%
+\footnote{See \URI{http://www.postgresql.org}.}
+engine, while metadata stored as {\RDF}/{\XML} files are accessed through a
+{\Galax}%
+\footnote{See \URI{http://db.bell-labs.com/galax/}.}
+{\XQuery} \cite{XQuery} engine.