\runningtitle{The \MATITA{} proof assistant}
\runningauthor{Asperti, Sacerdoti Coen, Tassi, Zacchiroli}
- \begin{motto}
- ``We are nearly bug-free'' -- \emph{CSC, Oct 2005}
- \end{motto}
+%\begin{motto}
+% ``We are nearly bug-free'' -- \emph{CSC, Oct 2005}
+% \end{motto}
\begin{abstract}
\TODO{scrivere abstract}
language~\cite{remathematization};
\item an innovative, \MATHML-compliant rendering widget~\cite{padovani}
- for the \GTK{} graphical environment,\footnote{\url{http://www.gtk.org/}}
+ for the \GTK{}\footnote{\url{http://www.gtk.org}} graphical environment,
supporting high-quality bidimensional
rendering, and semantic selection, i.e. the possibility to select semantically
meaningful rendering expressions, and to paste the respective content into
others systems
whose proof language is procedural. Traditionally, in a procedural system
the user interacts only with the \emph{script}, while proof terms are internal
-records kept by the system. On the contrary, in \MATITA{} proof terms are
-praised as declarative versions of the proof. Playing that role, they are the
-primary mean of communication of proofs (once rendered to natural language
-for human audiences).
-
-The user interfaces now adopted by all the proof assistants based on a
-procedural proof language have been inspired by the CtCoq pioneering
+records kept by the system.
+In \MATITA{}, which has been conceived for a potentially distributed
+library, proof terms are also meant as the primary representation for
+storage and communication with other systems (e.g. \COQ).
+%On the contrary, in \MATITA{} proof terms are
+%praised as declarative versions of the proof. Playing that role, they are the
+%primary mean of communication of proofs (once rendered to natural language
+%for human audiences).
+
+All the user interfaces currently adopted by proof assistants based on a
+procedural proof language have been influenced by the CtCoq pioneering
system~\cite{ctcoq1}. One successful incarnation of the ideas introduced
by CtCoq is the Proof General generic interface~\cite{proofgeneral},
that has set a sort of
-standard way to interact with the system. Several procedural proof assistants
-have either adopted or cloned Proof General as their main user interface.
-The authoring interface of \MATITA{} is a clone of the Proof General interface.
+standard way to interact with the system.
+%Several procedural proof assistants
+%have either adopted or cloned Proof General as their main user interface.
+The authoring interface of \MATITA{} essentially offers the same
+functionalities of the Proof General interface or CoqIde.
On the contrary, the interface to interact with the library is rather
innovative and directly inspired by the Web interfaces to our Web servers.
Calculus of (Co)Inductive Constructions), the same implementation
language (\OCAML\footnote{\url{http://caml.inria.fr/}}),
and the same (procedural, script based) authoring philosophy.
-However, the analogy essentially stops here and no code is shared
-between the two systems.
+However, the analogy essentially stops here; in particular, no code is shared
+between the two systems and the algorithms are often different.
In a sense, we like to think of \MATITA{} as the way \COQ{} would
look like if entirely rewritten from scratch: just to give an
historically. \COQ~\cite{CoqArt} is a quite old
system whose development started 20 years ago. Since then,
several developers have took over the code and several new research ideas
-that were not considered in the original architecture have been experimented
+and optimizations that were not considered in the original architecture
+have been experimented
and integrated in the system. Moreover, there exists a lot of developments
for \COQ{} that require backward compatibility between each pair of releases;
since many central functionalities of a proof assistant are based on heuristics
or arbitrary choices to overcome undecidability (e.g. for higher order
unification), changing these functionalities maintaining backward compatibility
-is very difficult. Finally, the code of \COQ{} has been greatly optimized
-over the years; optimization reduces maintainability and rises the complexity
-of the code.
+is very difficult.
+%Finally, the code of \COQ{} has been greatly optimized
+%over the years; optimization reduces maintainability and rises the complexity
+%of the code.
In writing \MATITA{} we have not been hindered by backward compatibility and
we have took advantage of the research results and experiences previously
depend on a restricted set of \components.
Only the proof assistant \MATITA{} and the \WHELP{} search engine are
-applications meant to be used directly by the user. All the other applications
-are Web services developed in the \HELM{} and \MOWGLI{} projects and already
-described elsewhere. In particular:
+applications meant to be used directly by the user; all the other applications
+are Web services developed in the \HELM{} and \MOWGLI{} projects. The
+following applications have already been described elsewhere:
\begin{itemize}
\item The \emph{\GETTER}~\cite{zack-master} is a Web service to
The Proof Checker application is the Web service that provides an interface
to the \texttt{cic\_proof\_checking} \component.
- We use metadata and a sort of crawler to index the mathematical concepts
+ We use metadata to index the mathematical concepts
in the distributed library. We are interested in retrieving a concept
by matching, instantiation or generalization of a user or system provided
mathematical formula. Thus we need to collect metadata over the fully
The \texttt{whelp} \component{} implements a search engine that performs
approximated queries by matching/instantiation/generalization. The queries
operate only on the metadata and do not involve any actual matching
- (see the \texttt{cic\_unification} \component in
+ (see the \texttt{cic\_unification} \component{} in
Sect.~\ref{sec:partiallyintro}). Not performing any actual matching
a query only returns a complete and hopefully small set of matching
candidates. The process that has issued the query is responsible of
\label{sec:partiallyintro}
\emph{Partially specified terms} are CIC terms where subterms can be omitted.
-Omitted subterms can bear no information at all or they may be associated to
-a sequent. The formers are called \emph{implicit terms} and they occur only
-linearly. The latters may occur multiple times and are called
-\emph{metavariables}~\cite{geuvers-jojgov,munoz}.
-An \emph{explicit substitution} is applied to each
-occurrence of a metavariable. A metavariable stands for a term whose type is
+Omitted subterms come in two flavours:
+\emph{implicit terms} that do not require a declaration, but can only occur
+linearly; and \emph{metavariables}~\cite{geuvers-jojgov,munoz} whose declaration
+associates with them a sequent and whose occurrences are coupled with an
+explicit substitution.
+A metavariable stands for a term whose type is
given by the conclusion of the sequent. The term must be closed in the
context that is given by the ordered list of hypotheses of the sequent.
The explicit substitution instantiates every hypothesis with an actual
metavariables and that can introduce
\emph{implicit coercions}~\cite{barthe95implicit} to make a
partially specified term well-typed. The refiner of \MATITA{} is implemented
-in the \texttt{cic\_unification} \component. As the type checker is based on
-the conversion check, the refiner is based on \emph{unification}~\cite{strecker}
+in the \texttt{cic\_unification} \component. In the same way as the
+type checker is based on
+convertibility, the refiner is based on \emph{unification}~\cite{strecker}
that is a procedure that makes two partially specified term convertible by
-instantiating as few as possible metavariables that occur in them.
+instantiating as few metavariables as possible that occur in them.
Since terms are used in CIC to represent proofs, correct incomplete
proofs are represented by refinable partially specified terms. The metavariables
\label{sec:contentintro}
The language used to communicate proofs and especially formulae with the
-user does not only needs to be extendible and accommodate the usual mathematical
-notation. It must also reflect the comfortable degree of imprecision and
-ambiguity that the mathematical language provides.
+user does not only need to be extendible and accommodate the usual mathematical
+notation. It must also reflect the comfortable and suggestive degree of
+notational abuse so typical of the mathematical language.
For instance, it is common practice in mathematics to speak of a generic
equality that can be used to compare any two terms. However, it is well known
those of addition over the unary representation. And addition over two natural
numbers is definitely different from addition over two real numbers.
-Formalized mathematics cannot hide these differences and obliges the user to be
+Formalized mathematics cannot hide these differences and forces the user to be
very precise on the types he is using and their representation. However,
to communicate formulae with the user and with external tools, it seems good
practice to stick to the usual imprecise mathematical ontology. In the
the \emph{content level}~\cite{adams} representation of formulae.
In \MATITA{} we provide translations from partially specified terms
-to content level terms and the other way around. The first translation can also
-be applied to fully specified terms since a fully specified term is a special
-case of partially specified term where no metavariable or implicit term occurs.
-
-The translation from partially specified terms to content level terms must
+to content level terms and the other way around.
+The former translation must
discriminate between terms used to represent proofs and terms used to represent
-formulae. The firsts are translated to a content level representation of
-proof steps that can in turn easily be rendered in natural language
-using techniques inspired by~\cite{natural,YANNTHESIS}. The representation
+formulae. Using techniques inspired by~\cite{natural,YANNTHESIS}, the firsts
+are translated to a content level representation of
+proof steps that can in turn easily be rendered in natural language.
+The representation
adopted has greatly influenced the OMDoc~\cite{omdoc} proof format that is now
isomorphic to it. Terms that represent formulae are translated to \MATHML{}
Content formulae. \MATHML{} Content~\cite{mathml} is a W3C standard
The translation to content level is implemented in the
\texttt{acic\_content} \component. Its input are \emph{annotated partially
-specified terms}, that are maximally unshared
+specified terms}, that are
partially specified terms enriched with additional typing information for each
subterm. This information is used to discriminate between terms that represent
proofs and terms that represent formulae. Part of it is also stored at the
content level since it is required to generate the natural language rendering
-of proofs. The terms need to be maximally unshared (i.e. they must be a tree
-and not a DAG). The reason is that to different occurrences of a subterm
-we need to associate different typing information.
-This association is made easier when the term is represented as a tree since
-it is possible to label each node with an unique identifier and associate
-the typing information using a map on the identifiers.
-The \texttt{cic\_acic} \component{} unshares and annotates terms. It is used
+of proofs.
+The \texttt{cic\_acic} \component{} annotates terms. It is used
by the \texttt{library} \component{} since fully specified terms are stored
in the library in their annotated form.
in a command as functions from a context to a partially refined term. The
function is obtained by partially applying our disambiguation function to
the content level term to be disambiguated. Our solution should be compared with
-the one adopted in the \COQ{} system, where ambiguity is only relative to
-De Brujin indexes.
+the one adopted in the \COQ{} system (where ambiguity is only relative to
+De Bruijn indexes).
In \COQ, variables can be bound either by name or by position. A term
occurring in a command has all its variables bound by name to avoid the need of
a context during disambiguation. This makes more complex every
These mechanisms will be further discussed in Sect.~\ref{sec:disambiguation}.
Finally, the \texttt{grafite\_parser} \component{} implements a parser for
-the concrete syntax of the commands of \MATITA. The parser process a stream
+the concrete syntax of the commands of \MATITA. The parser processes a stream
of characters and returns a stream of abstract syntax trees (the ones
defined by the \texttt{grafite} component and whose semantics is given
by \texttt{grafite\_engine}). When the parser meets a command that changes
\label{sec:library}
A proof assistant provides both an interface to interact with its library and
-an \emph{authoring} interface to develop new proofs and theories. According
+an authoring interface to develop new proofs and theories. According
to its historical origins, \MATITA{} strives to provide innovative
functionalities for the interaction with the library. It is more traditional
in its script based authoring interface. In the remaining part of the paper we
set, and the searching facilities built on top of them --- collected
in the so called \WHELP{} search engine --- have been
extensively described in~\cite{whelp}. Let us just recall here that
-the \WHELP{} metadata model is essentially based a single ternary relation
+the \WHELP{} metadata model is essentially based on a single ternary relation
\REF{p}{s}{t} stating that a concept $s$ refers a concept $t$ at a
given position $p$, where the position specify the place of the
occurrence of $t$ inside $s$ (we currently work with a fixed set of
indexed, and becomes immediately visible in the library. Several
interesting and innovative features of \MATITA{} described in the following
sections rely in a direct or indirect way on its metadata system and
-the search features. Here, we shall just recall some of its most
+the search functionalities. Here, we shall just recall some of its most
direct applications.
A first, very simple but not negligible feature is the \emph{duplicate check}.
(based on the pattern matching functionality of \WHELP); if this is the case,
a warning is raised to the user. At present, the notion of equivalence
adopted by \MATITA{} is convertibility, but we may imagine to weaken it
-in the future, covering for instance isomorphisms.
+in the future, covering for instance isomorphisms~\cite{delahaye}.
Another useful \WHELP{} operation is \HINT; we may invoke this query
at any moment during the authoring of a proof, resulting in the list
Software applications that involve input of mathematical content should strive
to require the user as less drift from informal mathematics as possible. We
believe this to be a fundamental aspect of such applications user interfaces.
-Being that drift in general very large when inputing
-proofs~\cite{debrujinfactor}, in \MATITA{} we achieved good results for
-mathematical formulae which can be input using a \TeX-like encoding (the
-concrete syntax corresponding to presentation level terms) and are then
+The drift is still very large for
+proofs~\cite{debrujinfactor}, while better results may be achieved for
+mathematical formulae. In \MATITA{} formulae can be written using a
+\TeX-like syntax (corresponding to presentation level terms) and are then
translated (in multiple steps) to partially specified terms as sketched in
Sect.~\ref{sec:contentintro}.
some of its parameters, helps in keeping the input terse without sacrificing
expressiveness.
-\subsubsection{Disambiguation aliases}
+\subsubsection{Disambiguation preferences}
\label{sec:disambaliases}
Consider the following command that states a theorem over integer numbers:
with the same source. It is possible to record \emph{disambiguation
preferences} to select one of the aliases of an overloaded source.
-Preferences can be explicitly given in the script (using the
-misleading \texttt{alias} commands), but
+Preferences can be explicitly given in the script (using the \texttt{alias}
+command), but
are also implicitly added when a new concept is introduced (\emph{implicit
preferences}) or after a successful disambiguation that did not require
user interaction. Explicit preferences are added automatically by \MATITA{} to
the \texttt{include} command. With \texttt{include} it is possible to import
at once in the current session the set of preferences that was in effect
at the end of the execution of a given script.
+The inclusion mechanism is thus sensibly different from that of other systems
+where concepts are effectively loaded and made visibible by inclusion; in \MATITA{}
+all concepts are always visible, and inclusion, that is optional, is only used
+to set up preferences.
-Preferences can be changed. For instance, at the start of the development
+Preferences can be changed. For instance, at the beginning of the development
of integer numbers the preference for the symbol \OP{<} is likely
to be the one over natural numbers; sooner or later it will be set to the one
over integer numbers.
Consider, for example:
\begin{grafite}
theorem Zlt_mono:
- \forall x,y, k. x < y \to x < y + k.
+ \forall x,y,k. x < y \to x < y + k.
\end{grafite}
No refinable partially specified term corresponds to the preferences:
\OP{+} over natural numbers, \OP{<} over integer numbers. To overcome this
limitation we organized disambiguation in \emph{multiple passes}: when the
-disambiguator fails, disambiguation is tried again with a less strict set of
+disambiguator fails, disambiguation is tried again with a less restrictive set of
preferences.
Several disambiguation parameters can vary among passes. With respect to
\subsubsection{Operator instances}
\label{sec:disambinstances}
-Consider now the following theorem:
+Consider now the following theorem, where \texttt{pos} injects natural numbers
+into positive integers:
+
\begin{grafite}
theorem lt_to_Zlt_pos_pos:
\forall n,m: nat. n < m \to pos n < pos m.
this reason we always attempt a fresh instances pass only after attempting a
shared instances pass.
-\paragraph{One-shot preferences} Disambiguation preferences as seen so far are
-instance-independent. However, implicit preferences obtained as a result of a
-disambiguation pass which uses fresh instances ought to be instance-dependent.
-Informally, the set of preferences that can be respected by the disambiguator on
-the theorem above is: ``the first instance of the \OP{<} symbol is over natural
-numbers, while the second is on integer numbers''.
+%\paragraph{One-shot preferences} Disambiguation preferences as seen so far are
+%instance-independent. However, implicit preferences obtained as a result of a
+%disambiguation pass which uses fresh instances ought to be instance-dependent.
+%Informally, the set of preferences that can be respected by the disambiguator on
+%the theorem above is: ``the first instance of the \OP{<} symbol is over natural
+%numbers, while the second is on integer numbers''.
Instance-dependent preferences are meaningful only for the term whose
-disambiguation generated it. For this reason we call them \emph{one-shot
+disambiguation generated them. For this reason we call them \emph{one-shot
preferences} and \MATITA{} does not use them to disambiguate further terms in
the script.
coercion from \texttt{nat} to \texttt{R} appears (assuming such a coercion has
been defined in advance).
-Implicit coercions are not always desirable. For example, in disambiguating
-\texttt{\TEXMACRO{forall} x: nat. n < n + 1} we do not want the term which uses
-2 coercions from \texttt{nat} to \texttt{R} around \OP{<} arguments to show up
-among the possible partially specified term choices. For this reason we always
-attempt a disambiguation pass which require the refiner not to use the coercions
-before attempting a coercion-enabled pass.
-
-The choice of whether implicit coercions are enabled or not interact with the
+Implicit coercions are not always desirable. For example, consider the term
+\texttt{\TEXMACRO{forall} x. x < x + 1} and assume that the preferences for \OP{<}
+and \OP{+} are over real numbers. The expected interpretation assignes the
+type \texttt{R} to \texttt{x}.
+However, if we had a coercion from natural to real numbers an alternative
+interpretation is to assign the type \texttt{nat} to \texttt{x} inserting the coercion
+as needed. Clearly, the latter interpretation looks artificial and
+for this reason we enable coercions only in case of failure of previous
+attempts.
+
+The choice of whether implicit coercions are enabled or not interacts with the
choice about operator instances. Indeed, consider again
\texttt{lt\_to\_Zlt\_pos\_pos}, which can be disambiguated using fresh operator
-instances. In case there exists a coercion from natural numbers to (positive)
-integers (which indeed does), the
-theorem can be disambiguated using twice that coercion on the left hand side of
-the implication. The obtained partially specified term however would not
-probably be the expected one, being a theorem which proves a trivial
-implication.
-Motivated by this and similar examples we choose to always prefer fresh
-instances over implicit coercions, i.e. we always attempt disambiguation
-passes with fresh instances
-and no implicit coercions before attempting passes with implicit coercions.
+instances. In case there exists a coercion from natural numbers to positive
+integers ({\texttt{pos} itself), the command
+can be disambiguated as
+\texttt{\TEXMACRO{forall} n,m: nat. pos n < pos m \TEXMACRO{to} pos n < pos m}.
+This is not the expected interpretation;
+by this and similar examples we choose to always prefer fresh
+instances over implicit coercions.
\subsubsection{Disambiguation passes}
\label{sec:disambpasses}
According to the criteria described above, in \MATITA{} we perform the
disambiguation passes depicted in Tab.~\ref{tab:disambpasses}. In
-our experience that choice gives reasonable performance and minimizes the need
+our experience that choice gives reasonable performance and reduces the need
of user interaction during the disambiguation.
\begin{table}[ht]
- \caption{Disambiguation passes sequence\strut}
- \label{tab:disambpasses}
\begin{center}
\begin{tabular}{c|c|c|c}
\multicolumn{1}{p{1.5cm}|}{\centering\raisebox{-1.5ex}{\textbf{Pass}}}
- & \multicolumn{1}{p{3.1cm}|}{\centering\textbf{Preferences}}
+ & \multicolumn{1}{p{3.1cm}|}{\centering\raisebox{-1.5ex}{\textbf{Preferences}}}
& \multicolumn{1}{p{2.5cm}|}{\centering\textbf{Operator instances}}
& \multicolumn{1}{p{2.5cm}}{\centering\textbf{Implicit coercions}} \\
\hline
\PASS & Library-preferences & Fresh instances & Enabled
\end{tabular}
\end{center}
+ \caption{Disambiguation passes sequence\strut}
+ \label{tab:disambpasses}
\end{table}
-\subsection{Generation and invalidation}
+\subsection{Invalidation and regeneration}
\label{sec:libmanagement}
%The aim of this section is to describe the way \MATITA{}
Invalidation (see Sect.~\ref{sec:library}) is implemented in two phases.
-The first one is the calculation of all the concepts that recursively
+The first one is the computation of all the concepts that recursively
depend on the ones we are invalidating. It can be performed
-using the relational database that stores the metadata.
+from the metadata stored in the relational database.
This technique is the same used by the \emph{Dependency Analyzer}
and is described in~\cite{zack-master}.