X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2Fpapers%2Fmatita%2Fmatita.tex;h=636860a822ff4ae33ef12a57c7a307d48d7c8f2f;hb=7703897d2b14dd101c763a4aa6d99b7cc95011d1;hp=b5f4c590fad705ffabfcfce56ccef94b9fbf0482;hpb=2a1845d8553ee753ad3b82901840900cba738a5e;p=helm.git diff --git a/helm/papers/matita/matita.tex b/helm/papers/matita/matita.tex index b5f4c590f..636860a82 100644 --- a/helm/papers/matita/matita.tex +++ b/helm/papers/matita/matita.tex @@ -37,6 +37,16 @@ \newcommand{\TEXMACRO}[1]{\texttt{\char92 #1}} \newcommand{\UWOBO}{UWOBO} \newcommand{\WHELP}{Whelp} +\newcommand{\DOT}{\ensuremath{\mbox{\textbf{.}}}} +\newcommand{\SEMICOLON}{\ensuremath{\mbox{\textbf{;}}}} +\newcommand{\BRANCH}{\ensuremath{\mbox{\textbf{[}}}} +\newcommand{\SHIFT}{\ensuremath{\mbox{\textbf{\textbar}}}} +\newcommand{\POS}[1]{\ensuremath{#1\mbox{\textbf{:}}}} +\newcommand{\MERGE}{\ensuremath{\mbox{\textbf{]}}}} +\newcommand{\FOCUS}[1]{\ensuremath{\mathtt{focus}~#1}} +\newcommand{\UNFOCUS}{\ensuremath{\mathtt{unfocus}}} +\newcommand{\SKIP}{\MATHTT{skip}} +\newcommand{\TACTIC}[1]{\ensuremath{\mathtt{tactic}~#1}} \definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black \newcommand{\NT}[1]{\langle\mathit{#1}\rangle} @@ -51,12 +61,12 @@ \end{center}} \newcounter{example} -\newenvironment{example}{\stepcounter{example}\emph{Example} \arabic{example}.} +\newenvironment{example}{\stepcounter{example}\vspace{0.5em}\noindent\emph{Example} \arabic{example}.} {} \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1} \newcommand{\FILE}[1]{\texttt{#1}} % \newcommand{\NOTE}[1]{\ifodd \arabic{page} \else \hspace{-2cm}\fi\ednote{#1}} -\newcommand{\NOTE}[1]{\ednote{x~\hspace{-10cm}y#1}{foo}} +\newcommand{\NOTE}[1]{\ednote{#1}{foo}} \newcommand{\TODO}[1]{\textbf{TODO: #1}} \newsavebox{\tmpxyz} @@ -322,12 +332,99 @@ library as a whole will be logically inconsistent. Logical inconsistency has never been a problem in the daily work of a mathematician. The mathematician simply imposes himself a discipline to restrict himself to consistent subsets of the mathematical knowledge. -However, in doing so he doesn't choose the subset in advance by forgetting -the rest of his knowledge. +However, in doing so he does not choose the subset in advance by forgetting +the rest of his knowledge. On the contrary he may proceed with a sort of +top-down strategy: he may always inspect or use part of his knowledge, but +when he actually does so he should check recursively that inconsistencies are +not exploited. + +Contrarily to the mathematical practice, the usual tendency in the world of +assisted automation is that of building a logical environment (a consistent +subset of the library) in a bottom up way, checking the consistency of a +new axiom or theorem as soon as it is added to the environment. No lemma +or definition outside the environment can be used until it is added to the +library after every notion it depends on. Moreover, very often the logical +environment is the only part of the library that can be inspected, +that we can search lemmas in and that can be exploited by automatic tactics. + +Moving one by one notions from the library to the environment is a costly +operation since it involves re-checking the correctness of the notion. +As a consequence mathematical notions are packages into theories that must +be added to the environment as a whole. However, the consistency problem is +only raised at the level of theories: theories must be imported in a bottom +up way and the system must check that no inconsistency arises. + +The practice of limiting the scope on the library to the logical environment +is contrary to our commitment of being able to fully exploit as much as possible +of the library at any given time. To reconcile consistency and visibility +we have departed from the traditional implementation of an environment +allowing environments to be built on demand in a top-down way. The new +implementation is based on a modified meta-theory that changes the way +convertibility, type checking, unification and refinement judgements. +The modified meta-theory is fully described in \cite{libraryenvironments}. +Here we just remark how a strong commitment on the way the user interacts +with the library has lead to modifications to the logical core of the proof +assistant. This is evidence that breakthroughs in the user interfaces +and in the way the user interacts with the tools and with the library could +be achieved only by means of strong architectural changes. + +\subsubsection{Accessibility} +A large library that is completely in scope needs effective indexing and +searching methods to make the user productive. Libraries of formal results +are particularly critical since they hold a large percentage of technical +lemmas that do not have a significative name and that must be retrieved +using advanced methods based on matching, unification, generalization and +instantiation. + +The efficiency of searching inside the library becomes a critical operation +when automatic tactics exploit the library during the proof search. In this +scenario the tactics must retrieve a set of candidates for backward or +forward reasoning in a few milliseconds. + +The choice of several proof assistants is to use ad-hoc data structures, +such as context trees, to index all the terms currently in scope. These +data structures are expecially designed to quickly retrieve terms up +to matching, instantiation and generalization. All these data structures +try to maximize sharing of identical subterms so that matching can be +reduced to a visit of the tree (or dag) that holds all the maximally shared +terms together. + +Since the terms to be retrieved (or at least their initial prefix) +are stored (actually ``melted'') in the data structure, these data structures +must collect all the terms in a single location. In other words, adopting +such data structures means centralizing the library. + +In the \MOWGLI{} project we have tried to follow an alternative approach +that consists in keeping the library fully distributed and indexing it +by means of spiders that collect metadata and store them in a database. +The challenge is to be able to collect only a smaller as possible number +of metadata that provide enough information to approximate the matching +operation. A matching operation is then performed in two steps. The first +step is a query to the remote search engine that stores the metadata in +order to detect a (hopefully small) complete set of candidates that could +match. Completeness means that no term that matches should be excluded from +the set of candiates. The second step consists in retrieving from the +distributed library all the candidates and attempt the actual matching. + +In the last we years we have progressively improved this technique. +Our achievements can be found in \cite{query1,query2,query3}. + +The technique and tools already developed have been integrated in \MATITA{}, +that is able to contact a remote \WHELP{} search engine \cite{whelp} or that +can be directly linked to the code of the \WHELP. In either case the database +used to store the metadata can be local or remote. + +Our current challenge consists in the exploitation of \WHELP{} inside of +\MATITA. In particular we are developing a set of tactics, for instance +based on paramodulation \cite{paramodulation}, that perform queries to \WHELP{} +to restrict the scope on the library to a set of interesting candidates, +greatly reducing the search space. Moreover, queries to \WHELP{} are performed +during parsing of user provided terms to disambiguate them. + +In Sect.~\ref{sec:metadata} we describe the technique adopted in \MATITA. + +\subsubsection{Library management} -Contrarily to a mathematician, the usual tendency in the world of assisted -automation is that of restricting in advance the part of the library that -will be used later on, checking its consistency by construction. \subsection{ricerca e indicizzazione} \label{sec:metadata} @@ -352,6 +449,54 @@ will be used later on, checking its consistency by construction. \label{sec:disambiguation} \ASSIGNEDTO{zack} + \begin{table} + \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in + notation\strut} + \hrule + \[ + \begin{array}{@{}rcll@{}} + \NT{term} & ::= & & \mbox{\bf terms} \\ + & & x & \mbox{(identifier)} \\ + & | & n & \mbox{(number)} \\ + & | & s & \mbox{(symbol)} \\ + & | & \mathrm{URI} & \mbox{(URI)} \\ + & | & \verb+_+ & \mbox{(implicit)}\TODO{sync} \\ + & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\ + & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\ + & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\ + & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\ + & | & \NT{term}~\NT{term} & \mbox{(application)} \\ + & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\ + & | & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\ + & & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+] + ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\ + & & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\ + & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\ + & | & \verb+(+~\NT{term}~\verb+)+ \\ + \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\ + & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\ + \NT{fun} & ::= & & \mbox{\bf functions} \\ + & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\ + \NT{binder} & ::= & & \mbox{\bf binders} \\ + & & \verb+\forall+ \mid \verb+\lambda+ \\ + \NT{arg} & ::= & & \mbox{\bf single argument} \\ + & & \verb+_+ \mid x \\ + \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\ + \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\ + \NT{kind} & ::= & & \mbox{\bf induction kind} \\ + & & \verb+rec+ \mid \verb+corec+ \\ + \NT{rule} & ::= & & \mbox{\bf rules} \\ + & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} + \end{array} + \] + \hrule + \end{table} + + \subsubsection{Term input} The primary form of user interaction employed by \MATITA{} is textual script @@ -508,8 +653,6 @@ polymorhpic equality. \subsubsection{Disambiguation algorithm} -\NOTE{assumo che si sia gia' parlato di refine} - A \emph{disambiguation algorithm} takes as input a content level term and return a fully determined CIC term. The key observation on which a disambiguation algorithm is based is that given a content level term with more than one sources @@ -517,101 +660,126 @@ of ambiguity, not all possible combination of interpretation lead to a typable CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the interpretation of \texttt{ln} as a function from \IR to \IR and the interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion -of ``can't coexists'' in the disambiguation of \MATITA{} is inherited from the -refiner described in Sect.~\ref{sec:metavariables}: as long as -$\mathit{refine}(c)\neq\epsilon$, the combination of interpretation which led to -$c$ -can coexists. +of ``can't coexists'' in the disambiguation of \MATITA{} is defined on top of +the \emph{refiner} for CIC terms described in~\cite{csc-phd}. -The \emph{naive disambiguation algorithm} takes as input a content level term -$t$ and proceeds as follows: +Briefly, a refiner is a function whose input is an \emph{incomplete CIC term} +$t_1$ --- i.e. a term where metavariables occur (Sect.~\ref{sec:metavariables} +--- and whose output is either:\NOTE{descrizione sommaria del refiner, pu\'o +essere spostata altrove} \begin{enumerate} + + \item an incomplete CIC term $t_2$ where $t_2$ is a well-typed term obtained + assigning a type to each metavariable in $t_1$ (in case of dependent types, + instantiation of some of the metavariable occurring in $t_1$ may occur as + well); - \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where - $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set - of CIC terms and can be built as described above. + \item $\epsilon$, meaning that no well-typed term could be obtained via + assignment of type to metavariable in $t_1$ and their instantiation; + + \item $\bot$, meaning that the refiner is unable to decide whether of the two + cases above apply (refinement is semi-decidable). + +\end{enumerate} + +On top of a CIC refiner \MATITA{} implement an efficient disambiguation +algorithm, which is outlined below. It takes as input a content level term $c$ +and proceeds as follows: - \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ be an - interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC term is - fully determined. Iterate over all possible interpretations of $t$ and refine - the corresponding CIC terms, keep only interpretations which lead to CIC terms - $c$ s.t. $\mathit{refine}(c)\neq\epsilon$ (i.e. interpretations that determine - typable terms). +\begin{enumerate} + + \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(c)\}$, where + $\mathit{Dom}(c)$ is the set of ambiguity sources of $c$. Each $D_i$ is a set + of CIC terms and can be built as described above. - \item Let $n$ be the number of interpretations who survived step 2. If $n=0$ - signal a type error. If $n=1$ we have found exactly one CIC term corresponding - to $t$, returns it as output of the disambiguation phase. If $n>1$ we have - found many different CIC terms which can correspond to the content level term, - let the user choose one of the $n$ interpretations and returns the - corresponding term. + \item An \emph{interpretation} $\Phi$ for $c$ is a map associating an + incomplete CIC term to each ambiguity source of $c$. Given $c$ and one of its + interpretations an incomplete CIC term is fully determined replacing each + ambiguity source of $c$ with its mapping in the interpretation and injecting + the remaining structure of the content level in the CIC level (e.g. replacing + the application of the content level with the application of the CIC level). + This operation is informally called ``interpreting $c$ with $\Phi$''. + + Create an initial interpretation $\Phi_0 = \{\phi_i | \phi_i = \_, + i\in\mathit{Dom}(c)\}$, which associates a fresh metavariable to each source + of ambiguity of $c$. During this step, implicit terms are expanded to fresh + metavariables as well. + + \item Refine the current incomplete CIC term (i.e. the term obtained + interpreting $t$ with $\Phi_i$). + + If the refinement succeeds or is undetermined the next interpretation + $\Phi_{i+1}$ will be created \emph{making a choice}, that is replacing in the + current interpretation one of the metavariable appearing in $\Phi_i$ with one + of the possible choice from the corresponding disambiguation domain. The + metavariable to be replaced is chosen following a preorder visit of the + ambiguous term. Then, step 3 is attempted again with the new interpretation. + + If the refinement fails the current set of choices cannot lead to a well-typed + term and backtracking of the current interpretation is attempted. + + \item Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does + no longer contain any placeholder), backtracking is attempted anyway to find + the other correct interpretations. + + \item Let $n$ be the number of interpretations who survived step 4. If $n=0$ + signal a type error. If $n=1$ we have found exactly one (incomplete) CIC term + corresponding to the content level term $c$, returns it as output of the + disambiguation phase. If $n>1$ we have found many different (incomplete) CIC + terms which can correspond to the content level term, let the user choose one + of the $n$ interpretations and returns the corresponding term. \end{enumerate} -The above algorithm is highly inefficient since the number of possible -interpretations $\Phi$ grows exponentially with the number of ambiguity sources. -The actual algorithm used in \MATITA{} is far more efficient being, in the -average case, linear in the number of ambiguity sources. - -The efficient algorithm --- thoroughly described along with an analysis of its -complexity in~\cite{disambiguation} --- exploit the refiner and the metavariable -extension (Sect.~\ref{sec:metavariables}) of the calculus used in \MATITA. - -\TODO{FINQUI} - -The efficient algorithm can be applied if the logic can be extended with -metavariables and a refiner can be implemented. This is the case for CIC and -several other logics. -\emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can -occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes -a freshly created metavariable. A \emph{refiner}~\cite{McBride} is a -function whose input is a term with placeholders and whose output is either a -new term obtained instantiating some placeholder or $\epsilon$, meaning that no -well typed instantiation could be found for the placeholders occurring in -the term (type error). - -The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i | -\phi_i = ?, i\in\mathit{Dom}(t)\}$, -which associates a fresh metavariable to each -source of ambiguity. Then it iterates refining the current CIC term (i.e. the -term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the -next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is -replacing a placeholder with one of the possible choice from the corresponding -disambiguation domain. The placeholder to be replaced is chosen following a -preorder visit of the ambiguous term. If the refinement fails the current set of -choices cannot lead to a well-typed term and backtracking is attempted. -Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no -longer contain any placeholder), backtracking is attempted -anyway to find the other correct interpretations. - -The intuition which explain why this algorithm is more efficient is that as soon -as a term containing placeholders is not typable, no further instantiation of -its placeholders could lead to a typable term. For example, during the -disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an -interpretation $\Phi_i$ is encountered which associates $?$ to the instance -of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on -the left, and the multiplication over natural numbers (\texttt{mult} for short) -to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural -argument, and no further instantiation of the placeholder will be tried. - -If, at the end of the disambiguation, more than one possible interpretations are -possible, the user will be asked to choose the intended one (see -Fig.~\ref{fig:disambiguation}). - -\begin{figure}[htb] -% \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}} - \caption{\label{fig:disambiguation} Disambiguation: interpretation choice} -\end{figure} +The efficiency of this algorithm resides in the fact that as soon as an +incomplete CIC term is not typable, no further instantiation of the +metavariables of the corresponding interpretation is attemped. +% For example, during the disambiguation of the user input +% \texttt{\TEXMACRO{forall} x. x*0 = 0}, an interpretation $\Phi_i$ is +% encountered which associates $?$ to the instance of \texttt{0} on the right, +% the real number $0$ to the instance of \texttt{0} on the left, and the +% multiplication over natural numbers (\texttt{mult} for short) to \texttt{*}. +% The refiner will fail, since \texttt{mult} require a natural argument, and no +% further instantiation of the placeholder will be tried. -Details of the disambiguation algorithm of \WHELP{} can -be found in~\cite{disambiguation}, where an equivalent algorithm -that avoids backtracking is also presented. +Details of the disambiguation algorithm along with an analysis of its complexity +can be found in~\cite{disambiguation}, where a formulation without backtracking +(corresponding to the actual \MATITA{} implementation) is also presented. +\subsubsection{Disambiguation stages} -\subsection{notazione} +\subsection{notation} \label{sec:notation} \ASSIGNEDTO{zack} +Mathematical notation plays a fundamental role in mathematical practice: it +helps expressing in a concise symbolic fashion concepts of arbitrary complexity. +Its use in proof assistants like \MATITA{} is no exception. Formal mathematics +indeed often impose to encode mathematical concepts at a very high level of +details (e.g. Peano numbers, implicit arguments) having a restricted toolbox of +syntactic constructions in the calculus. + +Consider for example one of the point reached while proving the distributivity +of times over minus on natural numbers included in the \MATITA{} standards +library. (Part of) the reached sequent can be seen in \MATITA{} both using the +notation for various arithmetical and relational operator or without using it. +The sequent rendered without using notation would be as follows: +\sequent{ +\mathtt{H}: \mathtt{le} z y\\ +\mathtt{Hcut}: \mathtt{eq} \mathtt{nat} (\mathtt{plus} (\mathtt{times} x (\mathtt{minus} +y z)) (\mathtt{times} x z))\\ +(\mathtt{plus} (\mathtt{minus} (\mathtt{times} x y) (\mathtt{times} x z)) +(\mathtt{times} x z))}{ +\mathtt{eq} \mathtt{nat} (\mathtt{times} x (\mathtt{minus} y z)) (\mathtt{minus} +(\mathtt{times} x y) (\mathtt{times} x z))} +while the corresponding sequent rendered with notation enabled would be: +\sequent{ +H: z\leq y\\ +Hcut: x*(y-z)+x*z=x*y-x*z+x*z}{ +x*(y-z)=x*y-x*z} + + \subsection{mathml} \ASSIGNEDTO{zack} @@ -619,7 +787,6 @@ that avoids backtracking is also presented. \ASSIGNEDTO{zack} \subsection{pattern} -\ASSIGNEDTO{gares}\\ Patterns are the textual counterpart of the MathML widget graphical selection. @@ -837,18 +1004,193 @@ One of the goals of \MATITA{} is to use modern publishing techiques, and adopting a method for restricting tactics application domain that discourages using heavy math notation, would definitively be a bad choice. -\subsection{tatticali} -\ASSIGNEDTO{gares} -\begin{verbatim} -- problemi principali dei PA basati su tattiche - o illeggibilita' dello script - o scarsa strutturazione dello script e mantenibilita' - - ameliorate by tacticals -- intro sui tatticali esistenti -- peculiarita' di matita - - passo passo - - nice handling of sideeffects (see later on metavariables) -\end{verbatim} +\subsection{Tacticals} +\ASSIGNEDTO{gares}\\ +There are mainly two kinds of languages used by proof assistants to recorder +proofs: tactic based and declarative. We will not investigate the philosophy +aroud the choice that many proof assistant made, \MATITA{} included, and we +will not compare the two diffrent approaches. We will describe the common +issues of the tactic-based language approach and how \MATITA{} tries to solve +them. + +\subsubsection{Tacticals overview} + +Tacticals first appeared in LCF and can be seen as programming +constructs, like looping, branching, error recovery or sequential composition. +The following simple example shows three tacticals in action +\begin{grafite} +theorem trivial: + \forall A,B:Prop. + A = B \to ((A \to B) \land (B \to A)). + intros (A B H). + split; intro; + [ rewrite < H. assumption. + | rewrite > H. assumption. + ] +qed. +\end{grafite} + +The first is ``\texttt{;}'' that combines the tactic \texttt{split} +with \texttt{intro}, applying the latter to each goal opened by the +former. Then we have ``\texttt{[}'' that branches on the goals (here +we have two goals, the two sides of the logic and). +The first goal $B$ (with $A$ in the context) +is proved by the first sequence of tactics +\texttt{rewrite} and \texttt{assumption}. Then we move to the second +goal with the separator ``\texttt{|}''. The last tactical we see here +is ``\texttt{.}'' that is a sequential composition that selects the +first goal opened for the following tactic (instead of applying it to +them all like ``\texttt{;}''). Note that usually ``\texttt{.}'' is +not considered a tactical, but a sentence terminator (i.e. the +delimiter of commands the proof assistant executes). + +Giving serious examples here is rather difficult, since they are hard +to read without the interactive tool. To help the reader in +understanding the following considerations we just give few common +usage examples without a proof context. + +\begin{grafite} + elim z; try assumption; [ ... | ... ]. + elim z; first [ assumption | reflexivity | id ]. +\end{grafite} + +The first example goes by induction on a term \texttt{z} and applies +the tactic \texttt{assumption} to each opened goal eventually recovering if +\texttt{assumption} fails. Here we are asking the system to close all +trivial cases and then we branch on the remaining with ``\texttt{[}''. +The second example goes again by induction on \texttt{z} and tries to +close each opened goal first with \texttt{assumption}, if it fails it +tries \texttt{reflexivity} and finally \texttt{id} +that is the tactic that leaves the goal untouched without failing. + +Note that in the common implementation of tacticals both lines are +compositions of tacticals and in particular they are a single +statement (i.e. derived from the same non terminal entry of the +grammar) ended with ``\texttt{.}''. As we will see later in \MATITA{} +this is not true, since each atomic tactic or punctuation is considered +a single statement. + +\subsubsection{Common issues of tactic(als)-based proof languages} +We will examine the two main problems of tactic(als)-based proof script: +maintainability and readability. + +Huge libraries of formal mathematics have been developed, and backward +compatibility is a really time consuming task. \\ +A real-life example in the history of \MATITA{} was the reordering of +goals opened by a tactic application. We noticed that some tactics +were not opening goals in the expected order. In particular the +\texttt{elim} tactic on a term of an inductive type with constructors +$c_1, \ldots, c_n$ used to open goals in order $g_1, g_n, g_{n-1} +\ldots, g_2$. The library of \MATITA{} was still in an embryonic state +but some theorems about integers were there. The inductive type of +$\mathcal{Z}$ has three constructors: $zero$, $pos$ and $neg$. All the +induction proofs on this type where written without tacticals and, +obviously, considering the three induction cases in the wrong order. +Fixing the behavior of the tactic broke the library and two days of +work were needed to make it compile again. The whole time was spent in +finding the list of tactics used to prove the third induction case and +swap it with the list of tactics used to prove the second case. If +the proofs was structured with the branch tactical this task could +have been done automatically. + +From this experience we learned that the use of tacticals for +structuring proofs gives some help but may have some drawbacks in +proof script readability. We must highlight that proof scripts +readability is poor by itself, but in conjunction with tacticals it +can be nearly impossible. The main cause is the fact that in proof +scripts there is no trace of what you are working on. It is not rare +for two different theorems to have the same proof script (while the +proof is completely different).\\ +Bad readability is not a big deal for the user while he is +constructing the proof, but is considerably a problem when he tries to +reread what he did or when he shows his work to someone else. The +workaround commonly used to read a script is to execute it again +step-by-step, so that you can see the proof goal changing and you can +follow the proof steps. This works fine until you reach a tactical. A +compound statement, made by some basic tactics glued with tacticals, +is executed in a single step, while it obviously performs lot of proof +steps. In the fist example of the previous section the whole branch +over the two goals (respectively the left and right part of the logic +and) result in a single step of execution. The workaround doesn't work +anymore unless you de-structure on the fly the proof, putting some +``\texttt{.}'' where you want the system to stop.\\ + +Now we can understand the tradeoff between script readability and +proof structuring with tacticals. Using tacticals helps in maintaining +scripts, but makes it really hard to read them again, cause of the way +they are executed. + +\MATITA{} uses a language of tactics and tacticals, but tries to avoid +this tradeoff, alluring the user to write structured proof without +making it impossible to read them again. + +\subsubsection{The \MATITA{} approach: Tinycals} + +\begin{table} + \caption{\label{tab:tacsyn} Concrete syntax of \MATITA{} tacticals.\strut} +\hrule +\[ +\begin{array}{@{}rcll@{}} + \NT{punctuation} & + ::= & \SEMICOLON \quad|\quad \DOT \quad|\quad \SHIFT \quad|\quad \BRANCH \quad|\quad \MERGE \quad|\quad \POS{\mathrm{NUMBER}~} & \\ + \NT{block\_kind} & + ::= & \verb+focus+ ~|~ \verb+try+ ~|~ \verb+solve+ ~|~ \verb+first+ ~|~ \verb+repeat+ ~|~ \verb+do+~\mathrm{NUMBER} & \\ + \NT{block\_delimiter} & + ::= & \verb+begin+ ~|~ \verb+end+ & \\ + \NT{tactical} & + ::= & \verb+skip+ ~|~ \NT{tactic} ~|~ \NT{block\_delimiter} ~|~ \NT{block\_kind} ~|~ \NT{punctuation} ~|~& \\ +\end{array} +\] +\hrule +\end{table} + +\MATITA{} tacticals syntax is reported in table \ref{tab:tacsyn}. +While one would expect to find structured constructs like +$\verb+do+~n~\NT{tactic}$ the syntax allows pieces of tacticals to be written. +This is essential for base idea behind matita tacticals: step-by-step execution. + +The low-level tacticals implementation of \MATITA{} allows a step-by-step +execution of a tactical, that substantially means that a $\NT{block\_kind}$ is +not executed as an atomic operation. This has two major benefits for the user, +even being a so simple idea: +\begin{description} +\item[Proof structuring] + is much easier. Consider for example a proof by induction, and imagine you + are using classical tacticals in one of the state of the + art graphical interfaces for proof assistant like Proof General or Coq Ide. + After applying the induction principle you have to choose: structure + the proof or not. If you decide for the former you have to branch with + ``\texttt{[}'' and write tactics for all the cases separated by + ``\texttt{|}'' and then close the tactical with ``\texttt{]}''. + You can replace most of the cases by the identity tactic just to + concentrate only on the first goal, but you will have to go one step back and + one further every time you add something inside the tactical. Again this is + caused by the one step execution of tacticals and by the fact that to modify + the already executed script you have to undo one step. + And if you are board of doing so, you will finish in giving up structuring + the proof and write a plain list of tactics.\\ + With step-by-step tacticals you can apply the induction principle, and just + open the branching tactical ``\texttt{[}''. Then you can interact with the + system reaching a proof of the first case, without having to specify any + tactic for the other goals. When you have proved all the induction cases, you + close the branching tactical with ``\texttt{]}'' and you are done with a + structured proof. \\ + While \MATITA{} tacticals help in structuring proofs they allow you to + choose the amount of structure you want. There are no constraints imposed by + the system, and if the user wants he can even write completely plain proofs. + +\item[Rereading] + is possible. Going on step by step shows exactly what is going on. Consider + again a proof by induction, that starts applying the induction principle and + suddenly branches with a ``\texttt{[}''. This clearly separates all the + induction cases, but if the square brackets content is executed in one single + step you completely loose the possibility of rereading it and you have to + temporary remove the branching tactical to execute in a satisfying way the + branches. Again, executing step-by-step is the way you would like to review + the demonstration. Remember that understanding the proof from the script is + not easy, and only the execution of tactics (and the resulting transformed + goal) gives you the feeling of what is going on. +\end{description} \subsection{named variable e disambiguazione lazy} \ASSIGNEDTO{csc}