From ecae84bd553e50c3182697e994ea28ae2d18d68f Mon Sep 17 00:00:00 2001 From: Enrico Tassi Date: Mon, 30 Jan 2006 14:13:41 +0000 Subject: [PATCH 1/1] CSC/Occam chain-saw/razor on tinycals and library generation/invalidation. minor changes on patters terminology. --- helm/papers/matita/matita2.tex | 478 +++++++++++++++++---------------- 1 file changed, 245 insertions(+), 233 deletions(-) diff --git a/helm/papers/matita/matita2.tex b/helm/papers/matita/matita2.tex index 81e0c1dee..353066a88 100644 --- a/helm/papers/matita/matita2.tex +++ b/helm/papers/matita/matita2.tex @@ -1064,111 +1064,87 @@ user interaction during the disambiguation. \subsection{Generation and invalidation} \label{sec:libmanagement} -The aim of this section is to describe the way \MATITA{} -preserves the consistency and the availability of the library -using the \WHELP{} technology, in response to the user alteration or -removal of mathematical objects. - -As already sketched in Sect.~\ref{sec:fullyintro} what we generate -from a script is split among two storage media, a -classical filesystem and a relational database. The former is used to -store the XML encoding of the objects defined in the script, the -disambiguation aliases and the interpretation and notational convention defined, -while the latter is used to store all the metadata needed by -\WHELP{}. - -While the consistency of the data store in the two media has -nothing to do with the nature of -the content of the library and is thus uninteresting (but really -tedious to implement and keep bug-free), there is a deeper -notion of mathematical consistency we need to provide. Each object -must reference only defined object (i.e. each proof must use only -already proved theorems). - -We will focus on how \MATITA{} ensures the interesting kind -of consistency during the formalization of a mathematical theory, +%The aim of this section is to describe the way \MATITA{} +%preserves the consistency and the availability of the library +%using the \WHELP{} technology, in response to the user alteration or +%removal of mathematical objects. +% +%As already sketched in Sect.~\ref{sec:fullyintro} what we generate +%from a script is split among two storage media, a +%classical filesystem and a relational database. The former is used to +%store the XML encoding of the objects defined in the script, the +%disambiguation aliases and the interpretation and notational convention defined, +%while the latter is used to store all the metadata needed by +%\WHELP{}. +% +%While the consistency of the data store in the two media has +%nothing to do with the nature of +%the content of the library and is thus uninteresting (but really +%tedious to implement and keep bug-free), there is a deeper +%notion of mathematical consistency we need to provide. Each object +%must reference only defined object (i.e. each proof must use only +%already proved theorems). + +In this section we will focus on how \MATITA{} ensures the library +consistency during the formalization of a mathematical theory, giving the user the freedom of adding, removing, modifying objects without loosing the feeling of an always visible and browsable library. -\subsubsection{Compilation} +\subsubsection{Invalidation} -The typechecker component guarantees that if an object is well typed -it depends only on well typed objects available in the library, -that is exactly what we need to be sure that the logic consistency of -the library is preserved. We have only to find the right order of -compilation of the scripts that compose the user development. +Invalidation (see Sect.~\ref{sec:library}) is implemented in two phases. + +The first one is the calculation of all the concepts that recursively +depend on the ones we are invalidating. The calculation of the +reverse dependencies can be computed using the relational database +that stores metadata. +This technique is the same used by the \emph{Dependency Analyzer} +and is described in~\cite{zack-master}. + +The second phase is the removal of all the results of the generation, +metadata included. + +\subsubsection{Regeneration} + +%The typechecker component guarantees that if an object is well typed +%it depends only on well typed objects available in the library, +%that is exactly what we need to be sure that the logic consistency of +%the library is preserved. + +To regenerate an invalidated part of the library \MATITA{} re-executes +the script files that produced the invalidated concepts. The main +problem is to find a suitable order of execution of the scripts. For this purpose we provide a tool called \MATITADEP{} -that takes in input the list of files that compose the development and +that takes in input the list of scripts that compose the development and outputs their dependencies in a format suitable for the GNU \texttt{make} tool. The user is not asked to run \MATITADEP{} by hand, but simply to tell \MATITA{} the root directory of his development (where all -script files can be found) and \MATITA{} will handle all the compilation +script files can be found) and \MATITA{} will handle all the generation related tasks, including dependencies calculation. -To compute dependencies it is enough to look at the script files for -inclusions of other parts of the development or for explicit -references to other objects (i.e. with explicit aliases, see -\ref{sec:disambaliases}). - -The output of the compilation is immediately available to the user -trough the \WHELP{} technology, since all metadata are stored in a -user-specific area of the database where the search engine has read -access, and all the automated tactics that operates on the whole -library, like \AUTO, have full visibility of the newly defined objects. - -Compilation is rather simple, and the only tricky case is when we want -to compile again the same script, maybe after the removal of a -theorem. Here the policy is simple: clean the output before recompiling. -As we will see in the next section cleaning will ensure that -there will be no theorems in the development that depends on the -removed items. - -\subsubsection{Cleaning} - -With the term ``cleaning'' we mean the process of removing all the -results of an object compilation. In order to keep the consistency of -the library, cleaning an object requires the (recursive) cleaning -of all the objects that depend on it (\emph{reverse dependencies}). - -The calculation of the reverse dependencies can be computed in two -ways, using the relational database or using a simpler set of metadata -that \MATITA{} saves in the filesystem as a result of compilation. The -former technique is the same used by the \emph{Dependency Analyzer} -described in~\cite{zack-master} and really depends on a relational -database. -The latter is a fall-back in case the database is not -available.\footnote{Due to the complex deployment of a large piece of -software like a database, it is a common practice for the \HELM{} team -to use a shared remote database, that may be unavailable if the user -workstation lacks network connectivity.} This facility has to be -intended only as a fall-back, since the queries of the \WHELP{} -technology depend require a working database. +To compute dependencies it is enough to look at the script files for +disambiguation preferences declared or imported from other scripts +(see \ref{sec:disambaliases}). -Cleaning guarantees that if an object is removed there are no dandling -references to it, and that the part of the library still compiled is -consistent. Since cleaning involves the removal of all the results of -the compilation, metadata included, the library browsable trough the -\WHELP{} technology is always kept up to date. +Regenerating the content of a modified script file involves the preliminary +invalidation of all its old content. \subsubsection{Batch vs Interactive} \MATITA{} includes an interactive authoring interface and a batch -``compiler'' (\MATITAC). Only the former is intended to be used directly by the -user, the latter is automatically invoked when a -part of the user development is required (for example issuing an -\texttt{include} command) but not yet compiled. - -While they share the same engine for compilation and cleaning, they -provide different granularity. The batch compiler is only able to -compile a whole script and similarly to clean only a whole script -(together with all the other scripts that rely on an object defined in -it). The interactive interface is able to execute single steps of -compilation, that may include the definition of an object, and -similarly to undo single steps. Note that in the latter case there is -no risk of introducing dangling references since the \MATITA{} user -interface inhibit undoing a step which is not the last executed. +``compiler'' (\MATITAC). + +Only the former is intended to be used directly by the +user, the latter is automatically invoked by \MATITA{} +to try to regenerate parts of the library previously invalidated. + +While they share the same engine for generation and invalidation, they +provide different granularity. \MATITAC{} is only able to reexecute a +whole script and similarly to invalidate the whole content of a script +(together with all the other scripts that rely on an concept defined +in it). \subsection{Automation} \label{sec:automation} @@ -1545,123 +1521,169 @@ using heavy math notation, would definitively be a bad choice. \subsection{Tacticals} \label{sec:tinycals} -There are mainly two kinds of languages used by proof assistants to recorder -proofs: tactic based and declarative. We will not investigate the philosophy -around the choice that many proof assistant made, \MATITA{} included, and we -will not compare the two different approaches. We will describe the common -issues of the tactic-based language approach and how \MATITA{} tries to solve -them. +%There are mainly two kinds of languages used by proof assistants to recorder +%proofs: tactic based and declarative. We will not investigate the philosophy +%around the choice that many proof assistant made, \MATITA{} included, and we +%will not compare the two different approaches. We will describe the common +%issues of the tactic-based language approach and how \MATITA{} tries to solve +%them. -\subsubsection{Tacticals overview} +The procedural proof language implemented in \MATITA{} is pretty standard, +with a notable exception for tacticals. + +%\subsubsection{Tacticals overview} + +Tacticals first appeared in LCF as higher order tactics. They can be +seen as control flow constructs, like looping, branching, error +recovery or sequential composition. -Tacticals first appeared in LCF and can be seen as programming -constructs, like looping, branching, error recovery or sequential composition. -The following simple example shows three tacticals in action -\begin{grafite} -theorem trivial: - \forall A,B:Prop. - A = B \to ((A \to B) \land (B \to A)). - intros (A B H). - split; intro; - [ rewrite < H. assumption. - | rewrite > H. assumption. - ] -qed. -\end{grafite} -The first is ``\texttt{;}'' that combines the tactic \texttt{split} -with \texttt{intro}, applying the latter to each goal opened by the -former. Then we have ``\texttt{[}'' that branches on the goals (here -we have two goals, the two sides of the logic and). -The first goal $B$ (with $A$ in the context) -is proved by the first sequence of tactics -\texttt{rewrite} and \texttt{assumption}. Then we move to the second -goal with the separator ``\texttt{|}''. The last tactical we see here -is ``\texttt{.}'' that is a sequential composition that selects the -first goal opened for the following tactic (instead of applying it to -them all like ``\texttt{;}''). Note that usually ``\texttt{.}'' is -not considered a tactical, but a sentence terminator (i.e. the -delimiter of commands the proof assistant executes). - -Giving serious examples here is rather difficult, since they are hard -to read without the interactive tool. To help the reader in -understanding the following considerations we just give few common -usage examples without a proof context. +The following simple example +shows a Coq script made of four dot-terminated commands \begin{grafite} - elim z; try assumption; [ ... | ... ]. - elim z; first [ assumption | reflexivity | id ]. +Theorem trivial: + forall A B:Prop, + A = B -> ((A -> B) /\ (B -> A)). + intros [A B H]. + split; intro; + [ rewrite < H; assumption + | rewrite > H; assumption + ]. +Qed. \end{grafite} -The first example goes by induction on a term \texttt{z} and applies -the tactic \texttt{assumption} to each opened goal eventually recovering if -\texttt{assumption} fails. Here we are asking the system to close all -trivial cases and then we branch on the remaining with ``\texttt{[}''. -The second example goes again by induction on \texttt{z} and tries to -close each opened goal first with \texttt{assumption}, if it fails it -tries \texttt{reflexivity} and finally \texttt{id} -that is the tactic that leaves the goal untouched without failing. - -Note that in the common implementation of tacticals both lines are -compositions of tacticals and in particular they are a single -statement (i.e. derived from the same non terminal entry of the -grammar) ended with ``\texttt{.}''. As we will see later in \MATITA{} -this is not true, since each atomic tactic or punctuation is considered -a single statement. +The third command is an application of the sequencing tactical +``$\ldots$\texttt{;}$\ldots$'', that combines the tactic +\texttt{split} with the application of the branching tactical +``$\ldots$\texttt{;[}$\ldots$\texttt{|}$\ldots$\texttt{|}$\ldots$\texttt{]}'' +to other tactics and tacticals. + +The usual implementation of tacticals executes them atomically as any +other command. In \MATITA{} thi is not true since each punctuation is +executed as a single command. + +%The latter is applied to all the goals opened by \texttt{split} +% +%(here we have two goals, the two sides of the logic and). The first +%goal $B$ (with $A$ in the context) is proved by the first sequence of +%tactics \texttt{rewrite} and \texttt{assumption}. Then we move to the +%second goal with the separator ``\texttt{|}''. +% +%Giving serious examples here is rather difficult, since they are hard +%to read without the interactive tool. To help the reader in +%understanding the following considerations we just give few common +%usage examples without a proof context. +% +%\begin{grafite} +% elim z; try assumption; [ ... | ... ]. +% elim z; first [ assumption | reflexivity | id ]. +%\end{grafite} +% +%The first example goes by induction on a term \texttt{z} and applies +%the tactic \texttt{assumption} to each opened goal eventually recovering if +%\texttt{assumption} fails. Here we are asking the system to close all +%trivial cases and then we branch on the remaining with ``\texttt{[}''. +%The second example goes again by induction on \texttt{z} and tries to +%close each opened goal first with \texttt{assumption}, if it fails it +%tries \texttt{reflexivity} and finally \texttt{id} +%that is the tactic that leaves the goal untouched without failing. +% +%Note that in the common implementation of tacticals both lines are +%compositions of tacticals and in particular they are a single +%statement (i.e. derived from the same non terminal entry of the +%grammar) ended with ``\texttt{.}''. As we will see later in \MATITA{} +%this is not true, since each atomic tactic or punctuation is considered +%a single statement. \subsubsection{Common issues of tactic(als)-based proof languages} We will examine the two main problems of tactic(als)-based proof script: maintainability and readability. -Huge libraries of formal mathematics have been developed, and backward -compatibility is a really time consuming task. \\ -A real-life example in the history of \MATITA{} was the reordering of -goals opened by a tactic application. We noticed that some tactics -were not opening goals in the expected order. In particular the -\texttt{elim} tactic on a term of an inductive type with constructors -$c_1, \ldots, c_n$ used to open goals in order $g_1, g_n, g_{n-1} -\ldots, g_2$. The library of \MATITA{} was still in an embryonic state -but some theorems about integers were there. The inductive type of -$\mathcal{Z}$ has three constructors: $zero$, $pos$ and $neg$. All the -induction proofs on this type where written without tacticals and, -obviously, considering the three induction cases in the wrong order. -Fixing the behavior of the tactic broke the library and two days of -work were needed to make it compile again. The whole time was spent in -finding the list of tactics used to prove the third induction case and -swap it with the list of tactics used to prove the second case. If -the proofs was structured with the branch tactical this task could -have been done automatically. - -From this experience we learned that the use of tacticals for -structuring proofs gives some help but may have some drawbacks in -proof script readability. We must highlight that proof scripts -readability is poor by itself, but in conjunction with tacticals it -can be nearly impossible. The main cause is the fact that in proof -scripts there is no trace of what you are working on. It is not rare -for two different theorems to have the same proof script (while the -proof is completely different).\\ -Bad readability is not a big deal for the user while he is -constructing the proof, but is considerably a problem when he tries to -reread what he did or when he shows his work to someone else. The -workaround commonly used to read a script is to execute it again -step-by-step, so that you can see the proof goal changing and you can -follow the proof steps. This works fine until you reach a tactical. A -compound statement, made by some basic tactics glued with tacticals, -is executed in a single step, while it obviously performs lot of proof -steps. In the fist example of the previous section the whole branch -over the two goals (respectively the left and right part of the logic -and) result in a single step of execution. The workaround does not work -anymore unless you de-structure on the fly the proof, putting some -``\texttt{.}'' where you want the system to stop.\\ - -Now we can understand the tradeoff between script readability and -proof structuring with tacticals. Using tacticals helps in maintaining -scripts, but makes it really hard to read them again, cause of the way -they are executed. - -\MATITA{} uses a language of tactics and tacticals, but tries to avoid -this tradeoff, alluring the user to write structured proof without -making it impossible to read them again. +%Huge libraries of formal mathematics have been developed, and backward +%compatibility is a really time consuming task. \\ +%A real-life example in the history of \MATITA{} was the reordering of +%goals opened by a tactic application. We noticed that some tactics +%were not opening goals in the expected order. In particular the +%\texttt{elim} tactic on a term of an inductive type with constructors +%$c_1, \ldots, c_n$ used to open goals in order $g_1, g_n, g_{n-1} +%\ldots, g_2$. The library of \MATITA{} was still in an embryonic state +%but some theorems about integers were there. The inductive type of +%$\mathcal{Z}$ has three constructors: $zero$, $pos$ and $neg$. All the +%induction proofs on this type where written without tacticals and, +%obviously, considering the three induction cases in the wrong order. +%Fixing the behavior of the tactic broke the library and two days of +%work were needed to make it compile again. The whole time was spent in +%finding the list of tactics used to prove the third induction case and +%swap it with the list of tactics used to prove the second case. If +%the proofs was structured with the branch tactical this task could +%have been done automatically. +% +%From this experience we learned that the use of tacticals for +%structuring proofs gives some help but may have some drawbacks in +%proof script readability. + +Tacticals are not only used to make scripts shorter by factoring out +common cases and repeating commands. They are a primary way of making +scripts more mainteable. Moreover, they also have the well-known +role of structuring the proof. + +However, authoring a proof structured with tacticals is annoying. +Consider for example a proof by induction, and imagine you +are using one of the state of the art graphical interfaces for proof assistant +like Proof General. After applying the induction principle you have to choose: +immediately structure the proof or postpone the structuring. +If you decide for the former you have to apply the branching tactical and write +at once tactics for all the cases. Since the user does not even know the +generated goals yet, she can only replace all the cases with the identity +tactic and execute the command, just to receive feedback on the first +goal. Then she has to go one step back to replace the first identity +tactic with the wanted one and repeat the process until all the +branches are closed. + +One could imagine that a structured script is simpler to understand. +This is not the case. +A proof script, being not declarative, is not meant to be read. +However, the user has the need of explaining it to others. +This is achieved by interactively re-playing the script to show each +intermediate proof status. Tacticals make this operation uncomfortable. +Indeed, a tactical is executed atomically, while it is obvious that it +performs lot of smaller steps we are interested in. +To show the intermediate steps, the proof must be de-structured on the +fly, for example replacing ``\texttt{;}'' with ``\texttt{.}'' where +possible.\\ + +%Proof scripts +%readability is poor by itself, but in conjunction with tacticals it +%can be nearly impossible. The main cause is the fact that in proof +%scripts there is no trace of what you are working on. It is not rare +%for two different theorems to have the same proof script.\\ +%Bad readability is not a big deal for the user while he is +%constructing the proof, but is considerably a problem when he tries to +%reread what he did or when he shows his work to someone else. The +%workaround commonly used to read a script is to execute it again +%step-by-step, so that you can see the proof goal changing and you can +%follow the proof steps. This works fine until you reach a tactical. A +%compound statement, made by some basic tactics glued with tacticals, +%is executed in a single step, while it obviously performs lot of proof +%steps. In the fist example of the previous section the whole branch +%over the two goals (respectively the left and right part of the logic +%and) result in a single step of execution. The workaround does not work +%anymore unless you de-structure on the fly the proof, putting some +%``\texttt{.}'' where you want the system to stop.\\ + +%Now we can understand the tradeoff between script readability and +%proof structuring with tacticals. Using tacticals helps in maintaining +%scripts, but makes it really hard to read them again, cause of the way +%they are executed. + +\MATITA{} has a peculiar tacticals implementation that provides the +same benefits as classical tacticals, while not burdening the user +during proof authoring and re-playing. + +%\MATITA{} uses a language of tactics and tacticals, but tries to avoid +%this tradeoff, alluring the user to write structured proof without +%making it impossible to read them again. \subsubsection{The \MATITA{} approach: Tinycals} @@ -1687,51 +1709,41 @@ making it impossible to read them again. \MATITA{} tacticals syntax is reported in Tab.~\ref{tab:tacsyn}. While one would expect to find structured constructs like $\verb+do+~n~\NT{tactic}$ the syntax allows pieces of tacticals to be written. -This is essential for base idea behind \MATITA{} tacticals: step-by-step +This is essential for the base idea behind \MATITA{} tacticals: step-by-step execution. The low-level tacticals implementation of \MATITA{} allows a step-by-step execution of a tactical, that substantially means that a $\NT{block\_kind}$ is -not executed as an atomic operation. This has two major benefits for the user, -even being a so simple idea: -\begin{description} -\item[Proof structuring] - is much easier. Consider for example a proof by induction, and imagine you - are using classical tacticals in one of the state of the - art graphical interfaces for proof assistant like Proof General or \COQIDE. - After applying the induction principle you have to choose: structure - the proof or not. If you decide for the former you have to branch with - ``\texttt{[}'' and write tactics for all the cases separated by - ``\texttt{|}'' and then close the tactical with ``\texttt{]}''. - You can replace most of the cases by the identity tactic just to - concentrate only on the first goal, but you will have to go one step back and - one further every time you add something inside the tactical. Again this is - caused by the one step execution of tacticals and by the fact that to modify - the already executed script you have to undo one step. - And if you are board of doing so, you will finish in giving up structuring - the proof and write a plain list of tactics.\\ - With step-by-step tacticals you can apply the induction principle, and just - open the branching tactical ``\texttt{[}''. Then you can interact with the - system reaching a proof of the first case, without having to specify any - tactic for the other goals. When you have proved all the induction cases, you - close the branching tactical with ``\texttt{]}'' and you are done with a - structured proof. \\ - While \MATITA{} tacticals help in structuring proofs they allow you to - choose the amount of structure you want. There are no constraints imposed by - the system, and if the user wants he can even write completely plain proofs. +not executed as an atomic operation. This has major benefits for the +user during proof structuring and re-playing. + +For instance, reconsider the previous example of a proof by induction. +With step-by-step tacticals the user can apply the induction principle, and just +open the branching tactical ``\texttt{[}''. Then she can interact with the +system until the proof of the first case is terminated. After that +``\texttt{|}'' is used to move to the next goal, until all goals are +closed. After the last goal, the user closes the branching tactical with +``\texttt{]}'' and is done with a structured proof. \\ +While \MATITA{} tacticals help in structuring proofs they allow you to +choose the amount of structure you want. There are no constraints imposed by +the system, and if the user wants he can even write completely plain proofs. -\item[Rereading] - is possible. Going on step by step shows exactly what is going on. Consider - again a proof by induction, that starts applying the induction principle and - suddenly branches with a ``\texttt{[}''. This clearly separates all the - induction cases, but if the square brackets content is executed in one single - step you completely loose the possibility of rereading it and you have to - temporary remove the branching tactical to execute in a satisfying way the - branches. Again, executing step-by-step is the way you would like to review - the demonstration. Remember that understanding the proof from the script is - not easy, and only the execution of tactics (and the resulting transformed - goal) gives you the feeling of what is going on. -\end{description} +Re-playing a proof is also made simpler. There is no longer any need +to destructure the proof on the fly since \MATITA{} executes each +tactical not atomically. + +%\item[Rereading] +% is possible. Going on step by step shows exactly what is going on. Consider +% again a proof by induction, that starts applying the induction principle and +% suddenly branches with a ``\texttt{[}''. This clearly separates all the +% induction cases, but if the square brackets content is executed in one single +% step you completely loose the possibility of rereading it and you have to +% temporary remove the branching tactical to execute in a satisfying way the +% branches. Again, executing step-by-step is the way you would like to review +% the demonstration. Remember that understanding the proof from the script is +% not easy, and only the execution of tactics (and the resulting transformed +% goal) gives you the feeling of what is going on. +%\end{description} \section{Standard library} \label{sec:stdlib} -- 2.39.2