1 \documentclass[runningheads]{llncs}
7 % \myincludegraphics{filename}{place}{width}{caption}{label}
8 \newcommand{\myincludegraphics}[5]{
11 \includegraphics[width=#3]{eps/#1.eps}
19 \usepackage{draftstamp}
21 \newcommand{\ws}{Web-Service}
22 \newcommand{\wss}{Web-Services}
23 \newcommand{\hbugs}{H-Bugs}
24 \newcommand{\helm}{HELM}
25 \newcommand{\Omegapp}{$\Omega$mega}
26 \newcommand{\OmegaAnts}{$\Omega$mega-Ants}
28 \title{Brokers and Web-Services for Automatic Deduction: a Case Study}
30 \author{Claudio Sacerdoti Coen \and Stefano Zacchiroli}
33 Department of Computer Science\\
34 University of Bologna\\
35 Via di Mura Anteo Zamboni 7, 40127 Bologna, ITALY\\
36 \email{sacerdot@cs.unibo.it}
38 Department of Computer Science\\
39 \'Ecole Normale Sup\'erieure\\
40 45, Rue d'Ulm, F-75230 Paris Cedex 05, FRANCE\\
41 \email{zack@cs.unibo.it}
51 We present a planning broker and several Web-Services for automatic deduction.
52 Each Web-Service implements one of the tactics usually available in an
53 interactive proof-assistant. When the broker is submitted a "proof status" (an
54 unfinished proof tree and a focus on an open goal) it dispatches the proof to
55 the Web-Services, collects the successfull results, and send them back to the
56 client as "hints" as soon as they are available.
58 In our experience this architecture turns out to be helpful both for
59 experienced users (who can take benefit of distributing heavy computations)
60 and beginners (who can learn from it).
63 \section{Introduction}
64 The \ws{} approach at software development seems to be a working solution for
65 getting rid of a wide range of incompatibilities between communicating
66 software applications. W3C's efforts in standardizing related technologies
67 grant longevity and implementations availability for frameworks based on \wss{}\
68 for information exchange. As a direct conseguence, the number of such
69 frameworks is increasing and the World Wide Web is moving from a disorganized
70 repository of human-understandable HTML documents to a disorganized repository
71 of applications working on machine-understandable XML documents both for input
74 The big challenge for the next future is to provide stable and reliable
75 services over this disorganized, unreliable and ever-evolving architecture.
76 The standard solution \ednote{zack: buhm! :-P} is providing a further level of
77 stable services (called \emph{brokers}) that behave as common gateway/address
78 for client applications to access a wide variety of services and abstract over
81 Since the \emph{Declaration of Linz}, the MONET Consortium \cite{MONET},
82 following the guidelines \ednote{guidelines non e' molto appropriato, dato che
83 il concetto di broker non e' definito da W3C e co} of the \wss{}/brokers
84 approach, is working on the development of a framework aimed at providing a
85 set of software tools for the advertisement and discovering of mathematical
87 %CSC This framework turns out to be strongly based on both \wss{} and brokers.
89 Several groups have already developed \wss{} providing both computational and
90 reasoning capabilities \cite{???,???,???}\ednote{trovare dei puntatori carini
91 dalle conferenze calculemus}: the first ones are implemented on top of
92 Computer Algebra Systems; the last ones provide interfaces to well-known
93 theorem provers. Proof-planners, proof-assistants, CAS themselves and
94 domain-specific problem solvers are natural candidates to be client of these
95 services. Nevertheless, so far the number of examples in the literature has
96 been extremely low and the concrete benefits are still to be assessed.
98 In this paper we present an architecture, namely \hbugs{}, implementing a
99 \emph{suggestion engine} for the proof assistant developed on behalf of the
100 \helm{} project. We provide several \wss{} (called \emph{tutors}) able to
101 suggest possible ways to proceed in a proof. The tutors are orchestrated
102 by a broker (a \ws{} itself) that is able to distribute a proof
103 status from a client (the proof-assistant) to the tutors;
104 each tutor try to make progress in the proof and, in case
105 of success, notify the client that shows an \emph{hint} to the user.
106 Both the broker and the tutors are instances of the homonymous entities of
109 A precursor of \hbugs{} is the \OmegaAnts{} project \cite{???},
110 which provided similar functionalities to the
111 \Omegapp{} proof-planner \cite{Omega}. The main architectural difference
112 between \hbugs{} and \OmegaAnts{} is that the latter is based on a
113 black-board architecture and it is not implemented using \wss{} and
114 brokers. Other differences will be detailed in Sect. \ref{conclusions}.
116 In Sect. \ref{architecture} we present the architecture of \hbugs{}.
117 Further implementation details are given in Sect. \ref{implementation}.
118 Sect. \ref{tutors} is an overview of the tutors that have been implemented.
119 As usual, the paper ends with the conclusions and future works.
122 {CSC: Non so se/dove mettere queste parti.
123 Zack: per ora facciamo senza e vediamo se/quanto spazio abbiamo, la prima parte
124 non e' molto utile, ma la seconda sugli usi tipici di proof assistant
127 Despite of that the proof assistant case seems to be well suited to
128 investigate the usage of many different mathematical \wss{}. Indeed: most proof
129 assistants are still based on non-client/server architectures, are
130 application-centric instead of document-centric, offer a scarce level of
131 automation leaving entirely to the user the choice of which macro (usually
132 called \emph{tactic}) to use in order to make progress in a proof.
134 The average proof assistant can be, for example, a client of a \ws{}\
135 interfacing a specific or generic purpose theorem prover, or a client of a
136 \ws{} interfacing a CAS to simplify expressions in a particular mathematical
140 \section{Architecture}
142 \myincludegraphics{arch}{t}{8cm}{\hbugs{} architecture}{\hbugs{} architecture}
144 The \hbugs{} architecture (depicted in Fig. \ref{arch}) is based on three
145 different kinds of actors: \emph{clients}, \emph{brokers}, and \emph{tutors}.
146 Each actor present one or more \ws{} interface to its neighbours \hbugs{}
149 In this section we will detail the role and requiremente of each kind of
150 actors and discuss about the correspondencies between them and the MONET
151 entities described in \cite{MONET-Overview}.
154 An \hbugs{} client is a software component able to produce \emph{proof
155 status} and to consume \emph{hints}.
157 \ednote{"status" ha il plurale?}
158 A proof status is a representation of an incomplete proof and is supposed to
159 be informative enough to be used by an interactive proof assistant. No
160 additional requirement exists on the proof status, but there should be an
161 agreement on its format between clients and tutors. An hint is a
162 representation of a step that can be performed in order to proceed in an
163 incomplete proof. Usually it represents a tactic available on some proof
164 assistant along with instantiation of its parameters for tactics which
167 Clients act both as \ws{} provider and requester (using W3C's terminology
168 \cite{ws-glossary}). They act as providers for the broker (to receive hints)
169 ans as requesters again for the broker (to submit new status). Clients
170 additionally use broker service to know which tutors are available and to
171 subscribe to one or more of them.
173 Usually, when the role of client is taken by an interactive proof assistant,
174 new status are sent to the broker as soon as the proof change (e.g. when the
175 user applies a tactic or when a new proof is started) and hints are shown to
176 the user be the means of some effect in the user interface (e.g. popping a
177 dialog box or enlightening a tactic button).
179 \hbugs{} clients act as MONET clients and ask broker to provide access to a
180 set of services (the tutors). \hbugs{} has no actors corresponding to MONET's
181 Broker Locating Service (since the client is supposed to know the URI of at
182 least one broker) and Mathematical Object Manager (since proof status are
183 built on the fly and not stored).
186 \hbugs{} brokers are the key actors of the \hbugs{} architecture since they
187 act as intermediaries between clients and tutors. They act both as \ws{}
188 provider and requester \emph{both} for clients and tutors.
190 With respect to client, a broker act as \ws{} provider receiving status and
191 forwarding them to one or more tutors. They act as \ws{} requester sending
192 hints to client as soon as they are available from tutors.
194 With respect to tutors, the \ws{} provider role is accomplished by receiving
195 hints as soon as they are produced; the \ws{} requester one is accomplished
196 by requesting computations (\emph{musings} in \hbugs{} terminology) on status
197 received by clients and stopping out of date ongoing musings.
199 Additionally brokers keep track of available tutors and subscription of each
202 \hbugs{} brokers act as MONET brokers implementing the following components:
203 Client Manager, Service Registry Manager (keeping track of available
204 tutors), Planning Manager (chosing the available tutors among the ones to
205 which the client is subscribed), Execution Manager. \ednote{non e' chiaro se
206 in monet le risposte siano sincrone o meno, se cosi' fosse dobbiamo
207 specificare che nel nostro caso non lo sono}
210 Tutors are software component able to consume proof status producing hints.
211 \hbugs{} doesn't specify by which means hints should be produced, tutors can
212 use any means necessary (heuristics, external theorem prover or CAS, ...).
213 The only requirement is that exists an agreement on the formats of proof
216 tutors act both as \ws{} providers and requesters for broker. As providers,
217 they wait for commands requesting to start a new musing on a given proof
218 status or to stop an old, out of date, musing. As requesters, they signal to
219 the broker the end of a musing along with its outcome (an hint in case of
220 success or a notification of failure).
222 \hbugs{} tutors act as MONET services.
224 \section{Implementation details}
225 \label{implementation}
226 \ednote{Zack: l'aspetto grafico di questa parte e' un po' peso, possiamo
227 aggiungere varie immagini volendo, e.g.: schema dei thread di un tutor, sample
228 code di un tutor generato automaticamente}
229 \ednote{Zack: ho cambiato idea riguardo a questa parte, mettere la lista delle
230 interfacce di client/broker/... mi sembrava sterile e palloso. Di conseguenza ho
231 messo i punti chiave dell'implementazione, dimmi cosa te ne pare ...}
232 In this section we present some of the most relevant implementation details of
233 the \hbugs{} architecture.
236 \paragraph{Proof status}
237 In our implementation of the \hbugs{} architecture we used the proof
238 assistant of the \helm{} project (codename ``gTopLevel'') as an \hbugs{}
239 client. Thus we have implemented serialization/deserialization capabilities
240 fot its internal status. In order to be able to describe \wss{} that
241 exchange status in WSDL using the XML Schema type system, we have chosen an
242 XML format as the target format for the serialization.
244 A schematic representation of the gTopLevel internal status is depicted in
245 Fig. \ref{status}. Each proof is representated by a 4-tuple: \emph{uri},
246 \emph{metasenv}, \emph{proof}, \emph{term}.
248 \myincludegraphics{status}{t}{8cm}{gTopLevel proof status}{gTopLevel proof
252 \item[uri]: an URI chosen by the user at the beginning of the proof
253 process. Once (and if) proved, that URI will globally identify the term
254 inside the \helm{} framework (given that the user decide to save it).
255 \item[metasenv]: a list of opened \emph{goals}, in order to complete the
256 proof the user has to fill them all. Each of them is identified by an
257 integer index, has a context (\emph{meta}) and a type (\emph{goal}). The
258 context is in turn represented as a list of named hypotheses
259 (declarations and definitions).
260 \item[proof]: the current incomplete proof tree. It can contain
261 metavariables (\emph{holes}) that need to be proved in order to complete
262 the proof. Each metavariables appearing in the tree reference one
263 element of the current metasenv.
264 \item[term]: the term that the ongoing proof aims to prove
267 Each of these information is represented in XML as described in
268 \cite{csc-thesis}. Additionally \hbugs{} status carry an integer value
269 referencing one of the opened goal; using this value is possible to
270 implement different client side strategies: the user could ask the tutors to
271 work on the same goal as the user or to work on ``background'' goals.
274 An hint in the \hbugs{} architecture should carry enough information to
275 permit to the client to make progress in the current proof. In our
276 implementation each hint reference one of the tactic available to the user
279 For tactics that don't require any particular argument (like tactics that
280 apply type constructors or try to automatically conclude equality goals)
281 only the tactic name is represented in the hint. For tactic that need terms
282 as arguments (for example the \emph{Apply} tactic) the hint includes a
283 textual representation of them, using the same representation used by the
284 interactive proof assistent when querying user for terms.
286 Hints are serialized to an XML format for the same reasons reported for the
289 Actually is also possible for a tutor to return more hints at once, grouping
290 them in a particular XML element. This feature turns out to be particulary
291 useful for the \emph{searchPatternApply} tutor (Sect. \ref{tutors}) that
292 query a term database and return to the client a list of all terms that
293 could be used to complete the proof. This particular hint is encoded as a
294 list of Apply hints, each of them having one of the results as term
297 We would like to stress once more that neither the hint nor the status
298 representation are fixed by the \hbugs{} architecture. Is only necessary
299 that a common format exists and is understood by both clients and tutors.
301 Additionally, please note that there is no ``trust'' assumed on tutors'
302 hints. Being encoded as references to available tactics and argument imply
303 that an \hbugs{} client, on receipt of an hint, simply try to reply the work
304 done by a tutor on the local copy of the proof. The application of the hint
305 can even fail to type check and the client copy of the proof can be left
306 undamaged after spotting the error (assuming that client's type checking
307 implementation is correct, though).
309 Other choices of hints encoding (like sending the \emph{result} of the hint
310 application), even if reducing total computation time, don't benefit of this
313 \paragraph{Registries}
314 Being central in the \hbugs{} architecture, the broker is also responsible
315 of accounting operations both for clients and tutors. These operations are
316 implemented using three different data structures called \emph{registries}:
317 clients registry, tutors registry and musings registry.
319 In order to use the suggestion engine a client should register itself to the
320 broker and subscribe to one or more tutors. The registration phase is
321 triggered by the client using the \emph{Register\_client} method of the
322 broker to send him an unique identifier and the base URI at which client's
323 \ws{} is available. After the registration, the client can use broker's
324 \emph{List\_tutors} method to get a list of available tutors. Eventually the
325 client can subscribe to one or more of these using broker's \emph{Subscribe}
326 method. Clients can also unregister from brokers using
327 \emph{Unregister\_client} method.
329 The broker keeps track of both registered clients and clients' subscriptions
330 in the clients registry.
332 In order to be advertised to clients during the subscription phase, tutors
333 should register to the broker using broker's \emph{Register\_tutor} method.
334 This method is really similar to the already seen Register\_client: tutors
335 are required to send an unique identify and a base URI for their \ws{}.
336 Additionally tutors are required to send an human readable description of
337 their capabilities, this information could be used by client's user to
338 decide which tutors he need to subscribe to. Like clients, tutors can
339 unregister from brokers using \emph{Unregister\_broker} method.
341 Track of available tutors is kept in the tutors registry.
343 Each time a new status is available from the client, it is sent to the
344 broker using its \emph{Status} method. Using both clients registry (to
345 lookup client's subscription) and tutors registry (to check if some tutors
346 has unsubscribed), the broker is able to decide to which tutor forward the
347 new status just received.
349 The forwarding operation is performed using tutors' \emph{Start\_musing}
350 method, that is a request to start a new computation (\emph{musing}) on a
351 given status. The return value for Start\_musing method is a musing
352 identifier that is saved in the musings registry along with the identifier
353 of the client that triggered the starting of the musing.
355 As soon as a musing is completed, the owning tutor informs the broker using
356 its \emph{Musing\_completed} method, the broker can now remove the musing
357 entry from the musings registry and, depending on its outcome, inform the
358 client. In case of success one of the Musing\_completed arguments is an hint
359 to be sent to the client, otherwise there's no need to inform him and the
360 Musing\_completed method is just to update the musings registry.
362 Consulting the musings registry, the tutor is able to know, at each time,
363 which musings are in execution on which tutor. This peculiarity is exploited
364 by the broker on invocation of Status method. Receiving a new status from
365 the client imply indeed that the previous status no longer exists and all
366 musings working on it should be stopped. Additionally to the already
367 described behaviour (i.e. starting new musings on the received status), the
368 tutor takes also care of stopping ongoing computation invoking
369 \emph{Stop\_musing} tutors' method.
372 As already discussed, all \hbugs{} actors act as \wss{} offering one or more
373 services to neighbour actors. To grant as most accessibility as possible to
374 our \wss{} we have chosen to bind them using the HTTP/POST bindings
375 described in \cite{????}.
377 Given that our proof assistant was entirely developed in the Objective Caml
378 language, we have chosen to develop also \hbugs{} in that language in order
379 to maximize code reuse. To develop \wss{} in Objective Caml we have
380 developed an auxiliary generic library (\emph{O'HTTP}) that can be used to
381 write HTTP 1.1 web servers and abstract over GET/POST parsing. This library
382 supports different kind of web servers architecture, including multi-process
383 and multi-threaded ones.
386 Each tutor expose a \ws{} interface and should be able to work, not only for
387 many different clients referring to a common broker, but also for many
388 different brokers. The potential number of clients is therefore very high
389 and is simply too non scalable to think at a single-process architecture.
391 Our choice was for a multi threaded architecture exploiting capabilities of
392 the O'HTTP library. Each tutor is composed by two thread always running plus
393 an additional thread for each running musing. One thread is devoted to
394 listening for incoming \ws{} request, upon correct receiving requests it
395 pass the control to the second always-running thread which handle the pool
396 of running musings. When a new musing is requested, a new thread is spawned
397 to work them out; when a request to interrupt an old musing is received, the
398 thread actually running them is killed freeing its resources.
400 This architecture turns out to be scalable and permit at the running threads
401 to share the type checker cache \ednote{Zack: brrrr ... abbiamo il coraggio
402 di dirlo?, mai provato :-)}. This feature turns out to be really useful
403 expecially for reflexive tactics, like Ring and Fourier, that ``reflect''
404 terms of some domain (e.g. polynomials over real numbers) in other abstract
405 domain. Type checking of most of the terms and operation in the abstract
406 domain can be indeed cached and shared by all the threads.
408 The implementation of a tutor with the described architecture is not that
409 difficult having a language with good threading capabilities (as OCaml has)
410 and a pool of implemented tactics (as gTopLevel has). Still working with
411 threads is known to be really error prone due to concurrent programming
412 intrinsic complexity.
414 To get rid of this technical problem we have written (and thoroughly tested)
415 a ``generic'' implementation of a tutor, and templated (using O'Caml
416 functors) them over some parameters like the tactic to be invoked, the hint
417 to send in case of success and a textual description of the tutor. This
418 generic implementation use O'HTTP to implement the \ws{} part of the tutor
419 and some generic routines we have implemented to handle deserialization of
420 received status and serialization of generated hints.
422 Having a list of available tactics and hints we are able to generate
423 automatically source code that implement all the requested tutors.
424 Obviously, this approach works only for the implementation of ``simple'' (or
425 ``stupid'', if you like) tutors that simply interface an already existing
426 proof assistant tactic and can't be used to implement more raffinate
427 decision procedures. Nonetheless, given that the number of available tactics
428 was quite high, it offered a lot of practical advantages.
430 \section{The \hbugs Tutors}
432 To test the \hbugs architecture and to assess the utility of a suggestion
433 engine for the end user, we have implemented several tutors. In particular,
434 we have investigated three classes of tutors:
436 \item \emph{Tutors for beginners}. These are tutors that implement tactics
437 which are neither computationally expensive nor difficult to understand:
438 an expert user can always understand if the tactic can be applied or not
439 withouth having to try it. For example, the following implemented tutors
440 belong to this class:
442 \item \emph{Assumption Tutor}: it ends the proof if the thesis is
443 equivalent\footnote{The equivalence relation is convertibility. In some
444 cases, expecially when non-trivial computations are involved, the user
445 is totally unable to figure out the convertibility of two terms.
446 In these cases the tutor becomes handy also for expert users.}
447 to one of the hypotheses.
448 \item \emph{Contradiction Tutor}: it ends the proof by \emph{reductio ad
449 adsurdum} if one hypothesis is equivalent to $False$.
450 \item \emph{Symmetry Tutor}: if the goal thesis is an equality, it
451 suggests to apply the commutative property.
452 \item \emph{Left/Right/Exists/Split/Constructor Tutors}: the Constructor
453 Tutor suggests to proceed in the proof by applying one or more
454 constructors when the goal thesis is an inductive type or a
455 proposition inductively defined according to the declarative style.
456 Since disjunction, conjunction and existential quantifications are
457 particular cases of inductive propositions,
458 all the other tutors of this class implements restrictions of the
459 the Constructor tactic: Left and Right suggest to prove a disjunction
460 by proving its left/right member; Split reduces the proof of a
461 conjunction to the two proof of its members; Exists suggests to
462 prove an existential quantification by providing a
463 witness\footnote{This task is left to the user.}.
465 Beginners, when first faced with a tactic-based proof-assistant, get
466 lost quite soon since the set of tactics is large and their names and
467 semantics must be remembered by heart. Tutorials are provided to guide
468 the user step-by-step in a few proofs, suggesting the tactics that must
469 be used. We believe that our beginners tutors can provide an auxiliary
470 learning tool: after the tutorial, the user is not suddendly left alone
471 with the system, but she can experiment with variations of the proof given
472 in the tutorial as much as she like, still getting useful suggestions.
473 Thus the user is allowed to focus on learning how to do a formal proof
474 instead of wasting efforts trying to remember the interface to the system.
475 \item{Tutors for Computationally Expensive Tactics}. Several tactics have
476 an unpredictable behaviour, in the sense that it is unfeasible to understand
477 wether they will succeed or they will failt when applied and what will be
478 their result. Among them, there are several tactics either computationally
479 expensive or resources consuming. In the first case, the user is not
480 willing to try a tactic and wait for a long time just to understand its
481 outcome: she would prefer to keep on concentrating on the proof and
482 have the tactic applied in background and receive out-of-band notification
483 of its success. The second case is similar, but the tactic application must
484 be performed on a remote machine to avoid overloading the user host
485 with several concurrent resource consuming applications.
487 Finally, several complex tactics and in particular all the tactics based
488 on reflexive techniques depend on a pretty large set of definitions, lemmas
489 and theorems. When these tactics are applied, the system needs to retrieve
490 and load all the lemmas. Pre-loading all the material needed by every
491 tactic can quickly lead to long initialization times and to large memory
492 footstamps. A specialized tutor running on a remote machine, instead,
493 can easily pre-load the required theorems.
495 As an example of computationally expensive task, we have implemented
496 a tutor for the \emph{Ring} tactic. The tutor is able to prove an equality
497 over a ring by reducing both members to a common normal form. The reduction,
498 which may require some time in complex cases,
499 is based on the usual commutative, associative and neutral element properties
500 of a ring. The tactic is implemented using a reflexive technique, which
501 means that the reduction trace is not stored in the proof-object itself:
502 the type-checker is able to perform the reduction on-the-fly thanks to
503 the conversion rules of the system. As a consequence, in the library there
504 must be stored both the algorithm used for the reduction and the proof of
505 correctness of the algorithm, based on the ring axioms. This big proof
506 and all of its lemmas must be retrieved and loaded in order to apply the
507 tactic. The Ring tutor loads and cache all the required theorems the
508 first time it is conctacted.
509 \item{Intelligent Tutors}. Expert users can already benefit from the previous
510 class of tutors. Nevertheless, to achieve a significative production gain,
511 they need more intelligent tutors implementing domain-specific theorem
512 provers or able to perform complex computations. These tutors are not just
513 plain implementations of tactics or decision procedures, but can be
514 more complex software agents interacting with third-parties software,
515 such as proof-planners, CAS or theorem-provers.
517 To test the productivity impact of intelligent tutors, we have implemented
518 a tutor that is interfaced with the HELM
519 Proof-Engine\footnote{\url{http://mowgli.cs.unibo.it/library.html}} and that
520 is able to look for every theorem in the distributed library that can
521 be applied to proceed in the proof. Even if the tutor deductive power
522 is extremely limited\footnote{We do not attempt to check if the new goals
523 obtained applying a lemma can be authomatically proved or, even better,
524 auhomatically disproved to reject the lemma.}, it is not unusual for
525 the tutor to come up with precious hints that can save several minutes of
526 works that would be spent in proving again already proven results or
527 figuring out where the lemmas could have been stored in the library.
530 Once the code that implements a tactic or decision procedure is available,
531 transforming it into a tutor is not a difficult task, but it is surely
532 repetitive and error prone.\ednote{Zack: perche' hai messo qui questa parte?
533 Vuoi riportare qui la generazione automatica dei tutor? per ora lo messo nella
534 sezione implementazione}
539 % \section{Introduction}
540 % Since the development of the first proof-assistants based on the
541 % Curry-Howard isomorphism, it became clear that directly writing
542 % lambda-terms (henceforth called simply terms) was a difficult, repetitive,
543 % time-expensive and error prone activity; hence the introduction of
544 % meta-languages to describe procedures that are able to automatically
545 % generate the low-level terms.
546 % Nowadays, almost all the proof-assistants
547 % using terms description of proofs have many levels of abstractions, i.e.
548 % meta-languages, to create the terms, with the only remarkable exception
549 % of the ALF family \cite{ALF} in which terms are still directly written
550 % without any over-standing level of abstraction.
552 % In particular, there are
553 % usually at least two levels, that of tactics and that of the language in
554 % which the whole system and hence the tactics are written; once the tactics
555 % are implemented, to do a proof the user enters to the system a sequence of
556 % tactics, called a script; the script is then executed to create the term
557 % encoding of the proof, which is type-checked by the kernel of the
558 % proof-assistant. Writing a script interactively is much simpler than
559 % writing the term by hands; once written, though, it becomes impossible
560 % to understand a script without replaying it interactively. For this reason,
561 % we can hardly speak of the language of tactics as a high level language,
562 % even if it is ``compiled'' to the language of terms.
564 % To avoid confusion, in the rest of this paper we will
565 % avoid the use of the term ``proof'', using only ``script'' and ``term'';
566 % moreover, we will avoid also the terms ``proof-checking'' and ``type-checking'',
567 % replacing them with ``retyping'' to mean the type-checking of an already
568 % generated term; finally, we will use the term ``compiling'' to mean the
569 % execution of a script. Usually, compilation ends with
570 % the type-checking of the generated term; hence the choice of ``retyping'' for
571 % the type-checking of an already generated term.
573 % A long term consequence of the introduction of tactics has been the
574 % progressive lowering of interest in terms. In particular, users of
575 % modern proof-assistants as Coq \cite{Coq} may even ignore the existence of
576 % commands to show the terms generated by their scripts. These terms are
577 % usually very huge and quite unreadable, so they don't add any easily
578 % accessible information to the scripts. Hence implementors have loosed
579 % interest in terms too, as far as their size and compilation time are
580 % small enough to make the system response-time acceptable.
581 % When this is not the case, it is sometimes possible to trade space
582 % with time using reflexive tactics, in which the potentiality of complex
583 % type-systems to speak about themselves are exploited: reflexive tactics
584 % usually leads to a polynomial asympotical reduction in the size of the
585 % terms, at the cost of an increased reduction time during retyping and
586 % compilation. Often, though, reflexive tactics could not be used; moreover,
587 % reflexive tactics suffer of the same implementative problems of other tactics
588 % and so could be implemented in such a way to create huger than
589 % needed terms, even if asymptotically smaller than the best ones created
590 % without reflection.
592 % The low level of interest in terms of the implementors of
593 % tactics often leads to naive implementations (``If it works, it is OK'') and
594 % this to low-quality terms, which:
596 % \item are huger than needed because particular cases are not taken
597 % into account during tactic development
598 % \item require more time than needed for retyping due to
600 % \item are particularly unreadable because they don't
601 % correspond to the ``natural'' way of writing the proof by hand
603 % To cope with the second problem, retyping is usually avoided allowing
604 % systems to reload saved terms without retyping them and using a
605 % checksum to ensure that the saved file has not been modified. This is
606 % perfectly reasonable accordingly to the traditional application-centric
607 % architecture of proof-assistants in which you have only one tool integrating
608 % all the functionalities and so you are free to use a proprietary format for data
611 % In the last months, though, an ever increasing number of people and projects
612 % (see, for example, HELM \cite{EHELM}, MathWeb \cite{MATHWEB} and
613 % Formavie \cite{FORMAVIE})
614 % have been interested to switch from the application-centric model to the
615 % newer content-centric one, in which information is stored in
616 % standard formats (that is, XML based) to allow different applications to work
617 % on the same set of data. As a consequence, term size really becomes an
618 % important issue, due to the redundancy of standard formats, and retyping
619 % is needed because the applications can not trust each other, hence needing
620 % retyping and making retyping time critical.
621 % Moreover, as showed by Yann Coscoy in its PhD.
622 % thesis \cite{YANNTHESIS} or by the Alfa interface to the Agda system
623 % \cite{ALFA}, it is perfectly reasonable and also feasible to try
624 % to produce descriptions in natural languages of formal proofs encoded as
626 % This approach, combined with the further possibility of applying the usual
627 % two-dimensional mathematic notation to the formulas that appears in the
628 % terms, is being followed by projects HELM \cite{HELM}, PCOQ \cite{PCOQ} and
629 % MathWeb \cite{MATHWEB} with promising results. It must be understood, though,
630 % that the quality (in terms of naturality and readability) of this kind of
631 % proofs rendering heavily depends on the quality of terms, making also
632 % the third characteristic of low-quality terms a critical issue.
634 % A totally different scenario in which term size and retyping time are
635 % critical is the one introduced by Necula and Lee \cite{Necula} under
636 % the name Proof Carrying Code (PCC). PCC is a technique that can be used
637 % for safe execution of untrusted code. In a typical instance of PCC, a code
638 % receiver establishes a set of safety rules that guarantee safe behavior
639 % of programs, and the code producer creates a formal safety proof that
640 % proves, for the untrusted code, adherence to the safety rules. Then, the
641 % proof is transmitted to the receiver together with the code and it is
642 % retyped before code execution. While very compact representation of the
643 % terms, highly based on type-inference and unification, could be used
644 % to reduce the size and retyping time \cite{Necula2}, designing proof-assistants
645 % to produce terms characterized by an high level of quality is still necessary.
647 % In the next section we introduce a particular class of metrics for
648 % tactics evaluation. In section \ref{equivalence} we consider the
649 % notion of tactics equivalence and we describe one of the bad habits
650 % of tactics implementors, which is overkilling; we also provide and analyze
651 % a simple example of overkilling tactic. In the last section we describe
652 % a concrete experience of fixing overkilling in the implementation of a
653 % reflexive tactic in system Coq and we analyze the gain with respect to
654 % term-size, retyping time and term readability.
656 % \section{From Metrics for Terms Evaluation to Metrics for Tactics Evaluation}
657 % The aim of this section is to show how metrics for term evaluation could
658 % induce metrics for tactic evaluation. Roughly speaking, this allows us to
659 % valuate tactics in terms of the quality of the terms produced. Even if
660 % we think that these kinds of metrics are interesting and worth studying,
661 % it must be understood that many other valuable forms of metrics could be
662 % defined on tactics, depending on what we are interested in. For example,
663 % we could be interested on compilation time, that is the sum of the time
664 % required to generate the term and the retyping time for it. Clearly, only
665 % the second component could be measured with a term metric and a good
666 % implementation of a tactic with respect to the metric considered could
667 % effectively decide to sacrifice term quality (and hence retyping time) to
668 % minimize the time spent to generate the term. The situation is very close to
669 % the one already encountered in compilers implementation, where there is
670 % always a compromise, usually user configurable, between minimizing compiling
671 % time and maximizing code quality.
673 % The section is organized as follows: first we recall the definition of
674 % tactic and we introduce metrics on terms; then we give the definition
675 % of some metrics induced by term metrics on tactics.
677 % \begin{definition}[Of tactic]
678 % We define a \emph{tactic} as a function that, given a goal $G$ (that is, a local
679 % context plus a type to inhabit) that satisfies a list of assumptions
680 % (preconditions) $P$,
681 % returns a couple $(L,t)$ where
682 % $L = {L_1,\ldots,L_n}$ is a (possible empty) list of proof-obligations
683 % (i.e. goals) and $t$ is a function that, given a list $l = {l_1,\ldots,l_n}$
684 % of terms such that $l_i$ inhabits\footnote{We say, with a small abuse of
685 % language, that a term $t$ inhabits a goal $G=(\Gamma,T)$ when $t$ is of type
686 % $T$ in the context $\Gamma$.} $L_i$ for each $i$ in ${1,\ldots,n}$, returns a
687 % term $t(l)$ inhabiting $G$.
690 % % You can get another looser definition of tactic just saying that a tactic
691 % % is a partial function instead than a total one. In this paper when referring
692 % % to tactics we always refer to the first definition.
694 % \begin{definition}[Of term metric]
695 % For any goal $G$, a term metric $\mu_G$ is any function in
696 % $\mathbb{N}^{\{t/t\;\mbox{inhabits}\;G\}}$.
697 % Two important class of term metrics are functional metrics and monotone
700 % \item {\bf Functional metrics:} a metric $\mu_G$ is \emph{functional}
701 % if for each term context (=term with an hole) $C[]$ and for all terms
702 % $t_1$,$t_2$ if $\mu_G(t_1)=\mu_G(t_2)$ then $\mu_G(C[t_1])=\mu_G(C[t_2])$.
703 % An equivalent definition is that a metric $\mu_G$ is functional
704 % if for each term context $C[]$ the function
705 % $\lambda t.\mu_G(C[\mu_G^{-1}(t)])$ is well defined.
706 % \item {\bf Monotone metrics:} a metric $\mu_G$ is \emph{monotone} if for
707 % each term context $C[]$ and for all terms $t_1$,$t_2$ if
708 % $\mu_G(t_1)\le\mu_G(t_2)$ then $\mu_G(C[t_1])\le\mu_G(C[t_2])$.
709 % An equivalent definition is that a metric $\mu_G$ is functional if
710 % for each term context $C[]$ the function
711 % $\lambda t.\mu_G(C[\mu_G^{-1}(t)])$ is well defined and monotone.
714 % % Vecchie definizioni
716 % %\item {\bf Monotony:} a metric $\mu_G$ is \emph{monotone} if for each term
717 % %context (=term with an hole) $C[]$ and for all terms $t_1$,$t_2$ if
718 % %$\mu_G(t_1)\le\mu_G(t_2)$ then $\mu_G(C[t_1])\le\mu_G(C[t_2])$
719 % %\item {\bf Strict monotony:} a monotone metric $\mu_G$ is \emph{strictly
720 % %monotone} if for each term context $C[]$ and
721 % %for all terms $t_1$,$t_2$ if $\mu_G(t_1)=\mu_G(t_2)$ then
722 % %$\mu_G(C[t_1])=\mu_G(C[t_2])$
726 % Typical examples of term metrics are the size of a term, the time required
727 % to retype it or even an estimate of its ``naturality'' (or simplicity) to be
728 % defined somehow; the first two are also examples of monotone metrics
729 % and the third one could probably be defined as to be. So, in the rest
730 % of this paper, we will restrict to monotone metrics, even if the
731 % following definitions also work with weaker properties for general metrics.
732 % Here, however, we are not interested in defining such metrics, but in showing
733 % how they naturally induce metrics for tactics.
735 % Once a term metric is chosen, we get the notion of a best term (not unique!)
737 % \begin{definition}[Of best terms inhabiting a goal]
738 % the term $t$ inhabiting a goal $G$ is said to be a best term inhabiting $G$
739 % w.r.t. the metric $\mu_G$ when
740 % $\mu_G(t) = min\{\mu_G(t') / t'\;\mbox{inhabits}\;G\}$.
743 % Using the previous notion, we can confront the behavior of two tactics
747 % Let $\tau_1$ and $\tau_2$ be two tactics both applyable to a goal $G$
748 % such that $\tau_1(G) = (L_1,t_1)$ and $\tau_2(G) = (L_2,t_2)$.
749 % We say that $\tau_1$ is better or equal than $\tau_2$ for the goal $G$
750 % with respect to $\mu_G$ if, for all $l_1$ and $l_2$ lists of best terms
751 % inhabiting respectively $L_1$ and $L_2$, $\mu_G(t_1(l_1)) \le \mu_G(t_2(l_2))$
754 % Note that confronting in this way ``equivalent'' tactics
755 % (whose definition is precised in the next section) gives us information
756 % on which implementation is better; doing the same thing on tactics that
757 % are not equivalent, instead, gives us information about what tactic to
758 % apply to obtain the best proof.
760 % A (functional) metric to confront two tactics only on a particular goal
761 % has the nice property to be a total order, but is quite useless. Hence,
763 % define a bunch of different tactic metrics induced by term metrics
764 % that can be used to confront the behavior of tactics when applied to
765 % a generic goal. Some of them will be \emph{deterministic} partial orders;
766 % others will be total orders, but will provide only a \emph{probabilistic}
767 % estimate of the behavior. Both kinds of metrics are useful in practice
768 % when rating tactics implementation.
769 % \begin{definition}[Of locally deterministic better or equal tactic]
770 % $\tau_1$ is a locally deterministic (or locally
771 % uniform) better or equal tactic
772 % than $\tau_2$ w.r.t. $\mu$ (and in that case we write $\tau_1 \le_{\mu} \tau_2$
773 % or simply $\tau_1 \le \tau_2$), when
774 % for all goals $G$ satisfying the preconditions of both tactics we have that
775 % $\tau_1$ is better or equal than $\tau_2$ w.r.t. the metric $\mu_G$.
778 % \begin{definition}[Of locally deterministic better tactic]
779 % $\tau_1$ is a locally deterministic (or locally uniform)
780 % better tactic than $\tau_2$
781 % w.r.t. $\mu$ (and in that case we write $\tau_1 <_{\mu} \tau_2$
782 % or simply $\tau_1 < \tau_2$), when $\tau_1 \le_{\mu} \tau_2$ and
783 % exists a goal $G$ satisfying the preconditions of both tactics such that
784 % $\tau_1$ is better (but not equal!) than $\tau_2$ w.r.t. the metric $\mu_G$.
787 % \begin{definition}[Of locally probabilistic better or equal tactic of a factor K]
788 % $\tau_1$ is said to be a tactic locally probabilistic better or equal of a
789 % factor $0.5 \le K \le 1$ than $\tau_2$ w.r.t. $\mu$ and a particular expected
790 % goals distribution when the probability of having $\tau_1$ better or equal than
791 % $\tau_2$ w.r.t. the metric $\mu_G$ is greater or equal to $K$ when $G$ is
792 % chosen randomly according to the distribution.
794 % The set of terms being discrete, you can note that a deterministically better
795 % or equal tactic is a tactic probabilistically better or equal of a factor 1.
798 % To end this section, we can remark the strong dependence of the $\le$
799 % relation on the choice of metric $\mu$, so that it is easy to find
800 % two metrics $\mu_1,\mu_2$ such that $\tau_1 <_{\mu_1} \tau_2$ and
801 % $\tau_2 <_{\mu_2} \tau_1$. Luckily, tough, the main interesting metrics,
802 % term size, retyping time and naturality, are in practice highly correlated,
803 % though the correlation of the third one with the previous two could be a
804 % bit surprising. So, in the following section, we
805 % will not state what is the chosen term metric; you may think as
806 % any of them or even at some kind of weighted mean.
809 % \section{Equivalent Tactics and Overkilling}
810 % \label{equivalence}
811 % We are now interested in using the metrics defined in the previous section
812 % to confront tactics implementation. Before doing so, though, we have to
813 % identify what we consider to be different implementations of the
814 % same tactic. Our approach consists in identifying every implementation
815 % with the tactic it implements and then defining appropriate notions of
816 % equivalence for tactics: two equivalent tactics will then be considered
817 % as equivalent implementations and will be confronted using metrics.
819 % Defining two tactics as equivalent when they can solve exactly the same
820 % set of goals generating the same set of proof-obligations seems quite natural,
821 % but is highly unsatisfactory if not completely wrong. The reason is that, for
822 % equivalent tactics, we would like to have the \emph{property of substitutivity},
823 % that is substituting a tactic for an equivalent one in a script should give
824 % back an error-free script\footnote{A weaker notion of substitutivity is that
825 % substituting the term generated by a tactic for the term generated by an
826 % equivalent one in a generic well-typed term should always give back a
827 % well-typed term.}. In logical frameworks with dependent types,
828 % without proof-irrelevance and with universes as CIC \cite{Werner} though,
829 % it is possible for a term to inspect the term of a previous proof, behaving
830 % in a different way, for example, if the constructive proof of a conjunction
831 % is made proving the left or right side.
832 % So, two tactics, equivalent w.r.t. the previous
833 % definition, that prove $A \vee A$ having at their disposal an
834 % hypothesis $A$ proving the first one the left and the second one
835 % the right part of the conjunction, could not be substituted one for the
836 % other if a subsequent term inspects the form of the generated proof.
838 % Put in another way, it seems quite reasonable to derive equivalence for
839 % tactics from the definition of an underlying equivalence for terms.
840 % The simplest form of such an equivalence relation is convertibility
841 % (up to proof-irrelevance) of closed terms and this is the relation
842 % we will use in this section and the following one. In particular,
843 % we will restrict ourselves to CIC and hence to
844 % $\beta\delta\iota$-convertibility\footnote{The Coq proof-assistant
845 % introduces the notion of \emph{opaque} and \emph{transparent} terms,
846 % differing only for the possibility of being inspected. Because the
847 % user could change the opacity status at any time, the notion of
848 % convertibility we must conservatively choose for the terms of Coq is
849 % $\beta\delta\iota$-convertibility after having set all the definitions
851 % Convertibility, though, is a too restrictive notion that does not take
852 % in account, for example, commuting conversions. Looking for more suitable
853 % notions of equivalence is our main open issue for future work.
855 % \begin{definition}[Of terms closed in a local environment]
856 % A term $t$ is \emph{closed} in a local environment $\Gamma$ when $\Gamma$
857 % is defined on any free variable of $t$.
860 % \begin{definition}[Of equivalent tactics]
861 % We define two tactics $\tau_1$ and $\tau_2$ to be equivalent (and
862 % we write $\tau_1 \approx \tau_2$) when for each goal $G = (\Gamma,T)$ and for
863 % each list of terms closed in $\Gamma$ and inhabiting the proof-obligations
864 % generated respectively by $\tau_1$ and $\tau_2$, we have that the result terms
865 % produced by $\tau_1$ and $\tau_2$ are $\beta\delta\iota$-convertible.
868 % Once we have the definition of equivalent tactics, we can use metrics,
869 % either deterministic or probabilistic, to confront them. In particular,
870 % in the rest of this section and in the following one we will focus on the
871 % notion of deterministically overkilling tactic, defined as follows:
873 % \begin{definition}[Of overkilling tactics]
874 % A tactic $\tau_1$ is (deterministically) overkilling w.r.t. a metric
875 % $\mu$ when there exists another tactic $\tau_2$ such that
876 % $\tau_1 \approx \tau_2$ and $\tau_2 <_\mu \tau_1$.
879 % Fixing an overkilling tactic $\tau_1$ means replacing it with the
880 % tactic $\tau_2$ which is the witness of $\tau_1$ being overkilling.
881 % Note that the fixed tactic could still be overkilling.
883 % The name overkilling has been chosen because most of the time overkilling
884 % tactics are tactics that do not consider special cases, following the general
885 % algorithm. While in computer science it is often a good design pattern to
886 % prefer general solutions to ad-hoc ones, this is not a silver bullet:
887 % an example comes another time from compiler technology, where
888 % ad-hoc cases, i.e. optimizations, are greatly valuable if not necessary.
889 % In our context, ad-hoc cases could be considered either as optimizations,
890 % or as applications of Occam's razor to proofs to keep the simplest one.
892 % \subsection{A Simple Example of Overkilling Tactic}
893 % A first example of overkilling tactic in system Coq is
894 % \emph{Replace}, that works in this way: when the current goal
895 % is $G = (\Gamma,T)$, the
896 % tactic ``{\texttt Replace E$_1$ with E$_2$.}'' always produces a new
897 % principal proof-obligation $(\Gamma, T\{E_2/E_1\})$ and an auxiliary
898 % proof-obligation $(\Gamma, E_1=E_2)$ and uses the elimination scheme
899 % of equality on the term $E_1 = E_2$ and the two terms that inhabit the
900 % obligations to prove the current goal.
902 % To show that this tactic is overkilling, we will provide an example
903 % in which the tactic fails to find the best term, we will
904 % propose a different implementation that produces the best term and we will
905 % show the equivalence with the actual one.
907 % The example consists in applying the tactic in the case in which $E_1$ is
908 % convertible to $E_2$: the tactic proposes to the user the two
909 % proof-obligations and then builds the term as described above. We claim
910 % that the term inhabiting the principal proof-obligation also inhabits
911 % the goal and, used as the generated term, is surely smaller
912 % and quicker to retype than the one that is generated in the
913 % implementation; moreover, it
914 % is also as natural as the previous one, in the sense that the apparently
915 % lost information has simply become implicit in the reduction and could
916 % be easily rediscovered using type-inference algorithms as the one described
917 % in Coscoy's thesis \cite{YANNTHESIS}. So, a new implementation could
918 % simply recognize this special case and generate the better term.
920 % We will now show that the terms provided by the two implementations are
921 % $\beta\delta\iota$-convertible.
922 % Each closed terms in $\beta\delta\iota$-normal form
923 % inhabiting the proof that $E_1$ is equal to $E_2$ is equal to the only
924 % constructor of equality applied to a term convertible to the type of
925 % $E_1$ and to another term convertible to $E_1$; hence, once the principle
926 % of elimination of equality is applied to this term, we can first apply
927 % $\beta$-reduction and then $\iota$-reduction to obtain the term inhabiting
928 % the principal proof-obligation in which $E_1$ has been replaced by $E_2$.
929 % Since $E_1$ and $E_2$ are $\beta\delta\iota$-convertible by hypothesis and
930 % for the congruence properties of convertibility in CIC, we have that the
931 % generated term is $\beta\delta\iota$-convertible to the one inhabiting the
932 % principal proof-obligation.
934 % %The simplest example is when $E_1$ and $E_2$ are syntactically equal:
935 % %in this case the principal obligation proposed is identical to the current goal
936 % %and the equality elimination introduced could be replaced by the proof of
937 % %this obligation, leading to a smaller, quicker to retype and also more
938 % %natural\footnote{Even if naturality is in some way a subjective metric,
939 % %who would dare say that a proof in which you rewrite an expression with
940 % %itself is natural?} term. To show that handling this special case in this
941 % %way is equivalent to the current solution, we have to show that the two
942 % %terms are $\beta\delta\iota$-convertible when so are the subterms inhabiting
944 % %The only closed term in $\beta\delta\iota$-normal form
945 % %inhabiting the proof that $E_1$ is equal to $E_1$ is the only constructor
946 % %of equality applied to the type of $E_1$ and to $E_1$; hence we can
947 % %first apply $\beta$-reduction and then $\iota$-reduction to obtain
948 % %the second proof in which $E_1$ has been replaced by $E_1$, i.e. the
951 % %So, for this example, the simple fix to avoid overkilling is checking
952 % %if $E_1$ and $E_2$ are syntactically equal and in case omitting the elimination;
953 % %curiously, this is done in Coq in the implementation of a very similar tactic
954 % %called \emph{Rewriting}.
957 % %A second, more involved, example is when $E_1$ and $E_2$ are not
958 % %syntactically equal, but $E_1$ reduces to $E_2$. Even in this case
959 % %the elimination could simply be avoided because the typing rules of
960 % %the logical system ensures us that a term that inhabits the principal
961 % %obligation also inhabits the current goal: the proof is equivalent to the
962 % %previous one, but for the last step in which $E_2$ is actually substituted for
963 % %$E_1$; by hypothesis, though, the two terms are $\beta\delta\iota$-convertible
964 % %and hence the thesis.
966 % %Surely smaller and faster to retype, the new term is also as natural
967 % %as the previous one, in the sense that the apparently lost information has
968 % %simply become implicit in the reduction and could be easily rediscovered
969 % %using type-inference algorithms as the one described in chapter ???
970 % %of Coscoy's thesis \ref{YANNTHESIS}.
972 % This example may seem quite stupid because, if the user is already able to
973 % prove the principal proof-obligation and because this new goal is totally
974 % equivalent to the original one, the user could simply redo the same steps
975 % without applying the rewriting at all. Most of the time, though, the
976 % convertibility of the two terms could be really complex to understand,
977 % greatly depending on the exact definitions given; indeed, the user could
978 % often be completely unaware of the convertibility of the two terms. Moreover,
979 % even in the cases in which the user understands the convertibility, the
980 % tactic has the important effect of changing the form of the current
981 % goal in order to simplify the task of completing the proof, which is the reason
982 % for the user to apply it.
986 % The previous example shows only a very small improvement in the produced
987 % term and could make you wonder if the effort of fixing overkilling and
988 % more in general if putting more attention to terms when implementing
989 % tactics is really worth the trouble. In the next section we describe as
990 % another example a concrete experience of fixing a complex reflexive tactic
991 % in system Coq that has lead to really significant improvements in term
992 % size, retyping time and naturality.
994 % \section{Fixing Overkilling: a Concrete Experience}
995 % Coq provides a reflexive tactic called \emph{Ring} to do associative-commutative
996 % rewriting in ring and semi-ring structures. The usual usage is,
997 % given the goal $E_1 = E_2$ where $E_1$ and $E_2$ are two expressions defined
998 % on the ring-structure, to prove the goal reducing it to proving
999 % $E'_1 = E'_2$ where $E'_i$ is the normal form of $E_i$. In fact, once
1000 % obtained the goal $E'_1 = E'_2$, the tactic also tries to apply simple
1001 % heuristics to automatically solve the goal.
1003 % The actual implementation of the tactic by reflexion is quite complex and
1004 % is described in \cite{Ring}. The main idea is described in Fig. \ref{ring1}:
1005 % first of all, an inductive data type to describe abstract polynomial is
1006 % made available. On this abstract polynomial, using well-founded recursion in
1007 % Coq, a normalization function named $apolynomial\_normalize$ is defined;
1008 % for technical reasons, the abstract data type of normalized polynomials
1009 % is different from the one of un-normalized polynomials. Then, two
1010 % interpretation functions, named $interp\_ap$ and $interp\_sacs$ are given
1011 % to map the two forms of abstract polynomials to the concrete one. Finally,
1012 % a theorem named $apolynomial\_normalize\_ok$ stating the equality of
1013 % the interpretation of an abstract polynomial and the interpretation of
1014 % its normal form is defined in Coq using well-founded induction. The above
1015 % machinery could be used in this way: to prove that $E^I$ is equal to its
1016 % normal form $E^{IV}$, the tactic computes an abstract polynomial $E^{II}$ that,
1017 % once interpreted, reduces to $E^{I}$, and such that the interpretation
1018 % of $E^{III} = (apolynomial\_normalize E^{II})$ could be shown to be equal
1019 % to $E^{IV}$ applying $apolynomial\_normalize\_ok$.
1021 % %such that $(interp\_ap\;E^{II})$ could be proved (theorem
1022 % %$apolynomial\_normalize\_ok$) in Coq to be equal to
1023 % %$(interp\_sacs\;E^{III})$ where
1024 % %$E^{III} = (apolynomial\_normalize E^{II})$ is another abstract polynomial
1025 % %in normal form and the three functions $interp\_ap$, $interp\_sacs$ and
1026 % %$apolynomial\_normalize$ could all be defined inside Coq using well-founded
1027 % %recursion/induction on the abstract polynomial definition.
1029 % % \myincludegraphics{ring1}{t}{12cm}{Reflexion in Ring}{Reflexion in Ring}
1030 % % \myincludegraphics{ring2}{t}{10cm}{Ring implementation (first half)}{Ring implementation (first half)}
1032 % In Fig. \ref{ring2} the first half of the steps taken
1033 % by the Ring tactic to prove $E_1 = E_2$ are shown\footnote{Here $E_1$
1034 % stands for $E^I$.}.
1035 % The first step is replacing $E_1 = E_2$ with $(interp\_ap\;E^{II}_1) = E_2$,
1036 % justifying the rewriting using the only one constructor of equality due to
1037 % the $\beta\delta\iota$-convertibility of $(interp\_ap\;E^{II}_1)$ with $E_1$.
1038 % The second one is replacing $(interp\_ap\;E^{II})$ with
1039 % $(interp\_sacs\;E^{III})$, justifying the rewriting using
1040 % $apolynomial\_normalize\_ok$.
1042 % Next, the two steps are done again on the left part of the equality,
1043 % obtaining $(interp\_sacs\;E^{III}_1) = (interp\_sacs\;E^{III}_2)$,
1044 % that is eventually solved trying simpler tactics as \emph{Reflexivity} or left
1047 % The tactic is clearly overkilling, at least due to the usage of rewriting for
1048 % convertible terms. Let's consider as a simple example the session in Fig.
1052 % \mbox{\hspace{3cm}\tt Coq $<$ Goal ``0*0==0``.}\\
1053 % \mbox{\hspace{3cm}\tt 1 subgoal}\\
1055 % \mbox{\hspace{3cm}\tt ~~=================}\\
1056 % \mbox{\hspace{3cm}\tt ~~~~``0*0 == 0``}\\
1058 % \mbox{\hspace{3cm}\tt Unnamed\_thm $<$ Ring.}\\
1059 % \mbox{\hspace{3cm}\tt Subtree proved!}
1061 % \caption{A Coq session.}
1064 % in Fig. \ref{before} the $\lambda$-term created by the
1065 % original overkilling implementation of Ring is shown. Following the previous
1066 % explanation, it should be easily understandable. In particular, the
1067 % four rewritings are clearly visible as applications of $eqT\_ind$, as
1068 % are the two applications of $apolynomial\_normalize\_ok$ and the
1069 % three usage of reflexivity, i.e. the two applications of $refl\_eqT$
1070 % to justify the rewritings on the left and right members of the equality
1071 % and the one that ends the proof.
1073 % Let's start the analysis of overkilling in this implementation:
1074 % \begin{description}
1075 % \item[Always overkilling rewritings:] as already stated, four of the rewriting
1076 % steps are always overkilling because the rewritten term is convertible to
1077 % the original one due to the tactic implementation.
1078 % As proved in the previous section, all these rewritings could be simply
1079 % removed obtaining an equivalent tactic.
1080 % \item[Overkilling rewritings due to members already normalized:] it may happen,
1081 % as in our example, that one (or even both) of the two members is already in
1082 % normal form. In this case the two rewriting steps for that member could be
1083 % simply removed obtaining an equivalent tactic as shown in the previous section.
1084 % \item[Rewriting followed by reflexivity:] after having removed all
1085 % the overkilling rewritings, the general form of the $\lambda$-term produced
1086 % for $E_1 = E_2$ is the application of two rewritings ($E'_1$ for $E_1$ and
1087 % $E'_2$ for $E_2$), followed by a proof of $E'_1 = E'_2$. In many cases,
1088 % $E'_1$ and $E'_2$ are simply convertible and so the tactic finishes the
1089 % proof with an application of reflexivity to prove the equivalent goal
1091 % A smaller and also more natural solution is just to
1092 % rewrite $E'_1$ for $E_1$ and then proving $E'_1 = E_2$ applying the lemma
1093 % stating the symmetry of equality to the proof of $E_2 = E'_2$.
1094 % The equivalence to the original
1095 % tactic is trivial by $\beta\iota$-reduction because the lemma is proved
1096 % exactly doing the rewriting and then applying reflexivity:
1100 % \hspace{0.2cm}\lambda x,y:A.\\
1101 % \hspace{0.4cm}\lambda H:(x==y).\\
1102 % \hspace{0.6cm}(eqT\_ind~A~x~ [x:A]a==x~(refl\_eqT~A~x)~y~H)
1106 % In Fig. \ref{after} is shown the $\lambda$-term created by the same
1107 % tactic after having fixed all the overkilling problems described above.
1111 % Unnamed_thm < Show Proof.
1116 % (interp_ap R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R)
1117 % (APmult AP0 AP0)) [r:R]``r == 0``
1119 % (interp_sacs R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R)
1120 % Nil_varlist) [r:R]``r == 0``
1122 % (interp_ap R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R) AP0)
1124 % ``(interp_sacs R Rplus Rmult 1 r Ropp (Empty_vm R) Nil_varlist)
1127 % (interp_sacs R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R)
1130 % ``(interp_sacs R Rplus Rmult 1 r Ropp (Empty_vm R)
1131 % Nil_varlist) == r`` (refl_eqT R ``0``)
1132 % (interp_ap R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R) AP0)
1133 % (apolynomial_normalize_ok R Rplus Rmult ``1`` ``0`` Ropp
1134 % [_,_:R]false (Empty_vm R) RTheory AP0)) ``0``
1135 % (refl_eqT R ``0``))
1136 % (interp_ap R Rplus Rmult ``1`` ``0`` Ropp (Empty_vm R)
1138 % (apolynomial_normalize_ok R Rplus Rmult ``1`` ``0`` Ropp
1139 % [_,_:R]false (Empty_vm R) RTheory (APmult AP0 AP0))) ``0*0``
1140 % (refl_eqT R ``0*0``))
1142 % \caption{The $\lambda$-term created by the original overkilling implementation}
1148 % Unnamed_thm < Show Proof.
1152 % (sym_eqT R ``0`` ``0*0``
1153 % (apolynomial_normalize_ok R Rplus Rmult ``1`` ``0`` Ropp
1154 % [_,_:R]false (Empty_vm R) RTheory (APmult AP0 AP0)))
1156 % \caption{The $\lambda$-term created by the new implementation}
1161 % \subsection{A Quantitative Analysis of the Gain Obtained}
1162 % Let's now try a quantitative analysis of the gain with respect to term size,
1163 % retyping time and naturality, considering the two interesting cases of
1164 % no member or only one member already in normal form\footnote{If the two
1165 % members are already in normal form, the new implementation simply applies
1166 % once the only constructor of the equality to one of the two members. The tactic
1167 % is also implemented to do the same thing also when the two members are not yet
1168 % in normal forms, but are already convertible. We omit this other improvement
1169 % in our analysis.}.
1171 % \subsubsection{Term Size.}
1172 % \paragraph{Terms metric definition:} given a term $t$, the metric $|.|$ associates to it its number of nodes $|t|$.
1173 % \paragraph{Notation}: $|T|$ stands for the number of nodes in the actual parameters
1174 % given to $interp\_ap$, $interp\_sacs$ and $apolynomial\_normalize\_ok$ to
1175 % describe the concrete (semi)ring theory and the list of non-primitive
1176 % terms occurring in the goal to solve. In the example in figures \ref{before}
1177 % and \ref{after}, $|T|$ is the number of nodes in
1178 % \begin{texttt}[R Rplus Rmult ``1`` ``0`` Ropp (Empty\_vm R)]\end{texttt}.
1179 % $|R|$ stands for the number of nodes in the term which is the carrier
1180 % of the ring structure. In the same examples, $|R|$ is simply 1, i.e.
1181 % the number of nodes in \begin{texttt}R\end{texttt}.
1183 % \begin{displaymath}
1185 % \mbox{\bf Original version:}\\
1187 % 1 + (|E^{II}_1| + |T| + 1) + |E_2| + |E_1| + &
1188 % \mbox{(I rewriting Left)} \\
1190 % \mbox{(justification)} \\
1191 % 1 + (|E^{III}_1| + |T| + 1) + |E_2| + (|E^{II}_1| + |T| + 1) + &
1192 % \mbox{(II rewriting Left)} \\
1193 % (|E^{II}_1| + |T| + 1) + &
1194 % \mbox{(justification)} \\
1195 % 1 + (|E^{II}_2| + |T| + 1) + (|E^{III}_1| + |T| + 1) + |E_2| + &
1196 % \mbox{(I rewriting Right)} \\
1198 % \mbox{(justification)} \\
1199 % 1 + (|E^{III}_2| + |T| + 1) + (|E^{III}_1| + |T| + 1) + &
1200 % \mbox{(II rewriting Right)} \\
1201 % ~(|E^{II}_2| + |T| + 1) +& \\
1202 % (|E^{II}_2| + |T| + 1) + &
1203 % \mbox{(justification)} \\
1205 % \mbox{(reflexivity application)} \\
1207 % 4|E_1| + 2|E_2| + 3|E^{II}_1| + 3|E^{II}_2| + 3|E^{III}_1| + |E^{III}_2| +~ &
1208 % \mbox{\bf Total number} \\
1214 % \begin{displaymath}
1216 % \mbox{\bf New version, both members not in normal form:}\\
1218 % 1 + |R| + |E_1| + |E'_2| + &
1219 % \mbox{(Rewriting Right)} \\
1220 % 1 + |T| + |E^{II}_2| + &
1221 % \mbox{(justification)} \\
1222 % 1 + |R| + |E'_2| + |E'_1| + |E_2| + &
1223 % \mbox{(Symmetry application)} \\
1224 % 1 + |T| + |E^{II}_1| = &
1225 % \mbox{(justification)} \\
1227 % 2|E_1| + |E_2| + |E^{II}_1| + |E^{II}_2| + 2|E'_2| + 2|T| +~ &
1228 % \mbox{\bf Total number} \\
1234 % \begin{displaymath}
1236 % \mbox{\bf New version, only the first member not in normal form:}\\
1238 % 1 + |R| + |E_1| + |E'_2| + &
1239 % \mbox{(Rewriting)} \\
1240 % 1 + |T| + |E^{II}_2| = &
1241 % \mbox{(justification)} \\
1243 % |E_1| + |E'_2| + |E^{II}_2| + |T| + |R| + 2~~~ &
1244 % \mbox{\bf Total number} \\
1250 % While the overall space complexity of the terms generated by the new
1251 % implementation is asymptotically equal to the one of the old implementation,
1252 % all the constants involved are much smaller, but for the one of
1253 % $E'_2$ (the two normal forms) that before was 0 and now is
1254 % equal to 2. Is it possible to have goals for which the new implementation
1255 % behaves worst than the old one? Unfortunately, yes. This happens when
1256 % the size of the two normal forms $E'_1$ and $E'_2$ is greatly huger than
1257 % ($E^{II}_1 + |T| + 1)$ and $(E^{II}_2 + |T| + 1)$. This happens when
1258 % the number of occurrences of non-primitive terms is much higher than
1259 % the number of non-primitive terms and the size of them is big. More
1260 % formally, being $m$ the number of non-primitive terms, $d$ the average
1261 % size and $n$ the number of occurrences, the new implementation creates bigger
1262 % terms than the previous one if
1263 % \begin{displaymath}
1264 % n \log_2 m + m d < n d
1266 % where the difference between the two members is great enough to hide the gain
1267 % achieved lowering all the other constants.
1268 % The logarithmic factor in the previous
1269 % formula derives from the implementation of the map of variables to
1270 % non-primitive terms as a tree and the representation of occurrences with
1271 % the path inside the tree to retrieve the term.
1273 % To fix the problem, for each non-primitive term occurring more than once
1274 % inside the normal forms, we can use a \emph{let \ldots in} local definition
1275 % to bind it to a fresh identifier; then we replace every occurrence of
1276 % the term inside the normal forms with the appropriate
1277 % identifier\footnote{This has not yet been implemented in Coq.}.
1278 % In this way, the above inequation becomes
1279 % \begin{displaymath}
1280 % n \log_2 m + m d < n + m d
1282 % that is never satisfied.
1284 % Here it is important to stress how the latest problem was easily
1285 % overlooked during the implementation and has been discovered
1286 % only during the previous analysis, strengthening our belief in
1287 % the importance of this kind of analysis for tactic implementations.
1289 % In the next two paragraphs we will consider only the new implementation
1290 % with the above fixing.
1292 % \subsubsection{Retyping Time.}
1293 % \paragraph{Terms metric definition:} given a term $t$, the metric $|.|$ associates to it the time $|t|$ required to retype it.
1295 % Due to lack of space, we will omit a detailed analysis as the one given
1296 % for terms size. Nevertheless, we can observe that the retyping time required
1297 % is surely smaller because all the type-checking operations required for
1298 % the new implementation are already present in the old one, but for the
1299 % type-checking of the two normal forms, that have fewer complexity
1300 % than the type-checking of the two abstract normal forms, and the
1301 % \emph{let \ldots in} definitions that have the same complexity of the
1302 % type-checking of the variable map. Moreover, the quite expensive operation
1303 % of computing the two normal forms is already done during proof construction.
1305 % %In the case in which both the members of the equality are not in normal form,
1306 % %we can't expect a great improvement in retyping time but for very cheap
1307 % %normalizations; this because retyping time is dominated by the factor due to
1308 % %$apolynomial\_normalize$ that is present in both implementations and is
1309 % %usually much higher than the factor due to the removed applications of
1310 % %$interp\_ap$, $interp\_sacs$ and the eliminations of equality.
1312 % %The situation is very different if one of the two members is already
1313 % %convertible to its normal form and the reduction involved in the normalization
1314 % %is expensive. In this rather unlikely case, the new implementation
1315 % %avoids the reduction at all, roughly halving the overall retyping time.
1317 % In section \ref{benchmarks} we present some benchmarks to give an idea of
1318 % the real gain obtained.
1320 % \subsubsection{Naturality.}~\\~\\
1321 % The idea behind the \emph{Ring} tactic is to be able to prove
1322 % an equality showing that both members have the same normal form.
1323 % This simply amounts to show that each member is equal to the same
1324 % normal form, that is exactly what is done in the new implementation.
1325 % Indeed, every step that belonged to the old implementation and has been
1326 % changed or removed to fix overkilling used to lead to some unnatural step:
1328 % \item The fact that the normalization is not done on the concrete
1329 % representation, but passing through two abstract ones that are
1330 % interpreted on the concrete terms is an implementative detail that
1331 % was not hidden as much as possible as it should be.
1332 % \item Normalizing a member of the equality that is already in normal form,
1333 % is illogical and so unnatural. Hence it should be avoided, but it was not.
1334 % \item The natural way to show $A=B$ under the hypothesis $B=A$ is just
1335 % to use the symmetric property of equality. Instead, the old implementation
1336 % rewrote $B$ with $A$ using the hypothesis and proved the goal by reflexivity.
1337 % \item Using local definitions (\emph{let \ldots in}) as abbreviations
1338 % rises the readability of the proof by shrinking its size removing
1339 % subexpressions that are not involved in the computation.
1342 % \subsection{Some Benchmarks}
1343 % \label{benchmarks}To understand the actual gain in term size and retyping
1344 % time on real-life examples, we have done some benchmarks on the whole set
1345 % of theorems in the standard library of Coq that use the Ring tactic. The
1346 % results are shown in table~\ref{benchs}.
1348 % Term size is the size of the disk dump of the terms. Re-typing time is the
1349 % user time spent by Coq in proof-checking already parsed terms. The reduction
1350 % of the terms size implies also a reduction in Coq parsing time, that is
1351 % difficult to compute because Coq files do not hold single terms, but whole
1352 % theories. Hence, the parsing time shown is really the user time spent by Coq
1353 % to parse not only the terms on which we are interested, but also all the
1354 % terms in their theories and the theories on which they depend. So, this last
1355 % measure greatly under-estimates the actual gain.
1357 % Every benchmark has been repeated 100 times under different load conditions on
1358 % a 600Mhz Pentium III bi-processor equipped with 256Mb RAM. The timings shown
1363 % \begin{tabular}{|l|c|c|c|}
1365 % & ~Term size~ & Re-typing time & Parsing time\\
1367 % Old implementation & 20.27Mb & 4.59s & 2.425s \\
1369 % New implementation & 12.99Mb & 2.94s & 2.210s \\
1371 % Percentage reduction & 35.74\% & 35.95\% & 8.87\% \\
1375 % \caption{Some benchmarks}
1379 % \section{Conclusions and Future Work}
1380 % Naive ways of implementing tactics lead to low quality terms that are
1381 % difficult to inspect and process. To improve the situation, we show
1382 % how metrics defined for terms naturally induce metrics for tactics and
1383 % tactics implementation and we advocate the usage of such metrics for
1384 % tactics evaluation and implementation. In particular, metrics could
1385 % be used to analyze the quality of an implementation or could be used at run
1386 % time by a tactic to choose what is the best way to proceed.
1388 % To safely replace a tactic implementation with another one, it is important
1389 % to define when two tactics are equivalent, i.e. generate equivalent terms.
1390 % In this work, the equivalence relation chosen for terms has simply been
1391 % $\beta\delta\iota$-convertibility, that in many situations seems too strong.
1392 % Hence, an important future work is the study of weaker forms of term
1393 % equivalence and the equivalence relations they induce on tactics. In particular,
1394 % it seems that proof-irrelevance, $\eta$-conversion and commuting conversions
1395 % must all be considered in the definition of a suitable equivalence relation.
1397 \begin{thebibliography}{01}
1399 % \bibitem{ALF} The ALF family of proof-assistants:\\
1400 % {\tt http://www.cs.chalmers.se/ComputingScience/Research/\\Logic/implementation.mhtml}
1402 % \bibitem{Coq} The Coq proof-assistant:\\
1403 % {\tt http://coq.inria.fr/}
1405 % \bibitem{FORMAVIE} The Formavie project:\\
1406 % {\tt http://http://www-sop.inria.fr/oasis/Formavie/}
1408 % \bibitem{EHELM} The HELM project:\\
1409 % {\tt http://www.cs.unibo.it/helm/}
1411 % \bibitem{MATHWEB} The MathWeb project:\\
1412 % {\tt http://www.mathweb.org/}
1414 % \bibitem{PCOQ} The PCoq project:\\
1415 % {\tt http://www-sop.inria.fr/lemme/pcoq/}
1417 % \bibitem{HELM} A.Asperti, L.Padovani, C.Sacerdoti Coen, I.Schena.
1418 % Towards a library of formal mathematics. Panel session of
1419 % the 13th International Conference on Theorem Proving in Higher Order Logics (TPHOLS'2000),
1420 % Portland, Oregon, USA.
1422 % \bibitem{Ring} S. Boutin. Using reflection to build efficient and certified
1423 % decision procedures. In Martin Abadi and Takahashi Ito, editors, TACS'97,
1424 % volume 1281. LNCS, Springer-Verlag, 1997.
1426 % \bibitem{YANNTHESIS} Y.Coscoy. \emph{Explication textuelle de preuves pour le
1427 % Calcul des Constructions Inductives}, PhD. Thesis, Universit\'e de Nice-Sophia
1430 % \bibitem{ALFA} T. Hallgreen, Aarne Ranta. An Extensible Proof Text Editor.
1431 % Presented at LPAR2000.
1433 % \bibitem{Necula} G. Necula, P. Lee. Safe Kernel Extensions Without Run-Time
1434 % Checking. Presented at OSDI'96, October 1996.
1436 % \bibitem{Necula2} G. Necula, P. Lee. Efficient Representation and Validation of Proofs. Presented at LICS'98, June 1998
1438 % \bibitem{Werner} B. Werner. \emph{Une Th\'eorie des Constructions Inductives},
1439 % PhD. Thesis, Universit\'e Paris VII, May 1994.
1441 \end{thebibliography}