helm/papers/matita/matita.tex

   1 \documentclass{kluwer}
   2 \usepackage{color}
   3 \usepackage{graphicx}
   4 % \usepackage{amssymb,amsmath}
   5 \usepackage{hyperref}
   6 % \usepackage{picins}
   7 \usepackage{color}
   8 \usepackage{fancyvrb}
   9 \usepackage[show]{ed}
  10
  11 \definecolor{gray}{gray}{0.85}
  12 %\newcommand{\logo}[3]{
  13 %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
  14 %}
  15
  16 \newcommand{\AUTO}{\textsc{Auto}}
  17 \newcommand{\COQ}{Coq}
  18 \newcommand{\ELIM}{\textsc{Elim}}
  19 \newcommand{\HELM}{Helm}
  20 \newcommand{\HINT}{\textsc{Hint}}
  21 \newcommand{\IN}{\ensuremath{\dN}}
  22 \newcommand{\INSTANCE}{\textsc{Instance}}
  23 \newcommand{\IR}{\ensuremath{\dR}}
  24 \newcommand{\IZ}{\ensuremath{\dZ}}
  25 \newcommand{\LIBXSLT}{LibXSLT}
  26 \newcommand{\LOCATE}{\textsc{Locate}}
  27 \newcommand{\MATCH}{\textsc{Match}}
  28 \newcommand{\MATITA}{Matita}
  29 \newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline}
  30 \newcommand{\MOWGLI}{MoWGLI}
  31 \newcommand{\NAT}{\ensuremath{\mathit{nat}}}
  32 \newcommand{\NATIND}{\mathit{nat\_ind}}
  33 \newcommand{\NUPRL}{NuPRL}
  34 \newcommand{\OCAML}{OCaml}
  35 \newcommand{\PROP}{\mathit{Prop}}
  36 \newcommand{\REF}[3]{\ensuremath{\mathit{Ref}_{#1}(#2,#3)}}
  37 \newcommand{\TEXMACRO}[1]{\texttt{\char92 #1}}
  38 \newcommand{\UWOBO}{UWOBO}
  39 \newcommand{\WHELP}{Whelp}
  40
  41 \definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black
  42 \newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
  43 \newcommand{\URI}[1]{\texttt{#1}}
  44
  45 %{\end{SaveVerbatim}\setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{2mm}%
  46 \newenvironment{grafite}{\VerbatimEnvironment
  47  \begin{SaveVerbatim}{boxtmp}}%
  48  {\end{SaveVerbatim}\setlength{\fboxsep}{3mm}%
  49   \begin{center}
  50    \fcolorbox{black}{gray}{\BUseVerbatim[boxwidth=0.9\linewidth]{boxtmp}}
  51   \end{center}}
  52
  53 \newcounter{example}
  54 \newenvironment{example}{\stepcounter{example}\vspace{0.5em}\noindent\emph{Example} \arabic{example}.}
  55  {}
  56 \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
  57 \newcommand{\FILE}[1]{\texttt{#1}}
  58 % \newcommand{\NOTE}[1]{\ifodd \arabic{page} \else \hspace{-2cm}\fi\ednote{#1}}
  59 \newcommand{\NOTE}[1]{\ednote{x~\hspace{-10cm}y#1}{foo}}
  60 \newcommand{\TODO}[1]{\textbf{TODO: #1}}
  61
  62 \newsavebox{\tmpxyz}
  63 \newcommand{\sequent}[2]{
  64   \savebox{\tmpxyz}[0.9\linewidth]{
  65     \begin{minipage}{0.9\linewidth}
  66       \ensuremath{#1} \\
  67       \rule{3cm}{0.03cm}\\
  68       \ensuremath{#2}
  69     \end{minipage}}\setlength{\fboxsep}{3mm}%
  70   \begin{center}
  71    \fcolorbox{black}{gray}{\usebox{\tmpxyz}}
  72   \end{center}}
  73
  74 \bibliographystyle{plain}
  75
  76 \begin{document}
  77
  78 \begin{opening}
  79
  80  \title{The \MATITA{} Proof Assistant}
  81
  82 \author{Andrea \surname{Asperti} \email{asperti@cs.unibo.it}}
  83 \author{Claudio \surname{Sacerdoti Coen} \email{sacerdot@cs.unibo.it}}
  84 \author{Enrico \surname{Tassi} \email{tassi@cs.unibo.it}}
  85 \author{Stefano \surname{Zacchiroli} \email{zacchiro@cs.unibo.it}}
  86 \institute{Department of Computer Science, University of Bologna\\
  87  Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY}
  88
  89 \runningtitle{The Matita proof assistant}
  90 \runningauthor{Asperti, Sacerdoti Coen, Tassi, Zacchiroli}
  91
  92 % \date{data}
  93
  94 \begin{motto}
  95 ``We are nearly bug-free'' -- \emph{CSC, Oct 2005}
  96 \end{motto}
  97
  98 \begin{abstract}
  99  abstract qui
 100 \end{abstract}
 101
 102 \keywords{Proof Assistant, Mathematical Knowledge Management, XML, Authoring,
 103 Digital Libraries}
 104
 105 \end{opening}
 106
 107 \section{Introduction}
 108 \label{sec:intro}
 109 {\em Matita} is the proof assistant under development by the \HELM{} team
 110 \cite{mkm-helm} at the University of Bologna, under the direction of
 111 Prof.~Asperti.
 112 The origin of the system goes back to 1999. At the time we were mostly
 113 interested to develop tools and techniques to enhance the accessibility
 114 via web of formal libraries of mathematics. Due to its dimension, the
 115 library of the \COQ{} proof assistant (of the order of 35'000 theorems)
 116 was choosed as a privileged test bench for our work, although experiments
 117 have been also conducted with other systems, and notably with \NUPRL{}.
 118 The work, mostly performed in the framework of the recently concluded
 119 European project IST-33562 \MOWGLI{}~\cite{pechino}, mainly consisted in the
 120 following teps:
 121 \begin{itemize}
 122 \item exporting the information from the internal representation of
 123  \COQ{} to a system and platform independent format. Since XML was at the
 124 time an emerging standard, we naturally adopted this technology, fostering
 125 a content-based architecture for future system, where the documents
 126 of the library were the the main components around which everything else
 127 has to be build;
 128 \item developing indexing and searching techniques supporting semantic
 129  queries to the library; these efforts gave birth to our \WHELP{}
 130 search engine, described in~\cite{whelp};
 131 \item developing languages and tools for a high-quality notational
 132 rendering of mathematical information; in particular, we have been
 133 active in the MathML Working group since 1999, and developed inside
 134 \HELM{} a MathML-compliant widget for the GTK graphical environment
 135 which can be integrated in any application.
 136 \end{itemize}
 137 The exportation issue, extensively discussed in \cite{exportation-module},
 138 has several major implications worth to be discussed.
 139
 140 The first
 141 point concerns the kind of content information to be exported. In a
 142 proof assistant like \COQ{}, proofs are represented in at least three clearly
 143 distinguishable formats: \emph{scripts} (i.e. sequences of commands issued by the
 144 user to the system during an interactive session of proof), \emph{proof objects}
 145 (which is the low-level representation of proofs in the form of
 146 lambda-terms readable to and checked by kernel) and \emph{proof-trees} (which
 147 is a kind of intermediate representation, vaguely inspired by a sequent
 148 like notation, that inherits most of the defects but essentially
 149 none of the advantages of the previous representations).
 150 Partially related to this problem, there is the
 151 issue of the {\em granularity} of the library: scripts usually comprise
 152 small developments with many definitions and theorems, while
 153 proof objects correspond to individual mathematical items.
 154
 155 In our case, the choice of the content encoding was eventually dictated
 156 by the methodological assumption of offering the information in a
 157 stable and system-independent format. The language of scripts is too
 158 oriented to \COQ, and it changes too rapidly to be of any interest
 159 to third parties. On the other side, the language of proof objects
 160 merely depend on
 161 the logical framework (the Calculus of Inductive Constructions, in
 162 the case of \COQ), is grammatically simple, semantically clear and,
 163 especially, is very stable (as kernels of proof assistants
 164 often are).
 165 So the granularity of the library is at the level of individual
 166 objects, that also justifies from another point of view the need
 167 for efficient searching techniques for retrieving individual
 168 logical items from the repository.
 169
 170 The main (possibly only) problem with proof objects is that they are
 171 difficult to read and do not directly correspond to what the user typed
 172 in. An analogy frequently made in the proof assistant community is that of
 173 comparing the vernacular language of scripts to a high level source language
 174 and lambda terms to the assembly language they are compiled in. We do not
 175 share this view and prefer to look at scripts as an imperative language,
 176 and to lambda terms as their denotational semantics; still, however,
 177 denotational semantics is possibly more formal but surely not more readable
 178 than the imperative source.
 179
 180 For all the previous reasons, a huge amount of work inside \MOWGLI{} has
 181 been devoted to automatic reconstruction of proofs in natural language
 182 from lambda terms. Since lambda terms are in close connection
 183 with natural deduction
 184 (that is still the most natural logical language discovered so far)
 185 the work is not hopeless as it may seem, especially if rendering
 186 is combined, as in our case, with dynamic features supporting
 187 in-line expansions or contractions of subproofs. The final
 188 rendering is probably not entirely satisfactory (see \cite{ida} for a
 189 discussion), but surely
 190 readable (the actual quality largely depends by the way the lambda
 191 term is written).
 192
 193 Summing up, we already disposed of the following tools/techniques:
 194 \begin{itemize}
 195 \item XML specifications for the Calculus of Inductive Constructions,
 196 with tools for parsing and saving mathematical objects in such a format;
 197 \item metadata specifications and tools for indexing and querying the
 198 XML knowledge base;
 199 \item a proof checker (i.e. the {\em kernel} of a proof assistant),
 200  implemented to check that we exported form the \COQ{} library all the
 201 logically relevant content;
 202 \item a sophisticated parser (used by the search engine), able to deal
 203 with potentially ambiguous and incomplete information, typical of the
 204 mathematical notation \cite{};
 205 \item a {\em refiner}, i.e. a type inference system, based on complex
 206 existential variables, used by the disambiguating parser;
 207 \item complex transformation algorithms for proof rendering in natural
 208 language;
 209 \item an innovative rendering widget, supporting high-quality bidimensional
 210 rendering, and semantic selection, i.e. the possibility to select semantically
 211 meaningful rendering expressions, and to past the respective content into
 212 a different text area.
 213 \NOTE{il widget non ha sel semantica}
 214 \end{itemize}
 215 Starting from all this, the further step of developing our own
 216 proof assistant was too
 217 small and too tempting to be neglected. Essentially, we ``just'' had to
 218 add an authoring interface, and a set of functionalities for the
 219 overall management of the library, integrating everything into a
 220 single system. \MATITA{} is the result of this effort.
 221
 222 At first sight, \MATITA{} looks as (and partly is) a \COQ{} clone. This is
 223 more the effect of the circumstances of its creation described
 224 above than the result of a deliberate design. In particular, we
 225 (essentially) share the same foundational dialect of \COQ{} (the
 226 Calculus of Inductive Constructions), the same implementative
 227 language (\OCAML{}), and the same (script based) authoring philosophy.
 228 However, as we shall see, the analogy essentially stops here.
 229
 230 In a sense; we like to think of \MATITA{} as the way \COQ{} would
 231 look like if entirely rewritten from scratch: just to give an
 232 idea, although \MATITA{} currently supports almost all functionalities of
 233 \COQ{}, it links 60'000 lins of \OCAML{} code, against ... of \COQ{} (and
 234 we are convinced that, starting from scratch again, we could furtherly
 235 reduce our code in sensible way).\NOTE{righe \COQ{}}
 236
 237 \begin{itemize}
 238  \item scelta del sistema fondazionale
 239  \item sistema indipendente (da Coq)
 240   \begin{itemize}
 241    \item possibilit\`a di sperimentare (soluzioni architetturali, logiche,
 242     implementative, \dots)
 243    \item compatibilit\`a con sistemi legacy
 244   \end{itemize}
 245 \end{itemize}
 246
 247 \section{\HELM{} library(??)}
 248
 249 \subsection{libreria tutta visibile}
 250 \ASSIGNEDTO{csc}
 251 \NOTE{assumo che si sia gia' parlato di approccio content-centrico}
 252 Our commitment to the content-centric view of the architecture of the system
 253 has important consequences on the user's experience and on the functionalities
 254 of several components of \MATITA. In the content-centric view the library
 255 of mathematical knowledge is an already existent and completely autonomous
 256 entity that we are allowed to exploit and augment using \MATITA. Thus, in
 257 principle, when the user starts to prove a new theorem she has complete
 258 visibility of the library and she can refer to every definition and lemma,
 259 also using the mathematical notation already developed. In a similar way,
 260 every form of automation of the system must be able to analyze and possibly
 261 exploit every notion in the library.
 262
 263 The benefits of this approach highly justify the non neglectable price to pay
 264 in the development of several components. We analyse now a few of the causes
 265 of this additional complexity.
 266
 267 \subsubsection{Ambiguity}
 268 A rich mathematical library includes equivalent definitions and representations
 269 of the same notion. Moreover, mathematical notation inside a rich library is
 270 surely highly overloaded. As a consequence every mathematical expression the
 271 user provides is highly ambiguous since all the definitions,
 272 representations and special notations are available at once to the user.
 273
 274 The usual solution to the problem, as adopted for instance in Coq, is to
 275 restrict the user's scope to just one interpretation for each definition,
 276 representation or notation. In this way much of the ambiguity is removed,
 277 burdening the user that must someway declare what is in scope and that must
 278 use special syntax when she needs to refer to something not in scope.
 279
 280 Even with this approach ambiguity cannot be completely removed since implicit
 281 coercions can be arbitrarily inserted by the system to ``change the type''
 282 of subterms that do not have the expected type. Usually implicit coercions
 283 are used to overcome the absence of subtyping that should mimic the subset
 284 relation found in set theory. For instance, the expression
 285 $\forall n \in nat. 2 * n * \pi \equiv_\pi 0$ is correct in set theory since
 286 the set of natural numbers is a subset of that of real numbers; the
 287 corresponding expression $\forall n:nat. 2*n*\pi \equiv_\pi 0$ is not well typed
 288 and requires the automatic insertion of the coercion $real_of_nat: nat \to R$
 289 either around both 2 and $n$ (to make both products be on real numbers) or
 290 around the product $2*n$. The usual approach consists in either rejecting the
 291 ambiguous term or arbitrarily choosing one of the interpretations. For instance,
 292 Coq rejects the declaration of coercions that have alternatives
 293 (i.e. already declared coercions with the same domain and codomain)
 294 or that are obtained composing other coercions in order to
 295 avoid making several terms highly ambiguous by choosing to insert any one of the
 296 alternative coercions. Coq also arbitrarily chooses how to insert coercions in
 297 terms to make them well typed when there is more than one possibility (as in
 298 the previous example).
 299
 300 The approach we are following is radically different. It consists in dealing
 301 with ambiguous expressions instead of avoiding them. As a last resource,
 302 when the system is unable to disambiguate the input, the user is interactively
 303 required to provide more information that is recorded to avoid asking the
 304 same question again in subsequent processing of the same input.
 305 More details on our approach can be found in \ref{sec:disambiguation}.
 306
 307 \subsubsection{Consistency}
 308 A large mathematical library is likely to be logically inconsistent.
 309 It may contain incompatible axioms or alternative conjectures and it may
 310 even use definitions in incompatible ways. To clarify this last point,
 311 consider two identical definitions of a set of elements of a given type
 312 (or of a category of objects of a given type). Let us call the two definitions
 313 $A-Set$ and $B-Set$ (or $A-Category$ and $B-Category$).
 314 It is perfectly legitimate to either form the $A-Set$ of every $B-Set$
 315 or the $B-Set$ of every $A-Set$ (the same for categories). This just corresponds
 316 to assuming that a $B-Set$ (respectively an $A-Set$) is a small set, whereas
 317 an $A-Set$ (respectively a $B-Set$) is a big set (possibly of small sets).
 318 However, if one part of the library assumes $A-Set$s to be the small ones
 319 and another part of the library assumes $B-Set$s to be the small ones, the
 320 library as a whole will be logically inconsistent.
 321
 322 Logical inconsistency has never been a problem in the daily work of a
 323 mathematician. The mathematician simply imposes himself a discipline to
 324 restrict himself to consistent subsets of the mathematical knowledge.
 325 However, in doing so he doesn't choose the subset in advance by forgetting
 326 the rest of his knowledge.
 327
 328 Contrarily to a mathematician, the usual tendency in the world of assisted
 329 automation is that of restricting in advance the part of the library that
 330 will be used later on, checking its consistency by construction.
 331
 332 \subsection{ricerca e indicizzazione}
 333 \label{sec:metadata}
 334 \ASSIGNEDTO{andrea}
 335
 336 \subsection{auto}
 337 \ASSIGNEDTO{andrea}
 338
 339 \subsection{sostituzioni esplicite vs moduli}
 340 \ASSIGNEDTO{csc}
 341
 342 \subsection{xml / gestione della libreria}
 343 \ASSIGNEDTO{gares}
 344
 345
 346 \section{User Interface (da cambiare)}
 347
 348 \subsection{assenza di proof tree / resa in linguaggio naturale}
 349 \ASSIGNEDTO{andrea}
 350
 351 \subsection{Disambiguation}
 352 \label{sec:disambiguation}
 353 \ASSIGNEDTO{zack}
 354
 355 \subsubsection{Term input}
 356
 357 The primary form of user interaction employed by \MATITA{} is textual script
 358 editing: the user modifies it and evaluate step by step its composing
 359 \emph{statements}. Examples of statements are inductive type definitions,
 360 theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
 361 be used to ask the system to refine a given term and pretty print the result).
 362 Since many statements refer to terms of the underlying calculus, \MATITA{} needs
 363 a concrete syntax able to encode terms of the Calculus of Inductive
 364 Constructions.
 365
 366 Two of the requirements in the design of such a syntax are apparently in
 367 contrast:
 368 \begin{enumerate}
 369  \item the syntax should be as close as possible to common mathematical practice
 370   and implement widespread mathematical notations;
 371  \item each term described by the syntax should be non-ambiguous meaning that it
 372   should exists a function which associates to it a CIC term.
 373 \end{enumerate}
 374
 375 These two requirements are addressed in \MATITA{} by the mean of two mechanisms
 376 which work together: \emph{term disambiguation} and \emph{extensible notation}.
 377 Their interaction is visible in the architecture of the \MATITA{} input phase,
 378 depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
 379 pipline of three levels: the concrete syntax level (level 0) is the one the user
 380 has to deal with when inserting CIC terms; the abstract syntax level (level 2)
 381 is an internal representation which intuitively encodes mathematical formulae at
 382 the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
 383 CIC terms.
 384
 385 \begin{figure}[ht]
 386  \begin{center}
 387   \includegraphics[width=0.9\textwidth]{input_phase}
 388   \caption{\MATITA{} input phase}
 389  \end{center}
 390  \label{fig:inputphase}
 391 \end{figure}
 392
 393 Requirement (1) is addressed by a built-in concrete syntax for terms, described
 394 in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
 395 way for extending available mathematical notations. Extensible notation, which
 396 is also in charge of providing a parsing function mapping concrete syntax terms
 397 to content level terms, is described in Sect.~\ref{sec:notation}.  Requirement
 398 (2) is addressed by the conjunct action of that parsing function and
 399 disambiguation which provides a function from content level terms to CIC terms.
 400
 401 \subsubsection{Sources of ambiguity}
 402
 403 The translation from content level terms to CIC terms is not straightforward
 404 because some nodes of the content encoding admit more that one CIC encoding,
 405 invalidating requirement (2).
 406
 407 \begin{example}
 408  \label{ex:disambiguation}
 409
 410  Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
 411  ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
 412  user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
 413  as infix operators, all the following questions are legitimate and must be
 414  answered before obtaining a CIC term from its content level encoding
 415  (Fig.~\ref{fig:inputphase}(b)):
 416
 417  \begin{enumerate}
 418
 419   \item Since \texttt{ln} is an unbound identifier, which CIC constants does it
 420    represent? Many different theorems in the library may share its (rather
 421    short) name \dots
 422
 423   \item Which kind of number (\IN, \IR, \dots) the \texttt{1} literal stand for?
 424    Which encoding is used in CIC to represent it? E.g., assuming $1\in\IN$, is
 425    it an unary or a binary encoding?
 426
 427   \item Which kind of equality the ``='' node represents? Is it Leibniz's
 428    polymorhpic equality? Is it a decidable equality over \IN, \IR, \dots?
 429
 430  \end{enumerate}
 431
 432 \end{example}
 433
 434 In \MATITA, three \emph{sources of ambiguity} are admitted for content level
 435 terms: unbound identifiers, literal numbers, and operators. Each instance of
 436 ambiguity sources (ambiguous entity) occuring in a content level term is
 437 associated to a \emph{disambiguation domain}. Intuitively a disambiguation
 438 domain is a set of CIC terms which may be replaced for an ambiguous entity
 439 during disambiguation. Each item of the domain is said to be an
 440 \emph{interpretation} for the ambiguous entity.
 441
 442 \emph{Unbound identifiers} (question 1) are ambiguous entities since the
 443 namespace of CIC objects is not flat and the same identifier may denote many
 444 ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
 445 is shared by three different theorems stating the associative property of
 446 different additions.  This kind of ambiguity is avoidable if the user is willing
 447 to use long names (in form of URIs in the \texttt{cic://} scheme) in the
 448 concrete syntax, with the obvious drawbacks of obtaining long and unreadable
 449 terms.
 450
 451 Given an unbound identifier, the corresponding disambiguation domain is computed
 452 querying the library for all constants, inductive types, and inductive type
 453 constructors having it as their short name (see the \LOCATE{} query in
 454 Sect.~\ref{sec:metadata}).
 455
 456 \emph{Literal numbers} (question 2) are ambiguous entities as well since
 457 different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
 458 different encodings. Considering the restricted example of natural numbers we
 459 can for instance encode them in CIC using inductive datatypes with a number of
 460 constructor equal to the encoding base plus 1, obtaining one encoding for each
 461 base.
 462
 463 For each possible way of mapping a literal number to a CIC term, \MATITA{} is
 464 aware of a \emph{number intepretation function} which, when applied to the
 465 natural number denoted by the literal\footnote{at the moment only literal
 466 natural number are supported in the concrete syntax} returns a corresponding CIC
 467 term. The disambiguation domain for a given literal number is built applying to
 468 the literal all available number interpretation functions in turn.
 469
 470 Number interpretation functions can be defined in OCaml or directly using
 471 \TODO{notazione per i numeri}.
 472
 473 \emph{Operators} (question 3) are intuitively head of applications, as such they
 474 are always applied to a (possiblt empty) sequence of arguments. Their ambiguity
 475 is a need since it is often the case that some notation is used in an overloaded
 476 fashion to hide the use of different CIC constants which encodes similar
 477 concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
 478 notation is available building a binary \texttt{Op(+)} node, whose
 479 disambiguation domain may refer to different constants like the addition over
 480 natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
 481 the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
 482
 483 For each possible way of mapping an operator application to a CIC term,
 484 \MATITA{} knows an \emph{operator interpretation function} which, when applied
 485 to an operator and its arguments, returns a CIC term. The disambiguation domain
 486 for a given operator is built applying to the operator and its arguments all
 487 available operator interpretation functions in turn.
 488
 489 Operator interpretation functions could be added using the
 490 \texttt{interpretation} statement. For example, among the first line of the
 491 script \FILE{matita/library/logic/equality.ma} from the \MATITA{} standard
 492 library we read:
 493
 494 \begin{grafite}
 495 interpretation "leibnitz's equality"
 496  'eq x y =
 497    (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
 498 \end{grafite}
 499
 500 Evaluating it in \MATITA{} will add an operator interpretation function for the
 501 binary operator \texttt{eq} which expands to the CIC term on the right hand side
 502 of the statement. That CIC term can be written using only built-in concrete
 503 syntax, can contain no ambiguity source; still, it can refer to operator
 504 arguments bound on the left hand side and can contain implicit terms (denoted
 505 with \texttt{\_}) which will be expanded to fresh metavariables. The latter
 506 feature is used in the example above for the first argument of Leibniz's
 507 polymorhpic equality.
 508
 509 \subsubsection{Disambiguation algorithm}
 510
 511 A \emph{disambiguation algorithm} takes as input a content level term and return
 512 a fully determined CIC term. The key observation on which a disambiguation
 513 algorithm is based is that given a content level term with more than one sources
 514 of ambiguity, not all possible combination of interpretation lead to a typable
 515 CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
 516 interpretation of \texttt{ln} as a function from \IR to \IR and the
 517 interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
 518 of ``can't coexists'' in the disambiguation of \MATITA{} is defined on top of
 519 the \emph{refiner} for CIC terms described in~\cite{csc-phd}.
 520
 521 Briefly, a refiner is a function whose input is an \emph{incomplete CIC term}
 522 $t_1$ --- i.e. a term where metavariables occur (Sect.~\ref{sec:metavariables}
 523 --- and whose output is either:\NOTE{descrizione sommaria del refiner, pu\'o
 524 essere spostata altrove}
 525
 526 \begin{enumerate}
 527
 528  \item an incomplete CIC term $t_2$ where $t_2$ is a well-typed term obtained
 529   assigning a type to each metavariable in $t_1$ (in case of dependent types,
 530   instantiation of some of the metavariable occurring in $t_1$ may occur as
 531   well);
 532
 533  \item $\epsilon$, meaning that no well-typed term could be obtained via
 534   assignment of type to metavariable in $t_1$ and their instantiation;
 535
 536  \item $\bot$, meaning that the refiner is unable to decide whether of the two
 537   cases above apply (refinement is semi-decidable).
 538
 539 \end{enumerate}
 540
 541 On top of a CIC refiner \MATITA{} implement an efficient disambiguation
 542 algorithm, which is outlined below. It takes as input a content level term $c$
 543 and proceeds as follows:
 544
 545 \begin{enumerate}
 546
 547  \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(c)\}$, where
 548   $\mathit{Dom}(c)$ is the set of ambiguity sources of $c$. Each $D_i$ is a set
 549   of CIC terms and can be built as described above.
 550
 551  \item An \emph{interpretation} $\Phi$ for $c$ is a map associating an
 552   incomplete CIC term to each ambiguity source of $c$. Given $c$ and one of its
 553   interpretations an incomplete CIC term is fully determined replacing each
 554   ambiguity source of $c$ with its mapping in the interpretation and injecting
 555   the remaining structure of the content level in the CIC level (e.g. replacing
 556   the application of the content level with the application of the CIC level).
 557   This operation is informally called ``interpreting $c$ with $\Phi$''.
 558
 559   Create an initial interpretation $\Phi_0 = \{\phi_i | \phi_i = \_,
 560   i\in\mathit{Dom}(c)\}$, which associates a fresh metavariable to each source
 561   of ambiguity of $c$. During this step, implicit terms are expanded to fresh
 562   metavariables as well.
 563
 564  \item Refine the current incomplete CIC term (i.e.  the term obtained
 565   interpreting $t$ with $\Phi_i$).
 566
 567   If the refinement succeeds or is undetermined the next interpretation
 568   $\Phi_{i+1}$ will be created \emph{making a choice}, that is replacing in the
 569   current interpretation one of the metavariable appearing in $\Phi_i$ with one
 570   of the possible choice from the corresponding disambiguation domain. The
 571   metavariable to be replaced is chosen following a preorder visit of the
 572   ambiguous term. Then, step 3 is attempted again with the new interpretation.
 573
 574   If the refinement fails the current set of choices cannot lead to a well-typed
 575   term and backtracking of the current interpretation is attempted.
 576
 577  \item Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does
 578   no longer contain any placeholder), backtracking is attempted anyway to find
 579   the other correct interpretations.
 580
 581  \item Let $n$ be the number of interpretations who survived step 4. If $n=0$
 582   signal a type error. If $n=1$ we have found exactly one (incomplete) CIC term
 583   corresponding to the content level term $c$, returns it as output of the
 584   disambiguation phase. If $n>1$ we have found many different (incomplete) CIC
 585   terms which can correspond to the content level term, let the user choose one
 586   of the $n$ interpretations and returns the corresponding term.
 587
 588 \end{enumerate}
 589
 590 The efficiency of this algorithm resides in the fact that as soon as an
 591 incomplete CIC term is not typable, no further instantiation of the
 592 metavariables of the corresponding interpretation is attemped.
 593 % For example, during the disambiguation of the user input
 594 % \texttt{\TEXMACRO{forall} x. x*0 = 0}, an interpretation $\Phi_i$ is
 595 % encountered which associates $?$ to the instance of \texttt{0} on the right,
 596 % the real number $0$ to the instance of \texttt{0} on the left, and the
 597 % multiplication over natural numbers (\texttt{mult} for short) to \texttt{*}.
 598 % The refiner will fail, since \texttt{mult} require a natural argument, and no
 599 % further instantiation of the placeholder will be tried.
 600
 601 Details of the disambiguation algorithm along with an analysis of its complexity
 602 can be found in~\cite{disambiguation}, where a formulation without backtracking
 603 (corresponding to the actual \MATITA{} implementation) is also presented.
 604
 605 \subsubsection{Disambiguation stages}
 606
 607 \subsection{notazione}
 608 \label{sec:notation}
 609 \ASSIGNEDTO{zack}
 610
 611 \subsection{mathml}
 612 \ASSIGNEDTO{zack}
 613
 614 \subsection{selezione semantica, cut paste, hyperlink}
 615 \ASSIGNEDTO{zack}
 616
 617 \subsection{pattern}
 618 \ASSIGNEDTO{gares}\\
 619 Patterns are the textual counterpart of the MathML widget graphical
 620 selection.
 621
 622 Matita benefits of a graphical interface and a powerful MathML rendering
 623 widget that allows the user to select pieces of the sequent he is working
 624 on. While this is an extremely intuitive way for the user to
 625 restrict the application of tactics, for example, to some subterms of the
 626 conclusion or some hypothesis, the way this action is recorded to the text
 627 script is not obvious.\\
 628 In \MATITA{} this issue is addressed by patterns.
 629
 630 \subsubsection{Pattern syntax}
 631 A pattern is composed of two terms: a $\NT{sequent\_path}$ and a
 632 $\NT{wanted}$.
 633 The former mocks-up a sequent, discharging unwanted subterms with $?$ and
 634 selecting the interesting parts with the placeholder $\%$.
 635 The latter is a term that lives in the context of the placeholders.
 636
 637 The concrete syntax is reported in table \ref{tab:pathsyn}
 638 \NOTE{uso nomi diversi dalla grammatica ma che hanno + senso}
 639 \begin{table}
 640  \caption{\label{tab:pathsyn} Concrete syntax of \MATITA{} patterns.\strut}
 641 \hrule
 642 \[
 643 \begin{array}{@{}rcll@{}}
 644   \NT{pattern} &
 645     ::= & [~\verb+in match+~\NT{wanted}~]~[~\verb+in+~\NT{sequent\_path}~] & \\
 646   \NT{sequent\_path} &
 647     ::= & \{~\NT{ident}~[~\verb+:+~\NT{multipath}~]~\}~
 648       [~\verb+\vdash+~\NT{multipath}~] & \\
 649   \NT{wanted} & ::= & \NT{term} & \\
 650   \NT{multipath} & ::= & \NT{term\_with\_placeholders} & \\
 651 \end{array}
 652 \]
 653 \hrule
 654 \end{table}
 655
 656 \subsubsection{How patterns work}
 657 Patterns mimic the user's selection in two steps. The first one
 658 selects roots (subterms) of the sequent, using the
 659 $\NT{sequent\_path}$,  while the second
 660 one searches the $\NT{wanted}$ term starting from these roots. Both are
 661 optional steps, and by convention the empty pattern selects the whole
 662 conclusion.
 663
 664 \begin{description}
 665 \item[Phase 1]
 666   concerns only the $[~\verb+in+~\NT{sequent\_path}~]$
 667   part of the syntax. $\NT{ident}$ is an hypothesis name and
 668   selects the assumption where the following optional $\NT{multipath}$
 669   will operate. \verb+\vdash+ can be considered the name for the goal.
 670   If the whole pattern is omitted, the whole goal will be selected.
 671   If one or more hypotheses names are given the selection is restricted to
 672   these assumptions. If a $\NT{multipath}$ is omitted the whole
 673   assumption is selected. Remember that the user can be mostly
 674   unaware of this syntax, since the system is able to write down a
 675   $\NT{sequent\_path}$ starting from a visual selection.
 676   \NOTE{Questo ancora non va in matita}
 677
 678   A $\NT{multipath}$ is a CiC term in which a special constant $\%$
 679   is allowed.
 680   The roots of discharged subterms are marked with $?$, while $\%$
 681   is used to select roots. The default $\NT{multipath}$, the one that
 682   selects the whole term, is simply $\%$.
 683   Valid $\NT{multipath}$ are, for example, $(?~\%~?)$ or $\%~\verb+\to+~(\%~?)$
 684   that respectively select the first argument of an application or
 685   the source of an arrow and the head of the application that is
 686   found in the arrow target.
 687
 688   The first phase selects not only terms (roots of subterms) but also
 689   their context that will be eventually used in the second phase.
 690
 691 \item[Phase 2]
 692   plays a role only if the $[~\verb+in match+~\NT{wanted}~]$
 693   part is specified. From the first phase we have some terms, that we
 694   will see as subterm roots, and their context. For each of these
 695   contexts the $\NT{wanted}$ term is disambiguated in it and the
 696   corresponding root is searched for a subterm $\alpha$-equivalent to
 697   $\NT{wanted}$. The result of this search is the selection the
 698   pattern represents.
 699
 700 \end{description}
 701
 702 \noindent
 703 Since the first step is equipotent to the composition of the two
 704 steps, the system uses it to represent each visual selection.
 705 The second step is only meant for the
 706 experienced user that writes patterns by hand, since it really
 707 helps in writing concise patterns as we will see in the
 708 following examples.
 709
 710 \subsubsection{Examples}
 711 To explain how the first step works let's give an example. Consider
 712 you want to prove the uniqueness of the identity element $0$ for natural
 713 sum, and that you can relay on the previously demonstrated left
 714 injectivity of the sum, that is $inj\_plus\_l:\forall x,y,z.x+y=z+y \to x =z$.
 715 Typing
 716 \begin{grafite}
 717 theorem valid_name: \forall n,m. m + n = n \to m = O.
 718   intros (n m H).
 719 \end{grafite}
 720 \noindent
 721 leads you to the following sequent
 722 \sequent{
 723 n:nat\\
 724 m:nat\\
 725 H: m + n = n}{
 726 m=O
 727 }
 728 \noindent
 729 where you want to change the right part of the equivalence of the $H$
 730 hypothesis with $O + n$ and then use $inj\_plus\_l$ to prove $m=O$.
 731 \begin{grafite}
 732   change in H:(? ? ? %) with (O + n).
 733 \end{grafite}
 734 \noindent
 735 This pattern, that is a simple instance of the $\NT{sequent\_path}$
 736 grammar entry, acts on $H$ that has type (without notation) $(eq~nat~(m+n)~n)$
 737 and discharges the head of the application and the first two arguments with a
 738 $?$ and selects the last argument with $\%$. The syntax may seem uncomfortable,
 739 but the user can simply select with the mouse the right part of the equivalence
 740 and left to the system the burden of writing down in the script file the
 741 corresponding pattern with $?$ and $\%$ in the right place (that is not
 742 trivial, expecially where implicit arguments are hidden by the notation, like
 743 the type $nat$ in this example).
 744
 745 Changing all the occurrences of $n$ in the hypothesis $H$ with $O+n$
 746 works too and can be done, by the experienced user, writing directly
 747 a simpler pattern that uses the second phase.
 748 \begin{grafite}
 749   change in match n in H with (O + n).
 750 \end{grafite}
 751 \noindent
 752 In this case the $\NT{sequent\_path}$ selects the whole $H$, while
 753 the second phase searches the wanted $n$ inside it by
 754 $\alpha$-equivalence. The resulting
 755 equivalence will be $m+(O+n)=O+n$ since the second phase found two
 756 occurrences of $n$ in $H$ and the tactic changed both.
 757
 758 Just for completeness the second pattern is equivalent to the
 759 following one, that is less readable but uses only the first phase.
 760 \begin{grafite}
 761   change in H:(? ? (? ? %) %) with (O + n).
 762 \end{grafite}
 763 \noindent
 764
 765 \subsubsection{Tactics supporting patterns}
 766 In \MATITA{} all the tactics that can be restricted to subterm of the working
 767 sequent accept the pattern syntax. In particular these tactics are: simplify,
 768 change, fold, unfold, generalize, replace and rewrite.
 769
 770 \NOTE{attualmente rewrite e fold non supportano phase 2. per
 771 supportarlo bisogna far loro trasformare il pattern phase1+phase2
 772 in un pattern phase1only come faccio nell'ultimo esempio. lo si fa
 773 con una pattern\_of(select(pattern))}
 774
 775 \subsubsection{Comparison with Coq}
 776 Coq has a two diffrent ways of restricting the application of tactis to
 777 subterms of the sequent, both relaying on the same special syntax to identify
 778 a term occurrence.
 779
 780 The first way is to use this special syntax to specify directly to the
 781 tactic the occurrnces of a wanted term that should be affected, while
 782 the second is to prepare the sequent with another tactic called
 783 pattern and the apply the real tactic. Note that the choice is not
 784 left to the user, since some tactics needs the sequent to be prepared
 785 with pattern and do not accept directly this special syntax.
 786
 787 The base idea is that to identify a subterm of the sequent we can
 788 write it and say that we want, for example, the third and the fifth
 789 occurce of it (counting from left to right). In our previous example,
 790 to change only the left part of the equivalence, the correct command
 791 is
 792 \begin{grafite}
 793   change n at 2 in H with (O + n)
 794 \end{grafite}
 795 \noindent
 796 meaning that in the hypothesis $H$ the $n$ we want to change is the
 797 second we encounter proceeding from left toright.
 798
 799 The tactic pattern computes a
 800 $\beta$-expansion of a part of the sequent with respect to some
 801 occurrences of the given term. In the previous example the following
 802 command
 803 \begin{grafite}
 804   pattern n at 2 in H
 805 \end{grafite}
 806 \noindent
 807 would have resulted in this sequent
 808 \begin{grafite}
 809   n : nat
 810   m : nat
 811   H : (fun n0 : nat => m + n = n0) n
 812   ============================
 813    m = 0
 814 \end{grafite}
 815 \noindent
 816 where $H$ is $\beta$-expanded over the second $n$
 817 occurrence. This is a trick to make the unification algorithm ignore
 818 the head of the application (since the unification is essentially
 819 first-order) but normally operate on the arguments.
 820 This works for some tactics, like rewrite and replace,
 821 but for example not for change and other tactics that do not relay on
 822 unification.
 823
 824 The idea behind this way of identifying subterms in not really far
 825 from the idea behind patterns, but really fails in extending to
 826 complex notation, since it relays on a mono-dimensional sequent representation.
 827 Real math notation places arguments upside-down (like in indexed sums or
 828 integrations) or even puts them inside a bidimensional matrix.
 829 In these cases using the mouse to select the wanted term is probably the
 830 only way to tell the system exactly what you want to do.
 831
 832 One of the goals of \MATITA{} is to use modern publishing techiques, and
 833 adopting a method for restricting tactics application domain that discourages
 834 using heavy math notation, would definitively be a bad choice.
 835
 836 \subsection{tatticali}
 837 \ASSIGNEDTO{gares}
 838 \begin{verbatim}
 839 - problemi principali dei PA basati su tattiche
 840  o illeggibilita' dello script
 841  o scarsa strutturazione dello script e mantenibilita'
 842    - ameliorate by tacticals
 843 - intro sui tatticali esistenti
 844 - peculiarita' di matita
 845   - passo passo
 846   - nice handling of sideeffects (see later on metavariables)
 847 \end{verbatim}
 848
 849 \subsection{named variable e disambiguazione lazy}
 850 \ASSIGNEDTO{csc}
 851
 852 \subsection{metavariabili}
 853 \label{sec:metavariables}
 854 \ASSIGNEDTO{csc}
 855
 856 \begin{verbatim}
 857
 858 \end{verbatim}
 859
 860 \section{Drawbacks, missing, \dots}
 861
 862 \subsection{moduli}
 863 \ASSIGNEDTO{}
 864
 865 \subsection{ltac}
 866 \ASSIGNEDTO{}
 867
 868 \subsection{estrazione}
 869 \ASSIGNEDTO{}
 870
 871 \subsection{localizzazione errori}
 872 \ASSIGNEDTO{}
 873
 874 \acknowledgements
 875 We would like to thank all the students that during the past
 876 five years collaborated in the \HELM{} project and contributed to
 877 the development of Matita, and in particular
 878 A.Griggio, F.Guidi, P. Di Lena, L.Padovani, I.Schena, M.Selmi,
 879 V.Tamburrelli.
 880
 881 \theendnotes
 882
 883 \bibliography{matita}
 884
 885
 886 \end{document}
 887