helm/papers/matita/matita.tex

   1 \documentclass{kluwer}
   2 \usepackage{color}
   3 \usepackage{graphicx}
   4 % \usepackage{amssymb,amsmath}
   5 \usepackage{hyperref}
   6 % \usepackage{picins}
   7 \usepackage{color}
   8 \usepackage{fancyvrb}
   9 \usepackage[show]{ed}
  10
  11 \definecolor{gray}{gray}{0.85}
  12 %\newcommand{\logo}[3]{
  13 %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
  14 %}
  15
  16 \newcommand{\AUTO}{\textsc{Auto}}
  17 \newcommand{\COQ}{Coq}
  18 \newcommand{\ELIM}{\textsc{Elim}}
  19 \newcommand{\HELM}{Helm}
  20 \newcommand{\HINT}{\textsc{Hint}}
  21 \newcommand{\IN}{\ensuremath{\mathbb{N}}}
  22 \newcommand{\INSTANCE}{\textsc{Instance}}
  23 \newcommand{\IR}{\ensuremath{\mathbb{R}}}
  24 \newcommand{\IZ}{\ensuremath{\mathbb{Z}}}
  25 \newcommand{\LIBXSLT}{LibXSLT}
  26 \newcommand{\LOCATE}{\textsc{Locate}}
  27 \newcommand{\MATCH}{\textsc{Match}}
  28 \newcommand{\MATITA}{Matita}
  29 \newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline}
  30 \newcommand{\MOWGLI}{MoWGLI}
  31 \newcommand{\NAT}{\ensuremath{\mathit{nat}}}
  32 \newcommand{\NATIND}{\mathit{nat\_ind}}
  33 \newcommand{\NUPRL}{NuPRL}
  34 \newcommand{\OCAML}{OCaml}
  35 \newcommand{\PROP}{\mathit{Prop}}
  36 \newcommand{\REF}[3]{\ensuremath{\mathit{Ref}_{#1}(#2,#3)}}
  37 \newcommand{\TEXMACRO}[1]{\texttt{\char92 #1}}
  38 \newcommand{\UWOBO}{UWOBO}
  39 \newcommand{\WHELP}{Whelp}
  40
  41 \definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black
  42 \newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
  43 \newcommand{\URI}[1]{\texttt{#1}}
  44
  45 %{\end{SaveVerbatim}\setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{2mm}%
  46 \newenvironment{grafite}{\VerbatimEnvironment
  47  \begin{SaveVerbatim}{boxtmp}}%
  48  {\end{SaveVerbatim}\setlength{\fboxsep}{3mm}%
  49   \begin{center}
  50    \fcolorbox{black}{gray}{\BUseVerbatim[boxwidth=0.9\linewidth]{boxtmp}}
  51   \end{center}}
  52
  53 \newcounter{example}
  54 \newenvironment{example}{\stepcounter{example}\emph{Example} \arabic{example}.}
  55  {}
  56 \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
  57 \newcommand{\FILE}[1]{\texttt{#1}}
  58 % \newcommand{\NOTE}[1]{\ifodd \arabic{page} \else \hspace{-2cm}\fi\ednote{#1}}
  59 \newcommand{\NOTE}[1]{\ednote{x~\hspace{-10cm}y#1}{foo}}
  60 \newcommand{\TODO}[1]{\textbf{TODO: #1}}
  61
  62 \newsavebox{\tmpxyz}
  63 \newcommand{\sequent}[2]{
  64   \savebox{\tmpxyz}[0.9\linewidth]{
  65     \begin{minipage}{0.9\linewidth}
  66       \ensuremath{#1} \\
  67       \rule{3cm}{0.03cm}\\
  68       \ensuremath{#2}
  69     \end{minipage}}\setlength{\fboxsep}{3mm}%
  70   \begin{center}
  71    \fcolorbox{black}{gray}{\usebox{\tmpxyz}}
  72   \end{center}}
  73
  74 \bibliographystyle{plain}
  75
  76 \begin{document}
  77
  78 \begin{opening}
  79
  80 \title{The Matita proof assistant}
  81 \runningtitle{The Matita proof assistant}
  82
  83 \author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
  84  and Stefano Zacchiroli}
  85 \runningauthor{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
  86  and Stefano Zacchiroli}
  87
  88 \institute{Department of Computer Science, University of Bologna\\
  89  Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY\\
  90  \email{$\{$asperti,sacerdot,tassi,zacchiro$\}$@cs.unibo.it}}
  91
  92 \date{data}
  93
  94 \begin{abstract}
  95  abstract qui
  96 \end{abstract}
  97
  98 \keywords{parole, chiave}
  99
 100 \end{opening}
 101
 102 \section{Introduction}
 103 \label{sec:intro}
 104 {\em Matita} is the proof assistant under development by the \HELM{} team
 105 \cite{mkm-helm} at the University of Bologna, under the direction of
 106 Prof.~Asperti.
 107 The origin of the system goes back to 1999. At the time we were mostly
 108 interested to develop tools and techniques to enhance the accessibility
 109 via web of formal libraries of mathematics. Due to its dimension, the
 110 library of the \COQ{} proof assistant (of the order of 35'000 theorems)
 111 was choosed as a privileged test bench for our work, although experiments
 112 have been also conducted with other systems, and notably with \NUPRL{}.
 113 The work, mostly performed in the framework of the recently concluded
 114 European project IST-33562 \MOWGLI{}~\cite{pechino}, mainly consisted in the
 115 following teps:
 116 \begin{itemize}
 117 \item exporting the information from the internal representation of
 118  \COQ{} to a system and platform independent format. Since XML was at the
 119 time an emerging standard, we naturally adopted this technology, fostering
 120 a content-based architecture for future system, where the documents
 121 of the library were the the main components around which everything else
 122 has to be build;
 123 \item developing indexing and searching techniques supporting semantic
 124  queries to the library; these efforts gave birth to our \WHELP{}
 125 search engine, described in~\cite{whelp};
 126 \item developing languages and tools for a high-quality notational
 127 rendering of mathematical information; in particular, we have been
 128 active in the MathML Working group since 1999, and developed inside
 129 \HELM{} a MathML-compliant widget for the GTK graphical environment
 130 which can be integrated in any application.
 131 \end{itemize}
 132 The exportation issue, extensively discussed in \cite{exportation-module},
 133 has several major implications worth to be discussed.
 134
 135 The first
 136 point concerns the kind of content information to be exported. In a
 137 proof assistant like \COQ{}, proofs are represented in at least three clearly
 138 distinguishable formats: \emph{scripts} (i.e. sequences of commands issued by the
 139 user to the system during an interactive session of proof), \emph{proof objects}
 140 (which is the low-level representation of proofs in the form of
 141 lambda-terms readable to and checked by kernel) and \emph{proof-trees} (which
 142 is a kind of intermediate representation, vaguely inspired by a sequent
 143 like notation, that inherits most of the defects but essentially
 144 none of the advantages of the previous representations).
 145 Partially related to this problem, there is the
 146 issue of the {\em granularity} of the library: scripts usually comprise
 147 small developments with many definitions and theorems, while
 148 proof objects correspond to individual mathematical items.
 149
 150 In our case, the choice of the content encoding was eventually dictated
 151 by the methodological assumption of offering the information in a
 152 stable and system-independent format. The language of scripts is too
 153 oriented to \COQ, and it changes too rapidly to be of any interest
 154 to third parties. On the other side, the language of proof objects
 155 merely depend on
 156 the logical framework (the Calculus of Inductive Constructions, in
 157 the case of \COQ), is grammatically simple, semantically clear and,
 158 especially, is very stable (as kernels of proof assistants
 159 often are).
 160 So the granularity of the library is at the level of individual
 161 objects, that also justifies from another point of view the need
 162 for efficient searching techniques for retrieving individual
 163 logical items from the repository.
 164
 165 The main (possibly only) problem with proof objects is that they are
 166 difficult to read and do not directly correspond to what the user typed
 167 in. An analogy frequently made in the proof assistant community is that of
 168 comparing the vernacular language of scripts to a high level source language
 169 and lambda terms to the assembly language they are compiled in. We do not
 170 share this view and prefer to look at scripts as an imperative language,
 171 and to lambda terms as their denotational semantics; still, however,
 172 denotational semantics is possibly more formal but surely not more readable
 173 than the imperative source.
 174
 175 For all the previous reasons, a huge amount of work inside \MOWGLI{} has
 176 been devoted to automatic reconstruction of proofs in natural language
 177 from lambda terms. Since lambda terms are in close connection
 178 with natural deduction
 179 (that is still the most natural logical language discovered so far)
 180 the work is not hopeless as it may seem, especially if rendering
 181 is combined, as in our case, with dynamic features supporting
 182 in-line expansions or contractions of subproofs. The final
 183 rendering is probably not entirely satisfactory (see \cite{ida} for a
 184 discussion), but surely
 185 readable (the actual quality largely depends by the way the lambda
 186 term is written).
 187
 188 Summing up, we already disposed of the following tools/techniques:
 189 \begin{itemize}
 190 \item XML specifications for the Calculus of Inductive Constructions,
 191 with tools for parsing and saving mathematical objects in such a format;
 192 \item metadata specifications and tools for indexing and querying the
 193 XML knowledge base;
 194 \item a proof checker (i.e. the {\em kernel} of a proof assistant),
 195  implemented to check that we exported form the \COQ{} library all the
 196 logically relevant content;
 197 \item a sophisticated parser (used by the search engine), able to deal
 198 with potentially ambiguous and incomplete information, typical of the
 199 mathematical notation \cite{};
 200 \item a {\em refiner}, i.e. a type inference system, based on complex
 201 existential variables, used by the disambiguating parser;
 202 \item complex transformation algorithms for proof rendering in natural
 203 language;
 204 \item an innovative rendering widget, supporting high-quality bidimensional
 205 rendering, and semantic selection, i.e. the possibility to select semantically
 206 meaningful rendering expressions, and to past the respective content into
 207 a different text area.
 208 \NOTE{il widget\\ non ha sel\\ semantica}
 209 \end{itemize}
 210 Starting from all this, the further step of developing our own
 211 proof assistant was too
 212 small and too tempting to be neglected. Essentially, we ``just'' had to
 213 add an authoring interface, and a set of functionalities for the
 214 overall management of the library, integrating everything into a
 215 single system. \MATITA{} is the result of this effort.
 216
 217 At first sight, \MATITA{} looks as (and partly is) a \COQ{} clone. This is
 218 more the effect of the circumstances of its creation described
 219 above than the result of a deliberate design. In particular, we
 220 (essentially) share the same foundational dialect of \COQ{} (the
 221 Calculus of Inductive Constructions), the same implementative
 222 language (\OCAML{}), and the same (script based) authoring philosophy.
 223 However, as we shall see, the analogy essentially stops here.
 224
 225 In a sense; we like to think of \MATITA{} as the way \COQ{} would
 226 look like if entirely rewritten from scratch: just to give an
 227 idea, although \MATITA{} currently supports almost all functionalities of
 228 \COQ{}, it links 60'000 lins of \OCAML{} code, against ... of \COQ{} (and
 229 we are convinced that, starting from scratch again, we could furtherly
 230 reduce our code in sensible way).\NOTE{righe\\\COQ{}}
 231
 232 \begin{itemize}
 233  \item scelta del sistema fondazionale
 234  \item sistema indipendente (da Coq)
 235   \begin{itemize}
 236    \item possibilit\`a di sperimentare (soluzioni architetturali, logiche,
 237     implementative, \dots)
 238    \item compatibilit\`a con sistemi legacy
 239   \end{itemize}
 240 \end{itemize}
 241
 242 \section{\HELM{} library(??)}
 243
 244 \subsection{libreria tutta visibile}
 245 \ASSIGNEDTO{csc}
 246
 247 \subsection{ricerca e indicizzazione}
 248 \label{sec:metadata}
 249 \ASSIGNEDTO{andrea}
 250
 251 \subsection{auto}
 252 \ASSIGNEDTO{andrea}
 253
 254 \subsection{sostituzioni esplicite vs moduli}
 255 \ASSIGNEDTO{csc}
 256
 257 \subsection{xml / gestione della libreria}
 258 \ASSIGNEDTO{gares}
 259
 260
 261 \section{User Interface (da cambiare)}
 262
 263 \subsection{assenza di proof tree / resa in linguaggio naturale}
 264 \ASSIGNEDTO{andrea}
 265
 266 \subsection{Disambiguation}
 267 \label{sec:disambiguation}
 268 \ASSIGNEDTO{zack}
 269
 270 \subsubsection{Term input}
 271
 272 The primary form of user interaction employed by \MATITA{} is textual script
 273 editing: the user modifies it and evaluate step by step its composing
 274 \emph{statements}. Examples of statements are inductive type definitions,
 275 theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
 276 be used to ask the system to refine a given term and pretty print the result).
 277 Since many statements refer to terms of the underlying calculus, \MATITA{} needs
 278 a concrete syntax able to encode terms of the Calculus of Inductive
 279 Constructions.
 280
 281 Two of the requirements in the design of such a syntax are apparently in
 282 contrast:
 283 \begin{enumerate}
 284  \item the syntax should be as close as possible to common mathematical practice
 285   and implement widespread mathematical notations;
 286  \item each term described by the syntax should be non-ambiguous meaning that it
 287   should exists a function which associates to it a CIC term.
 288 \end{enumerate}
 289
 290 These two requirements are addressed in \MATITA{} by the mean of two mechanisms
 291 which work together: \emph{term disambiguation} and \emph{extensible notation}.
 292 Their interaction is visible in the architecture of the \MATITA{} input phase,
 293 depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
 294 pipline of three levels: the concrete syntax level (level 0) is the one the user
 295 has to deal with when inserting CIC terms; the abstract syntax level (level 2)
 296 is an internal representation which intuitively encodes mathematical formulae at
 297 the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
 298 CIC terms.
 299
 300 \begin{figure}[ht]
 301  \begin{center}
 302   \includegraphics[width=0.9\textwidth]{input_phase}
 303   \caption{\MATITA{} input phase}
 304  \end{center}
 305  \label{fig:inputphase}
 306 \end{figure}
 307
 308 Requirement (1) is addressed by a built-in concrete syntax for terms, described
 309 in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
 310 way for extending available mathematical notations. Extensible notation, which
 311 is also in charge of providing a parsing function mapping concrete syntax terms
 312 to content level terms, is described in Sect.~\ref{sec:notation}.  Requirement
 313 (2) is addressed by the conjunct action of that parsing function and
 314 disambiguation which provides a function from content level terms to CIC terms.
 315
 316 \subsubsection{Sources of ambiguity}
 317
 318 The translation from content level terms to CIC terms is not straightforward
 319 because some nodes of the content encoding admit more that one CIC encoding,
 320 invalidating requirement (2).
 321
 322 \begin{example}
 323  \label{ex:disambiguation}
 324
 325  Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
 326  ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
 327  user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
 328  as infix operators, all the following questions are legitimate and must be
 329  answered before obtaining a CIC term from its content level encoding
 330  (Fig.~\ref{fig:inputphase}(b)):
 331
 332  \begin{enumerate}
 333
 334   \item Since \texttt{ln} is an unbound identifier, which CIC constants does it
 335    represent? Many different theorems in the library may share its (rather
 336    short) name \dots
 337
 338   \item Which kind of number (\IN, \IR, \dots) the \texttt{1} literal stand for?
 339    Which encoding is used in CIC to represent it? E.g., assuming $1\in\IN$, is
 340    it an unary or a binary encoding?
 341
 342   \item Which kind of equality the ``='' node represents? Is it Leibniz's
 343    polymorhpic equality? Is it a decidable equality over \IN, \IR, \dots?
 344
 345  \end{enumerate}
 346
 347 \end{example}
 348
 349 In \MATITA, three \emph{sources of ambiguity} are admitted for content level
 350 terms: unbound identifiers, literal numbers, and operators. Each instance of
 351 ambiguity sources (ambiguous entity) occuring in a content level term is
 352 associated to a \emph{disambiguation domain}. Intuitively a disambiguation
 353 domain is a set of CIC terms which may be replaced for an ambiguous entity
 354 during disambiguation. Each item of the domain is said to be an
 355 \emph{interpretation} for the ambiguous entity.
 356
 357 \emph{Unbound identifiers} (question 1) are ambiguous entities since the
 358 namespace of CIC objects is not flat and the same identifier may denote many
 359 ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
 360 is shared by three different theorems stating the associative property of
 361 different additions.  This kind of ambiguity is avoidable if the user is willing
 362 to use long names (in form of URIs in the \texttt{cic://} scheme) in the
 363 concrete syntax, with the obvious drawbacks of obtaining long and unreadable
 364 terms.
 365
 366 Given an unbound identifier, the corresponding disambiguation domain is computed
 367 querying the library for all constants, inductive types, and inductive type
 368 constructors having it as their short name (see the \LOCATE{} query in
 369 Sect.~\ref{sec:metadata}).
 370
 371 \emph{Literal numbers} (question 2) are ambiguous entities as well since
 372 different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
 373 different encodings. Considering the restricted example of natural numbers we
 374 can for instance encode them in CIC using inductive datatypes with a number of
 375 constructor equal to the encoding base plus 1, obtaining one encoding for each
 376 base.
 377
 378 For each possible way of mapping a literal number to a CIC term, \MATITA{} is
 379 aware of a \emph{number intepretation function} which, when applied to the
 380 natural number denoted by the literal\footnote{at the moment only literal
 381 natural number are supported in the concrete syntax} returns a corresponding CIC
 382 term. The disambiguation domain for a given literal number is built applying to
 383 the literal all available number interpretation functions in turn.
 384
 385 Number interpretation functions can be defined in OCaml or directly using
 386 \TODO{notazione per i numeri}.
 387
 388 \emph{Operators} (question 3) are intuitively head of applications, as such they
 389 are always applied to a (possiblt empty) sequence of arguments. Their ambiguity
 390 is a need since it is often the case that some notation is used in an overloaded
 391 fashion to hide the use of different CIC constants which encodes similar
 392 concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
 393 notation is available building a binary \texttt{Op(+)} node, whose
 394 disambiguation domain may refer to different constants like the addition over
 395 natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
 396 the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
 397
 398 For each possible way of mapping an operator application to a CIC term,
 399 \MATITA{} knows an \emph{operator interpretation function} which, when applied
 400 to an operator and its arguments, returns a CIC term. The disambiguation domain
 401 for a given operator is built applying to the operator and its arguments all
 402 available operator interpretation functions in turn.
 403
 404 Operator interpretation functions could be added using the
 405 \texttt{interpretation} statement. For example, among the first line of the
 406 script \FILE{matita/library/logic/equality.ma} from the \MATITA{} standard
 407 library we read:
 408
 409 \begin{grafite}
 410 interpretation "leibnitz's equality"
 411  'eq x y =
 412    (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
 413 \end{grafite}
 414
 415 Evaluating it in \MATITA{} will add an operator interpretation function for the
 416 binary operator \texttt{eq} which expands to the CIC term on the right hand side
 417 of the statement. That CIC term can be written using only built-in concrete
 418 syntax, can contain no ambiguity source; still, it can refer to operator
 419 arguments bound on the left hand side and can contain implicit terms (denoted
 420 with \texttt{\_}) which will be expanded to fresh metavariables. The latter
 421 feature is used in the example above for the first argument of Leibniz's
 422 polymorhpic equality.
 423
 424 \subsubsection{Disambiguation algorithm}
 425
 426 \NOTE{assumo\\
 427       che si sia\\
 428       gia' parlato\\
 429       di refine}
 430
 431 A \emph{disambiguation algorithm} takes as input a content level term and return
 432 a fully determined CIC term. The key observation on which a disambiguation
 433 algorithm is based is that given a content level term with more than one sources
 434 of ambiguity, not all possible combination of interpretation lead to a typable
 435 CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
 436 interpretation of \texttt{ln} as a function from \IR to \IR and the
 437 interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
 438 of ``can't coexists'' in the disambiguation of \MATITA{} is inherited from the
 439 refiner described in Sect.~\ref{sec:metavariables}: as long as
 440 $\mathit{refine}(c)\neq\epsilon$, the combination of interpretation which led to
 441 $c$
 442 can coexists.
 443
 444 The \emph{naive disambiguation algorithm} takes as input a content level term
 445 $t$ and proceeds as follows:
 446
 447 \begin{enumerate}
 448
 449  \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
 450   $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
 451   of CIC terms and can be built as described above.
 452
 453  \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ be an
 454   interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC term is
 455   fully determined. Iterate over all possible interpretations of $t$ and refine
 456   the corresponding CIC terms, keep only interpretations which lead to CIC terms
 457   $c$ s.t. $\mathit{refine}(c)\neq\epsilon$ (i.e. interpretations that determine
 458   typable terms).
 459
 460  \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
 461   signal a type error. If $n=1$ we have found exactly one CIC term corresponding
 462   to $t$, returns it as output of the disambiguation phase. If $n>1$ we have
 463   found many different CIC terms which can correspond to the content level term,
 464   let the user choose one of the $n$ interpretations and returns the
 465   corresponding term.
 466
 467 \end{enumerate}
 468
 469 The above algorithm is highly inefficient since the number of possible
 470 interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
 471 The actual algorithm used in \MATITA{} is far more efficient being, in the
 472 average case, linear in the number of ambiguity sources.
 473
 474 The efficient algorithm --- thoroughly described along with an analysis of its
 475 complexity in~\cite{disambiguation} --- exploit the refiner and the metavariable
 476 extension (Sect.~\ref{sec:metavariables}) of the calculus used in \MATITA.
 477
 478 \TODO{FINQUI}
 479
 480 The efficient algorithm can be applied if the logic can be extended with
 481 metavariables and a refiner can be implemented. This is the case for CIC and
 482 several other logics.
 483 \emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can
 484 occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes
 485 a freshly created metavariable. A \emph{refiner}~\cite{McBride} is a
 486 function whose input is a term with placeholders and whose output is either a
 487 new term obtained instantiating some placeholder or $\epsilon$, meaning that no
 488 well typed instantiation could be found for the placeholders occurring in
 489 the term (type error).
 490
 491 The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i |
 492 \phi_i = ?, i\in\mathit{Dom}(t)\}$,
 493 which associates a fresh metavariable to each
 494 source of ambiguity. Then it iterates refining the current CIC term (i.e. the
 495 term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the
 496 next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is
 497 replacing a placeholder with one of the possible choice from the corresponding
 498 disambiguation domain. The placeholder to be replaced is chosen following a
 499 preorder visit of the ambiguous term. If the refinement fails the current set of
 500 choices cannot lead to a well-typed term and backtracking is attempted.
 501 Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no
 502 longer contain any placeholder), backtracking is attempted
 503 anyway to find the other correct interpretations.
 504
 505 The intuition which explain why this algorithm is more efficient is that as soon
 506 as a term containing placeholders is not typable, no further instantiation of
 507 its placeholders could lead to a typable term. For example, during the
 508 disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an
 509 interpretation $\Phi_i$ is encountered which associates $?$ to the instance
 510 of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on
 511 the left, and the multiplication over natural numbers (\texttt{mult} for short)
 512 to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural
 513 argument, and no further instantiation of the placeholder will be tried.
 514
 515 If, at the end of the disambiguation, more than one possible interpretations are
 516 possible, the user will be asked to choose the intended one (see
 517 Fig.~\ref{fig:disambiguation}).
 518
 519 \begin{figure}[htb]
 520 %   \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}}
 521   \caption{\label{fig:disambiguation} Disambiguation: interpretation choice}
 522 \end{figure}
 523
 524 Details of the disambiguation algorithm of \WHELP{} can
 525 be found in~\cite{disambiguation}, where an equivalent algorithm
 526 that avoids backtracking is also presented.
 527
 528
 529 \subsection{notazione}
 530 \label{sec:notation}
 531 \ASSIGNEDTO{zack}
 532
 533 \subsection{mathml}
 534 \ASSIGNEDTO{zack}
 535
 536 \subsection{selezione semantica, cut paste, hyperlink}
 537 \ASSIGNEDTO{zack}
 538
 539 \subsection{pattern}
 540 \ASSIGNEDTO{gares}\\
 541 Patterns are the textual counterpart of the MathML widget graphical
 542 selection.
 543
 544 Matita benefits of a graphical interface and a powerful MathML rendering
 545 widget that allows the user to select pieces of the sequent he is working
 546 on. While this is an extremely intuitive way for the user to
 547 restrict the application of tactics, for example, to some subterms of the
 548 conclusion or some hypothesis, the way this action is recorded to the text
 549 script is not obvious.\\
 550 In \MATITA{} this issue is addressed by patterns.
 551
 552 \subsubsection{Pattern syntax}
 553 A pattern is composed of two terms: a $\NT{sequent\_path}$ and a
 554 $\NT{wanted}$.
 555 The former mocks-up a sequent, discharging unwanted subterms with $?$ and
 556 selecting the interesting parts with the placeholder $\%$.
 557 The latter is a term that lives in the context of the placeholders.
 558
 559 The concrete syntax is reported in table \ref{tab:pathsyn}
 560 \NOTE{uso nomi diversi \\dalla grammatica \\ ma che hanno + senso}
 561 \begin{table}
 562  \caption{\label{tab:pathsyn} Concrete syntax of \MATITA{} patterns.\strut}
 563 \hrule
 564 \[
 565 \begin{array}{@{}rcll@{}}
 566   \NT{pattern} &
 567     ::= & [~\verb+in match+~\NT{wanted}~]~[~\verb+in+~\NT{sequent\_path}~] & \\
 568   \NT{sequent\_path} &
 569     ::= & \{~\NT{ident}~[~\verb+:+~\NT{multipath}~]~\}~
 570       [~\verb+\vdash+~\NT{multipath}~] & \\
 571   \NT{wanted} & ::= & \NT{term} & \\
 572   \NT{multipath} & ::= & \NT{term\_with\_placeholders} & \\
 573 \end{array}
 574 \]
 575 \hrule
 576 \end{table}
 577
 578 \subsubsection{How patterns work}
 579 Patterns mimic the user's selection in two steps. The first one
 580 selects roots (subterms) of the sequent, using the
 581 $\NT{sequent\_path}$,  while the second
 582 one searches the $\NT{wanted}$ term starting from these roots. Both are
 583 optional steps, and by convention the empty pattern selects the whole
 584 conclusion.
 585
 586 \begin{description}
 587 \item[Phase 1]
 588   concerns only the $[~\verb+in+~\NT{sequent\_path}~]$
 589   part of the syntax. $\NT{ident}$ is an hypothesis name and
 590   selects the assumption where the following optional $\NT{multipath}$
 591   will operate. \verb+\vdash+ can be considered the name for the goal.
 592   If the whole pattern is omitted, the whole goal will be selected.
 593   If one or more hypotheses names are given the selection is restricted to
 594   these assumptions. If a $\NT{multipath}$ is omitted the whole
 595   assumption is selected. Remember that the user can be mostly
 596   unaware of this syntax, since the system is able to write down a
 597   $\NT{sequent\_path}$ starting from a visual selection.
 598   \NOTE{Questo ancora non va\\in matita}
 599
 600   A $\NT{multipath}$ is a CiC term in which a special constant $\%$
 601   is allowed.
 602   The roots of discharged subterms are marked with $?$, while $\%$
 603   is used to select roots. The default $\NT{multipath}$, the one that
 604   selects the whole term, is simply $\%$.
 605   Valid $\NT{multipath}$ are, for example, $(?~\%~?)$ or $\%~\verb+\to+~(\%~?)$
 606   that respectively select the first argument of an application or
 607   the source of an arrow and the head of the application that is
 608   found in the arrow target.
 609
 610   The first phase selects not only terms (roots of subterms) but also
 611   their context that will be eventually used in the second phase.
 612
 613 \item[Phase 2]
 614   plays a role only if the $[~\verb+in match+~\NT{wanted}~]$
 615   part is specified. From the first phase we have some terms, that we
 616   will see as subterm roots, and their context. For each of these
 617   contexts the $\NT{wanted}$ term is disambiguated in it and the
 618   corresponding root is searched for a subterm $\alpha$-equivalent to
 619   $\NT{wanted}$. The result of this search is the selection the
 620   pattern represents.
 621
 622 \end{description}
 623
 624 \noindent
 625 Since the first step is equipotent to the composition of the two
 626 steps, the system uses it to represent each visual selection.
 627 The second step is only meant for the
 628 experienced user that writes patterns by hand, since it really
 629 helps in writing concise patterns as we will see in the
 630 following examples.
 631
 632 \subsubsection{Examples}
 633 To explain how the first step works let's give an example. Consider
 634 you want to prove the uniqueness of the identity element $0$ for natural
 635 sum, and that you can relay on the previously demonstrated left
 636 injectivity of the sum, that is $inj\_plus\_l:\forall x,y,z.x+y=z+y \to x =z$.
 637 Typing
 638 \begin{grafite}
 639 theorem valid_name: \forall n,m. m + n = n \to m = O.
 640   intros (n m H).
 641 \end{grafite}
 642 \noindent
 643 leads you to the following sequent
 644 \sequent{
 645 n:nat\\
 646 m:nat\\
 647 H: m + n = n}{
 648 m=O
 649 }
 650 \noindent
 651 where you want to change the right part of the equivalence of the $H$
 652 hypothesis with $O + n$ and then use $inj\_plus\_l$ to prove $m=O$.
 653 \begin{grafite}
 654   change in H:(? ? ? %) with (O + n).
 655 \end{grafite}
 656 \noindent
 657 This pattern, that is a simple instance of the $\NT{sequent\_path}$
 658 grammar entry, acts on $H$ that has type (without notation) $(eq~nat~(m+n)~n)$
 659 and discharges the head of the application and the first two arguments with a
 660 $?$ and selects the last argument with $\%$. The syntax may seem uncomfortable,
 661 but the user can simply select with the mouse the right part of the equivalence
 662 and left to the system the burden of writing down in the script file the
 663 corresponding pattern with $?$ and $\%$ in the right place (that is not
 664 trivial, expecially where implicit arguments are hidden by the notation, like
 665 the type $nat$ in this example).
 666
 667 Changing all the occurrences of $n$ in the hypothesis $H$ with $O+n$
 668 works too and can be done, by the experienced user, writing directly
 669 a simpler pattern that uses the second phase.
 670 \begin{grafite}
 671   change in match n in H with (O + n).
 672 \end{grafite}
 673 \noindent
 674 In this case the $\NT{sequent\_path}$ selects the whole $H$, while
 675 the second phase searches the wanted $n$ inside it by
 676 $\alpha$-equivalence. The resulting
 677 equivalence will be $m+(O+n)=O+n$ since the second phase found two
 678 occurrences of $n$ in $H$ and the tactic changed both.
 679
 680 Just for completeness the second pattern is equivalent to the
 681 following one, that is less readable but uses only the first phase.
 682 \begin{grafite}
 683   change in H:(? ? (? ? %) %) with (O + n).
 684 \end{grafite}
 685 \noindent
 686
 687 \subsubsection{Tactics supporting patterns}
 688 In \MATITA{} all the tactics that can be restricted to subterm of the working
 689 sequent accept the pattern syntax. In particular these tactics are: simplify,
 690 change, fold, unfold, generalize, replace and rewrite.
 691
 692 \NOTE{attualmente rewrite e \\ fold non supportano \\ phase 2. per
 693 supportarlo\\bisogna far loro trasformare\\il pattern phase1+phase2\\
 694 in un pattern phase1only\\come faccio nell'ultimo esempio.\\lo si fa
 695 con una pattern\_of(select(pattern))}
 696
 697 \subsubsection{Comparison with Coq}
 698 Coq has a two diffrent ways of restricting the application of tactis to
 699 subterms of the sequent, both relaying on the same special syntax to identify
 700 a term occurrence.
 701
 702 The first way is to use this special syntax to specify directly to the
 703 tactic the occurrnces of a wanted term that should be affected, while
 704 the second is to prepare the sequent with another tactic called
 705 pattern and the apply the real tactic. Note that the choice is not
 706 left to the user, since some tactics needs the sequent to be prepared
 707 with pattern and do not accept directly this special syntax.
 708
 709 The base idea is that to identify a subterm of the sequent we can
 710 write it and say that we want, for example, the third and the fifth
 711 occurce of it (counting from left to right). In our previous example,
 712 to change only the left part of the equivalence, the correct command
 713 is
 714 \begin{grafite}
 715   change n at 2 in H with (O + n)
 716 \end{grafite}
 717 \noindent
 718 meaning that in the hypothesis $H$ the $n$ we want to change is the
 719 second we encounter proceeding from left toright.
 720
 721 The tactic pattern computes a
 722 $\beta$-expansion of a part of the sequent with respect to some
 723 occurrences of the given term. In the previous example the following
 724 command
 725 \begin{grafite}
 726   pattern n at 2 in H
 727 \end{grafite}
 728 \noindent
 729 would have resulted in this sequent
 730 \begin{grafite}
 731   n : nat
 732   m : nat
 733   H : (fun n0 : nat => m + n = n0) n
 734   ============================
 735    m = 0
 736 \end{grafite}
 737 \noindent
 738 where $H$ is $\beta$-expanded over the second $n$
 739 occurrence. This is a trick to make the unification algorithm ignore
 740 the head of the application (since the unification is essentially
 741 first-order) but normally operate on the arguments.
 742 This works for some tactics, like rewrite and replace,
 743 but for example not for change and other tactics that do not relay on
 744 unification.
 745
 746 The idea behind this way of identifying subterms in not really far
 747 from the idea behind patterns, but really fails in extending to
 748 complex notation, since it relays on a mono-dimensional sequent representation.
 749 Real math notation places arguments upside-down (like in indexed sums or
 750 integrations) or even puts them inside a bidimensional matrix.
 751 In these cases using the mouse to select the wanted term is probably the
 752 only way to tell the system exactly what you want to do.
 753
 754 One of the goals of \MATITA{} is to use modern publishing techiques, and
 755 adopting a method for restricting tactics application domain that discourages
 756 using heavy math notation, would definitively be a bad choice.
 757
 758 \subsection{tatticali}
 759 \ASSIGNEDTO{gares}
 760
 761 \subsection{named variable e disambiguazione lazy}
 762 \ASSIGNEDTO{csc}
 763
 764 \subsection{metavariabili}
 765 \label{sec:metavariables}
 766 \ASSIGNEDTO{csc}
 767
 768 \begin{verbatim}
 769
 770 \end{verbatim}
 771
 772 \section{Drawbacks, missing, \dots}
 773
 774 \subsection{moduli}
 775 \ASSIGNEDTO{}
 776
 777 \subsection{ltac}
 778 \ASSIGNEDTO{}
 779
 780 \subsection{estrazione}
 781 \ASSIGNEDTO{}
 782
 783 \subsection{localizzazione errori}
 784 \ASSIGNEDTO{}
 785
 786 \acknowledgements
 787 We would like to thank all the students that during the past
 788 five years collaborated in the \HELM{} project and contributed to
 789 the development of Matita, and in particular
 790 A.Griggio, F.Guidi, P. Di Lena, L.Padovani, I.Schena, M.Selmi,
 791 V.Tamburrelli.
 792
 793 \theendnotes
 794
 795 \bibliography{matita}
 796
 797
 798 \end{document}
 799