helm/papers/matita/matita.tex

   1 \documentclass[a4paper]{llncs}
   2 \pagestyle{headings}
   3 \usepackage{color}
   4 \usepackage{graphicx}
   5 \usepackage{amssymb,amsmath}
   6 \usepackage{hyperref}
   7 \usepackage{picins}
   8 \usepackage{color}
   9 \usepackage{fancyvrb}
  10
  11 \definecolor{gray}{gray}{0.85}
  12 %\newcommand{\logo}[3]{
  13 %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
  14 %}
  15
  16 \newcommand{\AUTO}{\textsc{Auto}}
  17 \newcommand{\COQ}{Coq}
  18 \newcommand{\ELIM}{\textsc{Elim}}
  19 \newcommand{\HELM}{Helm}
  20 \newcommand{\HINT}{\textsc{Hint}}
  21 \newcommand{\IN}{\ensuremath{\mathbb{N}}}
  22 \newcommand{\INSTANCE}{\textsc{Instance}}
  23 \newcommand{\IR}{\ensuremath{\mathbb{R}}}
  24 \newcommand{\IZ}{\ensuremath{\mathbb{Z}}}
  25 \newcommand{\LIBXSLT}{LibXSLT}
  26 \newcommand{\LOCATE}{\textsc{Locate}}
  27 \newcommand{\MATCH}{\textsc{Match}}
  28 \newcommand{\MATITA}{Matita}
  29 \newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline}
  30 \newcommand{\MOWGLI}{MoWGLI}
  31 \newcommand{\NAT}{\ensuremath{\mathit{nat}}}
  32 \newcommand{\NATIND}{\mathit{nat\_ind}}
  33 \newcommand{\NUPRL}{NuPRL}
  34 \newcommand{\OCAML}{OCaml}
  35 \newcommand{\PROP}{\mathit{Prop}}
  36 \newcommand{\REF}[3]{\ensuremath{\mathit{Ref}_{#1}(#2,#3)}}
  37 \newcommand{\TEXMACRO}[1]{\texttt{\char92 #1}}
  38 \newcommand{\UWOBO}{UWOBO}
  39 \newcommand{\WHELP}{Whelp}
  40
  41 \definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black
  42 \newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
  43 \newcommand{\URI}[1]{\texttt{#1}}
  44
  45 %{\end{SaveVerbatim}\setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{2mm}%
  46 \newenvironment{grafite}{\VerbatimEnvironment
  47  \begin{SaveVerbatim}{boxtmp}}%
  48  {\end{SaveVerbatim}\setlength{\fboxsep}{3mm}%
  49   \begin{center}
  50    \fcolorbox{black}{gray}{\BUseVerbatim[boxwidth=0.9\linewidth]{boxtmp}}
  51   \end{center}}
  52
  53 \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
  54 \newcommand{\FILE}[1]{\texttt{#1}}
  55 \newcommand{\NOTE}[1]{\marginpar{\scriptsize #1}}
  56 \newcommand{\TODO}[1]{\textbf{TODO: #1}}
  57
  58 \newsavebox{\tmpxyz}
  59 \newcommand{\sequent}[2]{
  60   \savebox{\tmpxyz}[0.9\linewidth]{
  61     \begin{minipage}{0.9\linewidth}
  62       \ensuremath{#1} \\
  63       \rule{3cm}{0.03cm}\\
  64       \ensuremath{#2}
  65     \end{minipage}}\setlength{\fboxsep}{3mm}%
  66   \begin{center}
  67    \fcolorbox{black}{gray}{\usebox{\tmpxyz}}
  68   \end{center}}
  69
  70 \title{The Matita proof assistant}
  71 \author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
  72  and Stefano Zacchiroli}
  73 \institute{Department of Computer Science, University of Bologna\\
  74  Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY\\
  75  \email{$\{$asperti,sacerdot,tassi,zacchiro$\}$@cs.unibo.it}}
  76 \bibliographystyle{plain}
  77
  78 \begin{document}
  79 \maketitle
  80
  81 \begin{abstract}
  82 \end{abstract}
  83
  84 \section{Introduction}
  85 \label{sec:intro}
  86 {\em Matita} is the proof assistant under development by the \HELM{} team
  87 \cite{mkm-helm} at the University of Bologna, under the direction of
  88 Prof.~Asperti.
  89 The origin of the system goes back to 1999. At the time we were mostly
  90 interested to develop tools and techniques to enhance the accessibility
  91 via web of formal libraries of mathematics. Due to its dimension, the
  92 library of the \COQ{} proof assistant (of the order of 35'000 theorems)
  93 was choosed as a privileged test bench for our work, although experiments
  94 have been also conducted with other systems, and notably with \NUPRL{}.
  95 The work, mostly performed in the framework of the recently concluded
  96 European project IST-33562 \MOWGLI{}~\cite{pechino}, mainly consisted in the
  97 following teps:
  98 \begin{itemize}
  99 \item exporting the information from the internal representation of
 100  \COQ{} to a system and platform independent format. Since XML was at the
 101 time an emerging standard, we naturally adopted this technology, fostering
 102 a content-based architecture for future system, where the documents
 103 of the library were the the main components around which everything else
 104 has to be build;
 105 \item developing indexing and searching techniques supporting semantic
 106  queries to the library; these efforts gave birth to our \WHELP{}
 107 search engine, described in~\cite{whelp};
 108 \item developing languages and tools for a high-quality notational
 109 rendering of mathematical information; in particular, we have been
 110 active in the MathML Working group since 1999, and developed inside
 111 \HELM{} a MathML-compliant widget for the GTK graphical environment
 112 which can be integrated in any application.
 113 \end{itemize}
 114 The exportation issue, extensively discussed in \cite{exportation-module},
 115 has several major implications worth to be discussed.
 116
 117 The first
 118 point concerns the kind of content information to be exported. In a
 119 proof assistant like \COQ{}, proofs are represented in at least three clearly
 120 distinguishable formats: \emph{scripts} (i.e. sequences of commands issued by the
 121 user to the system during an interactive session of proof), \emph{proof objects}
 122 (which is the low-level representation of proofs in the form of
 123 lambda-terms readable to and checked by kernel) and \emph{proof-trees} (which
 124 is a kind of intermediate representation, vaguely inspired by a sequent
 125 like notation, that inherits most of the defects but essentially
 126 none of the advantages of the previous representations).
 127 Partially related to this problem, there is the
 128 issue of the {\em granularity} of the library: scripts usually comprise
 129 small developments with many definitions and theorems, while
 130 proof objects correspond to individual mathematical items.
 131
 132 In our case, the choice of the content encoding was eventually dictated
 133 by the methodological assumption of offering the information in a
 134 stable and system-independent format. The language of scripts is too
 135 oriented to \COQ, and it changes too rapidly to be of any interest
 136 to third parties. On the other side, the language of proof objects
 137 merely depend on
 138 the logical framework (the Calculus of Inductive Constructions, in
 139 the case of \COQ), is grammatically simple, semantically clear and,
 140 especially, is very stable (as kernels of proof assistants
 141 often are).
 142 So the granularity of the library is at the level of individual
 143 objects, that also justifies from another point of view the need
 144 for efficient searching techniques for retrieving individual
 145 logical items from the repository.
 146
 147 The main (possibly only) problem with proof objects is that they are
 148 difficult to read and do not directly correspond to what the user typed
 149 in. An analogy frequently made in the proof assistant community is that of
 150 comparing the vernacular language of scripts to a high level source language
 151 and lambda terms to the assembly language they are compiled in. We do not
 152 share this view and prefer to look at scripts as an imperative language,
 153 and to lambda terms as their denotational semantics; still, however,
 154 denotational semantics is possibly more formal but surely not more readable
 155 than the imperative source.
 156
 157 For all the previous reasons, a huge amount of work inside \MOWGLI{} has
 158 been devoted to automatic reconstruction of proofs in natural language
 159 from lambda terms. Since lambda terms are in close connection
 160 with natural deduction
 161 (that is still the most natural logical language discovered so far)
 162 the work is not hopeless as it may seem, especially if rendering
 163 is combined, as in our case, with dynamic features supporting
 164 in-line expansions or contractions of subproofs. The final
 165 rendering is probably not entirely satisfactory (see \cite{ida} for a
 166 discussion), but surely
 167 readable (the actual quality largely depends by the way the lambda
 168 term is written).
 169
 170 Summing up, we already disposed of the following tools/techniques:
 171 \begin{itemize}
 172 \item XML specifications for the Calculus of Inductive Constructions,
 173 with tools for parsing and saving mathematical objects in such a format;
 174 \item metadata specifications and tools for indexing and querying the
 175 XML knowledge base;
 176 \item a proof checker (i.e. the {\em kernel} of a proof assistant),
 177  implemented to check that we exported form the \COQ{} library all the
 178 logically relevant content;
 179 \item a sophisticated parser (used by the search engine), able to deal
 180 with potentially ambiguous and incomplete information, typical of the
 181 mathematical notation \cite{};
 182 \item a {\em refiner}, i.e. a type inference system, based on complex
 183 existential variables, used by the disambiguating parser;
 184 \item complex transformation algorithms for proof rendering in natural
 185 language;
 186 \item an innovative rendering widget, supporting high-quality bidimensional
 187 rendering, and semantic selection, i.e. the possibility to select semantically
 188 meaningful rendering expressions, and to past the respective content into
 189 a different text area.
 190 \NOTE{il widget\\ non ha sel\\ semantica}
 191 \end{itemize}
 192 Starting from all this, the further step of developing our own
 193 proof assistant was too
 194 small and too tempting to be neglected. Essentially, we ``just'' had to
 195 add an authoring interface, and a set of functionalities for the
 196 overall management of the library, integrating everything into a
 197 single system. \MATITA{} is the result of this effort.
 198
 199 At first sight, \MATITA{} looks as (and partly is) a \COQ{} clone. This is
 200 more the effect of the circumstances of its creation described
 201 above than the result of a deliberate design. In particular, we
 202 (essentially) share the same foundational dialect of \COQ{} (the
 203 Calculus of Inductive Constructions), the same implementative
 204 language (\OCAML{}), and the same (script based) authoring philosophy.
 205 However, as we shall see, the analogy essentially stops here.
 206
 207 In a sense; we like to think of \MATITA{} as the way \COQ{} would
 208 look like if entirely rewritten from scratch: just to give an
 209 idea, although \MATITA{} currently supports almost all functionalities of
 210 \COQ{}, it links 60'000 lins of \OCAML{} code, against ... of \COQ{} (and
 211 we are convinced that, starting from scratch again, we could furtherly
 212 reduce our code in sensible way).\NOTE{righe\\\COQ{}}
 213
 214 \begin{itemize}
 215  \item scelta del sistema fondazionale
 216  \item sistema indipendente (da Coq)
 217   \begin{itemize}
 218    \item possibilit\`a di sperimentare (soluzioni architetturali, logiche,
 219     implementative, \dots)
 220    \item compatibilit\`a con sistemi legacy
 221   \end{itemize}
 222 \end{itemize}
 223
 224 \section{Features}
 225
 226 \subsection{mathml}
 227 \ASSIGNEDTO{zack}
 228
 229 \subsection{metavariabili}
 230 \label{sec:metavariables}
 231 \ASSIGNEDTO{csc}
 232
 233 \subsection{pattern}
 234 \ASSIGNEDTO{gares}\\
 235 Patterns are the textual counterpart of the MathML widget graphical
 236 selection.
 237
 238 Matita benefits of a graphical interface and a powerful MathML rendering
 239 widget that allows the user to select pieces of the sequent he is working
 240 on. While this is an extremely intuitive way for the user to
 241 restrict the application of tactics, for example, to some subterms of the
 242 conclusion or some hypothesis, the way this action is recorded to the text
 243 script is not obvious.\\
 244 In \MATITA{} this issue is addressed by patterns.
 245
 246 \subsubsection{Pattern syntax}
 247 A pattern is composed of two terms: a $\NT{sequent\_path}$ and a
 248 $\NT{wanted}$.
 249 The former mocks-up a sequent, discharging unwanted subterms with $?$ and
 250 selecting the interesting parts with the placeholder $\%$.
 251 The latter is a term that lives in the context of the placeholders.
 252
 253 The concrete syntax is reported in table \ref{tab:pathsyn}
 254 \NOTE{uso nomi diversi \\dalla grammatica \\ ma che hanno + senso}
 255 \begin{table}
 256  \caption{\label{tab:pathsyn} Concrete syntax of \MATITA{} patterns.\strut}
 257 \hrule
 258 \[
 259 \begin{array}{@{}rcll@{}}
 260   \NT{pattern} &
 261     ::= & [~\verb+in match+~\NT{wanted}~]~[~\verb+in+~\NT{sequent\_path}~] & \\
 262   \NT{sequent\_path} &
 263     ::= & \{~\NT{ident}~[~\verb+:+~\NT{multipath}~]~\}~
 264       [~\verb+\vdash+~\NT{multipath}~] & \\
 265   \NT{wanted} & ::= & \NT{term} & \\
 266   \NT{multipath} & ::= & \NT{term\_with\_placeholders} & \\
 267 \end{array}
 268 \]
 269 \hrule
 270 \end{table}
 271
 272 \subsubsection{How patterns work}
 273 Patterns mimic the user's selection in two steps. The first one
 274 selects roots (subterms) of the sequent, using the
 275 $\NT{sequent\_path}$,  while the second
 276 one searches the $\NT{wanted}$ term starting from these roots. Both are
 277 optional steps, and by convention the empty pattern selects the whole
 278 conclusion.
 279
 280 \begin{description}
 281 \item[Phase 1]
 282   concerns only the $[~\verb+in+~\NT{sequent\_path}~]$
 283   part of the syntax. $\NT{ident}$ is an hypothesis name and
 284   selects the assumption where the following optional $\NT{multipath}$
 285   will operate. \verb+\vdash+ can be considered the name for the goal.
 286   If the whole pattern is omitted, the whole goal will be selected.
 287   If one or more hypotheses names are given the selection is restricted to
 288   these assumptions. If a $\NT{multipath}$ is omitted the whole
 289   assumption is selected. Remember that the user can be mostly
 290   unaware of this syntax, since the system is able to write down a
 291   $\NT{sequent\_path}$ starting from a visual selection.
 292   \NOTE{Questo ancora non va\\in matita}
 293
 294   A $\NT{multipath}$ is a CiC term in which a special constant $\%$
 295   is allowed.
 296   The roots of discharged subterms are marked with $?$, while $\%$
 297   is used to select roots. The default $\NT{multipath}$, the one that
 298   selects the whole term, is simply $\%$.
 299   Valid $\NT{multipath}$ are, for example, $(?~\%~?)$ or $\%~\verb+\to+~(\%~?)$
 300   that respectively select the first argument of an application or
 301   the source of an arrow and the head of the application that is
 302   found in the arrow target.
 303
 304   The first phase selects not only terms (roots of subterms) but also
 305   their context that will be eventually used in the second phase.
 306
 307 \item[Phase 2]
 308   plays a role only if the $[~\verb+in match+~\NT{wanted}~]$
 309   part is specified. From the first phase we have some terms, that we
 310   will see as subterm roots, and their context. For each of these
 311   contexts the $\NT{wanted}$ term is disambiguated in it and the
 312   corresponding root is searched for a subterm $\alpha$-equivalent to
 313   $\NT{wanted}$. The result of this search is the selection the
 314   pattern represents.
 315
 316 \end{description}
 317
 318 \noindent
 319 Since the first step is equipotent to the composition of the two
 320 steps, the system uses it to represent each visual selection.
 321 The second step is only meant for the
 322 experienced user that writes patterns by hand, since it really
 323 helps in writing concise patterns as we will see in the
 324 following examples.
 325
 326 \subsubsection{Examples}
 327 To explain how the first step works let's give an example. Consider
 328 you want to prove the uniqueness of the identity element $0$ for natural
 329 sum, and that you can relay on the previously demonstrated left
 330 injectivity of the sum, that is $inj\_plus\_l:\forall x,y,z.x+y=z+y \to x =z$.
 331 Typing
 332 \begin{grafite}
 333 theorem valid_name: \forall n,m. m + n = n \to m = O.
 334   intros (n m H).
 335 \end{grafite}
 336 \noindent
 337 leads you to the following sequent
 338 \sequent{
 339 n:nat\\
 340 m:nat\\
 341 H: m + n = n}{
 342 m=O
 343 }
 344 \noindent
 345 where you want to change the right part of the equivalence of the $H$
 346 hypothesis with $O + n$ and then use $inj\_plus\_l$ to prove $m=O$.
 347 \begin{grafite}
 348   change in H:(? ? ? %) with (O + n).
 349 \end{grafite}
 350 \noindent
 351 This pattern, that is a simple instance of the $\NT{sequent\_path}$
 352 grammar entry, acts on $H$ that has type (without notation) $(eq~nat~(m+n)~n)$
 353 and discharges the head of the application and the first two arguments with a
 354 $?$ and selects the last argument with $\%$. The syntax may seem uncomfortable,
 355 but the user can simply select with the mouse the right part of the equivalence
 356 and left to the system the burden of writing down in the script file the
 357 corresponding pattern with $?$ and $\%$ in the right place (that is not
 358 trivial, expecially where implicit arguments are hidden by the notation, like
 359 the type $nat$ in this example).
 360
 361 Changing all the occurrences of $n$ in the hypothesis $H$ with $O+n$
 362 works too and can be done, by the experienced user, writing directly
 363 a simpler pattern that uses the second phase.
 364 \begin{grafite}
 365   change in match n in H with (O + n).
 366 \end{grafite}
 367 \noindent
 368 In this case the $\NT{sequent\_path}$ selects the whole $H$, while
 369 the second phase searches the wanted $n$ inside it by
 370 $\alpha$-equivalence. The resulting
 371 equivalence will be $m+(O+n)=O+n$ since the second phase found two
 372 occurrences of $n$ in $H$ and the tactic changed both.
 373
 374 Just for completeness the second pattern is equivalent to the
 375 following one, that is less readable but uses only the first phase.
 376 \begin{grafite}
 377   change in H:(? ? (? ? %) %) with (O + n).
 378 \end{grafite}
 379 \noindent
 380
 381 \subsubsection{Tactics supporting patterns}
 382 In \MATITA{} all the tactics that can be restricted to subterm of the working
 383 sequent accept the pattern syntax. In particular these tactics are: simplify,
 384 change, fold, unfold, generalize, replace and rewrite.
 385
 386 \NOTE{attualmente rewrite e \\ fold non supportano \\ phase 2. per
 387 supportarlo\\bisogna far loro trasformare\\il pattern phase1+phase2\\
 388 in un pattern phase1only\\come faccio nell'ultimo esempio.\\lo si fa
 389 con una pattern\_of(select(pattern))}
 390
 391 \subsubsection{Comparison with Coq}
 392 Coq has a two diffrent ways of restricting the application of tactis to
 393 subterms of the sequent, both relaying on the same special syntax to identify
 394 a term occurrence.
 395
 396 The first way is to use this special syntax to specify directly to the
 397 tactic the occurrnces of a wanted term that should be affected, while
 398 the second is to prepare the sequent with another tactic called
 399 pattern and the apply the real tactic. Note that the choice is not
 400 left to the user, since some tactics needs the sequent to be prepared
 401 with pattern and do not accept directly this special syntax.
 402
 403 The base idea is that to identify a subterm of the sequent we can
 404 write it and say that we want, for example, the third and the fifth
 405 occurce of it (counting from left to right). In our previous example,
 406 to change only the left part of the equivalence, the correct command
 407 is
 408 \begin{grafite}
 409   change n at 2 in H with (O + n)
 410 \end{grafite}
 411 \noindent
 412 meaning that in the hypothesis $H$ the $n$ we want to change is the
 413 second we encounter proceeding from left toright.
 414
 415 The tactic pattern computes a
 416 $\beta$-expansion of a part of the sequent with respect to some
 417 occurrences of the given term. In the previous example the following
 418 command
 419 \begin{grafite}
 420   pattern n at 2 in H
 421 \end{grafite}
 422 \noindent
 423 would have resulted in this sequent
 424 \begin{grafite}
 425   n : nat
 426   m : nat
 427   H : (fun n0 : nat => m + n = n0) n
 428   ============================
 429    m = 0
 430 \end{grafite}
 431 \noindent
 432 where $H$ is $\beta$-expanded over the second $n$
 433 occurrence. This is a trick to make the unification algorithm ignore
 434 the head of the application (since the unification is essentially
 435 first-order) but normally operate on the arguments.
 436 This works for some tactics, like rewrite and replace,
 437 but for example not for change and other tactics that do not relay on
 438 unification.
 439
 440 The idea behind this way of identifying subterms in not really far
 441 from the idea behind patterns, but really fails in extending to
 442 complex notation, since it relays on a mono-dimensional sequent representation.
 443 Real math notation places arguments upside-down (like in indexed sums or
 444 integrations) or even puts them inside a bidimensional matrix.
 445 In these cases using the mouse to select the wanted term is probably the
 446 only way to tell the system exactly what you want to do.
 447
 448 One of the goals of \MATITA{} is to use modern publishing techiques, and
 449 adopting a method for restricting tactics application domain that discourages
 450 using heavy math notation, would definitively be a bad choice.
 451
 452 \subsection{tatticali}
 453 \ASSIGNEDTO{gares}
 454
 455 \begin{verbatim}
 456
 457 \end{verbatim}
 458
 459 \subsection{Disambiguation}
 460 \label{sec:disambiguation}
 461 \ASSIGNEDTO{zack}
 462
 463 \begin{table}
 464  \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in
 465  notation\strut}
 466 \hrule
 467 \[
 468 \begin{array}{@{}rcll@{}}
 469   \NT{term} & ::= & & \mbox{\bf terms} \\
 470     &     & x & \mbox{(identifier)} \\
 471     &  |  & n & \mbox{(number)} \\
 472     &  |  & s & \mbox{(symbol)} \\
 473     &  |  & \mathrm{URI} & \mbox{(URI)} \\
 474     &  |  & \verb+_+ & \mbox{(implicit)}\TODO{sync} \\
 475     &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
 476     &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
 477     &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
 478     &  |  & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
 479     &  |  & \NT{term}~\NT{term} & \mbox{(application)} \\
 480     &  |  & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
 481     &  |  & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\
 482     &     & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+]
 483               ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\
 484     &     & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\
 485     &  |  & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
 486     &  |  & \verb+(+~\NT{term}~\verb+)+ \\
 487   \NT{defs}  & ::= & & \mbox{\bf mutual definitions} \\
 488     &     & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
 489   \NT{fun} & ::= & & \mbox{\bf functions} \\
 490     &     & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
 491   \NT{binder} & ::= & & \mbox{\bf binders} \\
 492     &     & \verb+\forall+ \mid \verb+\lambda+ \\
 493   \NT{arg} & ::= & & \mbox{\bf single argument} \\
 494     &     & \verb+_+ \mid x \\
 495   \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
 496     &     & \NT{arg} \\
 497     &  |  & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
 498   \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
 499     &     & \NT{arg} \\
 500     &  |  & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
 501   \NT{kind} & ::= & & \mbox{\bf induction kind} \\
 502     &     & \verb+rec+ \mid \verb+corec+ \\
 503   \NT{rule} & ::= & & \mbox{\bf rules} \\
 504     &     & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term}
 505 \end{array}
 506 \]
 507 \hrule
 508 \end{table}
 509
 510 \subsubsection{Term input}
 511
 512 The primary form of user interaction employed by \MATITA{} is textual script
 513 editing: the user modifies it and evaluate step by step its composing
 514 \emph{statements}. Examples of statements are inductive type definitions,
 515 theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
 516 be used to ask the system to refine a given term and pretty print the result).
 517 Since many statements refer to terms of the underlying calculus, \MATITA{} needs
 518 a concrete syntax able to encode terms of the Calculus of Inductive
 519 Constructions.
 520
 521 Two of the requirements in the design of such a syntax are apparently in
 522 contrast:
 523 \begin{enumerate}
 524  \item the syntax should be as close as possible to common mathematical practice
 525   and implement widespread mathematical notations;
 526  \item each term described by the syntax should be non-ambiguous meaning that it
 527   should exists a function which associates to it a CIC term.
 528 \end{enumerate}
 529
 530 These two requirements are addressed in \MATITA{} by the mean of two mechanisms
 531 which work together: \emph{term disambiguation} and \emph{extensible notation}.
 532 Their interaction is visible in the architecture of the \MATITA{} input phase,
 533 depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
 534 pipline of three levels: the concrete syntax level (level 0) is the one the user
 535 has to deal with when inserting CIC terms; the abstract syntax level (level 2)
 536 is an internal representation which intuitively encodes mathematical formulae at
 537 the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
 538 CIC terms.
 539
 540 \begin{figure}[ht]
 541  \begin{center}
 542   \includegraphics[width=0.9\textwidth]{input_phase}
 543   \caption{\MATITA{} input phase}
 544  \end{center}
 545  \label{fig:inputphase}
 546 \end{figure}
 547
 548 Requirement (1) is addressed by a built-in concrete syntax for terms, described
 549 in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
 550 way for extending available mathematical notations. Extensible notation, which
 551 is also in charge of providing a parsing function mapping concrete syntax terms
 552 to content level terms, is described in Sect.~\ref{sec:notation}.  Requirement
 553 (2) is addressed by the conjunct action of that parsing function and
 554 disambiguation which provides a function from content level terms to CIC terms.
 555
 556 \subsubsection{Sources of ambiguity}
 557
 558 The translation from content level terms to CIC terms is not straightforward
 559 because some nodes of the content encoding admit more that one CIC encoding,
 560 invalidating requirement (2).
 561
 562 \begin{example}
 563  \label{ex:disambiguation}
 564
 565  Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
 566  ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
 567  user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
 568  as infix operators, all the following questions are legitimate and must be
 569  answered before obtaining a CIC term from its content level encoding
 570  (Fig.~\ref{fig:inputphase}(b)):
 571
 572  \begin{enumerate}
 573
 574   \item Since \texttt{ln} is an unbound identifier, which CIC constants does it
 575    represent? Many different theorems in the library may share its (rather
 576    short) name \dots
 577
 578   \item Which kind of number (\IN, \IR, \dots) the \texttt{1} literal stand for?
 579    Which encoding is used in CIC to represent it? E.g., assuming $1\in\IN$, is
 580    it an unary or a binary encoding?
 581
 582   \item Which kind of equality the ``='' node represents? Is it Leibniz's
 583    polymorhpic equality? Is it a decidable equality over \IN, \IR, \dots?
 584
 585  \end{enumerate}
 586
 587 \end{example}
 588
 589 In \MATITA, three \emph{sources of ambiguity} are admitted for content level
 590 terms: unbound identifiers, literal numbers, and operators. Each instance of
 591 ambiguity sources (ambiguous entity) occuring in a content level term is
 592 associated to a \emph{disambiguation domain}. Intuitively a disambiguation
 593 domain is a set of CIC terms which may be replaced for an ambiguous entity
 594 during disambiguation. Each item of the domain is said to be an
 595 \emph{interpretation} for the ambiguous entity.
 596
 597 \emph{Unbound identifiers} (question 1) are ambiguous entities since the
 598 namespace of CIC objects is not flat and the same identifier may denote many
 599 ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
 600 is shared by three different theorems stating the associative property of
 601 different additions.  This kind of ambiguity is avoidable if the user is willing
 602 to use long names (in form of URIs in the \texttt{cic://} scheme) in the
 603 concrete syntax, with the obvious drawbacks of obtaining long and unreadable
 604 terms.
 605
 606 Given an unbound identifier, the corresponding disambiguation domain is computed
 607 querying the library for all constants, inductive types, and inductive type
 608 constructors having it as their short name (see the \LOCATE{} query in
 609 Sect.~\ref{sec:metadata}).
 610
 611 \emph{Literal numbers} (question 2) are ambiguous entities as well since
 612 different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
 613 different encodings. Considering the restricted example of natural numbers we
 614 can for instance encode them in CIC using inductive datatypes with a number of
 615 constructor equal to the encoding base plus 1, obtaining one encoding for each
 616 base.
 617
 618 For each possible way of mapping a literal number to a CIC term, \MATITA{} is
 619 aware of a \emph{number intepretation function} which, when applied to the
 620 natural number denoted by the literal\footnote{at the moment only literal
 621 natural number are supported in the concrete syntax} returns a corresponding CIC
 622 term. The disambiguation domain for a given literal number is built applying to
 623 the literal all available number interpretation functions in turn.
 624
 625 Number interpretation functions can be defined in OCaml or directly using
 626 \TODO{notazione per i numeri}.
 627
 628 \emph{Operators} (question 3) are intuitively head of applications, as such they
 629 are always applied to a non empty sequence of arguments. Their ambiguity is a
 630 need since it is often the case that some notation is used in an overloaded
 631 fashion to hide the use of different CIC constants which encodes similar
 632 concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
 633 notation is available building a binary \texttt{Op(+)} node, whose
 634 disambiguation domain may refer to different constants like the addition over
 635 natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
 636 the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
 637
 638 For each possible way of mapping an operator application to a CIC term,
 639 \MATITA{} knows an \emph{operator interpretation function} which, when applied
 640 to an operator and its arguments, returns a CIC term. The disambiguation domain
 641 for a given operator is built applying to the operator and its arguments all
 642 available operator interpretation functions in turn.
 643
 644 Operator interpretation functions could be added using the
 645 \texttt{interpretation} statement. For example, among the first line of the
 646 script \FILE{matita/library/logic/equality.ma} from the \MATITA{} standard
 647 library we read:
 648
 649 \begin{grafite}
 650 interpretation "leibnitz's equality"
 651  'eq x y =
 652    (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
 653 \end{grafite}
 654
 655 Evaluating it in \MATITA{} will add an operator interpretation function for the
 656 binary operator \texttt{eq} which expands to the CIC term on the right hand side
 657 of the statement. That CIC term can be written using only built-in concrete
 658 syntax, can contain no ambiguity source; still, it can refer to operator
 659 arguments bound on the left hand side and can contain implicit terms (denoted
 660 with \texttt{\_}) which will be expanded to fresh metavariables. The latter
 661 feature is used in the example above for the first argument of Leibniz's
 662 polymorhpic equality.
 663
 664 \subsubsection{Disambiguation algorithm}
 665
 666 \NOTE{assumo\\
 667       che si sia\\
 668       gia' parlato\\
 669       di refine}
 670
 671
 672 A \emph{disambiguation algorithm} takes as input a content level term and return
 673 a fully determined CIC term. The key observation on which a disambiguation
 674 algorithm is based is that given a content level term with more than one sources
 675 of ambiguity, not all possible combination of interpretation lead to a typable
 676 CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
 677 interpretation of \texttt{ln} as a function from \IR to \IR and the
 678 interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
 679 of ``can't coexists'' in the disambiguation of \MATITA{} is inherited from the
 680 refiner described in Sect.~\ref{sec:metavariables}: as long as
 681 $\mathit{refine}(c)\neq\bot$, the combination of interpretation which led to $c$
 682 can coexists.
 683
 684 The \emph{naive disambiguation algorithm} takes as input a content level term
 685 $t$ and proceeds as follows:
 686
 687 \begin{enumerate}
 688
 689  \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
 690   $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
 691   of CIC terms and can be built as described above.
 692
 693  \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ be an
 694   interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC term is
 695   fully determined. Iterate over all possible interpretations of $t$ and refine
 696   the corresponding CIC terms, keep only interpretations which lead to CIC terms
 697   $c$ s.t. $\mathit{refine}(c)\neq\bot$ (i.e. interpretations that determine
 698   typable terms).
 699
 700  \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
 701   signal a type error. If $n=1$ we have found exactly one CIC term corresponding
 702   to $t$, returns it as output of the disambiguation phase. If $n>1$ we have
 703   found many different CIC terms which can correspond to the content level term,
 704   let the user choose one of the $n$ interpretations and returns the
 705   corresponding term.
 706
 707 \end{enumerate}
 708
 709 The above algorithm is highly inefficient since the number of possible
 710 interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
 711 The actual algorithm used in \MATITA{} is far more efficient being, in the
 712 average case, linear in the number of ambiguity sources.
 713
 714 \TODO{FINQUI}
 715
 716 The efficient algorithm can be applied if the logic can be extended with
 717 metavariables and a refiner can be implemented. This is the case for CIC and
 718 several other logics.
 719 \emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can
 720 occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes
 721 a freshly created metavariable. A \emph{refiner}~\cite{McBride} is a
 722 function whose input is a term with placeholders and whose output is either a
 723 new term obtained instantiating some placeholder or $\epsilon$, meaning that no
 724 well typed instantiation could be found for the placeholders occurring in
 725 the term (type error).
 726
 727 The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i |
 728 \phi_i = ?, i\in\mathit{Dom}(t)\}$,
 729 which associates a fresh metavariable to each
 730 source of ambiguity. Then it iterates refining the current CIC term (i.e. the
 731 term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the
 732 next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is
 733 replacing a placeholder with one of the possible choice from the corresponding
 734 disambiguation domain. The placeholder to be replaced is chosen following a
 735 preorder visit of the ambiguous term. If the refinement fails the current set of
 736 choices cannot lead to a well-typed term and backtracking is attempted.
 737 Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no
 738 longer contain any placeholder), backtracking is attempted
 739 anyway to find the other correct interpretations.
 740
 741 The intuition which explain why this algorithm is more efficient is that as soon
 742 as a term containing placeholders is not typable, no further instantiation of
 743 its placeholders could lead to a typable term. For example, during the
 744 disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an
 745 interpretation $\Phi_i$ is encountered which associates $?$ to the instance
 746 of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on
 747 the left, and the multiplication over natural numbers (\texttt{mult} for short)
 748 to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural
 749 argument, and no further instantiation of the placeholder will be tried.
 750
 751 If, at the end of the disambiguation, more than one possible interpretations are
 752 possible, the user will be asked to choose the intended one (see
 753 Fig.~\ref{fig:disambiguation}).
 754
 755 \begin{figure}[htb]
 756 %   \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}}
 757   \caption{\label{fig:disambiguation} Disambiguation: interpretation choice}
 758 \end{figure}
 759
 760 Details of the disambiguation algorithm of \WHELP{} can
 761 be found in~\cite{disambiguation}, where an equivalent algorithm
 762 that avoids backtracking is also presented.
 763
 764 \subsection{notazione}
 765 \label{sec:notation}
 766 \ASSIGNEDTO{zack}
 767
 768 \subsection{libreria tutta visibile}
 769 \ASSIGNEDTO{csc}
 770
 771 \subsection{ricerca e indicizzazione}
 772 \label{sec:metadata}
 773 \ASSIGNEDTO{andrea}
 774
 775 \subsection{auto}
 776 \ASSIGNEDTO{andrea}
 777
 778 \subsection{xml / gestione della libreria}
 779 \ASSIGNEDTO{gares}
 780
 781 \subsection{named variable}
 782 \ASSIGNEDTO{csc}
 783
 784 \subsection{assenza di proof tree / resa in linguaggio naturale}
 785 \ASSIGNEDTO{andrea}
 786
 787 \subsection{selezione semantica, cut paste, hyperlink}
 788 \ASSIGNEDTO{zack}
 789
 790 \subsection{sostituzioni esplicite vs moduli}
 791 \ASSIGNEDTO{csc}
 792
 793 \section{Drawbacks, missing, \dots}
 794
 795 \subsection{moduli}
 796 \ASSIGNEDTO{}
 797
 798 \subsection{ltac}
 799 \ASSIGNEDTO{}
 800
 801 \subsection{estrazione}
 802 \ASSIGNEDTO{}
 803
 804 \subsection{localizzazione errori}
 805 \ASSIGNEDTO{}
 806
 807 \textbf{Acknowledgements}
 808 We would like to thank all the students that during the past
 809 five years collaborated in the \HELM{} project and contributed to
 810 the development of Matita, and in particular
 811 A.Griggio, F.Guidi, P. Di Lena, L.Padovani, I.Schena, M.Selmi,
 812 V.Tamburrelli.
 813
 814 \bibliography{matita}
 815
 816 \end{document}
 817