followed kluwer guidelines

[helm.git] / helm / papers / matita / matita.tex
diff --git a/helm/papers/matita/matita.tex b/helm/papers/matita/matita.tex

index 8ed71abe69bee79f4b2a8763c43d8fa67fbc8abd..b7b20dc2ecf468797982ad9673e0900e14998373 100644 (file)
--- a/helm/papers/matita/matita.tex
+++ b/helm/papers/matita/matita.tex
@@ -1,10 +1,14 @@
-\documentclass[a4paper]{llncs}
-\pagestyle{headings}
+\documentclass{kluwer}
+\usepackage{color}
  \usepackage{graphicx}
-\usepackage{amssymb,amsmath}
+% \usepackage{amssymb,amsmath}
  \usepackage{hyperref}
-\usepackage{picins}
+% \usepackage{picins}
+\usepackage{color}
+\usepackage{fancyvrb}
+\usepackage[show]{ed}
  
+\definecolor{gray}{gray}{0.85}
  %\newcommand{\logo}[3]{
  %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
  %}
@@ -14,9 +18,10 @@
  \newcommand{\ELIM}{\textsc{Elim}}
  \newcommand{\HELM}{Helm}
  \newcommand{\HINT}{\textsc{Hint}}
-\newcommand{\IN}{\ensuremath{\mathbb{N}}}
+\newcommand{\IN}{\ensuremath{\dN}}
  \newcommand{\INSTANCE}{\textsc{Instance}}
-\newcommand{\IR}{\ensuremath{\mathbb{R}}}
+\newcommand{\IR}{\ensuremath{\dR}}
+\newcommand{\IZ}{\ensuremath{\dZ}}
  \newcommand{\LIBXSLT}{LibXSLT}
  \newcommand{\LOCATE}{\textsc{Locate}}
  \newcommand{\MATCH}{\textsc{Match}}
@@ -33,24 +38,72 @@
  \newcommand{\UWOBO}{UWOBO}
  \newcommand{\WHELP}{Whelp}
  
-\newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
-\newcommand{\NOTE}[1]{\marginpar{\scriptsize #1}}
+\definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black
  \newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
+\newcommand{\URI}[1]{\texttt{#1}}
+
+%{\end{SaveVerbatim}\setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{2mm}%
+\newenvironment{grafite}{\VerbatimEnvironment
+ \begin{SaveVerbatim}{boxtmp}}%
+ {\end{SaveVerbatim}\setlength{\fboxsep}{3mm}%
+  \begin{center}
+   \fcolorbox{black}{gray}{\BUseVerbatim[boxwidth=0.9\linewidth]{boxtmp}}
+  \end{center}}
+
+\newcounter{example}
+\newenvironment{example}{\stepcounter{example}\emph{Example} \arabic{example}.}
+ {}
+\newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
+\newcommand{\FILE}[1]{\texttt{#1}}
+% \newcommand{\NOTE}[1]{\ifodd \arabic{page} \else \hspace{-2cm}\fi\ednote{#1}}
+\newcommand{\NOTE}[1]{\ednote{x~\hspace{-10cm}y#1}{foo}}
+\newcommand{\TODO}[1]{\textbf{TODO: #1}}
+
+\newsavebox{\tmpxyz}
+\newcommand{\sequent}[2]{
+  \savebox{\tmpxyz}[0.9\linewidth]{
+    \begin{minipage}{0.9\linewidth}
+      \ensuremath{#1} \\
+      \rule{3cm}{0.03cm}\\
+      \ensuremath{#2}
+    \end{minipage}}\setlength{\fboxsep}{3mm}%
+  \begin{center}
+   \fcolorbox{black}{gray}{\usebox{\tmpxyz}}
+  \end{center}}
  
-\title{The Matita proof assistant}
-\author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
- and Stefano Zacchiroli}
-\institute{Department of Computer Science, University of Bologna\\
- Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY\\
- \email{$\{$asperti,sacerdot,tassi,zacchiro$\}$@cs.unibo.it}}
  \bibliographystyle{plain}
  
  \begin{document}
-\maketitle
+
+\begin{opening}
+
+ \title{The \MATITA{} Proof Assistant}
+
+\author{Andrea \surname{Asperti} \email{asperti@cs.unibo.it}}
+\author{Claudio \surname{Sacerdoti Coen} \email{sacerdot@cs.unibo.it}}
+\author{Enrico \surname{Tassi} \email{tassi@cs.unibo.it}}
+\author{Stefano \surname{Zacchiroli} \email{zacchiro@cs.unibo.it}}
+\institute{Department of Computer Science, University of Bologna\\
+ Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY}
+
+\runningtitle{The Matita proof assistant}
+\runningauthor{Asperti, Sacerdoti Coen, Tassi, Zacchiroli}
+
+% \date{data}
+
+\begin{motto}
+ Be primitive.
+\end{motto}
  
  \begin{abstract}
+ abstract qui
  \end{abstract}
  
+\keywords{Proof Assistant, Mathematical Knowledge Management, XML, Authoring,
+Digital Libraries}
+
+\end{opening}
+
  \section{Introduction}
  \label{sec:intro}
  {\em Matita} is the proof assistant under development by the \HELM{} team
@@ -97,7 +150,7 @@ none of the advantages of the previous representations).
  Partially related to this problem, there is the
  issue of the {\em granularity} of the library: scripts usually comprise
  small developments with many definitions and theorems, while 
-proof objects correspond to individual mathemtical items. 
+proof objects correspond to individual mathematical items. 
  
  In our case, the choice of the content encoding was eventually dictated
  by the methodological assumption of offering the information in a
@@ -155,9 +208,9 @@ existential variables, used by the disambiguating parser;
  language;
  \item an innovative rendering widget, supporting high-quality bidimensional
  rendering, and semantic selection, i.e. the possibility to select semantically
-meaningfull rendering expressions, and to past the respective content into
+meaningful rendering expressions, and to past the respective content into
  a different text area.
-\NOTE{il widget\\ non ha sel\\ semantica}
+\NOTE{il widget non ha sel semantica}
  \end{itemize}
  Starting from all this, the further step of developing our own 
  proof assistant was too
@@ -175,11 +228,11 @@ language (\OCAML{}), and the same (script based) authoring philosophy.
  However, as we shall see, the analogy essentially stops here. 
  
  In a sense; we like to think of \MATITA{} as the way \COQ{} would 
-look like if entirely rerwritten from scratch: just to give an
+look like if entirely rewritten from scratch: just to give an
  idea, although \MATITA{} currently supports almost all functionalities of
  \COQ{}, it links 60'000 lins of \OCAML{} code, against ... of \COQ{} (and
-we are convinced that, starting from scratch againg, we could furtherly
-reduce our code in sensible way).\NOTE{righe\\\COQ{}}
+we are convinced that, starting from scratch again, we could furtherly
+reduce our code in sensible way).\NOTE{righe \COQ{}}
  
  \begin{itemize}
   \item scelta del sistema fondazionale
@@ -191,74 +244,118 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}}
    \end{itemize}
  \end{itemize}
  
-\section{Features}
+\section{\HELM{} library(??)}
  
-\subsection{mathml}
-\ASSIGNEDTO{zack}
+\subsection{libreria tutta visibile}
+\ASSIGNEDTO{csc}
+\NOTE{assumo che si sia gia' parlato di approccio content-centrico}
+Our commitment to the content-centric view of the architecture of the system
+has important consequences on the user's experience and on the functionalities
+of several components of \MATITA. In the content-centric view the library
+of mathematical knowledge is an already existent and completely autonomous
+entity that we are allowed to exploit and augment using \MATITA. Thus, in
+principle, when the user starts to prove a new theorem she has complete
+visibility of the library and she can refer to every definition and lemma,
+also using the mathematical notation already developed. In a similar way,
+every form of automation of the system must be able to analyze and possibly
+exploit every notion in the library.
+
+The benefits of this approach highly justify the non neglectable price to pay
+in the development of several components. We analyse now a few of the causes
+of this additional complexity.
+
+\subsubsection{Ambiguity}
+A rich mathematical library includes equivalent definitions and representations
+of the same notion. Moreover, mathematical notation inside a rich library is
+surely highly overloaded. As a consequence every mathematical expression the
+user provides is highly ambiguous since all the definitions,
+representations and special notations are available at once to the user.
+
+The usual solution to the problem, as adopted for instance in Coq, is to
+restrict the user's scope to just one interpretation for each definition,
+representation or notation. In this way much of the ambiguity is removed,
+burdening the user that must someway declare what is in scope and that must
+use special syntax when she needs to refer to something not in scope.
+
+Even with this approach ambiguity cannot be completely removed since implicit
+coercions can be arbitrarily inserted by the system to ``change the type''
+of subterms that do not have the expected type. Usually implicit coercions
+are used to overcome the absence of subtyping that should mimic the subset
+relation found in set theory. For instance, the expression
+$\forall n \in nat. 2 * n * \pi \equiv_\pi 0$ is correct in set theory since
+the set of natural numbers is a subset of that of real numbers; the
+corresponding expression $\forall n:nat. 2*n*\pi \equiv_\pi 0$ is not well typed
+and requires the automatic insertion of the coercion $real_of_nat: nat \to R$
+either around both 2 and $n$ (to make both products be on real numbers) or
+around the product $2*n$. The usual approach consists in either rejecting the
+ambiguous term or arbitrarily choosing one of the interpretations. For instance,
+Coq rejects the declaration of coercions that have alternatives
+(i.e. already declared coercions with the same domain and codomain)
+or that are obtained composing other coercions in order to
+avoid making several terms highly ambiguous by choosing to insert any one of the
+alternative coercions. Coq also arbitrarily chooses how to insert coercions in
+terms to make them well typed when there is more than one possibility (as in
+the previous example).
+
+The approach we are following is radically different. It consists in dealing
+with ambiguous expressions instead of avoiding them. As a last resource,
+when the system is unable to disambiguate the input, the user is interactively
+required to provide more information that is recorded to avoid asking the
+same question again in subsequent processing of the same input.
+More details on our approach can be found in \ref{sec:disambiguation}.
+
+\subsubsection{Consistency}
+A large mathematical library is likely to be logically inconsistent.
+It may contain incompatible axioms or alternative conjectures and it may
+even use definitions in incompatible ways. To clarify this last point,
+consider two identical definitions of a set of elements of a given type
+(or of a category of objects of a given type). Let us call the two definitions
+$A-Set$ and $B-Set$ (or $A-Category$ and $B-Category$).
+It is perfectly legitimate to either form the $A-Set$ of every $B-Set$
+or the $B-Set$ of every $A-Set$ (the same for categories). This just corresponds
+to assuming that a $B-Set$ (respectively an $A-Set$) is a small set, whereas
+an $A-Set$ (respectively a $B-Set$) is a big set (possibly of small sets).
+However, if one part of the library assumes $A-Set$s to be the small ones
+and another part of the library assumes $B-Set$s to be the small ones, the
+library as a whole will be logically inconsistent.
+
+Logical inconsistency has never been a problem in the daily work of a
+mathematician. The mathematician simply imposes himself a discipline to
+restrict himself to consistent subsets of the mathematical knowledge.
+However, in doing so he doesn't choose the subset in advance by forgetting
+the rest of his knowledge.
+
+Contrarily to a mathematician, the usual tendency in the world of assisted
+automation is that of restricting in advance the part of the library that
+will be used later on, checking its consistency by construction.
  
-\subsection{metavariabili}
+\subsection{ricerca e indicizzazione}
+\label{sec:metadata}
+\ASSIGNEDTO{andrea}
+
+\subsection{auto}
+\ASSIGNEDTO{andrea}
+
+\subsection{sostituzioni esplicite vs moduli}
  \ASSIGNEDTO{csc}
  
-\subsection{pattern}
+\subsection{xml / gestione della libreria}
  \ASSIGNEDTO{gares}
  
-\subsection{tatticali}
-\ASSIGNEDTO{gares}
+
+\section{User Interface (da cambiare)}
+
+\subsection{assenza di proof tree / resa in linguaggio naturale}
+\ASSIGNEDTO{andrea}
  
  \subsection{Disambiguation}
  \label{sec:disambiguation}
  \ASSIGNEDTO{zack}
  
-\begin{table}
- \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in notation\strut}
-\hrule
-\[
-\begin{array}{@{}rcll@{}}
-  \NT{term} & ::= & & \mbox{\bf terms} \\
-    &     & x & \mbox{(identifier)} \\
-    &  |  & n & \mbox{(number)} \\
-    &  |  & s & \mbox{(symbol)} \\
-    &  |  & \mathrm{URI} & \mbox{(URI)} \\
-    &  |  & \verb+?+ & \mbox{(implicit)} \\
-    &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
-    &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
-    &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
-    &  |  & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
-    &  |  & \NT{term}~\NT{term} & \mbox{(application)} \\
-    &  |  & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
-    &  |  & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\
-    &     & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+]
-             ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\
-    &     & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\
-    &  |  & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
-    &  |  & \verb+(+~\NT{term}~\verb+)+ \\
-  \NT{defs}  & ::= & & \mbox{\bf mutual definitions} \\
-    &     & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
-  \NT{fun} & ::= & & \mbox{\bf functions} \\
-    &     & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
-  \NT{binder} & ::= & & \mbox{\bf binders} \\
-    &     & \verb+\forall+ \mid \verb+\lambda+ \\
-  \NT{arg} & ::= & & \mbox{\bf single argument} \\
-    &     & \verb+_+ \mid x \\
-  \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
-    &     & \NT{arg} \\
-    &  |  & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
-  \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
-    &     & \NT{arg} \\
-    &  |  & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
-  \NT{kind} & ::= & & \mbox{\bf induction kind} \\
-    &     & \verb+rec+ \mid \verb+corec+ \\
-  \NT{rule} & ::= & & \mbox{\bf rules} \\
-    &     & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term}
-\end{array}
-\]
-\hrule
-\end{table}
-
  \subsubsection{Term input}
  
  The primary form of user interaction employed by \MATITA{} is textual script
-editing: the user can modifies it and evaluate step by step its composing
+editing: the user modifies it and evaluate step by step its composing
  \emph{statements}. Examples of statements are inductive type definitions,
  theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
  be used to ask the system to refine a given term and pretty print the result).
@@ -270,10 +367,9 @@ Two of the requirements in the design of such a syntax are apparently in
  contrast:
  \begin{enumerate}
   \item the syntax should be as close as possible to common mathematical practice
-  and implement widespread mathematical notions;
+  and implement widespread mathematical notations;
   \item each term described by the syntax should be non-ambiguous meaning that it
-  should exists a function which associates to each term of the syntax a CIC
-  term.
+  should exists a function which associates to it a CIC term.
  \end{enumerate}
  
  These two requirements are addressed in \MATITA{} by the mean of two mechanisms
@@ -283,8 +379,16 @@ depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
  pipline of three levels: the concrete syntax level (level 0) is the one the user
  has to deal with when inserting CIC terms; the abstract syntax level (level 2)
  is an internal representation which intuitively encodes mathematical formulae at
-the content level \NOTE{rif. per\\ content}; the formal mathematics level (level
-3) is the CIC encoding of terms.
+the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
+CIC terms.
+
+\begin{figure}[ht]
+ \begin{center}
+  \includegraphics[width=0.9\textwidth]{input_phase}
+  \caption{\MATITA{} input phase}
+ \end{center}
+ \label{fig:inputphase}
+\end{figure}
  
  Requirement (1) is addressed by a built-in concrete syntax for terms, described
  in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
@@ -301,11 +405,13 @@ because some nodes of the content encoding admit more that one CIC encoding,
  invalidating requirement (2).
  
  \begin{example}
+ \label{ex:disambiguation}
  
- Consider the term \texttt{\TEXMACRO{forall} x. x + ln 1 = x}, the type of a
- lemma the user may want to prove. Assuming that both \texttt{+} and \texttt{=}
- are parsed as infix operators, all the following questions are legitimate and
- must be answered before obtaining a CIC term from its content level encoding
+ Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
+ ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
+ user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
+ as infix operators, all the following questions are legitimate and must be
+ answered before obtaining a CIC term from its content level encoding
   (Fig.~\ref{fig:inputphase}(b)):
  
   \begin{enumerate}
@@ -326,54 +432,133 @@ invalidating requirement (2).
  \end{example}
  
  In \MATITA, three \emph{sources of ambiguity} are admitted for content level
-terms: unbound identifiers, literal numbers, and literal symbols.
-
-\emph{Unbound identifiers} (question 1) are sources of ambiguity since the same
-name could have been used in the proof assistant library to represent different
-objects. \emph{Numbers} (question 2) are ambiguous since several different
-encodings of them could be provided in the calculus. Finally, \emph{symbols}
-(question 3) are ambiguous as well, since they may be used in an overloaded
-fashion to represent the application of different objects.
-
-\textbf{FINQUI, il resto \`e copy and paste dal Whelp paper \dots}
-
-Note that given a content level term with more than one sources of ambiguity,
-not all possible disambiguation choices are valid: for example, given the input
-\texttt{1+1} we must choose an interpretation of \texttt{+} which is typable in
-CIC according to the chosen interpretation for \texttt{1}; choosing as
-\texttt{+} the addition over natural numbers and as \texttt{1} the real number
-$1$ will lead to a type error.
-
-A \emph{disambiguation algorithm} takes as input an ambiguous term and return a
-fully determined CIC term. The \emph{naive disambiguation algorithm} takes as
-input an ambiguous term $t$ and proceeds as follows:
+terms: unbound identifiers, literal numbers, and operators. Each instance of
+ambiguity sources (ambiguous entity) occuring in a content level term is
+associated to a \emph{disambiguation domain}. Intuitively a disambiguation
+domain is a set of CIC terms which may be replaced for an ambiguous entity
+during disambiguation. Each item of the domain is said to be an
+\emph{interpretation} for the ambiguous entity.
+
+\emph{Unbound identifiers} (question 1) are ambiguous entities since the
+namespace of CIC objects is not flat and the same identifier may denote many
+ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
+is shared by three different theorems stating the associative property of
+different additions.  This kind of ambiguity is avoidable if the user is willing
+to use long names (in form of URIs in the \texttt{cic://} scheme) in the
+concrete syntax, with the obvious drawbacks of obtaining long and unreadable
+terms.
+
+Given an unbound identifier, the corresponding disambiguation domain is computed
+querying the library for all constants, inductive types, and inductive type
+constructors having it as their short name (see the \LOCATE{} query in
+Sect.~\ref{sec:metadata}).
+
+\emph{Literal numbers} (question 2) are ambiguous entities as well since
+different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
+different encodings. Considering the restricted example of natural numbers we
+can for instance encode them in CIC using inductive datatypes with a number of
+constructor equal to the encoding base plus 1, obtaining one encoding for each
+base.
+
+For each possible way of mapping a literal number to a CIC term, \MATITA{} is
+aware of a \emph{number intepretation function} which, when applied to the
+natural number denoted by the literal\footnote{at the moment only literal
+natural number are supported in the concrete syntax} returns a corresponding CIC
+term. The disambiguation domain for a given literal number is built applying to
+the literal all available number interpretation functions in turn.
+
+Number interpretation functions can be defined in OCaml or directly using
+\TODO{notazione per i numeri}.
+
+\emph{Operators} (question 3) are intuitively head of applications, as such they
+are always applied to a (possiblt empty) sequence of arguments. Their ambiguity
+is a need since it is often the case that some notation is used in an overloaded
+fashion to hide the use of different CIC constants which encodes similar
+concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
+notation is available building a binary \texttt{Op(+)} node, whose
+disambiguation domain may refer to different constants like the addition over
+natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
+the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
+
+For each possible way of mapping an operator application to a CIC term,
+\MATITA{} knows an \emph{operator interpretation function} which, when applied
+to an operator and its arguments, returns a CIC term. The disambiguation domain
+for a given operator is built applying to the operator and its arguments all
+available operator interpretation functions in turn.
+
+Operator interpretation functions could be added using the
+\texttt{interpretation} statement. For example, among the first line of the
+script \FILE{matita/library/logic/equality.ma} from the \MATITA{} standard
+library we read:
+
+\begin{grafite}
+interpretation "leibnitz's equality"
+ 'eq x y =
+   (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
+\end{grafite}
+
+Evaluating it in \MATITA{} will add an operator interpretation function for the
+binary operator \texttt{eq} which expands to the CIC term on the right hand side
+of the statement. That CIC term can be written using only built-in concrete
+syntax, can contain no ambiguity source; still, it can refer to operator
+arguments bound on the left hand side and can contain implicit terms (denoted
+with \texttt{\_}) which will be expanded to fresh metavariables. The latter
+feature is used in the example above for the first argument of Leibniz's
+polymorhpic equality.
+
+\subsubsection{Disambiguation algorithm}
+
+\NOTE{assumo che si sia gia' parlato di refine}
+
+A \emph{disambiguation algorithm} takes as input a content level term and return
+a fully determined CIC term. The key observation on which a disambiguation
+algorithm is based is that given a content level term with more than one sources
+of ambiguity, not all possible combination of interpretation lead to a typable
+CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
+interpretation of \texttt{ln} as a function from \IR to \IR and the
+interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
+of ``can't coexists'' in the disambiguation of \MATITA{} is inherited from the
+refiner described in Sect.~\ref{sec:metavariables}: as long as
+$\mathit{refine}(c)\neq\epsilon$, the combination of interpretation which led to
+$c$
+can coexists.
+
+The \emph{naive disambiguation algorithm} takes as input a content level term
+$t$ and proceeds as follows:
  
  \begin{enumerate}
  
   \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
    $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
-  of CIC terms.
+  of CIC terms and can be built as described above.
  
- \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$
-%  such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$
-  be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC
-  term is fully determined. Iterate over all possible interpretations of $t$ and
-  type-check them, keep only typable interpretations (i.e. interpretations that
-  determine typable terms).
+ \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ be an
+  interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC term is
+  fully determined. Iterate over all possible interpretations of $t$ and refine
+  the corresponding CIC terms, keep only interpretations which lead to CIC terms
+  $c$ s.t. $\mathit{refine}(c)\neq\epsilon$ (i.e. interpretations that determine
+  typable terms).
  
   \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
-  signal a type error. If $n=1$ we have found exactly one CIC term
-  corresponding to $t$, returns it as output of the disambiguation phase.
-  If $n>1$ let the user choose one of the $n$ interpretations and returns the
+  signal a type error. If $n=1$ we have found exactly one CIC term corresponding
+  to $t$, returns it as output of the disambiguation phase. If $n>1$ we have
+  found many different CIC terms which can correspond to the content level term,
+  let the user choose one of the $n$ interpretations and returns the
    corresponding term.
  
  \end{enumerate}
  
  The above algorithm is highly inefficient since the number of possible
  interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
-The actual algorithm used in \WHELP{} is far more efficient being, in the
+The actual algorithm used in \MATITA{} is far more efficient being, in the
  average case, linear in the number of ambiguity sources.
  
+The efficient algorithm --- thoroughly described along with an analysis of its
+complexity in~\cite{disambiguation} --- exploit the refiner and the metavariable
+extension (Sect.~\ref{sec:metavariables}) of the calculus used in \MATITA.
+
+\TODO{FINQUI}
+
  The efficient algorithm can be applied if the logic can be extended with
  metavariables and a refiner can be implemented. This is the case for CIC and
  several other logics.
@@ -422,33 +607,249 @@ Details of the disambiguation algorithm of \WHELP{} can
  be found in~\cite{disambiguation}, where an equivalent algorithm
  that avoids backtracking is also presented.
  
+
  \subsection{notazione}
  \label{sec:notation}
  \ASSIGNEDTO{zack}
  
-\subsection{libreria tutta visibile}
-\ASSIGNEDTO{csc}
+\subsection{mathml}
+\ASSIGNEDTO{zack}
  
-\subsection{ricerca e indicizzazione}
-\ASSIGNEDTO{andrea}
+\subsection{selezione semantica, cut paste, hyperlink}
+\ASSIGNEDTO{zack}
  
-\subsection{auto}
-\ASSIGNEDTO{andrea}
+\subsection{pattern}
+\ASSIGNEDTO{gares}\\
+Patterns are the textual counterpart of the MathML widget graphical
+selection.
+
+Matita benefits of a graphical interface and a powerful MathML rendering
+widget that allows the user to select pieces of the sequent he is working
+on. While this is an extremely intuitive way for the user to
+restrict the application of tactics, for example, to some subterms of the
+conclusion or some hypothesis, the way this action is recorded to the text
+script is not obvious.\\
+In \MATITA{} this issue is addressed by patterns.
+
+\subsubsection{Pattern syntax}
+A pattern is composed of two terms: a $\NT{sequent\_path}$ and a
+$\NT{wanted}$.
+The former mocks-up a sequent, discharging unwanted subterms with $?$ and
+selecting the interesting parts with the placeholder $\%$. 
+The latter is a term that lives in the context of the placeholders.
+
+The concrete syntax is reported in table \ref{tab:pathsyn}
+\NOTE{uso nomi diversi dalla grammatica ma che hanno + senso}
+\begin{table}
+ \caption{\label{tab:pathsyn} Concrete syntax of \MATITA{} patterns.\strut}
+\hrule
+\[
+\begin{array}{@{}rcll@{}}
+  \NT{pattern} & 
+    ::= & [~\verb+in match+~\NT{wanted}~]~[~\verb+in+~\NT{sequent\_path}~] & \\
+  \NT{sequent\_path} & 
+    ::= & \{~\NT{ident}~[~\verb+:+~\NT{multipath}~]~\}~
+      [~\verb+\vdash+~\NT{multipath}~] & \\
+  \NT{wanted} & ::= & \NT{term} & \\
+  \NT{multipath} & ::= & \NT{term\_with\_placeholders} & \\
+\end{array}
+\]
+\hrule
+\end{table}
  
-\subsection{xml / gestione della libreria}
+\subsubsection{How patterns work}
+Patterns mimic the user's selection in two steps. The first one
+selects roots (subterms) of the sequent, using the
+$\NT{sequent\_path}$,  while the second 
+one searches the $\NT{wanted}$ term starting from these roots. Both are
+optional steps, and by convention the empty pattern selects the whole
+conclusion.
+
+\begin{description}
+\item[Phase 1]
+  concerns only the $[~\verb+in+~\NT{sequent\_path}~]$
+  part of the syntax. $\NT{ident}$ is an hypothesis name and
+  selects the assumption where the following optional $\NT{multipath}$
+  will operate. \verb+\vdash+ can be considered the name for the goal.
+  If the whole pattern is omitted, the whole goal will be selected.
+  If one or more hypotheses names are given the selection is restricted to 
+  these assumptions. If a $\NT{multipath}$ is omitted the whole
+  assumption is selected. Remember that the user can be mostly
+  unaware of this syntax, since the system is able to write down a 
+  $\NT{sequent\_path}$ starting from a visual selection.
+  \NOTE{Questo ancora non va in matita}
+
+  A $\NT{multipath}$ is a CiC term in which a special constant $\%$
+  is allowed.
+  The roots of discharged subterms are marked with $?$, while $\%$
+  is used to select roots. The default $\NT{multipath}$, the one that
+  selects the whole term, is simply $\%$.
+  Valid $\NT{multipath}$ are, for example, $(?~\%~?)$ or $\%~\verb+\to+~(\%~?)$
+  that respectively select the first argument of an application or
+  the source of an arrow and the head of the application that is
+  found in the arrow target.
+
+  The first phase selects not only terms (roots of subterms) but also 
+  their context that will be eventually used in the second phase.
+
+\item[Phase 2] 
+  plays a role only if the $[~\verb+in match+~\NT{wanted}~]$
+  part is specified. From the first phase we have some terms, that we
+  will see as subterm roots, and their context. For each of these
+  contexts the $\NT{wanted}$ term is disambiguated in it and the
+  corresponding root is searched for a subterm $\alpha$-equivalent to
+  $\NT{wanted}$. The result of this search is the selection the
+  pattern represents.
+
+\end{description}
+
+\noindent
+Since the first step is equipotent to the composition of the two
+steps, the system uses it to represent each visual selection.
+The second step is only meant for the
+experienced user that writes patterns by hand, since it really
+helps in writing concise patterns as we will see in the
+following examples.
+
+\subsubsection{Examples}
+To explain how the first step works let's give an example. Consider
+you want to prove the uniqueness of the identity element $0$ for natural
+sum, and that you can relay on the previously demonstrated left
+injectivity of the sum, that is $inj\_plus\_l:\forall x,y,z.x+y=z+y \to x =z$.
+Typing
+\begin{grafite}
+theorem valid_name: \forall n,m. m + n = n \to m = O.
+  intros (n m H).
+\end{grafite}
+\noindent
+leads you to the following sequent 
+\sequent{
+n:nat\\
+m:nat\\
+H: m + n = n}{
+m=O
+}
+\noindent
+where you want to change the right part of the equivalence of the $H$
+hypothesis with $O + n$ and then use $inj\_plus\_l$ to prove $m=O$.
+\begin{grafite}
+  change in H:(? ? ? %) with (O + n).
+\end{grafite}
+\noindent
+This pattern, that is a simple instance of the $\NT{sequent\_path}$
+grammar entry, acts on $H$ that has type (without notation) $(eq~nat~(m+n)~n)$
+and discharges the head of the application and the first two arguments with a
+$?$ and selects the last argument with $\%$. The syntax may seem uncomfortable,
+but the user can simply select with the mouse the right part of the equivalence
+and left to the system the burden of writing down in the script file the
+corresponding pattern with $?$ and $\%$ in the right place (that is not
+trivial, expecially where implicit arguments are hidden by the notation, like
+the type $nat$ in this example).
+
+Changing all the occurrences of $n$ in the hypothesis $H$ with $O+n$ 
+works too and can be done, by the experienced user, writing directly
+a simpler pattern that uses the second phase.
+\begin{grafite}
+  change in match n in H with (O + n).
+\end{grafite}
+\noindent
+In this case the $\NT{sequent\_path}$ selects the whole $H$, while
+the second phase searches the wanted $n$ inside it by
+$\alpha$-equivalence. The resulting
+equivalence will be $m+(O+n)=O+n$ since the second phase found two
+occurrences of $n$ in $H$ and the tactic changed both.
+
+Just for completeness the second pattern is equivalent to the
+following one, that is less readable but uses only the first phase.
+\begin{grafite}
+  change in H:(? ? (? ? %) %) with (O + n).
+\end{grafite}
+\noindent
+
+\subsubsection{Tactics supporting patterns}
+In \MATITA{} all the tactics that can be restricted to subterm of the working
+sequent accept the pattern syntax. In particular these tactics are: simplify,
+change, fold, unfold, generalize, replace and rewrite.
+
+\NOTE{attualmente rewrite e fold non supportano phase 2. per
+supportarlo bisogna far loro trasformare il pattern phase1+phase2 
+in un pattern phase1only come faccio nell'ultimo esempio. lo si fa
+con una pattern\_of(select(pattern))}
+
+\subsubsection{Comparison with Coq}
+Coq has a two diffrent ways of restricting the application of tactis to
+subterms of the sequent, both relaying on the same special syntax to identify
+a term occurrence.
+
+The first way is to use this special syntax to specify directly to the
+tactic the occurrnces of a wanted term that should be affected, while
+the second is to prepare the sequent with another tactic called
+pattern and the apply the real tactic. Note that the choice is not
+left to the user, since some tactics needs the sequent to be prepared
+with pattern and do not accept directly this special syntax.
+
+The base idea is that to identify a subterm of the sequent we can
+write it and say that we want, for example, the third and the fifth
+occurce of it (counting from left to right). In our previous example,
+to change only the left part of the equivalence, the correct command
+is
+\begin{grafite}
+  change n at 2 in H with (O + n)
+\end{grafite} 
+\noindent
+meaning that in the hypothesis $H$ the $n$ we want to change is the
+second we encounter proceeding from left toright.
+
+The tactic pattern computes a
+$\beta$-expansion of a part of the sequent with respect to some
+occurrences of the given term. In the previous example the following
+command
+\begin{grafite}
+  pattern n at 2 in H
+\end{grafite}
+\noindent
+would have resulted in this sequent
+\begin{grafite}
+  n : nat
+  m : nat
+  H : (fun n0 : nat => m + n = n0) n
+  ============================
+   m = 0
+\end{grafite}
+\noindent
+where $H$ is $\beta$-expanded over the second $n$
+occurrence. This is a trick to make the unification algorithm ignore
+the head of the application (since the unification is essentially
+first-order) but normally operate on the arguments. 
+This works for some tactics, like rewrite and replace,
+but for example not for change and other tactics that do not relay on
+unification. 
+
+The idea behind this way of identifying subterms in not really far
+from the idea behind patterns, but really fails in extending to
+complex notation, since it relays on a mono-dimensional sequent representation.
+Real math notation places arguments upside-down (like in indexed sums or
+integrations) or even puts them inside a bidimensional matrix.  
+In these cases using the mouse to select the wanted term is probably the 
+only way to tell the system exactly what you want to do. 
+
+One of the goals of \MATITA{} is to use modern publishing techiques, and
+adopting a method for restricting tactics application domain that discourages 
+using heavy math notation, would definitively be a bad choice.
+
+\subsection{tatticali}
  \ASSIGNEDTO{gares}
  
-\subsection{named variable}
+\subsection{named variable e disambiguazione lazy}
  \ASSIGNEDTO{csc}
  
-\subsection{assenza di proof tree / resa in linguaggio naturale}
-\ASSIGNEDTO{andrea}
+\subsection{metavariabili}
+\label{sec:metavariables}
+\ASSIGNEDTO{csc}
  
-\subsection{selezione semantica, cut paste, hyperlink}
-\ASSIGNEDTO{zack}
+\begin{verbatim}
  
-\subsection{sostituzioni esplicite vs moduli}
-\ASSIGNEDTO{csc}
+\end{verbatim}
  
  \section{Drawbacks, missing, \dots}
  
@@ -464,14 +865,17 @@ that avoids backtracking is also presented.
  \subsection{localizzazione errori}
  \ASSIGNEDTO{}
  
-\textbf{Acknowledgements}
+\acknowledgements
  We would like to thank all the students that during the past
  five years collaborated in the \HELM{} project and contributed to 
  the development of Matita, and in particular
  A.Griggio, F.Guidi, P. Di Lena, L.Padovani, I.Schena, M.Selmi, 
  V.Tamburrelli.
  
+\theendnotes
+
  \bibliography{matita}
  
+
  \end{document}