\newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
\newcommand{\NOTE}[1]{\marginpar{\scriptsize #1}}
+\newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
\title{The Matita proof assistant}
\author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
\label{sec:disambiguation}
\ASSIGNEDTO{zack}
-%The disambiguation phase builds CIC terms from ASTs of user inputs (also called
-%\emph{ambiguous terms}). Ambiguous terms may carry three different \emph{sources
-%of ambiguity}: unbound identifiers, literal numbers, and literal symbols.
-%\emph{Unbound identifiers} are sources of ambiguity since the same name
-%could have been used to represent different objects. For example, three
-%different theorems of the \COQ{} library share the name $\mathit{plus\_assoc}$
-%(locating them is an exercise for the interested reader. Hint: use \WHELP's
-%\LOCATE{} query).
-%
-%\emph{Numbers} are ambiguous since several different encodings of them could be
-%provided in logical systems. In the \COQ{} standard library for example we found
-%naturals (in their unary encoding), positives (binary encoding), integers
-%(signed positives), and reals. Finally, \emph{symbols} (instances of the
-%\emph{binop} and \emph{unop} syntactic categories of Table~\ref{tab:syntax}) are
-%ambiguous as well: infix \texttt{+} for example is overloaded to represent
-%addition over the four different kinds of numbers available in the \COQ{}
-%standard library. Note that given a term with more than one sources of
-%ambiguity, not all possible disambiguation choices are valid: for example, given
-%the input \texttt{1+1} we must choose an interpretation of \texttt{+} which is
-%typable in CIC according to the chosen interpretation for \texttt{1}; choosing
-%as \texttt{+} the addition over natural numbers and as \texttt{1} the real
-%number $1$ will lead to a type error.
-%
-%A \emph{disambiguation algorithm} takes as input an ambiguous term and return a
-%fully determined CIC term. The \emph{naive disambiguation algorithm} takes as
-%input an ambiguous term $t$ and proceeds as follows:
-%
-%\begin{enumerate}
-%
-% \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
-% $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
-% of CIC terms.
-%
-% \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$
-%% such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$
-% be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC
-% term is fully determined. Iterate over all possible interpretations of $t$ and
-% type-check them, keep only typable interpretations (i.e. interpretations that
-% determine typable terms).
-%
-% \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
-% signal a type error. If $n=1$ we have found exactly one CIC term
-% corresponding to $t$, returns it as output of the disambiguation phase.
-% If $n>1$ let the user choose one of the $n$ interpretations and returns the
-% corresponding term.
-%
-%\end{enumerate}
-%
-%The above algorithm is highly inefficient since the number of possible
-%interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
-%The actual algorithm used in \WHELP{} is far more efficient being, in the
-%average case, linear in the number of ambiguity sources.
-%
-%The efficient algorithm can be applied if the logic can be extended with
-%metavariables and a refiner can be implemented. This is the case for CIC and
-%several other logics.
-%\emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can
-%occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes
-%a freshly created metavariable. A \emph{refiner}~\cite{mcbride} is a
-%function whose input is a term with placeholders and whose output is either a
-%new term obtained instantiating some placeholder or $\epsilon$, meaning that no
-%well typed instantiation could be found for the placeholders occurring in
-%the term (type error).
-%
-%The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i |
-%\phi_i = ?, i\in\mathit{Dom}(t)\}$,
-%which associates a fresh metavariable to each
-%source of ambiguity. Then it iterates refining the current CIC term (i.e. the
-%term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the
-%next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is
-%replacing a placeholder with one of the possible choice from the corresponding
-%disambiguation domain. The placeholder to be replaced is chosen following a
-%preorder visit of the ambiguous term. If the refinement fails the current set of
-%choices cannot lead to a well-typed term and backtracking is attempted.
-%Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no
-%longer contain any placeholder), backtracking is attempted
-%anyway to find the other correct interpretations.
-%
-%The intuition which explain why this algorithm is more efficient is that as soon
-%as a term containing placeholders is not typable, no further instantiation of
-%its placeholders could lead to a typable term. For example, during the
-%disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an
-%interpretation $\Phi_i$ is encountered which associates $?$ to the instance
-%of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on
-%the left, and the multiplication over natural numbers (\texttt{mult} for short)
-%to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural
-%argument, and no further instantiation of the placeholder will be tried.
-%
-%If, at the end of the disambiguation, more than one possible interpretations are
-%possible, the user will be asked to choose the intended one (see
-%Fig.~\ref{fig:disambiguation}).
-%
-%\begin{figure}[htb]
-%% \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}}
-% \caption{\label{fig:disambiguation} Disambiguation: interpretation choice}
-%\end{figure}
-%
-%Details of the disambiguation algorithm of \WHELP{} can
-%be found in~\cite{disambiguation}, where an equivalent algorithm
-%that avoids backtracking is also presented.
+\begin{table}
+ \caption{\label{tab:termsyn} Concrete syntax of CIC terms in \MATITA{}.\strut}
+\hrule
+\[
+\begin{array}{@{}rcll@{}}
+ \NT{term} & ::= & & \mbox{\bf terms} \\
+ & & x & \mbox{(identifier)} \\
+ & | & n & \mbox{(number)} \\
+ & | & s & \mbox{(symbol)} \\
+ & | & \mathrm{URI} & \mbox{(URI)} \\
+ & | & \verb+?+ & \mbox{(implicit)} \\
+ & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
+ & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
+ & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
+ & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
+ & | & \NT{term}~\NT{term} & \mbox{(application)} \\
+ & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
+ & | & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\
+ & & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+]
+ ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\
+ & & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\
+ & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
+ & | & \verb+(+~\NT{term}~\verb+)+ \\
+ \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\
+ & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
+ \NT{fun} & ::= & & \mbox{\bf functions} \\
+ & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
+ \NT{binder} & ::= & & \mbox{\bf binders} \\
+ & & \verb+\forall+ \mid \verb+\lambda+ \\
+ \NT{arg} & ::= & & \mbox{\bf single argument} \\
+ & & \verb+_+ \mid x \\
+ \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
+ & & \NT{arg} \\
+ & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
+ \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
+ & & \NT{arg} \\
+ & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
+ \NT{kind} & ::= & & \mbox{\bf induction kind} \\
+ & & \verb+rec+ \mid \verb+corec+ \\
+ \NT{rule} & ::= & & \mbox{\bf rules} \\
+ & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term}
+\end{array}
+\]
+\hrule
+\end{table}
+
+
+
+
+The disambiguation phase builds CIC terms from ASTs of user inputs (also called
+\emph{ambiguous terms}). Ambiguous terms may carry three different \emph{sources
+of ambiguity}: unbound identifiers, literal numbers, and literal symbols.
+\emph{Unbound identifiers} are sources of ambiguity since the same name
+could have been used to represent different objects. For example, three
+different theorems of the \COQ{} library share the name $\mathit{plus\_assoc}$
+(locating them is an exercise for the interested reader. Hint: use \WHELP's
+\LOCATE{} query).
+
+\emph{Numbers} are ambiguous since several different encodings of them could be
+provided in logical systems. In the \COQ{} standard library for example we found
+naturals (in their unary encoding), positives (binary encoding), integers
+(signed positives), and reals. Finally, \emph{symbols} (instances of the
+\emph{binop} and \emph{unop} syntactic categories of Table~\ref{tab:syntax}) are
+ambiguous as well: infix \texttt{+} for example is overloaded to represent
+addition over the four different kinds of numbers available in the \COQ{}
+standard library. Note that given a term with more than one sources of
+ambiguity, not all possible disambiguation choices are valid: for example, given
+the input \texttt{1+1} we must choose an interpretation of \texttt{+} which is
+typable in CIC according to the chosen interpretation for \texttt{1}; choosing
+as \texttt{+} the addition over natural numbers and as \texttt{1} the real
+number $1$ will lead to a type error.
+
+A \emph{disambiguation algorithm} takes as input an ambiguous term and return a
+fully determined CIC term. The \emph{naive disambiguation algorithm} takes as
+input an ambiguous term $t$ and proceeds as follows:
+
+\begin{enumerate}
+
+ \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
+ $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
+ of CIC terms.
+
+ \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$
+% such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$
+ be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC
+ term is fully determined. Iterate over all possible interpretations of $t$ and
+ type-check them, keep only typable interpretations (i.e. interpretations that
+ determine typable terms).
+
+ \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
+ signal a type error. If $n=1$ we have found exactly one CIC term
+ corresponding to $t$, returns it as output of the disambiguation phase.
+ If $n>1$ let the user choose one of the $n$ interpretations and returns the
+ corresponding term.
+
+\end{enumerate}
+
+The above algorithm is highly inefficient since the number of possible
+interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
+The actual algorithm used in \WHELP{} is far more efficient being, in the
+average case, linear in the number of ambiguity sources.
+
+The efficient algorithm can be applied if the logic can be extended with
+metavariables and a refiner can be implemented. This is the case for CIC and
+several other logics.
+\emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can
+occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes
+a freshly created metavariable. A \emph{refiner}~\cite{mcbride} is a
+function whose input is a term with placeholders and whose output is either a
+new term obtained instantiating some placeholder or $\epsilon$, meaning that no
+well typed instantiation could be found for the placeholders occurring in
+the term (type error).
+
+The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i |
+\phi_i = ?, i\in\mathit{Dom}(t)\}$,
+which associates a fresh metavariable to each
+source of ambiguity. Then it iterates refining the current CIC term (i.e. the
+term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the
+next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is
+replacing a placeholder with one of the possible choice from the corresponding
+disambiguation domain. The placeholder to be replaced is chosen following a
+preorder visit of the ambiguous term. If the refinement fails the current set of
+choices cannot lead to a well-typed term and backtracking is attempted.
+Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no
+longer contain any placeholder), backtracking is attempted
+anyway to find the other correct interpretations.
+
+The intuition which explain why this algorithm is more efficient is that as soon
+as a term containing placeholders is not typable, no further instantiation of
+its placeholders could lead to a typable term. For example, during the
+disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an
+interpretation $\Phi_i$ is encountered which associates $?$ to the instance
+of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on
+the left, and the multiplication over natural numbers (\texttt{mult} for short)
+to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural
+argument, and no further instantiation of the placeholder will be tried.
+
+If, at the end of the disambiguation, more than one possible interpretations are
+possible, the user will be asked to choose the intended one (see
+Fig.~\ref{fig:disambiguation}).
+
+\begin{figure}[htb]
+% \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}}
+ \caption{\label{fig:disambiguation} Disambiguation: interpretation choice}
+\end{figure}
+
+Details of the disambiguation algorithm of \WHELP{} can
+be found in~\cite{disambiguation}, where an equivalent algorithm
+that avoids backtracking is also presented.
\subsection{notazione}
\ASSIGNEDTO{zack}