From edad5bdc005e8004116ef10ac3cf7b0c9d92c3b6 Mon Sep 17 00:00:00 2001 From: Stefano Zacchiroli Date: Tue, 15 Nov 2005 14:40:09 +0000 Subject: [PATCH] cic concrete syntax (w/o notation) --- helm/papers/matita/matita.tex | 250 ++++++++++++++++++++-------------- 1 file changed, 150 insertions(+), 100 deletions(-) diff --git a/helm/papers/matita/matita.tex b/helm/papers/matita/matita.tex index 4dab63361..e298dad21 100644 --- a/helm/papers/matita/matita.tex +++ b/helm/papers/matita/matita.tex @@ -35,6 +35,7 @@ \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1} \newcommand{\NOTE}[1]{\marginpar{\scriptsize #1}} +\newcommand{\NT}[1]{\langle\mathit{#1}\rangle} \title{The Matita proof assistant} \author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi @@ -207,106 +208,155 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}} \label{sec:disambiguation} \ASSIGNEDTO{zack} -%The disambiguation phase builds CIC terms from ASTs of user inputs (also called -%\emph{ambiguous terms}). Ambiguous terms may carry three different \emph{sources -%of ambiguity}: unbound identifiers, literal numbers, and literal symbols. -%\emph{Unbound identifiers} are sources of ambiguity since the same name -%could have been used to represent different objects. For example, three -%different theorems of the \COQ{} library share the name $\mathit{plus\_assoc}$ -%(locating them is an exercise for the interested reader. Hint: use \WHELP's -%\LOCATE{} query). -% -%\emph{Numbers} are ambiguous since several different encodings of them could be -%provided in logical systems. In the \COQ{} standard library for example we found -%naturals (in their unary encoding), positives (binary encoding), integers -%(signed positives), and reals. Finally, \emph{symbols} (instances of the -%\emph{binop} and \emph{unop} syntactic categories of Table~\ref{tab:syntax}) are -%ambiguous as well: infix \texttt{+} for example is overloaded to represent -%addition over the four different kinds of numbers available in the \COQ{} -%standard library. Note that given a term with more than one sources of -%ambiguity, not all possible disambiguation choices are valid: for example, given -%the input \texttt{1+1} we must choose an interpretation of \texttt{+} which is -%typable in CIC according to the chosen interpretation for \texttt{1}; choosing -%as \texttt{+} the addition over natural numbers and as \texttt{1} the real -%number $1$ will lead to a type error. -% -%A \emph{disambiguation algorithm} takes as input an ambiguous term and return a -%fully determined CIC term. The \emph{naive disambiguation algorithm} takes as -%input an ambiguous term $t$ and proceeds as follows: -% -%\begin{enumerate} -% -% \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where -% $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set -% of CIC terms. -% -% \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ -%% such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$ -% be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC -% term is fully determined. Iterate over all possible interpretations of $t$ and -% type-check them, keep only typable interpretations (i.e. interpretations that -% determine typable terms). -% -% \item Let $n$ be the number of interpretations who survived step 2. If $n=0$ -% signal a type error. If $n=1$ we have found exactly one CIC term -% corresponding to $t$, returns it as output of the disambiguation phase. -% If $n>1$ let the user choose one of the $n$ interpretations and returns the -% corresponding term. -% -%\end{enumerate} -% -%The above algorithm is highly inefficient since the number of possible -%interpretations $\Phi$ grows exponentially with the number of ambiguity sources. -%The actual algorithm used in \WHELP{} is far more efficient being, in the -%average case, linear in the number of ambiguity sources. -% -%The efficient algorithm can be applied if the logic can be extended with -%metavariables and a refiner can be implemented. This is the case for CIC and -%several other logics. -%\emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can -%occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes -%a freshly created metavariable. A \emph{refiner}~\cite{mcbride} is a -%function whose input is a term with placeholders and whose output is either a -%new term obtained instantiating some placeholder or $\epsilon$, meaning that no -%well typed instantiation could be found for the placeholders occurring in -%the term (type error). -% -%The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i | -%\phi_i = ?, i\in\mathit{Dom}(t)\}$, -%which associates a fresh metavariable to each -%source of ambiguity. Then it iterates refining the current CIC term (i.e. the -%term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the -%next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is -%replacing a placeholder with one of the possible choice from the corresponding -%disambiguation domain. The placeholder to be replaced is chosen following a -%preorder visit of the ambiguous term. If the refinement fails the current set of -%choices cannot lead to a well-typed term and backtracking is attempted. -%Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no -%longer contain any placeholder), backtracking is attempted -%anyway to find the other correct interpretations. -% -%The intuition which explain why this algorithm is more efficient is that as soon -%as a term containing placeholders is not typable, no further instantiation of -%its placeholders could lead to a typable term. For example, during the -%disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an -%interpretation $\Phi_i$ is encountered which associates $?$ to the instance -%of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on -%the left, and the multiplication over natural numbers (\texttt{mult} for short) -%to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural -%argument, and no further instantiation of the placeholder will be tried. -% -%If, at the end of the disambiguation, more than one possible interpretations are -%possible, the user will be asked to choose the intended one (see -%Fig.~\ref{fig:disambiguation}). -% -%\begin{figure}[htb] -%% \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}} -% \caption{\label{fig:disambiguation} Disambiguation: interpretation choice} -%\end{figure} -% -%Details of the disambiguation algorithm of \WHELP{} can -%be found in~\cite{disambiguation}, where an equivalent algorithm -%that avoids backtracking is also presented. +\begin{table} + \caption{\label{tab:termsyn} Concrete syntax of CIC terms in \MATITA{}.\strut} +\hrule +\[ +\begin{array}{@{}rcll@{}} + \NT{term} & ::= & & \mbox{\bf terms} \\ + & & x & \mbox{(identifier)} \\ + & | & n & \mbox{(number)} \\ + & | & s & \mbox{(symbol)} \\ + & | & \mathrm{URI} & \mbox{(URI)} \\ + & | & \verb+?+ & \mbox{(implicit)} \\ + & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\ + & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\ + & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\ + & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\ + & | & \NT{term}~\NT{term} & \mbox{(application)} \\ + & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\ + & | & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\ + & & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+] + ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\ + & & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\ + & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\ + & | & \verb+(+~\NT{term}~\verb+)+ \\ + \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\ + & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\ + \NT{fun} & ::= & & \mbox{\bf functions} \\ + & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\ + \NT{binder} & ::= & & \mbox{\bf binders} \\ + & & \verb+\forall+ \mid \verb+\lambda+ \\ + \NT{arg} & ::= & & \mbox{\bf single argument} \\ + & & \verb+_+ \mid x \\ + \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\ + \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\ + \NT{kind} & ::= & & \mbox{\bf induction kind} \\ + & & \verb+rec+ \mid \verb+corec+ \\ + \NT{rule} & ::= & & \mbox{\bf rules} \\ + & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} +\end{array} +\] +\hrule +\end{table} + + + + +The disambiguation phase builds CIC terms from ASTs of user inputs (also called +\emph{ambiguous terms}). Ambiguous terms may carry three different \emph{sources +of ambiguity}: unbound identifiers, literal numbers, and literal symbols. +\emph{Unbound identifiers} are sources of ambiguity since the same name +could have been used to represent different objects. For example, three +different theorems of the \COQ{} library share the name $\mathit{plus\_assoc}$ +(locating them is an exercise for the interested reader. Hint: use \WHELP's +\LOCATE{} query). + +\emph{Numbers} are ambiguous since several different encodings of them could be +provided in logical systems. In the \COQ{} standard library for example we found +naturals (in their unary encoding), positives (binary encoding), integers +(signed positives), and reals. Finally, \emph{symbols} (instances of the +\emph{binop} and \emph{unop} syntactic categories of Table~\ref{tab:syntax}) are +ambiguous as well: infix \texttt{+} for example is overloaded to represent +addition over the four different kinds of numbers available in the \COQ{} +standard library. Note that given a term with more than one sources of +ambiguity, not all possible disambiguation choices are valid: for example, given +the input \texttt{1+1} we must choose an interpretation of \texttt{+} which is +typable in CIC according to the chosen interpretation for \texttt{1}; choosing +as \texttt{+} the addition over natural numbers and as \texttt{1} the real +number $1$ will lead to a type error. + +A \emph{disambiguation algorithm} takes as input an ambiguous term and return a +fully determined CIC term. The \emph{naive disambiguation algorithm} takes as +input an ambiguous term $t$ and proceeds as follows: + +\begin{enumerate} + + \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where + $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set + of CIC terms. + + \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ +% such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$ + be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC + term is fully determined. Iterate over all possible interpretations of $t$ and + type-check them, keep only typable interpretations (i.e. interpretations that + determine typable terms). + + \item Let $n$ be the number of interpretations who survived step 2. If $n=0$ + signal a type error. If $n=1$ we have found exactly one CIC term + corresponding to $t$, returns it as output of the disambiguation phase. + If $n>1$ let the user choose one of the $n$ interpretations and returns the + corresponding term. + +\end{enumerate} + +The above algorithm is highly inefficient since the number of possible +interpretations $\Phi$ grows exponentially with the number of ambiguity sources. +The actual algorithm used in \WHELP{} is far more efficient being, in the +average case, linear in the number of ambiguity sources. + +The efficient algorithm can be applied if the logic can be extended with +metavariables and a refiner can be implemented. This is the case for CIC and +several other logics. +\emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can +occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes +a freshly created metavariable. A \emph{refiner}~\cite{mcbride} is a +function whose input is a term with placeholders and whose output is either a +new term obtained instantiating some placeholder or $\epsilon$, meaning that no +well typed instantiation could be found for the placeholders occurring in +the term (type error). + +The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i | +\phi_i = ?, i\in\mathit{Dom}(t)\}$, +which associates a fresh metavariable to each +source of ambiguity. Then it iterates refining the current CIC term (i.e. the +term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the +next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is +replacing a placeholder with one of the possible choice from the corresponding +disambiguation domain. The placeholder to be replaced is chosen following a +preorder visit of the ambiguous term. If the refinement fails the current set of +choices cannot lead to a well-typed term and backtracking is attempted. +Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no +longer contain any placeholder), backtracking is attempted +anyway to find the other correct interpretations. + +The intuition which explain why this algorithm is more efficient is that as soon +as a term containing placeholders is not typable, no further instantiation of +its placeholders could lead to a typable term. For example, during the +disambiguation of user input \texttt{\TEXMACRO{forall} x. x*0 = 0}, an +interpretation $\Phi_i$ is encountered which associates $?$ to the instance +of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on +the left, and the multiplication over natural numbers (\texttt{mult} for short) +to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural +argument, and no further instantiation of the placeholder will be tried. + +If, at the end of the disambiguation, more than one possible interpretations are +possible, the user will be asked to choose the intended one (see +Fig.~\ref{fig:disambiguation}). + +\begin{figure}[htb] +% \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}} + \caption{\label{fig:disambiguation} Disambiguation: interpretation choice} +\end{figure} + +Details of the disambiguation algorithm of \WHELP{} can +be found in~\cite{disambiguation}, where an equivalent algorithm +that avoids backtracking is also presented. \subsection{notazione} \ASSIGNEDTO{zack} -- 2.39.2