snapshot

[helm.git] / helm / papers / matita / matita.tex
diff --git a/helm/papers/matita/matita.tex b/helm/papers/matita/matita.tex

index 8ed71abe69bee79f4b2a8763c43d8fa67fbc8abd..7b849c9285ffcd930020b59dff21706a6c87b59b 100644 (file)
--- a/helm/papers/matita/matita.tex
+++ b/helm/papers/matita/matita.tex
@@ -1,9 +1,11 @@
  \documentclass[a4paper]{llncs}
  \pagestyle{headings}
+\usepackage{color}
  \usepackage{graphicx}
  \usepackage{amssymb,amsmath}
  \usepackage{hyperref}
  \usepackage{picins}
+\usepackage{fancyvrb}
  
  %\newcommand{\logo}[3]{
  %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
@@ -17,6 +19,7 @@
  \newcommand{\IN}{\ensuremath{\mathbb{N}}}
  \newcommand{\INSTANCE}{\textsc{Instance}}
  \newcommand{\IR}{\ensuremath{\mathbb{R}}}
+\newcommand{\IZ}{\ensuremath{\mathbb{Z}}}
  \newcommand{\LIBXSLT}{LibXSLT}
  \newcommand{\LOCATE}{\textsc{Locate}}
  \newcommand{\MATCH}{\textsc{Match}}
@@ -33,9 +36,22 @@
  \newcommand{\UWOBO}{UWOBO}
  \newcommand{\WHELP}{Whelp}
  
+\definecolor{gray}{gray}{0.85} % 1 -> white; 0 -> black
+\newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
+\newcommand{\URI}[1]{\texttt{#1}}
+
+%{\end{SaveVerbatim}\setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{2mm}%
+\newenvironment{grafite}{\VerbatimEnvironment
+ \begin{SaveVerbatim}{boxtmp}}%
+ {\end{SaveVerbatim}\setlength{\fboxsep}{3mm}%
+  \begin{center}
+   \fcolorbox{black}{gray}{\BUseVerbatim[boxwidth=0.9\linewidth]{boxtmp}}
+  \end{center}}
+
  \newcommand{\ASSIGNEDTO}[1]{\textbf{Assigned to:} #1}
+\newcommand{\FILE}[1]{\texttt{#1}}
  \newcommand{\NOTE}[1]{\marginpar{\scriptsize #1}}
-\newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
+\newcommand{\TODO}[1]{\textbf{TODO: #1}}
  
  \title{The Matita proof assistant}
  \author{Andrea Asperti, Claudio Sacerdoti Coen, Enrico Tassi
@@ -197,6 +213,7 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}}
  \ASSIGNEDTO{zack}
  
  \subsection{metavariabili}
+\label{sec:metavariables}
  \ASSIGNEDTO{csc}
  
  \subsection{pattern}
@@ -210,7 +227,8 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}}
  \ASSIGNEDTO{zack}
  
  \begin{table}
- \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in notation\strut}
+ \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in
+ notation\strut}
  \hrule
  \[
  \begin{array}{@{}rcll@{}}
@@ -219,7 +237,7 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}}
      &  |  & n & \mbox{(number)} \\
      &  |  & s & \mbox{(symbol)} \\
      &  |  & \mathrm{URI} & \mbox{(URI)} \\
-    &  |  & \verb+?+ & \mbox{(implicit)} \\
+    &  |  & \verb+_+ & \mbox{(implicit)}\TODO{sync} \\
      &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
      &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
      &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
@@ -258,7 +276,7 @@ reduce our code in sensible way).\NOTE{righe\\\COQ{}}
  \subsubsection{Term input}
  
  The primary form of user interaction employed by \MATITA{} is textual script
-editing: the user can modifies it and evaluate step by step its composing
+editing: the user modifies it and evaluate step by step its composing
  \emph{statements}. Examples of statements are inductive type definitions,
  theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
  be used to ask the system to refine a given term and pretty print the result).
@@ -270,10 +288,9 @@ Two of the requirements in the design of such a syntax are apparently in
  contrast:
  \begin{enumerate}
   \item the syntax should be as close as possible to common mathematical practice
-  and implement widespread mathematical notions;
+  and implement widespread mathematical notations;
   \item each term described by the syntax should be non-ambiguous meaning that it
-  should exists a function which associates to each term of the syntax a CIC
-  term.
+  should exists a function which associates to it a CIC term.
  \end{enumerate}
  
  These two requirements are addressed in \MATITA{} by the mean of two mechanisms
@@ -283,8 +300,16 @@ depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
  pipline of three levels: the concrete syntax level (level 0) is the one the user
  has to deal with when inserting CIC terms; the abstract syntax level (level 2)
  is an internal representation which intuitively encodes mathematical formulae at
-the content level \NOTE{rif. per\\ content}; the formal mathematics level (level
-3) is the CIC encoding of terms.
+the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
+CIC terms.
+
+\begin{figure}[ht]
+ \begin{center}
+  \includegraphics[width=0.9\textwidth]{input_phase}
+  \caption{\MATITA{} input phase}
+ \end{center}
+ \label{fig:inputphase}
+\end{figure}
  
  Requirement (1) is addressed by a built-in concrete syntax for terms, described
  in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
@@ -301,11 +326,13 @@ because some nodes of the content encoding admit more that one CIC encoding,
  invalidating requirement (2).
  
  \begin{example}
+ \label{ex:disambiguation}
  
- Consider the term \texttt{\TEXMACRO{forall} x. x + ln 1 = x}, the type of a
- lemma the user may want to prove. Assuming that both \texttt{+} and \texttt{=}
- are parsed as infix operators, all the following questions are legitimate and
- must be answered before obtaining a CIC term from its content level encoding
+ Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
+ ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
+ user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
+ as infix operators, all the following questions are legitimate and must be
+ answered before obtaining a CIC term from its content level encoding
   (Fig.~\ref{fig:inputphase}(b)):
  
   \begin{enumerate}
@@ -326,54 +353,132 @@ invalidating requirement (2).
  \end{example}
  
  In \MATITA, three \emph{sources of ambiguity} are admitted for content level
-terms: unbound identifiers, literal numbers, and literal symbols.
-
-\emph{Unbound identifiers} (question 1) are sources of ambiguity since the same
-name could have been used in the proof assistant library to represent different
-objects. \emph{Numbers} (question 2) are ambiguous since several different
-encodings of them could be provided in the calculus. Finally, \emph{symbols}
-(question 3) are ambiguous as well, since they may be used in an overloaded
-fashion to represent the application of different objects.
-
-\textbf{FINQUI, il resto \`e copy and paste dal Whelp paper \dots}
-
-Note that given a content level term with more than one sources of ambiguity,
-not all possible disambiguation choices are valid: for example, given the input
-\texttt{1+1} we must choose an interpretation of \texttt{+} which is typable in
-CIC according to the chosen interpretation for \texttt{1}; choosing as
-\texttt{+} the addition over natural numbers and as \texttt{1} the real number
-$1$ will lead to a type error.
-
-A \emph{disambiguation algorithm} takes as input an ambiguous term and return a
-fully determined CIC term. The \emph{naive disambiguation algorithm} takes as
-input an ambiguous term $t$ and proceeds as follows:
+terms: unbound identifiers, literal numbers, and operators. Each instance of
+ambiguity sources (ambiguous entity) occuring in a content level term is
+associated to a \emph{disambiguation domain}. Intuitively a disambiguation
+domain is a set of CIC terms which may be replaced for an ambiguous entity
+during disambiguation. Each item of the domain is said to be an
+\emph{interpretation} for the ambiguous entity.
+
+\emph{Unbound identifiers} (question 1) are ambiguous entities since the
+namespace of CIC objects is not flat and the same identifier may denote many
+ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
+is shared by three different theorems stating the associative property of
+different additions.  This kind of ambiguity is avoidable if the user is willing
+to use long names (in form of URIs in the \texttt{cic://} scheme) in the
+concrete syntax, with the obvious drawbacks of obtaining long and unreadable
+terms.
+
+Given an unbound identifier, the corresponding disambiguation domain is computed
+querying the library for all constants, inductive types, and inductive type
+constructors having it as their short name (see the \LOCATE{} query in
+Sect.~\ref{sec:metadata}).
+
+\emph{Literal numbers} (question 2) are ambiguous entities as well since
+different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
+different encodings. Considering the restricted example of natural numbers we
+can for instance encode them in CIC using inductive datatypes with a number of
+constructor equal to the encoding base plus 1, obtaining one encoding for each
+base.
+
+For each possible way of mapping a literal number to a CIC term, \MATITA{} is
+aware of a \emph{number intepretation function} which, when applied to the
+natural number denoted by the literal\footnote{at the moment only literal
+natural number are supported in the concrete syntax} returns a corresponding CIC
+term. The disambiguation domain for a given literal number is built applying to
+the literal all available number interpretation functions in turn.
+
+Number interpretation functions can be defined in OCaml or directly using
+\TODO{notazione per i numeri}.
+
+\emph{Operators} (question 3) are intuitively head of applications, as such they
+are always applied to a non empty sequence of arguments. Their ambiguity is a
+need since it is often the case that some notation is used in an overloaded
+fashion to hide the use of different CIC constants which encodes similar
+concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
+notation is available building a binary \texttt{Op(+)} node, whose
+disambiguation domain may refer to different constants like the addition over
+natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
+the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
+
+For each possible way of mapping an operator application to a CIC term,
+\MATITA{} knows an \emph{operator interpretation function} which, when applied
+to an operator and its arguments, returns a CIC term. The disambiguation domain
+for a given operator is built applying to the operator and its arguments all
+available operator interpretation functions in turn.
+
+Operator interpretation functions could be added using the
+\texttt{interpretation} statement. For example, among the first line of the
+script \FILE{matita/library/logic/equality.ma} from the \MATITA{} standard
+library we read:
+
+\begin{grafite}
+interpretation "leibnitz's equality"
+ 'eq x y =
+   (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
+\end{grafite}
+
+Evaluating it in \MATITA{} will add an operator interpretation function for the
+binary operator \texttt{eq} which expands to the CIC term on the right hand side
+of the statement. That CIC term can be written using only built-in concrete
+syntax, can contain no ambiguity source; still, it can refer to operator
+arguments bound on the left hand side and can contain implicit terms (denoted
+with \texttt{\_}) which will be expanded to fresh metavariables. The latter
+feature is used in the example above for the first argument of Leibniz's
+polymorhpic equality.
+
+\subsubsection{Disambiguation algorithm}
+
+\NOTE{assumo\\
+      che si sia\\
+      gia' parlato\\
+      di refine}
+
+
+A \emph{disambiguation algorithm} takes as input a content level term and return
+a fully determined CIC term. The key observation on which a disambiguation
+algorithm is based is that given a content level term with more than one sources
+of ambiguity, not all possible combination of interpretation lead to a typable
+CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
+interpretation of \texttt{ln} as a function from \IR to \IR and the
+interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
+of ``can't coexists'' in the disambiguation of \MATITA{} is inherited from the
+refiner described in Sect.~\ref{sec:metavariables}: as long as
+$\mathit{refine}(c)\neq\bot$, the combination of interpretation which led to $c$
+can coexists.
+
+The \emph{naive disambiguation algorithm} takes as input a content level term
+$t$ and proceeds as follows:
  
  \begin{enumerate}
  
   \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
    $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
-  of CIC terms.
+  of CIC terms and can be built as described above.
  
- \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$
-%  such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$
-  be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC
-  term is fully determined. Iterate over all possible interpretations of $t$ and
-  type-check them, keep only typable interpretations (i.e. interpretations that
-  determine typable terms).
+ \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$ be an
+  interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC term is
+  fully determined. Iterate over all possible interpretations of $t$ and refine
+  the corresponding CIC terms, keep only interpretations which lead to CIC terms
+  $c$ s.t. $\mathit{refine}(c)\neq\bot$ (i.e. interpretations that determine
+  typable terms).
  
   \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
-  signal a type error. If $n=1$ we have found exactly one CIC term
-  corresponding to $t$, returns it as output of the disambiguation phase.
-  If $n>1$ let the user choose one of the $n$ interpretations and returns the
+  signal a type error. If $n=1$ we have found exactly one CIC term corresponding
+  to $t$, returns it as output of the disambiguation phase. If $n>1$ we have
+  found many different CIC terms which can correspond to the content level term,
+  let the user choose one of the $n$ interpretations and returns the
    corresponding term.
  
  \end{enumerate}
  
  The above algorithm is highly inefficient since the number of possible
  interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
-The actual algorithm used in \WHELP{} is far more efficient being, in the
+The actual algorithm used in \MATITA{} is far more efficient being, in the
  average case, linear in the number of ambiguity sources.
  
+\TODO{FINQUI}
+
  The efficient algorithm can be applied if the logic can be extended with
  metavariables and a refiner can be implemented. This is the case for CIC and
  several other logics.
@@ -430,6 +535,7 @@ that avoids backtracking is also presented.
  \ASSIGNEDTO{csc}
  
  \subsection{ricerca e indicizzazione}
+\label{sec:metadata}
  \ASSIGNEDTO{andrea}
  
  \subsection{auto}