minimal introduction

[helm.git] / helm / ocaml / cic_notation / doc / body.tex
diff --git a/helm/ocaml/cic_notation/doc/body.tex b/helm/ocaml/cic_notation/doc/body.tex

index f1ce73577174befcea58cf010f3784b735899b16..fef547e1d0da321fb58abcb1b5a043937e14f5e2 100644 (file)
--- a/helm/ocaml/cic_notation/doc/body.tex
+++ b/helm/ocaml/cic_notation/doc/body.tex
@@ -1,21 +1,209 @@
  
-\section{Environment}
+\section{Introduction}
  
-\[
-\begin{array}{rcll}
-  V & ::= & & \mbox{(\bf values)} \\
-    &     & \verb+Term+~T & \mbox{(term)} \\
-    &  |  & \verb+String+~s & \mbox{(string)} \\
-    &  |  & \verb+Number+~n & \mbox{(number)} \\
-    &  |  & \verb+None+ & \mbox{(optional value)} \\
-    &  |  & \verb+Some+~V & \mbox{(optional value)} \\
-    &  |  & [V_1,\dots,V_n] & \mbox{(list value)} \\[2ex]
-\end{array}
-\]
+Mathematical notation plays a fundamental role in mathematical practice: it
+helps expressing in a concise symbolic fashion concepts of arbitrary complexity.
+Its use in proof assistants like \MATITA{} is no exception. Formal mathematics
+indeed often impose to encode mathematical concepts at a very high level of
+details (e.g. Peano numbers, implicit arguments) having a restricted toolbox of
+syntactic constructions in the calculus.
  
-An environment is a map $\mathcal E : \mathit{Name} -> V$.
+Consider for example one of the point reached while proving the distributivity
+of times over minus on natural numbers included in the \MATITA{} standards
+library. (Part of) the reached sequent can be seen in \MATITA{} both using the
+notation for various arithmetical and relational operator or without using it.
+The sequent rendered without using notation looks as follows:
  
-\section{Level 1: concrete syntax}
+\sequent{
+\mathtt{H}: \mathtt{le} z y\\
+\mathtt{Hcut}: \mathtt{eq} \mathtt{nat} (\mathtt{plus} (\mathtt{times} x (\mathtt{minus}
+y z)) (\mathtt{times} x z))\\
+(\mathtt{plus} (\mathtt{minus} (\mathtt{times} x y) (\mathtt{times} x z))
+(\mathtt{times} x z))}{
+\mathtt{eq} \mathtt{nat} (\mathtt{times} x (\mathtt{minus} y z)) (\mathtt{minus}
+(\mathtt{times} x y) (\mathtt{times} x z))}
+
+while the corresponding sequent rendered with notation enabled looks:
+
+\sequent{
+H: z\leq y\\
+Hcut: x*(y-z)+x*z=x*y-x*z+x*z}{
+x*(y-z)=x*y-x*z}
+
+The latter representation is evidently more readable than the former helping
+users both in concentrating on the key aspects of the proof (namely on choosing
+the right strategy to proceed in the proof) and in reducing the amount of input
+that need to be provided to the system when term input is required (assuming the
+exists a correspondence among the rendered output and the textual input syntax
+used by the user, as it happens in \MATITA).
+
+In this section we present the \emph{extensible notation} mechanism implemented
+in \MATITA. Its role may be looked at from two different point of view: the term
+input phase and the term output --- or rendering --- phase. We arbitrarly
+decided to call the former view ``from the left'' and the latter ``from the
+right''. Looking from the point of view of the input phase it offers a mechanism
+of dynamic extension of the term grammar enabling the user to define fancy
+mathematical notations.  Looking from the point of view of rendering it enable
+the reconstruction of such notations from CIC term and its rendering to various
+presentation languages (at the time of writing supported languages are MathML
+Presentation and the \MATITA{} concrete syntax for terms).
+
+If you're wondering why the notation mechanisms need to be ``extensible'', the
+answer lays in how notation is used in the development of formal mathematics
+with proof assistants. When doing ordinary (i.e. non automatically checkable by
+the mean of a proof checker) mathematics, notation is often confused with the
+mathematical objects being defined. ``+'' may be thought as \emph{the} addition
+(and is often termed as such in mathematical textbooks!), but is rather the
+notation for one particolar kind of addition which may possibly be used in an
+overloaded fashion elsewhere. When doing formal mathematics the difference is
+tangible and users has to deal separately with all the actions we skimmed
+through:
+
+\begin{enumerate}
+
+ \item definition of mathematical objects (e.g. addition over Peano numbers
+  using the primitive recursion scheme);
+
+ \item definition of new mathematical notation (e.g. infix use of the $+$ symbol
+  as in $x + 3$);
+
+ \item (incremental) definition of the meanings of a given notation (e.g. the
+  use of the notation of (2) above for denoting the addition of (1)).
+
+\end{enumerate}
+
+Since all the points above are part of everyday life of proof assistants users
+we know that mathematical notation in the system will change and we can't
+provide a ``one-size fits all'' solution as is done for instance in mainstream
+programming languages mathematical notation. For this reason \MATITA{} supports
+all the above actions in a coherent manner in both term input and output.
+
+\section{Looking from the left: term input}
+
+\subsubsection{\MATITA{} input phase}
+
+ \begin{table}
+  \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in
+  notation\strut}
+ \hrule
+ \[
+ \begin{array}{@{}rcll@{}}
+   \NT{term} & ::= & & \mbox{\bf terms} \\
+     &     & x & \mbox{(identifier)} \\
+     &  |  & n & \mbox{(number)} \\
+     &  |  & s & \mbox{(symbol)} \\
+     &  |  & \mathrm{URI} & \mbox{(URI)} \\
+     &  |  & \verb+_+ & \mbox{(implicit)} \\
+     &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
+     &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
+     &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
+     &  |  & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
+     &  |  & \NT{term}~\NT{term} & \mbox{(application)} \\
+     &  |  & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
+     &  |  & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\
+     &     & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+]
+             ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\
+     &     & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\
+     &  |  & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
+     &  |  & \verb+(+~\NT{term}~\verb+)+ \\
+   \NT{defs}  & ::= & & \mbox{\bf mutual definitions} \\
+     &     & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
+   \NT{fun} & ::= & & \mbox{\bf functions} \\
+     &     & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
+   \NT{binder} & ::= & & \mbox{\bf binders} \\
+     &     & \verb+\forall+ \mid \verb+\lambda+ \\
+   \NT{arg} & ::= & & \mbox{\bf single argument} \\
+     &     & \verb+_+ \mid x \\
+   \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
+     &     & \NT{arg} \\
+     &  |  & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
+   \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
+     &     & \NT{arg} \\
+     &  |  & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
+   \NT{kind} & ::= & & \mbox{\bf induction kind} \\
+     &     & \verb+rec+ \mid \verb+corec+ \\
+   \NT{rule} & ::= & & \mbox{\bf rules} \\
+     &     & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term}
+ \end{array}
+ \]
+ \hrule
+ \end{table}
+
+The primary form of user interaction employed by \MATITA{} is textual script
+editing: the user modifies it and evaluate step by step its composing
+\emph{statements}. Examples of statements are inductive type definitions,
+theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can
+be used to ask the system to refine a given term and pretty print the result).
+Since many statements refer to terms of the underlying calculus, \MATITA{} needs
+a concrete syntax able to encode terms of the Calculus of Inductive
+Constructions.
+
+Two of the requirements in the design of such a syntax are apparently in
+contrast:
+
+\begin{enumerate}
+
+ \item the syntax should be as close as possible to common mathematical practice
+  and implement widespread mathematical notations;
+
+ \item each term described by the syntax should be non-ambiguous meaning that it
+  should exists a function which associates to it a CIC term.
+
+\end{enumerate}
+
+These two requirements are addressed in \MATITA{} by the mean of two mechanisms
+which work together: \emph{term disambiguation} and \emph{extensible notation}.
+Their interaction is visible in the architecture of the \MATITA{} input phase,
+depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a
+pipeline of three levels: the concrete syntax level (level 0) is the one the
+user has to deal with when inserting CIC terms; the abstract syntax level (level
+2) is an internal representation which intuitively encodes mathematical formulae
+at the content level~\cite{adams}\cite{mkm-structure}; the last level is that of
+CIC terms.
+
+\begin{figure}[ht]
+ \begin{center}
+  \includegraphics[width=0.9\textwidth]{input_phase}
+  \caption{\MATITA{} input phase}
+ \end{center}
+ \label{fig:inputphase}
+\end{figure}
+
+Requirement (1) is addressed by a built-in concrete syntax for terms, described
+in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a
+way for extending available mathematical notations and providing a parser for
+the extended notation. Requirement (2) is addressed by the conjunct action of
+that parsing function and disambiguation which provides a function from content
+level terms to CIC terms.
+
+\subsubsection{From concrete syntax to content level}
+
+Content level terms are instances of what are commonly referred as Abstract
+Syntax Trees (ASTs) in compilers literature. In this respect the mapping from
+concrete syntax fo content level is nothing more than the pipelined application
+of a lexer and a parser to the characters that form terms at the concrete syntax
+level.
+
+The plus offered by the notation mechanisms is the ability to dinamically extend
+the parsing rules which build abstract syntax tree from stream of lexer tokens.
+For example, in the standard library of \MATITA{} we found the following
+statements which define the notation used for the ``+'' infix operator.
+
+\begin{example}
+\begin{Verbatim}
+ notation "a + b" 
+   left associative with precedence 50
+ for @{ 'plus $a $b }.
+\end{Verbatim}
+\end{example}
+
+The meaning of such a statement is to declare a bidirectional
+mapping\footnote{in this section we only deal with the left to right part of the
+mapping, but it is actually bidirectional} between a concrete syntax pattern
+(the part of the statement inside double quotes) and a content level pattern
+(the part of the statement which follows \texttt{for}). The syntax of concrete
+syntax patterns and content level patterns can be found in Tab.~\ref{tab:l1c}
+and Tab.~\ref{tab:l2c} respectively.
  
  \begin{table}
  \caption{\label{tab:l1c} Concrete syntax of level 1 patterns.\strut}
@@ -52,14 +240,6 @@ An environment is a map $\mathcal E : \mathit{Name} -> V$.
  \hrule
  \end{table}
  
-Rationale: while the layout schemata can occur in the concrete syntax
-used by user, the box schemata and the magic patterns can only occur
-when defining the notation. This is why the layout schemata are
-``escaped'' with a backslash, so that they cannot be confused with
-plain identifiers, wherease the others are not. Alternatively, they
-could be defined as keywords, but this would prevent their names to be
-used in different contexts.
-
  \begin{table}
  \caption{\label{tab:l1a} Abstract syntax of level 1 terms and patterns.\strut}
  \hrule
@@ -96,6 +276,320 @@ used in different contexts.
  \hrule
  \end{table}
  
+\begin{table}
+\caption{\label{tab:synl2} Concrete syntax of level 2 patterns.\strut}
+\hrule
+\[
+\begin{array}{@{}rcll@{}}
+  \NT{term} & ::= & & \mbox{\bf terms} \\
+    &     & x & \mbox{(identifier)} \\
+    &  |  & n & \mbox{(number)} \\
+    &  |  & s & \mbox{(symbol)} \\
+    &  |  & \mathrm{URI} & \mbox{(URI)} \\
+    &  |  & \verb+?+ & \mbox{(implicit)} \\
+    &  |  & \verb+%+ & \mbox{(placeholder)} \\
+    &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
+    &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
+    &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
+    &  |  & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
+    &  |  & \NT{term}~\NT{term} & \mbox{(application)} \\
+    &  |  & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
+    &  |  & [\verb+[+~\NT{term}~\verb+]+]~\verb+match+~\NT{term}~\verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \mbox{(pattern match)} \\
+    &  |  & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
+    &  |  & \verb+(+~\NT{term}~\verb+)+ \\
+    &  |  & \BLOB(\NT{meta},\dots,\NT{meta}) & \mbox{(meta blob)} \\
+  \NT{defs}  & ::= & & \mbox{\bf mutual definitions} \\
+    &     & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
+  \NT{fun} & ::= & & \mbox{\bf functions} \\
+    &     & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
+  \NT{binder} & ::= & & \mbox{\bf binders} \\
+    &     & \verb+\Pi+ \mid \verb+\exists+ \mid \verb+\forall+ \mid \verb+\lambda+ \\
+  \NT{arg} & ::= & & \mbox{\bf single argument} \\
+    &     & \verb+_+ \mid x \mid \BLOB(\NT{meta},\dots,\NT{meta}) \\
+  \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
+    &     & \NT{arg} \\
+    &  |  & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
+  \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
+    &     & \NT{arg} \\
+    &  |  & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
+  \NT{kind} & ::= & & \mbox{\bf induction kind} \\
+    &     & \verb+rec+ \mid \verb+corec+ \\
+  \NT{rule} & ::= & & \mbox{\bf rules} \\
+    &     & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} \\[10ex]
+
+  \NT{meta} & ::= & & \mbox{\bf meta} \\
+    &     & \BLOB(\NT{term},\dots,\NT{term}) & \mbox{(term blob)} \\
+    &  |  & [\verb+term+]~x \\
+    &  |  & \verb+number+~x \\
+    &  |  & \verb+ident+~x \\
+    &  |  & \verb+fresh+~x \\
+    &  |  & \verb+anonymous+ \\
+    &  |  & \verb+fold+~[\verb+left+\mid\verb+right+]~\NT{meta}~\verb+rec+~x~\NT{meta} \\
+    &  |  & \verb+default+~\NT{meta}~\NT{meta} \\
+    &  |  & \verb+if+~\NT{meta}~\verb+then+~\NT{meta}~\verb+else+~\NT{meta} \\
+    &  |  & \verb+fail+ 
+\end{array}
+\]
+\hrule
+\end{table}
+
+Each time a \texttt{notation} statement is evaluated by \MATITA{} a new parsing
+rule, extracted from the concrete syntax pattern, is added to the term parser
+and a semantic action which build a content level term, extracted from the
+content level pattern, is associated to it. We will now describe in turn what
+can be part of a concrete syntax pattern and what can be part of a content level
+pattern.
+
+Concrete syntax patterns, whose abstract syntax can additionally be found in
+Tab.~\ref{tab:l1a} can be made of several components. The most basic of which
+are \emph{literal symbols} (like the ``+'' in the example above) and \emph{term
+variables} (like ``a'' and ``b''). During the extraction of parsing rules
+literal symbols are mapped to productions expecting those symbols verbatim as
+input and term variables as production expecting other terms (instances of the
+same parsing rule we are extending, possibly with different precedence and/or
+associativity).
+
+\ldots
+
+\subsubsection{From content level to CIC}
+
+Responsible of mapping content level terms to CIC terms is the disambiguation
+algorithm implemented in \MATITA. Since it has already been described
+elsewhere~\cite{disambiguation} we wont enter in too much details here. We only
+give some highlights of its fundamental concepts.
+
+\subsubsection{Sources of ambiguity}
+
+The translation from content level terms to CIC terms is not straightforward
+because some nodes of the content encoding admit more that one CIC encoding,
+invalidating requirement (2).
+
+\begin{example}
+ \label{ex:disambiguation}
+
+ Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x +
+ ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the
+ user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed
+ as infix operators, all the following questions are legitimate and must be
+ answered before obtaining a CIC term from its content level encoding
+ (Fig.~\ref{fig:inputphase}(b)):
+
+ \begin{enumerate}
+
+  \item Since \texttt{ln} is an unbound identifier, which CIC constants does it
+   represent? Many different theorems in the library may share its (rather
+   short) name \dots
+
+  \item Which kind of number (\IN, \IR, \dots) the \texttt{1} literal stand for?
+   Which encoding is used in CIC to represent it? E.g., assuming $1\in\IN$, is
+   it an unary or a binary encoding?
+
+  \item Which kind of equality the ``='' node represents? Is it Leibniz's
+   polymorhpic equality? Is it a decidable equality over \IN, \IR, \dots?
+
+ \end{enumerate}
+
+\end{example}
+
+In \MATITA, three \emph{sources of ambiguity} are admitted for content level
+terms: unbound identifiers, literal numbers, and operators. Each instance of
+ambiguity sources (ambiguous entity) occuring in a content level term is
+associated to a \emph{disambiguation domain}. Intuitively a disambiguation
+domain is a set of CIC terms which may be replaced for an ambiguous entity
+during disambiguation. Each item of the domain is said to be an
+\emph{interpretation} for the ambiguous entity.
+
+\emph{Unbound identifiers} (question 1) are ambiguous entities since the
+namespace of CIC objects is not flat and the same identifier may denote many
+ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library
+is shared by three different theorems stating the associative property of
+different additions.  This kind of ambiguity is avoidable if the user is willing
+to use long names (in form of URIs in the \texttt{cic://} scheme) in the
+concrete syntax, with the obvious drawbacks of obtaining long and unreadable
+terms.
+
+Given an unbound identifier, the corresponding disambiguation domain is computed
+querying the library for all constants, inductive types, and inductive type
+constructors having it as their short name (see the \LOCATE{} query in
+Sect.~\ref{sec:metadata}).
+
+\emph{Literal numbers} (question 2) are ambiguous entities as well since
+different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using
+different encodings. Considering the restricted example of natural numbers we
+can for instance encode them in CIC using inductive datatypes with a number of
+constructor equal to the encoding base plus 1, obtaining one encoding for each
+base.
+
+For each possible way of mapping a literal number to a CIC term, \MATITA{} is
+aware of a \emph{number intepretation function} which, when applied to the
+natural number denoted by the literal\footnote{at the moment only literal
+natural number are supported in the concrete syntax} returns a corresponding CIC
+term. The disambiguation domain for a given literal number is built applying to
+the literal all available number interpretation functions in turn.
+
+Number interpretation functions can at the moment only be defined in OCaml, but
+a mechanism to enable their definition directly in \MATITA{} is under
+developement.
+
+\emph{Operators} (question 3) are intuitively head of applications, as such they
+are always applied to a (possiblt empty) sequence of arguments. Their ambiguity
+is a need since it is often the case that some notation is used in an overloaded
+fashion to hide the use of different CIC constants which encodes similar
+concepts. For example, in the standard library of \MATITA{} the infix \texttt{+}
+notation is available building a binary \texttt{Op(+)} node, whose
+disambiguation domain may refer to different constants like the addition over
+natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of
+the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}.
+
+For each possible way of mapping an operator application to a CIC term,
+\MATITA{} knows an \emph{operator interpretation function} which, when applied
+to an operator and its arguments, returns a CIC term. The disambiguation domain
+for a given operator is built applying to the operator and its arguments all
+available operator interpretation functions in turn.
+
+Operator interpretation functions could be added using the
+\texttt{interpretation} statement. For example, among the first line of the
+script \texttt{matita/library/logic/equality.ma} from the \MATITA{} standard
+library we read:
+
+\begin{Verbatim}
+interpretation "leibnitz's equality"
+ 'eq x y =
+   (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y).
+\end{Verbatim}
+
+Evaluating it in \MATITA{} will add an operator interpretation function for the
+binary operator \texttt{eq} which expands to the CIC term on the right hand side
+of the statement. That CIC term can be written using only built-in concrete
+syntax, can contain no ambiguity source; still, it can refer to operator
+arguments bound on the left hand side and can contain implicit terms (denoted
+with \texttt{\_}) which will be expanded to fresh metavariables. The latter
+feature is used in the example above for the first argument of Leibniz's
+polymorhpic equality.
+
+\subsubsection{Disambiguation algorithm}
+
+A \emph{disambiguation algorithm} takes as input a content level term and return
+a fully determined CIC term. The key observation on which a disambiguation
+algorithm is based is that given a content level term with more than one sources
+of ambiguity, not all possible combination of interpretation lead to a typable
+CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the
+interpretation of \texttt{ln} as a function from \IR to \IR and the
+interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion
+of ``can't coexists'' in the disambiguation of \MATITA{} is defined on top of
+the \emph{refiner} for CIC terms described in~\cite{csc-phd}.
+
+Briefly, a refiner is a function whose input is an \emph{incomplete CIC term}
+$t_1$ --- i.e. a term where metavariables occur (Sect.~\ref{sec:metavariables}
+--- and whose output is either
+
+\begin{enumerate}
+ 
+ \item an incomplete CIC term $t_2$ where $t_2$ is a well-typed term obtained
+  assigning a type to each metavariable in $t_1$ (in case of dependent types,
+  instantiation of some of the metavariable occurring in $t_1$ may occur as
+  well);
+
+ \item $\epsilon$, meaning that no well-typed term could be obtained via
+  assignment of type to metavariable in $t_1$ and their instantiation;
+
+ \item $\bot$, meaning that the refiner is unable to decide whether of the two
+  cases above apply (refinement is semi-decidable).
+
+\end{enumerate}
+
+On top of a CIC refiner \MATITA{} implement an efficient disambiguation
+algorithm, which is outlined below. It takes as input a content level term $c$
+and proceeds as follows:
+
+\begin{enumerate}
+
+ \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(c)\}$, where
+  $\mathit{Dom}(c)$ is the set of ambiguity sources of $c$. Each $D_i$ is a set
+  of CIC terms and can be built as described above.
+
+ \item An \emph{interpretation} $\Phi$ for $c$ is a map associating an
+  incomplete CIC term to each ambiguity source of $c$. Given $c$ and one of its
+  interpretations an incomplete CIC term is fully determined replacing each
+  ambiguity source of $c$ with its mapping in the interpretation and injecting
+  the remaining structure of the content level in the CIC level (e.g. replacing
+  the application of the content level with the application of the CIC level).
+  This operation is informally called ``interpreting $c$ with $\Phi$''.
+  
+  Create an initial interpretation $\Phi_0 = \{\phi_i | \phi_i = \_,
+  i\in\mathit{Dom}(c)\}$, which associates a fresh metavariable to each source
+  of ambiguity of $c$. During this step, implicit terms are expanded to fresh
+  metavariables as well.
+
+ \item Refine the current incomplete CIC term (i.e.  the term obtained
+  interpreting $t$ with $\Phi_i$).
+
+  If the refinement succeeds or is undetermined the next interpretation
+  $\Phi_{i+1}$ will be created \emph{making a choice}, that is replacing in the
+  current interpretation one of the metavariable appearing in $\Phi_i$ with one
+  of the possible choice from the corresponding disambiguation domain. The
+  metavariable to be replaced is chosen following a preorder visit of the
+  ambiguous term. Then, step 3 is attempted again with the new interpretation.
+    
+  If the refinement fails the current set of choices cannot lead to a well-typed
+  term and backtracking of the current interpretation is attempted.
+    
+ \item Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does
+  no longer contain any placeholder), backtracking is attempted anyway to find
+  the other correct interpretations.
+
+ \item Let $n$ be the number of interpretations who survived step 4. If $n=0$
+  signal a type error. If $n=1$ we have found exactly one (incomplete) CIC term
+  corresponding to the content level term $c$, returns it as output of the
+  disambiguation phase. If $n>1$ we have found many different (incomplete) CIC
+  terms which can correspond to the content level term, let the user choose one
+  of the $n$ interpretations and returns the corresponding term.
+
+\end{enumerate}
+
+The efficiency of this algorithm resides in the fact that as soon as an
+incomplete CIC term is not typable, no further instantiation of the
+metavariables of the corresponding interpretation is attemped.
+% For example, during the disambiguation of the user input
+% \texttt{\TEXMACRO{forall} x. x*0 = 0}, an interpretation $\Phi_i$ is
+% encountered which associates $?$ to the instance of \texttt{0} on the right,
+% the real number $0$ to the instance of \texttt{0} on the left, and the
+% multiplication over natural numbers (\texttt{mult} for short) to \texttt{*}.
+% The refiner will fail, since \texttt{mult} require a natural argument, and no
+% further instantiation of the placeholder will be tried.
+
+Details of the disambiguation algorithm along with an analysis of its complexity
+can be found in~\cite{disambiguation}, where a formulation without backtracking
+(corresponding to the actual \MATITA{} implementation) is also presented.
+
+\subsubsection{Disambiguation stages}
+
+\section{Environment}
+
+\[
+\begin{array}{rcll}
+  V & ::= & & \mbox{(\bf values)} \\
+    &     & \verb+Term+~T & \mbox{(term)} \\
+    &  |  & \verb+String+~s & \mbox{(string)} \\
+    &  |  & \verb+Number+~n & \mbox{(number)} \\
+    &  |  & \verb+None+ & \mbox{(optional value)} \\
+    &  |  & \verb+Some+~V & \mbox{(optional value)} \\
+    &  |  & [V_1,\dots,V_n] & \mbox{(list value)} \\[2ex]
+\end{array}
+\]
+
+An environment is a map $\mathcal E : \mathit{Name} -> V$.
+
+\section{Level 1: concrete syntax}
+
+Rationale: while the layout schemata can occur in the concrete syntax
+used by user, the box schemata and the magic patterns can only occur
+when defining the notation. This is why the layout schemata are
+``escaped'' with a backslash, so that they cannot be confused with
+plain identifiers, wherease the others are not. Alternatively, they
+could be defined as keywords, but this would prevent their names to be
+used in different contexts.
+
  \[
  \ITO{\cdot}{{}} : P -> \mathit{Env} -> T
  \]
@@ -260,65 +754,6 @@ used in different contexts.
  
  \section{Level 2: abstract syntax}
  
-\newcommand{\NT}[1]{\langle\mathit{#1}\rangle}
-
-\begin{table}
-\caption{\label{tab:synl2} Concrete syntax of level 2 patterns.\strut}
-\hrule
-\[
-\begin{array}{@{}rcll@{}}
-  \NT{term} & ::= & & \mbox{\bf terms} \\
-    &     & x & \mbox{(identifier)} \\
-    &  |  & n & \mbox{(number)} \\
-    &  |  & s & \mbox{(symbol)} \\
-    &  |  & \mathrm{URI} & \mbox{(URI)} \\
-    &  |  & \verb+?+ & \mbox{(implicit)} \\
-    &  |  & \verb+%+ & \mbox{(placeholder)} \\
-    &  |  & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\
-    &  |  & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\
-    &  |  & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\
-    &  |  & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\
-    &  |  & \NT{term}~\NT{term} & \mbox{(application)} \\
-    &  |  & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\
-    &  |  & [\verb+[+~\NT{term}~\verb+]+]~\verb+match+~\NT{term}~\verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \mbox{(pattern match)} \\
-    &  |  & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\
-    &  |  & \verb+(+~\NT{term}~\verb+)+ \\
-    &  |  & \BLOB(\NT{meta},\dots,\NT{meta}) & \mbox{(meta blob)} \\
-  \NT{defs}  & ::= & & \mbox{\bf mutual definitions} \\
-    &     & \NT{fun}~\{\verb+and+~\NT{fun}\} \\
-  \NT{fun} & ::= & & \mbox{\bf functions} \\
-    &     & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\
-  \NT{binder} & ::= & & \mbox{\bf binders} \\
-    &     & \verb+\Pi+ \mid \verb+\exists+ \mid \verb+\forall+ \mid \verb+\lambda+ \\
-  \NT{arg} & ::= & & \mbox{\bf single argument} \\
-    &     & \verb+_+ \mid x \mid \BLOB(\NT{meta},\dots,\NT{meta}) \\
-  \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\
-    &     & \NT{arg} \\
-    &  |  & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\
-  \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\
-    &     & \NT{arg} \\
-    &  |  & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\
-  \NT{kind} & ::= & & \mbox{\bf induction kind} \\
-    &     & \verb+rec+ \mid \verb+corec+ \\
-  \NT{rule} & ::= & & \mbox{\bf rules} \\
-    &     & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} \\[10ex]
-
-  \NT{meta} & ::= & & \mbox{\bf meta} \\
-    &     & \BLOB(\NT{term},\dots,\NT{term}) & \mbox{(term blob)} \\
-    &  |  & [\verb+term+]~x \\
-    &  |  & \verb+number+~x \\
-    &  |  & \verb+ident+~x \\
-    &  |  & \verb+fresh+~x \\
-    &  |  & \verb+anonymous+ \\
-    &  |  & \verb+fold+~[\verb+left+\mid\verb+right+]~\NT{meta}~\verb+rec+~x~\NT{meta} \\
-    &  |  & \verb+default+~\NT{meta}~\NT{meta} \\
-    &  |  & \verb+if+~\NT{meta}~\verb+then+~\NT{meta}~\verb+else+~\NT{meta} \\
-    &  |  & \verb+fail+ 
-\end{array}
-\]
-\hrule
-\end{table}
-
  \begin{table}
  \caption{\label{tab:wfl2} Well-formedness rules for level 2 patterns.\strut}
  \hrule
@@ -786,3 +1221,5 @@ is well-formed if:
    context $D$, that is $P_2 :: D$ and $P_3 :: D$.
  \end{itemize}
  
+\section{Semantic selection}
+