X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2Focaml%2Fcic_notation%2Fdoc%2Fbody.tex;h=fef547e1d0da321fb58abcb1b5a043937e14f5e2;hb=f59550b5a9cdddbb348697201fae7d736d6b96c5;hp=f1ce73577174befcea58cf010f3784b735899b16;hpb=25be0a2deae2c84d626cf77aeb41c39c959ab81a;p=helm.git diff --git a/helm/ocaml/cic_notation/doc/body.tex b/helm/ocaml/cic_notation/doc/body.tex index f1ce73577..fef547e1d 100644 --- a/helm/ocaml/cic_notation/doc/body.tex +++ b/helm/ocaml/cic_notation/doc/body.tex @@ -1,21 +1,209 @@ -\section{Environment} +\section{Introduction} -\[ -\begin{array}{rcll} - V & ::= & & \mbox{(\bf values)} \\ - & & \verb+Term+~T & \mbox{(term)} \\ - & | & \verb+String+~s & \mbox{(string)} \\ - & | & \verb+Number+~n & \mbox{(number)} \\ - & | & \verb+None+ & \mbox{(optional value)} \\ - & | & \verb+Some+~V & \mbox{(optional value)} \\ - & | & [V_1,\dots,V_n] & \mbox{(list value)} \\[2ex] -\end{array} -\] +Mathematical notation plays a fundamental role in mathematical practice: it +helps expressing in a concise symbolic fashion concepts of arbitrary complexity. +Its use in proof assistants like \MATITA{} is no exception. Formal mathematics +indeed often impose to encode mathematical concepts at a very high level of +details (e.g. Peano numbers, implicit arguments) having a restricted toolbox of +syntactic constructions in the calculus. -An environment is a map $\mathcal E : \mathit{Name} -> V$. +Consider for example one of the point reached while proving the distributivity +of times over minus on natural numbers included in the \MATITA{} standards +library. (Part of) the reached sequent can be seen in \MATITA{} both using the +notation for various arithmetical and relational operator or without using it. +The sequent rendered without using notation looks as follows: -\section{Level 1: concrete syntax} +\sequent{ +\mathtt{H}: \mathtt{le} z y\\ +\mathtt{Hcut}: \mathtt{eq} \mathtt{nat} (\mathtt{plus} (\mathtt{times} x (\mathtt{minus} +y z)) (\mathtt{times} x z))\\ +(\mathtt{plus} (\mathtt{minus} (\mathtt{times} x y) (\mathtt{times} x z)) +(\mathtt{times} x z))}{ +\mathtt{eq} \mathtt{nat} (\mathtt{times} x (\mathtt{minus} y z)) (\mathtt{minus} +(\mathtt{times} x y) (\mathtt{times} x z))} + +while the corresponding sequent rendered with notation enabled looks: + +\sequent{ +H: z\leq y\\ +Hcut: x*(y-z)+x*z=x*y-x*z+x*z}{ +x*(y-z)=x*y-x*z} + +The latter representation is evidently more readable than the former helping +users both in concentrating on the key aspects of the proof (namely on choosing +the right strategy to proceed in the proof) and in reducing the amount of input +that need to be provided to the system when term input is required (assuming the +exists a correspondence among the rendered output and the textual input syntax +used by the user, as it happens in \MATITA). + +In this section we present the \emph{extensible notation} mechanism implemented +in \MATITA. Its role may be looked at from two different point of view: the term +input phase and the term output --- or rendering --- phase. We arbitrarly +decided to call the former view ``from the left'' and the latter ``from the +right''. Looking from the point of view of the input phase it offers a mechanism +of dynamic extension of the term grammar enabling the user to define fancy +mathematical notations. Looking from the point of view of rendering it enable +the reconstruction of such notations from CIC term and its rendering to various +presentation languages (at the time of writing supported languages are MathML +Presentation and the \MATITA{} concrete syntax for terms). + +If you're wondering why the notation mechanisms need to be ``extensible'', the +answer lays in how notation is used in the development of formal mathematics +with proof assistants. When doing ordinary (i.e. non automatically checkable by +the mean of a proof checker) mathematics, notation is often confused with the +mathematical objects being defined. ``+'' may be thought as \emph{the} addition +(and is often termed as such in mathematical textbooks!), but is rather the +notation for one particolar kind of addition which may possibly be used in an +overloaded fashion elsewhere. When doing formal mathematics the difference is +tangible and users has to deal separately with all the actions we skimmed +through: + +\begin{enumerate} + + \item definition of mathematical objects (e.g. addition over Peano numbers + using the primitive recursion scheme); + + \item definition of new mathematical notation (e.g. infix use of the $+$ symbol + as in $x + 3$); + + \item (incremental) definition of the meanings of a given notation (e.g. the + use of the notation of (2) above for denoting the addition of (1)). + +\end{enumerate} + +Since all the points above are part of everyday life of proof assistants users +we know that mathematical notation in the system will change and we can't +provide a ``one-size fits all'' solution as is done for instance in mainstream +programming languages mathematical notation. For this reason \MATITA{} supports +all the above actions in a coherent manner in both term input and output. + +\section{Looking from the left: term input} + +\subsubsection{\MATITA{} input phase} + + \begin{table} + \caption{\label{tab:termsyn} Concrete syntax of CIC terms: built-in + notation\strut} + \hrule + \[ + \begin{array}{@{}rcll@{}} + \NT{term} & ::= & & \mbox{\bf terms} \\ + & & x & \mbox{(identifier)} \\ + & | & n & \mbox{(number)} \\ + & | & s & \mbox{(symbol)} \\ + & | & \mathrm{URI} & \mbox{(URI)} \\ + & | & \verb+_+ & \mbox{(implicit)} \\ + & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\ + & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\ + & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\ + & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\ + & | & \NT{term}~\NT{term} & \mbox{(application)} \\ + & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\ + & | & \verb+match+~\NT{term}~ & \mbox{(pattern matching)} \\ + & & ~ ~ [\verb+[+~\verb+in+~x~\verb+]+] + ~ [\verb+[+~\verb+return+~\NT{term}~\verb+]+] \\ + & & ~ ~ \verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \\ + & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\ + & | & \verb+(+~\NT{term}~\verb+)+ \\ + \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\ + & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\ + \NT{fun} & ::= & & \mbox{\bf functions} \\ + & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\ + \NT{binder} & ::= & & \mbox{\bf binders} \\ + & & \verb+\forall+ \mid \verb+\lambda+ \\ + \NT{arg} & ::= & & \mbox{\bf single argument} \\ + & & \verb+_+ \mid x \\ + \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\ + \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\ + \NT{kind} & ::= & & \mbox{\bf induction kind} \\ + & & \verb+rec+ \mid \verb+corec+ \\ + \NT{rule} & ::= & & \mbox{\bf rules} \\ + & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} + \end{array} + \] + \hrule + \end{table} + +The primary form of user interaction employed by \MATITA{} is textual script +editing: the user modifies it and evaluate step by step its composing +\emph{statements}. Examples of statements are inductive type definitions, +theorem declarations, LCF-style tacticals, and macros (e.g. \texttt{Check} can +be used to ask the system to refine a given term and pretty print the result). +Since many statements refer to terms of the underlying calculus, \MATITA{} needs +a concrete syntax able to encode terms of the Calculus of Inductive +Constructions. + +Two of the requirements in the design of such a syntax are apparently in +contrast: + +\begin{enumerate} + + \item the syntax should be as close as possible to common mathematical practice + and implement widespread mathematical notations; + + \item each term described by the syntax should be non-ambiguous meaning that it + should exists a function which associates to it a CIC term. + +\end{enumerate} + +These two requirements are addressed in \MATITA{} by the mean of two mechanisms +which work together: \emph{term disambiguation} and \emph{extensible notation}. +Their interaction is visible in the architecture of the \MATITA{} input phase, +depicted in Fig.~\ref{fig:inputphase}. The architecture is articulated as a +pipeline of three levels: the concrete syntax level (level 0) is the one the +user has to deal with when inserting CIC terms; the abstract syntax level (level +2) is an internal representation which intuitively encodes mathematical formulae +at the content level~\cite{adams}\cite{mkm-structure}; the last level is that of +CIC terms. + +\begin{figure}[ht] + \begin{center} + \includegraphics[width=0.9\textwidth]{input_phase} + \caption{\MATITA{} input phase} + \end{center} + \label{fig:inputphase} +\end{figure} + +Requirement (1) is addressed by a built-in concrete syntax for terms, described +in Tab.~\ref{tab:termsyn}, and the extensible notation mechanisms which offers a +way for extending available mathematical notations and providing a parser for +the extended notation. Requirement (2) is addressed by the conjunct action of +that parsing function and disambiguation which provides a function from content +level terms to CIC terms. + +\subsubsection{From concrete syntax to content level} + +Content level terms are instances of what are commonly referred as Abstract +Syntax Trees (ASTs) in compilers literature. In this respect the mapping from +concrete syntax fo content level is nothing more than the pipelined application +of a lexer and a parser to the characters that form terms at the concrete syntax +level. + +The plus offered by the notation mechanisms is the ability to dinamically extend +the parsing rules which build abstract syntax tree from stream of lexer tokens. +For example, in the standard library of \MATITA{} we found the following +statements which define the notation used for the ``+'' infix operator. + +\begin{example} +\begin{Verbatim} + notation "a + b" + left associative with precedence 50 + for @{ 'plus $a $b }. +\end{Verbatim} +\end{example} + +The meaning of such a statement is to declare a bidirectional +mapping\footnote{in this section we only deal with the left to right part of the +mapping, but it is actually bidirectional} between a concrete syntax pattern +(the part of the statement inside double quotes) and a content level pattern +(the part of the statement which follows \texttt{for}). The syntax of concrete +syntax patterns and content level patterns can be found in Tab.~\ref{tab:l1c} +and Tab.~\ref{tab:l2c} respectively. \begin{table} \caption{\label{tab:l1c} Concrete syntax of level 1 patterns.\strut} @@ -52,14 +240,6 @@ An environment is a map $\mathcal E : \mathit{Name} -> V$. \hrule \end{table} -Rationale: while the layout schemata can occur in the concrete syntax -used by user, the box schemata and the magic patterns can only occur -when defining the notation. This is why the layout schemata are -``escaped'' with a backslash, so that they cannot be confused with -plain identifiers, wherease the others are not. Alternatively, they -could be defined as keywords, but this would prevent their names to be -used in different contexts. - \begin{table} \caption{\label{tab:l1a} Abstract syntax of level 1 terms and patterns.\strut} \hrule @@ -96,6 +276,320 @@ used in different contexts. \hrule \end{table} +\begin{table} +\caption{\label{tab:synl2} Concrete syntax of level 2 patterns.\strut} +\hrule +\[ +\begin{array}{@{}rcll@{}} + \NT{term} & ::= & & \mbox{\bf terms} \\ + & & x & \mbox{(identifier)} \\ + & | & n & \mbox{(number)} \\ + & | & s & \mbox{(symbol)} \\ + & | & \mathrm{URI} & \mbox{(URI)} \\ + & | & \verb+?+ & \mbox{(implicit)} \\ + & | & \verb+%+ & \mbox{(placeholder)} \\ + & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\ + & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\ + & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\ + & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\ + & | & \NT{term}~\NT{term} & \mbox{(application)} \\ + & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\ + & | & [\verb+[+~\NT{term}~\verb+]+]~\verb+match+~\NT{term}~\verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \mbox{(pattern match)} \\ + & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\ + & | & \verb+(+~\NT{term}~\verb+)+ \\ + & | & \BLOB(\NT{meta},\dots,\NT{meta}) & \mbox{(meta blob)} \\ + \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\ + & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\ + \NT{fun} & ::= & & \mbox{\bf functions} \\ + & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\ + \NT{binder} & ::= & & \mbox{\bf binders} \\ + & & \verb+\Pi+ \mid \verb+\exists+ \mid \verb+\forall+ \mid \verb+\lambda+ \\ + \NT{arg} & ::= & & \mbox{\bf single argument} \\ + & & \verb+_+ \mid x \mid \BLOB(\NT{meta},\dots,\NT{meta}) \\ + \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\ + \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\ + & & \NT{arg} \\ + & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\ + \NT{kind} & ::= & & \mbox{\bf induction kind} \\ + & & \verb+rec+ \mid \verb+corec+ \\ + \NT{rule} & ::= & & \mbox{\bf rules} \\ + & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} \\[10ex] + + \NT{meta} & ::= & & \mbox{\bf meta} \\ + & & \BLOB(\NT{term},\dots,\NT{term}) & \mbox{(term blob)} \\ + & | & [\verb+term+]~x \\ + & | & \verb+number+~x \\ + & | & \verb+ident+~x \\ + & | & \verb+fresh+~x \\ + & | & \verb+anonymous+ \\ + & | & \verb+fold+~[\verb+left+\mid\verb+right+]~\NT{meta}~\verb+rec+~x~\NT{meta} \\ + & | & \verb+default+~\NT{meta}~\NT{meta} \\ + & | & \verb+if+~\NT{meta}~\verb+then+~\NT{meta}~\verb+else+~\NT{meta} \\ + & | & \verb+fail+ +\end{array} +\] +\hrule +\end{table} + +Each time a \texttt{notation} statement is evaluated by \MATITA{} a new parsing +rule, extracted from the concrete syntax pattern, is added to the term parser +and a semantic action which build a content level term, extracted from the +content level pattern, is associated to it. We will now describe in turn what +can be part of a concrete syntax pattern and what can be part of a content level +pattern. + +Concrete syntax patterns, whose abstract syntax can additionally be found in +Tab.~\ref{tab:l1a} can be made of several components. The most basic of which +are \emph{literal symbols} (like the ``+'' in the example above) and \emph{term +variables} (like ``a'' and ``b''). During the extraction of parsing rules +literal symbols are mapped to productions expecting those symbols verbatim as +input and term variables as production expecting other terms (instances of the +same parsing rule we are extending, possibly with different precedence and/or +associativity). + +\ldots + +\subsubsection{From content level to CIC} + +Responsible of mapping content level terms to CIC terms is the disambiguation +algorithm implemented in \MATITA. Since it has already been described +elsewhere~\cite{disambiguation} we wont enter in too much details here. We only +give some highlights of its fundamental concepts. + +\subsubsection{Sources of ambiguity} + +The translation from content level terms to CIC terms is not straightforward +because some nodes of the content encoding admit more that one CIC encoding, +invalidating requirement (2). + +\begin{example} + \label{ex:disambiguation} + + Consider the term at the concrete syntax level \texttt{\TEXMACRO{forall} x. x + + ln 1 = x} of Fig.~\ref{fig:inputphase}(a), it can be the type of a lemma the + user may want to prove. Assuming that both \texttt{+} and \texttt{=} are parsed + as infix operators, all the following questions are legitimate and must be + answered before obtaining a CIC term from its content level encoding + (Fig.~\ref{fig:inputphase}(b)): + + \begin{enumerate} + + \item Since \texttt{ln} is an unbound identifier, which CIC constants does it + represent? Many different theorems in the library may share its (rather + short) name \dots + + \item Which kind of number (\IN, \IR, \dots) the \texttt{1} literal stand for? + Which encoding is used in CIC to represent it? E.g., assuming $1\in\IN$, is + it an unary or a binary encoding? + + \item Which kind of equality the ``='' node represents? Is it Leibniz's + polymorhpic equality? Is it a decidable equality over \IN, \IR, \dots? + + \end{enumerate} + +\end{example} + +In \MATITA, three \emph{sources of ambiguity} are admitted for content level +terms: unbound identifiers, literal numbers, and operators. Each instance of +ambiguity sources (ambiguous entity) occuring in a content level term is +associated to a \emph{disambiguation domain}. Intuitively a disambiguation +domain is a set of CIC terms which may be replaced for an ambiguous entity +during disambiguation. Each item of the domain is said to be an +\emph{interpretation} for the ambiguous entity. + +\emph{Unbound identifiers} (question 1) are ambiguous entities since the +namespace of CIC objects is not flat and the same identifier may denote many +ofthem. For example the short name \texttt{plus\_assoc} in the \HELM{} library +is shared by three different theorems stating the associative property of +different additions. This kind of ambiguity is avoidable if the user is willing +to use long names (in form of URIs in the \texttt{cic://} scheme) in the +concrete syntax, with the obvious drawbacks of obtaining long and unreadable +terms. + +Given an unbound identifier, the corresponding disambiguation domain is computed +querying the library for all constants, inductive types, and inductive type +constructors having it as their short name (see the \LOCATE{} query in +Sect.~\ref{sec:metadata}). + +\emph{Literal numbers} (question 2) are ambiguous entities as well since +different kinds of numbers can be encoded in CIC (\IN, \IR, \IZ, \dots) using +different encodings. Considering the restricted example of natural numbers we +can for instance encode them in CIC using inductive datatypes with a number of +constructor equal to the encoding base plus 1, obtaining one encoding for each +base. + +For each possible way of mapping a literal number to a CIC term, \MATITA{} is +aware of a \emph{number intepretation function} which, when applied to the +natural number denoted by the literal\footnote{at the moment only literal +natural number are supported in the concrete syntax} returns a corresponding CIC +term. The disambiguation domain for a given literal number is built applying to +the literal all available number interpretation functions in turn. + +Number interpretation functions can at the moment only be defined in OCaml, but +a mechanism to enable their definition directly in \MATITA{} is under +developement. + +\emph{Operators} (question 3) are intuitively head of applications, as such they +are always applied to a (possiblt empty) sequence of arguments. Their ambiguity +is a need since it is often the case that some notation is used in an overloaded +fashion to hide the use of different CIC constants which encodes similar +concepts. For example, in the standard library of \MATITA{} the infix \texttt{+} +notation is available building a binary \texttt{Op(+)} node, whose +disambiguation domain may refer to different constants like the addition over +natural numbers \URI{cic:/matita/nat/plus/plus.con} or that over real numbers of +the \COQ{} standard library \URI{cic:/Coq/Reals/Rdefinitions/Rplus.con}. + +For each possible way of mapping an operator application to a CIC term, +\MATITA{} knows an \emph{operator interpretation function} which, when applied +to an operator and its arguments, returns a CIC term. The disambiguation domain +for a given operator is built applying to the operator and its arguments all +available operator interpretation functions in turn. + +Operator interpretation functions could be added using the +\texttt{interpretation} statement. For example, among the first line of the +script \texttt{matita/library/logic/equality.ma} from the \MATITA{} standard +library we read: + +\begin{Verbatim} +interpretation "leibnitz's equality" + 'eq x y = + (cic:/matita/logic/equality/eq.ind#xpointer(1/1) _ x y). +\end{Verbatim} + +Evaluating it in \MATITA{} will add an operator interpretation function for the +binary operator \texttt{eq} which expands to the CIC term on the right hand side +of the statement. That CIC term can be written using only built-in concrete +syntax, can contain no ambiguity source; still, it can refer to operator +arguments bound on the left hand side and can contain implicit terms (denoted +with \texttt{\_}) which will be expanded to fresh metavariables. The latter +feature is used in the example above for the first argument of Leibniz's +polymorhpic equality. + +\subsubsection{Disambiguation algorithm} + +A \emph{disambiguation algorithm} takes as input a content level term and return +a fully determined CIC term. The key observation on which a disambiguation +algorithm is based is that given a content level term with more than one sources +of ambiguity, not all possible combination of interpretation lead to a typable +CIC term. In the term of Ex.~\ref{ex:disambiguation} for instance the +interpretation of \texttt{ln} as a function from \IR to \IR and the +interpretation of \texttt{1} as the Peano number $1$ can't coexists. The notion +of ``can't coexists'' in the disambiguation of \MATITA{} is defined on top of +the \emph{refiner} for CIC terms described in~\cite{csc-phd}. + +Briefly, a refiner is a function whose input is an \emph{incomplete CIC term} +$t_1$ --- i.e. a term where metavariables occur (Sect.~\ref{sec:metavariables} +--- and whose output is either + +\begin{enumerate} + + \item an incomplete CIC term $t_2$ where $t_2$ is a well-typed term obtained + assigning a type to each metavariable in $t_1$ (in case of dependent types, + instantiation of some of the metavariable occurring in $t_1$ may occur as + well); + + \item $\epsilon$, meaning that no well-typed term could be obtained via + assignment of type to metavariable in $t_1$ and their instantiation; + + \item $\bot$, meaning that the refiner is unable to decide whether of the two + cases above apply (refinement is semi-decidable). + +\end{enumerate} + +On top of a CIC refiner \MATITA{} implement an efficient disambiguation +algorithm, which is outlined below. It takes as input a content level term $c$ +and proceeds as follows: + +\begin{enumerate} + + \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(c)\}$, where + $\mathit{Dom}(c)$ is the set of ambiguity sources of $c$. Each $D_i$ is a set + of CIC terms and can be built as described above. + + \item An \emph{interpretation} $\Phi$ for $c$ is a map associating an + incomplete CIC term to each ambiguity source of $c$. Given $c$ and one of its + interpretations an incomplete CIC term is fully determined replacing each + ambiguity source of $c$ with its mapping in the interpretation and injecting + the remaining structure of the content level in the CIC level (e.g. replacing + the application of the content level with the application of the CIC level). + This operation is informally called ``interpreting $c$ with $\Phi$''. + + Create an initial interpretation $\Phi_0 = \{\phi_i | \phi_i = \_, + i\in\mathit{Dom}(c)\}$, which associates a fresh metavariable to each source + of ambiguity of $c$. During this step, implicit terms are expanded to fresh + metavariables as well. + + \item Refine the current incomplete CIC term (i.e. the term obtained + interpreting $t$ with $\Phi_i$). + + If the refinement succeeds or is undetermined the next interpretation + $\Phi_{i+1}$ will be created \emph{making a choice}, that is replacing in the + current interpretation one of the metavariable appearing in $\Phi_i$ with one + of the possible choice from the corresponding disambiguation domain. The + metavariable to be replaced is chosen following a preorder visit of the + ambiguous term. Then, step 3 is attempted again with the new interpretation. + + If the refinement fails the current set of choices cannot lead to a well-typed + term and backtracking of the current interpretation is attempted. + + \item Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does + no longer contain any placeholder), backtracking is attempted anyway to find + the other correct interpretations. + + \item Let $n$ be the number of interpretations who survived step 4. If $n=0$ + signal a type error. If $n=1$ we have found exactly one (incomplete) CIC term + corresponding to the content level term $c$, returns it as output of the + disambiguation phase. If $n>1$ we have found many different (incomplete) CIC + terms which can correspond to the content level term, let the user choose one + of the $n$ interpretations and returns the corresponding term. + +\end{enumerate} + +The efficiency of this algorithm resides in the fact that as soon as an +incomplete CIC term is not typable, no further instantiation of the +metavariables of the corresponding interpretation is attemped. +% For example, during the disambiguation of the user input +% \texttt{\TEXMACRO{forall} x. x*0 = 0}, an interpretation $\Phi_i$ is +% encountered which associates $?$ to the instance of \texttt{0} on the right, +% the real number $0$ to the instance of \texttt{0} on the left, and the +% multiplication over natural numbers (\texttt{mult} for short) to \texttt{*}. +% The refiner will fail, since \texttt{mult} require a natural argument, and no +% further instantiation of the placeholder will be tried. + +Details of the disambiguation algorithm along with an analysis of its complexity +can be found in~\cite{disambiguation}, where a formulation without backtracking +(corresponding to the actual \MATITA{} implementation) is also presented. + +\subsubsection{Disambiguation stages} + +\section{Environment} + +\[ +\begin{array}{rcll} + V & ::= & & \mbox{(\bf values)} \\ + & & \verb+Term+~T & \mbox{(term)} \\ + & | & \verb+String+~s & \mbox{(string)} \\ + & | & \verb+Number+~n & \mbox{(number)} \\ + & | & \verb+None+ & \mbox{(optional value)} \\ + & | & \verb+Some+~V & \mbox{(optional value)} \\ + & | & [V_1,\dots,V_n] & \mbox{(list value)} \\[2ex] +\end{array} +\] + +An environment is a map $\mathcal E : \mathit{Name} -> V$. + +\section{Level 1: concrete syntax} + +Rationale: while the layout schemata can occur in the concrete syntax +used by user, the box schemata and the magic patterns can only occur +when defining the notation. This is why the layout schemata are +``escaped'' with a backslash, so that they cannot be confused with +plain identifiers, wherease the others are not. Alternatively, they +could be defined as keywords, but this would prevent their names to be +used in different contexts. + \[ \ITO{\cdot}{{}} : P -> \mathit{Env} -> T \] @@ -260,65 +754,6 @@ used in different contexts. \section{Level 2: abstract syntax} -\newcommand{\NT}[1]{\langle\mathit{#1}\rangle} - -\begin{table} -\caption{\label{tab:synl2} Concrete syntax of level 2 patterns.\strut} -\hrule -\[ -\begin{array}{@{}rcll@{}} - \NT{term} & ::= & & \mbox{\bf terms} \\ - & & x & \mbox{(identifier)} \\ - & | & n & \mbox{(number)} \\ - & | & s & \mbox{(symbol)} \\ - & | & \mathrm{URI} & \mbox{(URI)} \\ - & | & \verb+?+ & \mbox{(implicit)} \\ - & | & \verb+%+ & \mbox{(placeholder)} \\ - & | & \verb+?+n~[\verb+[+~\{\NT{subst}\}~\verb+]+] & \mbox{(meta)} \\ - & | & \verb+let+~\NT{ptname}~\verb+\def+~\NT{term}~\verb+in+~\NT{term} \\ - & | & \verb+let+~\NT{kind}~\NT{defs}~\verb+in+~\NT{term} \\ - & | & \NT{binder}~\{\NT{ptnames}\}^{+}~\verb+.+~\NT{term} \\ - & | & \NT{term}~\NT{term} & \mbox{(application)} \\ - & | & \verb+Prop+ \mid \verb+Set+ \mid \verb+Type+ \mid \verb+CProp+ & \mbox{(sort)} \\ - & | & [\verb+[+~\NT{term}~\verb+]+]~\verb+match+~\NT{term}~\verb+with [+~[\NT{rule}~\{\verb+|+~\NT{rule}\}]~\verb+]+ & \mbox{(pattern match)} \\ - & | & \verb+(+~\NT{term}~\verb+:+~\NT{term}~\verb+)+ & \mbox{(cast)} \\ - & | & \verb+(+~\NT{term}~\verb+)+ \\ - & | & \BLOB(\NT{meta},\dots,\NT{meta}) & \mbox{(meta blob)} \\ - \NT{defs} & ::= & & \mbox{\bf mutual definitions} \\ - & & \NT{fun}~\{\verb+and+~\NT{fun}\} \\ - \NT{fun} & ::= & & \mbox{\bf functions} \\ - & & \NT{arg}~\{\NT{ptnames}\}^{+}~[\verb+on+~x]~\verb+\def+~\NT{term} \\ - \NT{binder} & ::= & & \mbox{\bf binders} \\ - & & \verb+\Pi+ \mid \verb+\exists+ \mid \verb+\forall+ \mid \verb+\lambda+ \\ - \NT{arg} & ::= & & \mbox{\bf single argument} \\ - & & \verb+_+ \mid x \mid \BLOB(\NT{meta},\dots,\NT{meta}) \\ - \NT{ptname} & ::= & & \mbox{\bf possibly typed name} \\ - & & \NT{arg} \\ - & | & \verb+(+~\NT{arg}~\verb+:+~\NT{term}~\verb+)+ \\ - \NT{ptnames} & ::= & & \mbox{\bf bound variables} \\ - & & \NT{arg} \\ - & | & \verb+(+~\NT{arg}~\{\verb+,+~\NT{arg}\}~[\verb+:+~\NT{term}]~\verb+)+ \\ - \NT{kind} & ::= & & \mbox{\bf induction kind} \\ - & & \verb+rec+ \mid \verb+corec+ \\ - \NT{rule} & ::= & & \mbox{\bf rules} \\ - & & x~\{\NT{ptname}\}~\verb+\Rightarrow+~\NT{term} \\[10ex] - - \NT{meta} & ::= & & \mbox{\bf meta} \\ - & & \BLOB(\NT{term},\dots,\NT{term}) & \mbox{(term blob)} \\ - & | & [\verb+term+]~x \\ - & | & \verb+number+~x \\ - & | & \verb+ident+~x \\ - & | & \verb+fresh+~x \\ - & | & \verb+anonymous+ \\ - & | & \verb+fold+~[\verb+left+\mid\verb+right+]~\NT{meta}~\verb+rec+~x~\NT{meta} \\ - & | & \verb+default+~\NT{meta}~\NT{meta} \\ - & | & \verb+if+~\NT{meta}~\verb+then+~\NT{meta}~\verb+else+~\NT{meta} \\ - & | & \verb+fail+ -\end{array} -\] -\hrule -\end{table} - \begin{table} \caption{\label{tab:wfl2} Well-formedness rules for level 2 patterns.\strut} \hrule @@ -786,3 +1221,5 @@ is well-formed if: context $D$, that is $P_2 :: D$ and $P_3 :: D$. \end{itemize} +\section{Semantic selection} +