From 1c4f274260deeca43260e2591cf76992508fa56e Mon Sep 17 00:00:00 2001 From: Stefano Zacchiroli Date: Mon, 30 Jan 2006 11:52:27 +0000 Subject: [PATCH] uniformed disambiguation part --- helm/papers/matita/matita2.tex | 209 ++++++++++++++++----------------- 1 file changed, 99 insertions(+), 110 deletions(-) diff --git a/helm/papers/matita/matita2.tex b/helm/papers/matita/matita2.tex index 5c56eccf3..e74021c5b 100644 --- a/helm/papers/matita/matita2.tex +++ b/helm/papers/matita/matita2.tex @@ -1,4 +1,4 @@ -\documentclass[]{kluwer} +\documentclass[draft]{kluwer} \usepackage{color} \usepackage{graphicx} \usepackage{hyperref} @@ -29,7 +29,6 @@ \newcommand{\MATITA}{Matita} \newcommand{\MATITAC}{\texttt{matitac}} \newcommand{\MATITADEP}{\texttt{matitadep}} -\newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline} \newcommand{\MOWGLI}{MoWGLI} \newcommand{\NAT}{\ensuremath{\mathit{nat}}} \newcommand{\NATIND}{\mathit{nat\_ind}} @@ -41,6 +40,7 @@ \newcommand{\UWOBO}{UWOBO} \newcommand{\GETTER}{Getter} \newcommand{\WHELP}{Whelp} + \newcommand{\DOT}{\ensuremath{\mbox{\textbf{.}}}} \newcommand{\SEMICOLON}{\ensuremath{\mbox{\textbf{;}}}} \newcommand{\BRANCH}{\ensuremath{\mbox{\textbf{[}}}} @@ -291,7 +291,7 @@ allow other developers to quickly understand our code and contribute. \includegraphics[width=0.9\textwidth,height=0.8\textheight]{pics/libraries-clusters} \caption[\MATITA{} components and related applications]{\MATITA{} components and related applications, with thousands of line of - codes (klocs)} + codes (klocs).\strut} \label{fig:libraries} \end{center} \end{figure} @@ -757,6 +757,8 @@ the \MATITA{} authoring interface. \subsection{Indexing and searching} \label{sec:indexing} +\TODO{sezione sull'indicizzazione} + \subsection{Disambiguation} \label{sec:disambiguation} @@ -773,7 +775,7 @@ Sect.~\ref{sec:contentintro}. The key component of the translation is the generic disambiguation algorithm implemented in the \texttt{disambiguation} component of Fig.~\ref{fig:libraries} and presented in~\cite{disambiguation}. In this section we present how to use -such an algorithm in the context of the development of a library of formalized +that algorithm in the context of the development of a library of formalized mathematics. We will see that using multiple passes of the algorithm, varying some of its parameters, helps in keeping the input terse without sacrificing expressiveness. @@ -781,7 +783,7 @@ expressiveness. \subsubsection{Disambiguation aliases} \label{sec:disambaliases} -Consider the following command to state a theorem over integer numbers. +Consider the following command to state a theorem over integer numbers: \begin{grafite} theorem Zlt_compat: @@ -794,8 +796,8 @@ Thus, according to the disambiguation algorithm, two different refinable partially specified terms could be associated to it. \MATITA{} asks the user what interpretation he meant. However, to avoid posing the same question in case of a future re-execution (e.g. undo/redo), -the choice must be recorded. Scripts need to be re-executed after invalidation. -Therefore the choice record must be permanently stored somewhere. The most +the choice must be recorded. Since scripts need to be re-executed after +invalidation, the choice record must be permanently stored somewhere. The most natural place is in the script itself. In \MATITA{} disambiguation is governed by \emph{disambiguation aliases}. @@ -829,7 +831,7 @@ the disambiguation in the rest of the script and speed up disambiguation reducing the search space. Disambiguation preferences are included in the lexicon status -(see Sect.~\ref{sec:presentationintro} that is part of the authoring interface +(see Sect.~\ref{sec:presentationintro}) that is part of the authoring interface status. Unlike aliases, they are not part of the library. When starting a new authoring session the set of disambiguation preferences @@ -847,120 +849,107 @@ over integer numbers. Nothing forbids the set of preferences to become incoherent. For this reason the disambiguator cannot always respect the user preferences. -For example consider: - +Consider, for example: \begin{grafite} -theorem Zlt_mono: \forall x, y, k. x < y \to x < y + k. +theorem Zlt_mono: + \forall x, y, k. x < y \to x < y + k. \end{grafite} No refinable partially specified term corresponds to the preferences: -\OP{+} over natural numbers, \OP{<} over integer numbers. -When the disambiguator fails, disambiguation is tried again with a less -strict set of preferences. Thus disambiguation is organized in -multiple \emph{passes}. In the first pass, called \emph{mono-preferences}, -we consider only the aliases corresponding to the current preferences. -In the second pass, called \emph{multi-preferences}, -we consider every alias corresponding to a current or past preference. -For instance, in the example above disambiguation succeeds in the -multi-preference pass - -\TODO{Disambiguazione: cambiato solo FINQUI} - -and suppose that the \OP{+} operator is defined only on natural numbers. If -the alias for \OP{<} points to the integer version of the operator, no -refinable partially specified term matching the term could be found. - -For this reason we chose to attempt \emph{multiple disambiguation passes}. A -first pass attempts to disambiguate using the last available disambiguation -aliases (\emph{mono aliases} pass); in case of failure the next pass tries -disambiguation again forgetting the aliases and using the whole library to -retrieve interpretation for ambiguous expressions (\emph{library aliases} pass). -Since the latter pass may lead to too many choices we intertwined an additional -pass among the two which use as interpretations all the aliases coming for -included parts of the library (\emph{multi aliases} phase). This is the reason -why aliases are \emph{one-to-many} mappings instead of one-to-one. This choice -turned out to be a well-balanced trade-off among performances (earlier passes -fail quickly) and degree of ambiguity supported for presentation level terms. +\OP{+} over natural numbers, \OP{<} over integer numbers. To overcome this +limitation we organized disambiguation in \emph{multiple passes}: when the +disambiguator fails, disambiguation is tried again with a less strict set of +preferences. + +Several disambiguation parameters can vary among passes. With respect to +preference handling we implemented three passes. In the first pass, called +\emph{mono-preferences}, we consider only the aliases corresponding to the +current preferences. In the second pass, called \emph{multi-preferences}, we +consider every alias corresponding to a current or past preference. For +instance, in the example above disambiguation succeeds in the multi-preference +pass. In the third pass, called \emph{library-preferences}, all aliases +available in the library are considered. + +\TODO{rivedere questo periodo\ldots} +The rationale behind this choice is trying to respect user preferences in early +passes that complete quickly in case of failure; later passes are slower but +have more chances of success. \subsubsection{Operator instances} \label{sec:disambinstances} -Let us suppose now we want to define a theorem relating ordering relations on -natural and integer numbers. The way we would like to write such a theorem (as -we can read it in the \MATITA{} standard library) is: - +Consider now the following theorem: \begin{grafite} -include "Z/z.ma". -include "nat/orders.ma". -.. theorem lt_to_Zlt_pos_pos: \forall n, m: nat. n < m \to pos n < pos m. \end{grafite} - -Unfortunately, none of the passes described above is able to disambiguate its -type, no matter how aliases are defined. This is because the \OP{<} operator -occurs twice in the content level term (it has two \emph{instances}) and two -different interpretations for it have to be used in order to obtain a refinable -partially specified term. To address this issue, we have the ability to consider -each instance of a single symbol as a different ambiguous expression in the -content level term, and thus we can assign a different interpretation to each of -them. A disambiguation pass which exploit this feature is said to be using -\emph{fresh instances}. +and assume that there exist in the library aliases for \OP{<} over natural +numbers and over integer numbers. None of the passes described above is able to +disambiguate \texttt{lt\_to\_Zlt\_pos\_pos}, no matter how preferences are set. +This is because the \OP{<} operator occurs twice in the content level term (it +has two \emph{instances}) and two different interpretations for it have to be +used in order to obtain a refinable partially specified term. + +To address this issue, we have the ability to consider each instance of a single +symbol as a different ambiguous expression in the content level term, and thus +we can use a different alias for each of them. Exploiting or not this feature is +one of the disambiguation pass parameters. A disambiguation pass which exploit +it is said to be using \emph{fresh instances} (opposed to a \emph{shared +instances} pass). Fresh instances lead to a non negligible performance loss (since the choice of -an interpretation for one instances does not constraint the choice for the -others). For this reason we always attempt a fresh instances pass only after -attempting a non-fresh one. - -\paragraph{One-shot aliases} Disambiguation aliases as seen so far are -instance-independent. However, aliases obtained as a result of a disambiguation -pass which uses fresh instances ought to be instance-dependent, that is: to -ensure a term can be disambiguated in a batch fashion we may need to state that -an \emph{i}-th instance of a symbol should be mapped to a given partially -specified term. Instance-depend aliases are meaningful only for the term whose +an alias for one instances does not constraint the choice of the others). For +this reason we always attempt a fresh instances pass only after attempting a +non-fresh one. + +\paragraph{One-shot preferences} Disambiguation preferecens as seen so far are +instance-independent. However, implicit preferences obtained as a result of a +disambiguation pass which uses fresh instances ought to be instance-dependent. +Informally, the set of preferences that can be satisfied by the disambiguator on +the theorem above is: ``the first instance of the \OP{<} symbol is over natural +numbers, while the second is on integer numbers''. + +Instance-depend preferences are meaningful only for the term whose disambiguation generated it. For this reason we call them \emph{one-shot -aliases} and \MATITA{} does not use it to disambiguate further terms down in the -script. +preferences} and \MATITA{} does not use them to disambiguate further terms in +the script. \subsubsection{Implicit coercions} \label{sec:disambcoercions} -Let us now consider a theorem about derivation: - +Consider the following theorem about derivation: \begin{grafite} theorem power_deriv: \forall n: nat, x: R. d x ^ n dx = n * x ^ (n - 1). \end{grafite} - -and suppose there exists a \texttt{R \TEXMACRO{to} nat \TEXMACRO{to} R} -interpretation for \OP{\^}, and a real number interpretation for \OP{*}. -Mathematicians would write the term that way since it is well known that the -natural number \texttt{n} could be ``injected'' in \IR{} and considered a real -number for the purpose of real multiplication. The refiner of \MATITA{} supports -\emph{implicit coercions} for this reason: given as input the above content -level term, it will return a partially specified term where in place of -\texttt{n} the application of a coercion from \texttt{nat} to \texttt{R} appears -(assuming it has been defined as such of course). - -Nonetheless coercions are not always desirable. For example, in disambiguating +and assume that in the library there is an alias mapping \OP{\^} to a partially +specified term having type: \texttt{R \TEXMACRO{to} nat \TEXMACRO{to} R}. In +order to disambiguate \texttt{power\_deriv}, the occurrence of \texttt{n} on the +right hand side of the equality need to be ``injected'' from \texttt{nat} to +\texttt{R}. The refiner of \MATITA{} supports \emph{implicit coercions} for +this reason: given as input the above presentation level term, it will return a +partially specified term where in place of \texttt{n} the application of a +coercion from \texttt{nat} to \texttt{R} appears (assuming such a coercion has +been defined in advance). + +Coercions are not always desirable. For example, in disambiguating \texttt{\TEXMACRO{forall} x: nat. n < n + 1} we do not want the term which uses two coercions from \texttt{nat} to \texttt{R} around \OP{<} arguments to show up -among the possible partially specified term choices. For this reason in -\MATITA{} we always try first a disambiguation pass which require the refiner -not to use the coercions and only in case of failure we attempt a -coercion-enabled pass. - -It is interesting to observe also the relationship among operator instances and -implicit coercions. Consider again the theorem \texttt{lt\_to\_Zlt\_pos\_pos}, -which \MATITA{} disambiguated using fresh instances. In case there exists a -coercion from natural numbers to (positive) integers (which indeed does, it is -the \texttt{pos} constructor itself), the theorem can be disambiguated using -twice that coercion on the left hand side of the implication. The obtained -partially specified term however would not probably be the expected one, being a -theorem which prove a trivial implication. For this reason we choose to always -prefer fresh instances over implicit coercions, i.e. we always attempt -disambiguation passes with fresh instances and no implicit coercions before -attempting passes with implicit coercions. +among the possible partially specified term choices. For this reason we always +attempt a disambiguation pass which require the refiner not to use the coercions +before attempting a coercion-enabled pass. + +The choice of whether implicit coercions are enabled or not interact with the +choice about operator instances. Indeed, consider again +\texttt{lt\_to\_Zlt\_pos\_pos}, which can be disambiguated using fresh operator +instances. In case there exists a coercion from natural numbers to (positive) +integers (which indeed does, it is the \texttt{pos} constructor itself), the +theorem can be disambiguated using twice that coercion on the left hand side of +the implication. The obtained partially specified term however would not +probably be the expected one, being a theorem which prove a trivial implication. +For this reason we choose to always prefer fresh instances over implicit +coercions, i.e. we always attempt disambiguation passes with fresh instances +and no implicit coercions before attempting passes with implicit coercions. \subsubsection{Disambiguation passes} \label{sec:disambpasses} @@ -971,22 +960,22 @@ our experience that choice gives reasonable performance and minimize the need of user interaction during the disambiguation. \begin{table}[ht] - \caption{Sequence of disambiguation passes used in \MATITA.\strut} + \caption{Disambiguation passes sequence.\strut} \label{tab:disambpasses} \begin{center} \begin{tabular}{c|c|c|c} \multicolumn{1}{p{1.5cm}|}{\centering\raisebox{-1.5ex}{\textbf{Pass}}} - & \multicolumn{1}{p{3.1cm}|}{\centering\textbf{Disambiguation aliases}} + & \multicolumn{1}{p{3.1cm}|}{\centering\textbf{Preferences}} & \multicolumn{1}{p{2.5cm}|}{\centering\textbf{Operator instances}} & \multicolumn{1}{p{2.5cm}}{\centering\textbf{Implicit coercions}} \\ \hline - \PASS & Mono aliases & Shared & Disabled \\ - \PASS & Multi aliases & Shared & Disabled \\ - \PASS & Mono aliases & Fresh instances & Disabled \\ - \PASS & Multi aliases & Fresh instances & Disabled \\ - \PASS & Mono aliases & Fresh instances & Enabled \\ - \PASS & Multi aliases & Fresh instances & Enabled \\ - \PASS & Library aliases& Fresh instances & Enabled + \PASS & Mono-preferences & Shared instances & Disabled \\ + \PASS & Multi-preferences & Shared instances & Disabled \\ + \PASS & Mono-preferences & Fresh instances & Disabled \\ + \PASS & Multi-preferences & Fresh instances & Disabled \\ + \PASS & Mono-preferences & Fresh instances & Enabled \\ + \PASS & Multi-preferences & Fresh instances & Enabled \\ + \PASS & Library-preferences & Fresh instances & Enabled \end{tabular} \end{center} \end{table} @@ -1203,7 +1192,7 @@ fully supported so that mathematical glyphs can be input as such. \begin{figure}[!ht] \begin{center} \includegraphics[width=0.95\textwidth]{pics/matita-screenshot} - \caption{\MATITA{} look and feel} + \caption{\MATITA{} look and feel.\strut} \label{fig:screenshot} \end{center} \end{figure} @@ -1249,7 +1238,7 @@ applications. \includegraphics[width=0.40\textwidth]{pics/matita-screenshot-selection} \hspace{0.05\textwidth} \raisebox{0.4cm}{\includegraphics[width=0.50\textwidth]{pics/matita-screenshot-href}} - \caption{Semantic selection and hyperlinks} + \caption{Semantic selection and hyperlinks.\strut} \label{fig:directmanip} \end{center} \end{figure} @@ -1263,7 +1252,7 @@ applications. \includegraphics[width=0.30\textwidth]{pics/cicbrowser-screenshot-query} \hspace{0.02\textwidth} \includegraphics[width=0.30\textwidth]{pics/cicbrowser-screenshot-con} - \caption{Browsing and searching the library} + \caption{Browsing and searching the library.\strut} \label{fig:cicbrowser} \end{center} \end{figure} @@ -1713,7 +1702,7 @@ scripts, listed in Tab.~\ref{tab:scripts}. \FILE{fermat\_little\_th.ma} \\ \FILE{totient.ma} & & \\ \end{tabular} - \caption{Scripts on natural numbers in the standard library} + \caption{Scripts on natural numbers in the standard library.\strut} \label{tab:scripts} \end{table} -- 2.39.2