uniformed disambiguation part

author Stefano Zacchiroli <zack@upsilon.cc>

Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)

committer Stefano Zacchiroli <zack@upsilon.cc>

Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)
author Stefano Zacchiroli <zack@upsilon.cc>
Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)
committer Stefano Zacchiroli <zack@upsilon.cc>
Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)
diff --git a/helm/papers/matita/matita2.tex b/helm/papers/matita/matita2.tex

index 5c56eccf3a83629361c4547efe1483feac55759e..e74021c5b22c1c276693dc0c7139f1d780432737 100644 (file)
--- a/helm/papers/matita/matita2.tex
+++ b/helm/papers/matita/matita2.tex
@@ -1,4 +1,4 @@
-\documentclass[]{kluwer}
+\documentclass[draft]{kluwer}
  \usepackage{color}
  \usepackage{graphicx}
  \usepackage{hyperref}
@@ -29,7 +29,6 @@
  \newcommand{\MATITA}{Matita}
  \newcommand{\MATITAC}{\texttt{matitac}}
  \newcommand{\MATITADEP}{\texttt{matitadep}}
-\newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline}
  \newcommand{\MOWGLI}{MoWGLI}
  \newcommand{\NAT}{\ensuremath{\mathit{nat}}}
  \newcommand{\NATIND}{\mathit{nat\_ind}}
@@ -41,6 +40,7 @@
  \newcommand{\UWOBO}{UWOBO}
  \newcommand{\GETTER}{Getter}
  \newcommand{\WHELP}{Whelp}
+
  \newcommand{\DOT}{\ensuremath{\mbox{\textbf{.}}}}
  \newcommand{\SEMICOLON}{\ensuremath{\mbox{\textbf{;}}}}
  \newcommand{\BRANCH}{\ensuremath{\mbox{\textbf{[}}}}
@@ -291,7 +291,7 @@ allow other developers to quickly understand our code and contribute.
    \includegraphics[width=0.9\textwidth,height=0.8\textheight]{pics/libraries-clusters}
    \caption[\MATITA{} components and related applications]{\MATITA{}
     components and related applications, with thousands of line of
-   codes (klocs)}
+   codes (klocs).\strut}
    \label{fig:libraries}
   \end{center}
  \end{figure}
@@ -757,6 +757,8 @@ the \MATITA{} authoring interface.
  \subsection{Indexing and searching}
  \label{sec:indexing}
  
+\TODO{sezione sull'indicizzazione}
+
  \subsection{Disambiguation}
  \label{sec:disambiguation}
  
@@ -773,7 +775,7 @@ Sect.~\ref{sec:contentintro}.
  The key component of the translation is the generic disambiguation algorithm
  implemented in the \texttt{disambiguation} component of Fig.~\ref{fig:libraries}
  and presented in~\cite{disambiguation}. In this section we present how to use
-such an algorithm in the context of the development of a library of formalized
+that algorithm in the context of the development of a library of formalized
  mathematics. We will see that using multiple passes of the algorithm, varying
  some of its parameters, helps in keeping the input terse without sacrificing
  expressiveness.
@@ -781,7 +783,7 @@ expressiveness.
  \subsubsection{Disambiguation aliases}
  \label{sec:disambaliases}
  
-Consider the following command to state a theorem over integer numbers.
+Consider the following command to state a theorem over integer numbers:
  
  \begin{grafite}
  theorem Zlt_compat:
@@ -794,8 +796,8 @@ Thus, according to the disambiguation algorithm, two different
  refinable partially specified terms could be associated to it.
  \MATITA{} asks the user what interpretation he meant. However, to avoid
  posing the same question in case of a future re-execution (e.g. undo/redo),
-the choice must be recorded. Scripts need to be re-executed after invalidation.
-Therefore the choice record must be permanently stored somewhere. The most
+the choice must be recorded. Since scripts need to be re-executed after
+invalidation, the choice record must be permanently stored somewhere. The most
  natural place is in the script itself.
  
  In \MATITA{} disambiguation is governed by \emph{disambiguation aliases}.
@@ -829,7 +831,7 @@ the disambiguation in the rest of the script and speed up disambiguation
  reducing the search space.
  
  Disambiguation preferences are included in the lexicon status
-(see Sect.~\ref{sec:presentationintro} that is part of the authoring interface
+(see Sect.~\ref{sec:presentationintro}) that is part of the authoring interface
  status.  Unlike aliases, they are not part of the library.
  
  When starting a new authoring session the set of disambiguation preferences
@@ -847,120 +849,107 @@ over integer numbers.
  
  Nothing forbids the set of preferences to become incoherent. For this reason
  the disambiguator cannot always respect the user preferences.
-For example consider:
-
+Consider, for example:
  \begin{grafite}
-theorem Zlt_mono: \forall x, y, k. x < y \to x < y + k.
+theorem Zlt_mono:
+  \forall x, y, k. x < y \to x < y + k.
  \end{grafite}
  
  No refinable partially specified term corresponds to the preferences:
-\OP{+} over natural numbers, \OP{<} over integer numbers.
-When the disambiguator fails, disambiguation is tried again with a less
-strict set of preferences. Thus disambiguation is organized in
-multiple \emph{passes}. In the first pass, called \emph{mono-preferences},
-we consider only the aliases corresponding to the current preferences.
-In the second pass, called \emph{multi-preferences},
-we consider every alias corresponding to a current or past preference.
-For instance, in the example above disambiguation succeeds in the
-multi-preference pass 
-
-\TODO{Disambiguazione: cambiato solo FINQUI}
-
-and suppose that the \OP{+} operator is defined only on natural numbers. If
-the alias for \OP{<} points to the integer version of the operator, no
-refinable partially specified term matching the term could be found.
-
-For this reason we chose to attempt \emph{multiple disambiguation passes}. A
-first pass attempts to disambiguate using the last available disambiguation
-aliases (\emph{mono aliases} pass); in case of failure the next pass tries
-disambiguation again forgetting the aliases and using the whole library to
-retrieve interpretation for ambiguous expressions (\emph{library aliases} pass).
-Since the latter pass may lead to too many choices we intertwined an additional
-pass among the two which use as interpretations all the aliases coming for
-included parts of the library (\emph{multi aliases} phase). This is the reason
-why aliases are \emph{one-to-many} mappings instead of one-to-one. This choice
-turned out to be a well-balanced trade-off among performances (earlier passes
-fail quickly) and degree of ambiguity supported for presentation level terms.
+\OP{+} over natural numbers, \OP{<} over integer numbers. To overcome this
+limitation we organized disambiguation in \emph{multiple passes}: when the
+disambiguator fails, disambiguation is tried again with a less strict set of
+preferences.
+
+Several disambiguation parameters can vary among passes. With respect to
+preference handling we implemented three passes.  In the first pass, called
+\emph{mono-preferences}, we consider only the aliases corresponding to the
+current preferences.  In the second pass, called \emph{multi-preferences}, we
+consider every alias corresponding to a current or past preference.  For
+instance, in the example above disambiguation succeeds in the multi-preference
+pass. In the third pass, called \emph{library-preferences}, all aliases
+available in the library are considered.
+
+\TODO{rivedere questo periodo\ldots}
+The rationale behind this choice is trying to respect user preferences in early
+passes that complete quickly in case of failure; later passes are slower but
+have more chances of success.
  
  \subsubsection{Operator instances}
  \label{sec:disambinstances}
  
-Let us suppose now we want to define a theorem relating ordering relations on
-natural and integer numbers. The way we would like to write such a theorem (as
-we can read it in the \MATITA{} standard library) is:
-
+Consider now the following theorem:
  \begin{grafite}
-include "Z/z.ma".
-include "nat/orders.ma".
-..
  theorem lt_to_Zlt_pos_pos:
    \forall n, m: nat. n < m \to pos n < pos m. 
  \end{grafite}
-
-Unfortunately, none of the passes described above is able to disambiguate its
-type, no matter how aliases are defined. This is because the \OP{<} operator
-occurs twice in the content level term (it has two \emph{instances}) and two
-different interpretations for it have to be used in order to obtain a refinable
-partially specified term. To address this issue, we have the ability to consider
-each instance of a single symbol as a different ambiguous expression in the
-content level term, and thus we can assign a different interpretation to each of
-them. A disambiguation pass which exploit this feature is said to be using
-\emph{fresh instances}.
+and assume that there exist in the library aliases for \OP{<} over natural
+numbers and over integer numbers. None of the passes described above is able to
+disambiguate \texttt{lt\_to\_Zlt\_pos\_pos}, no matter how preferences are set.
+This is because the \OP{<} operator occurs twice in the content level term (it
+has two \emph{instances}) and two different interpretations for it have to be
+used in order to obtain a refinable partially specified term.
+
+To address this issue, we have the ability to consider each instance of a single
+symbol as a different ambiguous expression in the content level term, and thus
+we can use a different alias for each of them. Exploiting or not this feature is
+one of the disambiguation pass parameters. A disambiguation pass which exploit
+it is said to be using \emph{fresh instances} (opposed to a \emph{shared
+instances} pass).
  
  Fresh instances lead to a non negligible performance loss (since the choice of
-an interpretation for one instances does not constraint the choice for the
-others). For this reason we always attempt a fresh instances pass only after
-attempting a non-fresh one.
-
-\paragraph{One-shot aliases} Disambiguation aliases as seen so far are
-instance-independent. However, aliases obtained as a result of a disambiguation
-pass which uses fresh instances ought to be instance-dependent, that is: to
-ensure a term can be disambiguated in a batch fashion we may need to state that
-an \emph{i}-th instance of a symbol should be mapped to a given partially
-specified term. Instance-depend aliases are meaningful only for the term whose
+an alias for one instances does not constraint the choice of the others). For
+this reason we always attempt a fresh instances pass only after attempting a
+non-fresh one.
+
+\paragraph{One-shot preferences} Disambiguation preferecens as seen so far are
+instance-independent. However, implicit preferences obtained as a result of a
+disambiguation pass which uses fresh instances ought to be instance-dependent.
+Informally, the set of preferences that can be satisfied by the disambiguator on
+the theorem above is: ``the first instance of the \OP{<} symbol is over natural
+numbers, while the second is on integer numbers''.
+
+Instance-depend preferences are meaningful only for the term whose
  disambiguation generated it. For this reason we call them \emph{one-shot
-aliases} and \MATITA{} does not use it to disambiguate further terms down in the
-script.
+preferences} and \MATITA{} does not use them to disambiguate further terms in
+the script.
  
  \subsubsection{Implicit coercions}
  \label{sec:disambcoercions}
  
-Let us now consider a theorem about derivation:
-
+Consider the following theorem about derivation:
  \begin{grafite}
  theorem power_deriv:
    \forall n: nat, x: R. d x ^ n dx = n * x ^ (n - 1).
  \end{grafite}
-
-and suppose there exists a \texttt{R \TEXMACRO{to} nat \TEXMACRO{to} R}
-interpretation for \OP{\^}, and a real number interpretation for \OP{*}.
-Mathematicians would write the term that way since it is well known that the
-natural number \texttt{n} could be ``injected'' in \IR{} and considered a real
-number for the purpose of real multiplication. The refiner of \MATITA{} supports
-\emph{implicit coercions} for this reason: given as input the above content
-level term, it will return a partially specified term where in place of
-\texttt{n} the application of a coercion from \texttt{nat} to \texttt{R} appears
-(assuming it has been defined as such of course).
-
-Nonetheless coercions are not always desirable. For example, in disambiguating
+and assume that in the library there is an alias mapping \OP{\^} to a partially
+specified term having type: \texttt{R \TEXMACRO{to} nat \TEXMACRO{to} R}. In
+order to disambiguate \texttt{power\_deriv}, the occurrence of \texttt{n} on the
+right hand side of the equality need to be ``injected'' from \texttt{nat} to
+\texttt{R}.  The refiner of \MATITA{} supports \emph{implicit coercions} for
+this reason: given as input the above presentation level term, it will return a
+partially specified term where in place of \texttt{n} the application of a
+coercion from \texttt{nat} to \texttt{R} appears (assuming such a coercion has
+been defined in advance).
+
+Coercions are not always desirable. For example, in disambiguating
  \texttt{\TEXMACRO{forall} x: nat. n < n + 1} we do not want the term which uses
  two coercions from \texttt{nat} to \texttt{R} around \OP{<} arguments to show up
-among the possible partially specified term choices. For this reason in
-\MATITA{} we always try first a disambiguation pass which require the refiner
-not to use the coercions and only in case of failure we attempt a
-coercion-enabled pass.
-
-It is interesting to observe also the relationship among operator instances and
-implicit coercions. Consider again the theorem \texttt{lt\_to\_Zlt\_pos\_pos},
-which \MATITA{} disambiguated using fresh instances. In case there exists a
-coercion from natural numbers to (positive) integers (which indeed does, it is
-the \texttt{pos} constructor itself), the theorem can be disambiguated using
-twice that coercion on the left hand side of the implication. The obtained
-partially specified term however would not probably be the expected one, being a
-theorem which prove a trivial implication. For this reason we choose to always
-prefer fresh instances over implicit coercions, i.e. we always attempt
-disambiguation passes with fresh instances and no implicit coercions before
-attempting passes with implicit coercions.
+among the possible partially specified term choices. For this reason we always
+attempt a disambiguation pass which require the refiner not to use the coercions
+before attempting a coercion-enabled pass.
+
+The choice of whether implicit coercions are enabled or not interact with the
+choice about operator instances. Indeed, consider again
+\texttt{lt\_to\_Zlt\_pos\_pos}, which can be disambiguated using fresh operator
+instances. In case there exists a coercion from natural numbers to (positive)
+integers (which indeed does, it is the \texttt{pos} constructor itself), the
+theorem can be disambiguated using twice that coercion on the left hand side of
+the implication. The obtained partially specified term however would not
+probably be the expected one, being a theorem which prove a trivial implication.
+For this reason we choose to always prefer fresh instances over implicit
+coercions, i.e.  we always attempt disambiguation passes with fresh instances
+and no implicit coercions before attempting passes with implicit coercions.
  
  \subsubsection{Disambiguation passes}
  \label{sec:disambpasses}
@@ -971,22 +960,22 @@ our experience that choice gives reasonable performance and minimize the need of
  user interaction during the disambiguation.
  
  \begin{table}[ht]
- \caption{Sequence of disambiguation passes used in \MATITA.\strut}
+ \caption{Disambiguation passes sequence.\strut}
   \label{tab:disambpasses} 
   \begin{center}
    \begin{tabular}{c|c|c|c}
     \multicolumn{1}{p{1.5cm}|}{\centering\raisebox{-1.5ex}{\textbf{Pass}}}
-   & \multicolumn{1}{p{3.1cm}|}{\centering\textbf{Disambiguation aliases}}
+   & \multicolumn{1}{p{3.1cm}|}{\centering\textbf{Preferences}}
     & \multicolumn{1}{p{2.5cm}|}{\centering\textbf{Operator instances}}
     & \multicolumn{1}{p{2.5cm}}{\centering\textbf{Implicit coercions}} \\
     \hline
-   \PASS & Mono aliases   & Shared         & Disabled \\
-   \PASS & Multi aliases  & Shared         & Disabled \\
-   \PASS & Mono aliases   & Fresh instances & Disabled \\
-   \PASS & Multi aliases  & Fresh instances & Disabled \\
-   \PASS & Mono        aliases   & Fresh instances & Enabled  \\
-   \PASS & Multi aliases  & Fresh instances & Enabled  \\
-   \PASS & Library aliases& Fresh instances & Enabled
+   \PASS & Mono-preferences    & Shared instances  & Disabled \\
+   \PASS & Multi-preferences   & Shared instances  & Disabled \\
+   \PASS & Mono-preferences    & Fresh instances   & Disabled \\
+   \PASS & Multi-preferences   & Fresh instances   & Disabled \\
+   \PASS & Mono-preferences    & Fresh instances   & Enabled  \\
+   \PASS & Multi-preferences   & Fresh instances   & Enabled  \\
+   \PASS & Library-preferences  & Fresh instances   & Enabled
    \end{tabular}
   \end{center}
  \end{table}
@@ -1203,7 +1192,7 @@ fully supported so that mathematical glyphs can be input as such.
  \begin{figure}[!ht]
   \begin{center}
    \includegraphics[width=0.95\textwidth]{pics/matita-screenshot}
-  \caption{\MATITA{} look and feel}
+  \caption{\MATITA{} look and feel.\strut}
    \label{fig:screenshot}
   \end{center}
  \end{figure}
@@ -1249,7 +1238,7 @@ applications.
    \includegraphics[width=0.40\textwidth]{pics/matita-screenshot-selection}
    \hspace{0.05\textwidth}
    \raisebox{0.4cm}{\includegraphics[width=0.50\textwidth]{pics/matita-screenshot-href}}
-  \caption{Semantic selection and hyperlinks}
+  \caption{Semantic selection and hyperlinks.\strut}
    \label{fig:directmanip}
   \end{center}
  \end{figure}
@@ -1263,7 +1252,7 @@ applications.
    \includegraphics[width=0.30\textwidth]{pics/cicbrowser-screenshot-query}
    \hspace{0.02\textwidth}
    \includegraphics[width=0.30\textwidth]{pics/cicbrowser-screenshot-con}
-  \caption{Browsing and searching the library}
+  \caption{Browsing and searching the library.\strut}
    \label{fig:cicbrowser}
   \end{center}
  \end{figure}
@@ -1713,7 +1702,7 @@ scripts, listed in Tab.~\ref{tab:scripts}.
    \FILE{fermat\_little\_th.ma} \\     
    \FILE{totient.ma} & & \\
   \end{tabular}
- \caption{Scripts on natural numbers in the standard library}
+ \caption{Scripts on natural numbers in the standard library.\strut}
   \label{tab:scripts}
  \end{table}
author	Stefano Zacchiroli <zack@upsilon.cc>
	Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)
committer	Stefano Zacchiroli <zack@upsilon.cc>
	Mon, 30 Jan 2006 11:52:27 +0000 (11:52 +0000)