From 19e209d893c9e5ccb3b8b3ee0a1cb8b32311c57b Mon Sep 17 00:00:00 2001
From: Claudio Sacerdoti Coen <claudio.sacerdoticoen@unibo.it>
Date: Mon, 23 Jan 2006 14:31:10 +0000
Subject: [PATCH] Cenni alla disambiguazione lazy.

---
 helm/papers/matita/matita2.tex | 59 +++++++++++++++++++++++-----------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/helm/papers/matita/matita2.tex b/helm/papers/matita/matita2.tex
index 9a9870fcc..f970c8c29 100644
--- a/helm/papers/matita/matita2.tex
+++ b/helm/papers/matita/matita2.tex
@@ -344,7 +344,7 @@ To better understand the architecture of \MATITA{} and the role of each
 \component, we can focus on the representation of the mathematical information.
 \MATITA{} is based on (a variant of) the Calculus of (Co)Inductive
 Constructions (CIC). In CIC terms are used to represent mathematical
-expressions, types and proofs. \MATITA{} is able to handle terms at
+formulae, types and proofs. \MATITA{} is able to handle terms at
 four different levels of specification. On each level it is possible to provide
 a different set of functionalities. The four different levels are:
 fully specified terms; partially specified terms; 
@@ -386,7 +386,7 @@ content level terms; presentation level terms.
    We use metadata and a sort of crawler to index the mathematical notions
    in the distributed library. We are interested in retrieving a notion
    by matching, instantiation or generalization of a user or system provided
-   mathematical expression. Thus we need to collect metadata over the fully
+   mathematical formula. Thus we need to collect metadata over the fully
    specified terms and to store the metadata in some kind of (relational)
    database for later usage. The \texttt{hmysql} \component{} provides
    a simplified
@@ -427,13 +427,13 @@ occurrence of a metavariable. A metavariable stand for a term whose type is
 given by the conclusion of the sequent. The term must be closed in the
 context that is given by the ordered list of hypotheses of the sequent.
 The explicit substitution instantiates every hypothesis with an actual
-value for the term bound by the hypothesis.
+value for the variable bound by the hypothesis.
 
 Partially specified terms are not required to be well-typed. However a
 partially specified term should be \emph{refinable}. A \emph{refiner} is
 a type-inference procedure that can instantiate implicit terms and
 metavariables and that can introduce \emph{implicit coercions} to make a
-partially specified term be well-typed. The refiner of \MATITA{} is implemented
+partially specified term well-typed. The refiner of \MATITA{} is implemented
 in the \texttt{cic\_unification} \component. As the type checker is based on
 the conversion check, the refiner is based on \emph{unification} that is
 a procedure that makes two partially specified term convertible by instantiating
@@ -457,7 +457,7 @@ Other commands are used to give definitions and axioms or to state theorems
 and lemmas. The \texttt{grafite\_engine} \component{} is the core of \MATITA{}.
 It implements the semantics of each command in the grafite AST as a function
 from status to status.  It implements also an undo function to go back to
-previous statuses. \TODO{parlare di disambiguazione lazy \& co?}
+previous statuses.
 
 As fully specified terms, partially specified terms are not well suited
 for user consumption since their syntax is not extendible and it is not
@@ -468,7 +468,7 @@ information that can be inferred by the refiner.
 \subsection{Content level terms}
 \label{sec:contentintro}
 
-The language used to communicate proofs and expecially expressions with the
+The language used to communicate proofs and expecially formulae with the
 user does not only needs to be extendible and accomodate the usual mathematical
 notation. It must also reflect the comfortable degree of imprecision and
 ambiguity that the mathematical language provides.
@@ -491,7 +491,7 @@ very precise on the types he is using and their representation. However,
 to communicate formulae with the user and with external tools, it seems good
 practice to stick to the usual imprecise mathematical ontology. In the
 Mathematical Knowledge Management community this imprecise language is called
-the \emph{content level} representation of expressions.
+the \emph{content level} representation of formulae.
 
 In \MATITA{} we provide two translations: from partially specified terms
 to content level terms and the other way around. The first translation can also
@@ -500,18 +500,19 @@ case of partially specified term where no metavariable or implicit term occurs.
 
 The translation from partially specified terms to content level terms must
 discriminate between terms used to represent proofs and terms used to represent
-expressions. The firsts are translated to a content level representation of
-proof steps that can easily be rendered in natural language. The latters
-are translated to MathML Content formulae. MathML Content~\cite{mathml} is a W3C
-standard
-for the representation of content level expressions in an XML extensible format.
+formulae. The firsts are translated to a content level representation of
+proof steps that can easily be rendered in natural language. The representation
+adopted has greatly influenced the OMDoc~\cite{omdoc} proof format that is now
+isomorphic to it. Terms that represent formulae are translated to MathML
+Content formulae. MathML Content~\cite{mathml} is a W3C standard
+for the representation of content level formulae in an XML extensible format.
 
 The translation to content level is implemented in the
 \texttt{acic\_content} \component. Its input are \emph{annotated partially
 specified terms}, that are maximally unshared
 partially specified terms enriched with additional typing information for each
 subterm. This information is used to discriminate between terms that represent
-proofs and terms that represent expressions. Part of it is also stored at the
+proofs and terms that represent formulae. Part of it is also stored at the
 content level since it is required to generate the natural language rendering
 of proofs. The terms need to be maximally unshared (i.e. they must be a tree
 and not a DAG). The reason is that to the occurrences of a subterm in
@@ -525,21 +526,41 @@ in the library in their annotated form.
 
 We do not provide yet a reverse translation from content level proofs to
 partially specified terms. But in \texttt{cic\_disambiguation} we do provide
-the reverse translation for expressions. The mapping from
-content level expressions to partially specified terms is not unique due to
+the reverse translation for formulae. The mapping from
+content level formulae to partially specified terms is not unique due to
 the ambiguity of the content level. As a consequence the translation
 is guided by an \emph{interpretation}, that is a function that chooses for
-every ambiguous expression one partially specified term. The
+every ambiguous formula one partially specified term. The
 \texttt{cic\_disambiguation} \component{} implements the
 disambiguation algorithm we presented in~\cite{disambiguation} that is
 responsible of building in an efficicent way the set of all ``correct''
 interpretations. An interpretation is correct if the partially specified term
 obtained using the interpretation is refinable.
 
+In the last section we have described the semantics of a command as a
+function from status to status. We also suggested that the formulae in a
+command are encoded as partially specified terms. However, consider the
+command ``\texttt{replace} $x$ \texttt{with} $y^2$''. Until the occurrence
+of $x$ to be replaced is located, its context is unknown. Since $y^2$ must
+replace $x$ in that context, its encoding as a term cannot be computed
+until $x$ is located. In other words, $y^2$ must be disambiguated in the
+context of the occurrence $x$ it must replace.
+
+The elegant solution we have implemented consists in representing terms
+in a command as function from a context to a partially refined term. The
+function is obtained by partially applying our disambiguation function to
+the content term to be disambiguated. Our solution should be compared with
+the one adopted in the Coq system (where ambiguity is only relative to
+DeBrujin indexes). In Coq variables can be bound either by name or by
+position. This makes more complex every operation over terms (i.e. according
+to our architecture every module that depends on \texttt{cic}). Moreover,
+this solution cannot cope with other forms of ambiguity (as the meaning
+of the $~^2$ exponent in the previous example that depends on the context).
+
 \subsection{Presentation level terms}
 
 Content level terms are a sort of abstract syntax trees for mathematical
-expressions and proofs. The concrete syntax given to these abstract trees
+formulae and proofs. The concrete syntax given to these abstract trees
 is called \emph{presentation level}.
 
 The main important difference between the content level language and the
@@ -638,8 +659,8 @@ The \components{} not yet described (\texttt{extlib}, \texttt{xml},
 \texttt{logger}, \texttt{registry} and \texttt{utf8\_macros}) are 
 minor \components{} that provide a core of useful functions and basic
 services missing from the standard library of the programming language.
-In particular, the \texttt{xml} \component{} is used
-to easily represent, parse and pretty-print XML files.
+%In particular, the \texttt{xml} \component{} is used to easily represent,
+%parse and pretty-print XML files.
 
 \section{Using \MATITA (boh \ldots cambiare titolo)}
 
-- 
2.39.5