From 19e209d893c9e5ccb3b8b3ee0a1cb8b32311c57b Mon Sep 17 00:00:00 2001 From: Claudio Sacerdoti Coen Date: Mon, 23 Jan 2006 14:31:10 +0000 Subject: [PATCH] Cenni alla disambiguazione lazy. --- helm/papers/matita/matita2.tex | 59 +++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 19 deletions(-) diff --git a/helm/papers/matita/matita2.tex b/helm/papers/matita/matita2.tex index 9a9870fcc..f970c8c29 100644 --- a/helm/papers/matita/matita2.tex +++ b/helm/papers/matita/matita2.tex @@ -344,7 +344,7 @@ To better understand the architecture of \MATITA{} and the role of each \component, we can focus on the representation of the mathematical information. \MATITA{} is based on (a variant of) the Calculus of (Co)Inductive Constructions (CIC). In CIC terms are used to represent mathematical -expressions, types and proofs. \MATITA{} is able to handle terms at +formulae, types and proofs. \MATITA{} is able to handle terms at four different levels of specification. On each level it is possible to provide a different set of functionalities. The four different levels are: fully specified terms; partially specified terms; @@ -386,7 +386,7 @@ content level terms; presentation level terms. We use metadata and a sort of crawler to index the mathematical notions in the distributed library. We are interested in retrieving a notion by matching, instantiation or generalization of a user or system provided - mathematical expression. Thus we need to collect metadata over the fully + mathematical formula. Thus we need to collect metadata over the fully specified terms and to store the metadata in some kind of (relational) database for later usage. The \texttt{hmysql} \component{} provides a simplified @@ -427,13 +427,13 @@ occurrence of a metavariable. A metavariable stand for a term whose type is given by the conclusion of the sequent. The term must be closed in the context that is given by the ordered list of hypotheses of the sequent. The explicit substitution instantiates every hypothesis with an actual -value for the term bound by the hypothesis. +value for the variable bound by the hypothesis. Partially specified terms are not required to be well-typed. However a partially specified term should be \emph{refinable}. A \emph{refiner} is a type-inference procedure that can instantiate implicit terms and metavariables and that can introduce \emph{implicit coercions} to make a -partially specified term be well-typed. The refiner of \MATITA{} is implemented +partially specified term well-typed. The refiner of \MATITA{} is implemented in the \texttt{cic\_unification} \component. As the type checker is based on the conversion check, the refiner is based on \emph{unification} that is a procedure that makes two partially specified term convertible by instantiating @@ -457,7 +457,7 @@ Other commands are used to give definitions and axioms or to state theorems and lemmas. The \texttt{grafite\_engine} \component{} is the core of \MATITA{}. It implements the semantics of each command in the grafite AST as a function from status to status. It implements also an undo function to go back to -previous statuses. \TODO{parlare di disambiguazione lazy \& co?} +previous statuses. As fully specified terms, partially specified terms are not well suited for user consumption since their syntax is not extendible and it is not @@ -468,7 +468,7 @@ information that can be inferred by the refiner. \subsection{Content level terms} \label{sec:contentintro} -The language used to communicate proofs and expecially expressions with the +The language used to communicate proofs and expecially formulae with the user does not only needs to be extendible and accomodate the usual mathematical notation. It must also reflect the comfortable degree of imprecision and ambiguity that the mathematical language provides. @@ -491,7 +491,7 @@ very precise on the types he is using and their representation. However, to communicate formulae with the user and with external tools, it seems good practice to stick to the usual imprecise mathematical ontology. In the Mathematical Knowledge Management community this imprecise language is called -the \emph{content level} representation of expressions. +the \emph{content level} representation of formulae. In \MATITA{} we provide two translations: from partially specified terms to content level terms and the other way around. The first translation can also @@ -500,18 +500,19 @@ case of partially specified term where no metavariable or implicit term occurs. The translation from partially specified terms to content level terms must discriminate between terms used to represent proofs and terms used to represent -expressions. The firsts are translated to a content level representation of -proof steps that can easily be rendered in natural language. The latters -are translated to MathML Content formulae. MathML Content~\cite{mathml} is a W3C -standard -for the representation of content level expressions in an XML extensible format. +formulae. The firsts are translated to a content level representation of +proof steps that can easily be rendered in natural language. The representation +adopted has greatly influenced the OMDoc~\cite{omdoc} proof format that is now +isomorphic to it. Terms that represent formulae are translated to MathML +Content formulae. MathML Content~\cite{mathml} is a W3C standard +for the representation of content level formulae in an XML extensible format. The translation to content level is implemented in the \texttt{acic\_content} \component. Its input are \emph{annotated partially specified terms}, that are maximally unshared partially specified terms enriched with additional typing information for each subterm. This information is used to discriminate between terms that represent -proofs and terms that represent expressions. Part of it is also stored at the +proofs and terms that represent formulae. Part of it is also stored at the content level since it is required to generate the natural language rendering of proofs. The terms need to be maximally unshared (i.e. they must be a tree and not a DAG). The reason is that to the occurrences of a subterm in @@ -525,21 +526,41 @@ in the library in their annotated form. We do not provide yet a reverse translation from content level proofs to partially specified terms. But in \texttt{cic\_disambiguation} we do provide -the reverse translation for expressions. The mapping from -content level expressions to partially specified terms is not unique due to +the reverse translation for formulae. The mapping from +content level formulae to partially specified terms is not unique due to the ambiguity of the content level. As a consequence the translation is guided by an \emph{interpretation}, that is a function that chooses for -every ambiguous expression one partially specified term. The +every ambiguous formula one partially specified term. The \texttt{cic\_disambiguation} \component{} implements the disambiguation algorithm we presented in~\cite{disambiguation} that is responsible of building in an efficicent way the set of all ``correct'' interpretations. An interpretation is correct if the partially specified term obtained using the interpretation is refinable. +In the last section we have described the semantics of a command as a +function from status to status. We also suggested that the formulae in a +command are encoded as partially specified terms. However, consider the +command ``\texttt{replace} $x$ \texttt{with} $y^2$''. Until the occurrence +of $x$ to be replaced is located, its context is unknown. Since $y^2$ must +replace $x$ in that context, its encoding as a term cannot be computed +until $x$ is located. In other words, $y^2$ must be disambiguated in the +context of the occurrence $x$ it must replace. + +The elegant solution we have implemented consists in representing terms +in a command as function from a context to a partially refined term. The +function is obtained by partially applying our disambiguation function to +the content term to be disambiguated. Our solution should be compared with +the one adopted in the Coq system (where ambiguity is only relative to +DeBrujin indexes). In Coq variables can be bound either by name or by +position. This makes more complex every operation over terms (i.e. according +to our architecture every module that depends on \texttt{cic}). Moreover, +this solution cannot cope with other forms of ambiguity (as the meaning +of the $~^2$ exponent in the previous example that depends on the context). + \subsection{Presentation level terms} Content level terms are a sort of abstract syntax trees for mathematical -expressions and proofs. The concrete syntax given to these abstract trees +formulae and proofs. The concrete syntax given to these abstract trees is called \emph{presentation level}. The main important difference between the content level language and the @@ -638,8 +659,8 @@ The \components{} not yet described (\texttt{extlib}, \texttt{xml}, \texttt{logger}, \texttt{registry} and \texttt{utf8\_macros}) are minor \components{} that provide a core of useful functions and basic services missing from the standard library of the programming language. -In particular, the \texttt{xml} \component{} is used -to easily represent, parse and pretty-print XML files. +%In particular, the \texttt{xml} \component{} is used to easily represent, +%parse and pretty-print XML files. \section{Using \MATITA (boh \ldots cambiare titolo)} -- 2.39.2