helm/papers/whelp/moogle.tex

   1 \documentclass[a4paper]{llncs}
   2 \pagestyle{headings}
   3 \usepackage{graphicx}
   4 \usepackage{amssymb,amsmath}
   5 \usepackage{hyperref}
   6 \usepackage{picins}
   7
   8 \newcommand{\whelp}{Whelp}
   9
  10 %\newcommand{\logo}[3]{
  11 %\parpic(0cm,0cm)(#2,#3)[l]{\includegraphics[width=#1]{whelp-bw}}
  12 %}
  13
  14 \newcommand{\coq}{Coq}
  15 \newcommand{\mowgli}{MoWGLI}
  16 \newcommand{\IR}{\ensuremath{\mathbb{R}}}
  17 \newcommand{\IN}{\ensuremath{\mathbb{N}}}
  18 \newcommand{\instance}{\textsc{Instance}}
  19 \newcommand{\auto}{\textsc{Auto}}
  20 \newcommand{\hint}{\textsc{Hint}}
  21 \newcommand{\locate}{\textsc{Locate}}
  22 \newcommand{\elim}{\textsc{Elim}}
  23 \newcommand{\match}{\textsc{Match}}
  24 \newcommand{\texmacro}[1]{\texttt{\char92 #1}}
  25 \newcommand{\REF}[3]{\ensuremath{\mathit{Ref}_{#1}(#2,#3)}}
  26 \newcommand{\NAT}{\ensuremath{\mathit{nat}}}
  27 \newcommand{\Prop}{\mathit{Prop}}
  28 \newcommand{\natind}{\mathit{nat\_ind}}
  29 \newcommand{\METAHEADING}{Symbol & Position \\ \hline\hline}
  30 \newcommand{\LIBXSLT}{LibXSLT}
  31 \newcommand{\OCAML}{OCaml}
  32 \newcommand{\UWOBO}{UWOBO}
  33
  34 \title{A content based mathematical search engine: \whelp}
  35 \author{Andrea Asperti, Ferruccio Guidi, Claudio Sacerdoti Coen,\\
  36  Enrico Tassi, and Stefano Zacchiroli}
  37 \institute{Department of Computer Science, University of Bologna\\
  38  Mura Anteo Zamboni, 7 --- 40127 Bologna, ITALY\\
  39  \email{$\{$asperti,fguidi,sacerdot,tassi,zacchiro$\}$@cs.unibo.it}}
  40
  41 \begin{document}
  42 \maketitle
  43
  44 %\begin{center}
  45 %\Large{\textbf{A content based mathematical search engine:\\
  46 %\logo{0.8cm}{-1.4cm}{0.6cm}\whelp}}
  47 %\author{Andrea Asperti, Ferruccio Guidi, Claudio Sacerdoti Coen,\\
  48 % Enrico Tassi, and Stefano Zacchiroli}
  49 %\vspace{0.4cm}
  50 %{\footnotesize
  51 %Department of Computer Science, University of Bologna\\
  52 % Mura Anteo Zamboni, 7 -- 40127 Bologna, ITALY\\
  53 % \texttt{$\{$asperti,fguidi,sacerdot,tassi,zacchiro$\}$@cs.unibo.it}}
  54 %\end{center}
  55
  56 \begin{abstract}
  57  The prototype of a content based search engine
  58  for mathematical knowledge supporting a small set of queries requiring
  59  matching and/or typing operations is described.
  60  The prototype --- called \whelp{} --- exploits a metadata approach
  61  for indexing the information that looks far more flexible than traditional
  62  indexing techniques for structured expressions like substitution,
  63  discrimination, or context trees. The prototype has been instantiated to the
  64  standard library of the \coq{} proof assistant extended with many
  65  user contributions.
  66 \end{abstract}
  67
  68 \section{Introduction}
  69 \label{sec:intro}
  70
  71 The paper describes the prototype of a content based search engine
  72 for mathematical knowledge --- called \whelp{} --- developed inside
  73 the European Project IST-2001-33562 MoWGLI~\cite{pechino}.
  74 \whelp{} has been mostly
  75 tested to search notions inside the library of formal mathematical
  76 knowledge of the \coq{} proof assistant~\cite{coq}. Due to its dimension (about
  77 40,000 theorems), this library was adopted by \mowgli{} as a main
  78 example of repository of structured mathematical information. However,
  79 \whelp{} --- better, its filtering phase --- only works on a small set
  80 of metadata automatically extracted
  81 from the structured sources, and is thus largely independent from
  82 the actual syntax (and semantics) of the information.
  83 Metadata also offer a higher flexibility with respect to more canonical
  84 indexing techniques such as discrimination trees~\cite{McCune},
  85 substitution trees~\cite{Graf} or context trees~\cite{Ganzinger}  since
  86 all these approaches are optimized for the single operation of
  87 (forward) matching, and are difficult to adapt or tune with
  88 additional constraints (such as global constraints on the
  89 signature of the term, just to make a simple but significant example).
  90
  91 \whelp{} is the final output of a three-year research work inside \mowgli{}
  92 which consisted in exporting the \coq{} library into XML, defining
  93 a suitable set of metadata for indexing the information, implementing
  94 the actual indexing tools, and finally designing and developing
  95 the search engine. \whelp{} itself is the result of a complete
  96 architectural re-visitation of a first prototype described in~\cite{metadata1},
  97 integrated with the efficient retrieval
  98 mechanisms described in~\cite{metadata2} (further improved
  99 as described in Sect.~\ref{sec:hint}), and integrated with syntactic facilities
 100 borrowed from the disambiguating parser of~\cite{disambiguation}.
 101 Since the prototype version described in~\cite{metadata1}, also
 102 the Web interface has been completely rewritten and simplified,
 103 exploiting most of the publishing techniques
 104 developed for the hypertextual rendering of the \coq{} library
 105 (see \url{http://helm.cs.unibo.it/}) and described in
 106 Sect.~\ref{sec:interface}.
 107
 108 The \whelp{} search engine is located at \url{http://helm.cs.unibo.it/whelp}.
 109
 110 \section{Syntax}
 111 \label{sec:syntax}
 112
 113 \whelp{} interacts with the user as a classical World Wide Web search engine, it
 114 expects single line queries and returns a list of results matching it.  \whelp{}
 115 currently supports four different kinds of queries, addressing different
 116 user-requirements emerged in \mowgli: \match, \hint, \elim, and \locate{}
 117 (described in Sect.~\ref{sec:queries}). The list is not meant to be exhaustive
 118 and is likely to grow in the future.
 119
 120 The most typical of these queries (\match{} and \hint) require
 121 the user to input a term of the Calculus of (co-)Inductive Constructions --- CIC
 122 --- (the underlying calculus of \coq), supporting different kinds of pattern
 123 based queries. Nevertheless, the concrete syntax we chose for writing the
 124 input term is not bound to any specific logical system: it has been designed to
 125 be as similar as possible to ordinary mathematics formulae, in their \TeX{}
 126 encoding (see Table~\ref{tab:syntax}).
 127
 128 \begin{table}[ht]
 129  \begin{center}
 130   \begin{tabular}{lcl@{\hspace{1em}}l}
 131    \emph{term} & ::= & \emph{identifier} & \\
 132    & $|$ & \emph{number} & \\
 133    & $|$ & \texttt{Prop} $~|~$ \texttt{Type} $~|~$ \texttt{Set} & sort \\
 134    & $|$ & \texttt{?} & placeholder \\
 135    & $|$ & \emph{term} \emph{term} & application \\
 136    & $|$ & \emph{binder} \emph{vars} \texttt{.} \emph{term} & abstraction \\
 137    & $|$ & \emph{term} \texmacro{to} \emph{term} & arrow type \\
 138    & $|$ & \texttt{(} \emph{term} \texttt{)} & grouping \\
 139    & $|$ & \emph{term} \emph{binop} \emph{term} & binary operator \\
 140    & $|$ & \emph{unop} \emph{term} & unary operator \\
 141    \emph{binder} & ::= & \texmacro{forall} $~|~$ \texmacro{exists} $~|~$
 142 %    \texmacro{Pi} $~|~$
 143     \texmacro{lambda} \\
 144    \emph{vars} & ::= & \emph{names} & variables \\
 145    & $|$ & \emph{names} \texttt{:} \emph{term} & typed variables \\
 146    \emph{names} & ::= & \emph{identifier} $~|~$ \emph{identifier} \emph{names}
 147      \\
 148    \emph{binop} & ::= & \texttt{+} $~|~$ \texttt{-} $~|~$ \texttt{*} $~|~$
 149     \texttt{/} $~|~$ \texttt{\^} & arithmetic operators \\
 150    & $|$ & \texttt{<} $~|~$ \texttt{>} $~|~$ \texmacro{leq} $~|~$
 151     \texmacro{geq} $~|~$ \texttt{=} $~|~$ \texmacro{neq}
 152     & comparison operators \\
 153    & $|$ & \texmacro{lor} $~|~$ \texmacro{land} &
 154     logical operators \\
 155     \emph{unop} & ::= & \texttt{-} & unary minus \\
 156     & $|$ & \texmacro{lnot} & logical negation
 157   \end{tabular}
 158  \end{center}
 159  \caption{\whelp's term syntax\label{tab:syntax}}
 160 \end{table}
 161
 162 As a consequence of the generality of syntax, user
 163 provided terms do not usually have a unique direct mapping to CIC term,
 164 but must
 165 be suitably interpreted
 166 in order to solve \emph{ambiguities}.
 167 Consider for example the following term input:
 168 \texttt{\texmacro{forall} x. 1*x = x}. in order to find the
 169 corresponding CIC term we need to know the possible meanings of the
 170 number \texttt{1} and the symbol \texttt{*}.\footnote{Note that
 171 \texttt{x} is not undetermined, since it is a
 172 bound variable.}%, number \texttt{1}, and symbol \texttt{+}.
 173
 174 The typical processing of a user query (depicted in Fig.~\ref{fig:engine}) is
 175 therefore a pipeline made of four distinct phases: \emph{parsing} (canonical
 176 transformation from concrete textual syntax to Abstract Syntax Trees, ASTs for
 177 short), \emph{disambiguation} (described in next section), \emph{metadata
 178 extraction} (described in Sect.~\ref{sec:metadata}), and the actual
 179 \emph{query} (described in Sect.~\ref{sec:queries}).
 180
 181 It is worth observing that while the preliminary phases (disambiguation
 182 and metadata extraction) of \whelp{} are not logic independent, the query
 183 engine is.
 184
 185 \begin{figure}[ht]
 186  \begin{center}
 187   \includegraphics[width=0.9\textwidth]{engine_new}
 188   \caption{\whelp's processing\label{fig:engine}}
 189  \end{center}
 190 \end{figure}
 191
 192
 193 \section{Disambiguation}
 194 \label{sec:disambiguation}
 195
 196 The disambiguation phase builds CIC terms from ASTs of user inputs (also called
 197 \emph{ambiguous terms}). Ambiguous terms may carry three different \emph{sources
 198 of ambiguity}: unbound identifiers, literal numbers, and literal symbols.
 199 \emph{Unbound identifiers} are sources of ambiguity since the same name
 200 could have been used to represent different objects. For example, three
 201 different theorems of the \coq{} library share the name $\mathit{plus\_assoc}$
 202 (locating them is an exercise for the interested reader. Hint: use \whelp's
 203 \locate{} query).
 204
 205 \emph{Numbers} are ambiguous since several different encodings of them could be
 206 provided in logical systems. In the \coq{} standard library for example we found
 207 naturals (in their unary encoding), positives (binary encoding), integers
 208 (signed positives), and reals. Finally, \emph{symbols} (instances of the
 209 \emph{binop} and \emph{unop} syntactic categories of Table~\ref{tab:syntax}) are
 210 ambiguous as well: infix \texttt{+} for example is overloaded to represent
 211 addition over the four different kinds of numbers available in the \coq{}
 212 standard library. Note that given a term with more than one sources of
 213 ambiguity, not all possible disambiguation choices are valid: for example, given
 214 the input \texttt{1+1} we must choose an interpretation of \texttt{+} which is
 215 typable in CIC according to the chosen interpretation for \texttt{1}; choosing
 216 as \texttt{+} the addition over natural numbers and as \texttt{1} the real
 217 number $1$ will lead to a type error.
 218
 219 A \emph{disambiguation algorithm} takes as input an ambiguous term and return a
 220 fully determined CIC term. The \emph{naive disambiguation algorithm} takes as
 221 input an ambiguous term $t$ and proceeds as follows:
 222
 223 \begin{enumerate}
 224
 225  \item Create disambiguation domains $\{D_i | i\in\mathit{Dom}(t)\}$, where
 226   $\mathit{Dom}(t)$ is the set of ambiguity sources of $t$. Each $D_i$ is a set
 227   of CIC terms.
 228
 229  \item Let $\Phi = \{\phi_i | {i\in\mathit{Dom}(t)},\phi_i\in D_i\}$
 230 %  such that $\forall i\in\mathit{Dom}(t),\exists\phi_j\in\Phi,i=j$
 231   be an interpretation for $t$. Given $t$ and an interpretation $\Phi$, a CIC
 232   term is fully determined. Iterate over all possible interpretations of $t$ and
 233   type-check them, keep only typable interpretations (i.e. interpretations that
 234   determine typable terms).
 235
 236  \item Let $n$ be the number of interpretations who survived step 2. If $n=0$
 237   signal a type error. If $n=1$ we have found exactly one CIC term
 238   corresponding to $t$, returns it as output of the disambiguation phase.
 239   If $n>1$ let the user choose one of the $n$ interpretations and returns the
 240   corresponding term.
 241
 242 \end{enumerate}
 243
 244 The above algorithm is highly inefficient since the number of possible
 245 interpretations $\Phi$ grows exponentially with the number of ambiguity sources.
 246 The actual algorithm used in \whelp{} is far more efficient being, in the
 247 average case, linear in the number of ambiguity sources.
 248
 249 The efficient algorithm can be applied if the logic can be extended with
 250 metavariables and a refiner can be implemented. This is the case for CIC and
 251 several other logics.
 252 \emph{Metavariables}~\cite{munoz} are typed, non linear placeholders that can
 253 occur in terms; $?_i$ usually denotes the $i$-th metavariable, while $?$ denotes
 254 a freshly created metavariable. A \emph{refiner}~\cite{mcbride} is a
 255 function whose input is a term with placeholders and whose output is either a
 256 new term obtained instantiating some placeholder or $\epsilon$, meaning that no
 257 well typed instantiation could be found for the placeholders occurring in
 258 the term (type error).
 259
 260 The efficient algorithm starts with an interpretation $\Phi_0 = \{\phi_i |
 261 \phi_i = ?, i\in\mathit{Dom}(t)\}$,
 262 which associates a fresh metavariable to each
 263 source of ambiguity. Then it iterates refining the current CIC term (i.e. the
 264 term obtained interpreting $t$ with $\Phi_i$). If the refinement succeeds the
 265 next interpretation $\Phi_{i+1}$ will be created \emph{making a choice}, that is
 266 replacing a placeholder with one of the possible choice from the corresponding
 267 disambiguation domain. The placeholder to be replaced is chosen following a
 268 preorder visit of the ambiguous term. If the refinement fails the current set of
 269 choices cannot lead to a well-typed term and backtracking is attempted.
 270 Once an unambiguous correct interpretation is found (i.e. $\Phi_i$ does no
 271 longer contain any placeholder), backtracking is attempted
 272 anyway to find the other correct interpretations.
 273
 274 The intuition which explain why this algorithm is more efficient is that as soon
 275 as a term containing placeholders is not typable, no further instantiation of
 276 its placeholders could lead to a typable term. For example, during the
 277 disambiguation of user input \texttt{\texmacro{forall} x. x*0 = 0}, an
 278 interpretation $\Phi_i$ is encountered which associates $?$ to the instance
 279 of \texttt{0} on the right, the real number $0$ to the instance of \texttt{0} on
 280 the left, and the multiplication over natural numbers (\texttt{mult} for short)
 281 to \texttt{*}. The refiner will fail, since \texttt{mult} require a natural
 282 argument, and no further instantiation of the placeholder will be tried.
 283
 284 If, at the end of the disambiguation, more than one possible interpretations are
 285 possible, the user will be asked to choose the intended one (see
 286 Fig.~\ref{fig:disambiguation}).
 287
 288 \begin{figure}[htb]
 289   \centerline{\includegraphics[width=0.9\textwidth]{disambiguation-1}}
 290   \caption{\label{fig:disambiguation} Disambiguation: interpretation choice}
 291 \end{figure}
 292
 293 Details of the disambiguation algorithm of \whelp{} can
 294 be found in~\cite{disambiguation}, where an equivalent algorithm
 295 that avoids backtracking is also presented.
 296
 297 %\example{Pattern matching query.} When fed with input \texttt{\texmacro{forall}
 298 %n,m:nat. (S ?) = ? \texmacro{to} n = m}, \match{} return the following 3
 299 %results:
 300 %
 301 %\begin{description}
 302 % \item[\url{cic:/Coq/Init/Peano/eq_add_S.con}]
 303 % \item[\url{cic:/Eindhoven/POCKLINGTON/lemmas/S_inj.con}]
 304 % \item[\url{cic:/Rocq/TreeAutomata/bases/Sn_eq_Sm_n_eq_m.con}]
 305 %\end{description}
 306 %
 307 %all having type $\forall n,m\in\IN.(S~n)=(S~m)\to n=m$, where first placeholder
 308 %has been instantiated with $n$ and second with $(S~m)$.
 309 %
 310
 311
 312 \section{Metadata}
 313 \label{sec:metadata}
 314
 315 We use a logic-independent metadata model for indexing mathematical notions.
 316 The model is essentially based on a single ternary relation \REF{p}{s}{t}
 317 stating that an object $s$ refers an object $t$ at a given position
 318 $p$. We use a minimal set of positions discriminating the hypotheses
 319 (H), from the conclusion (C) and the proof (P) of a theorem (respectively,
 320 the type of the input parameters, the type of the result, and
 321 the body of a definition). Moreover, in the hypothesis and in the
 322 conclusion we also distinguish the root position (MH and MC, respectively)
 323 from deeper positions (that, in a first order setting, essentially amounts
 324 to distinguish relational symbols from functional ones).
 325 Extending the set of positions we could improve the granularity
 326 and the precision of our indexing technique but so far, apart from a simple
 327 extension discussed below, we never felt this need.
 328
 329 \begin{example}
 330 Consider the statement:
 331 \[\forall m,n:\NAT. m \le n \to m < (S~n)\]
 332 its metadata are described by the following table:
 333 \begin{center}
 334 \begin{tabular}{|l|l|}
 335 \hline
 336 \METAHEADING
 337 $\NAT$  & MH  \\\hline
 338 $\le$ &  MH  \\\hline
 339 $<$ & MC \\\hline
 340 $S$ & C \\\hline
 341 \end{tabular}
 342 \end{center}
 343 All occurrences of bound variables in position MH or MC are collapsed under
 344 a unique reserved name \emph{Rel}, forgetting the actual variable name.
 345 The occurrences in other positions are not considered.
 346 See Sect.~\ref{sec:elim} for an example of use.
 347 %
 348 %
 349 %Note that bound variables are not indexed: it would require extra work to
 350 %take care of alpha conversion and also in this case we do not feel the
 351 %need to index them to improve accuracy; however for the \elim{}
 352 %query (Sect.~\ref{sec:elim}) we will index the fact that \emph{a} bound variable
 353 %occurs in root position (without recording which variable).
 354 %Looking for the statement in the library (that is the match
 355 %operation of section ...) amounts
 356 %to revert the indexing, searching for some some term $c$ such that
 357 %\[Ref_{MH}(c,nat) \wedge Ref_{MH}(c,\le) \wedge Ref_{MC}(c,<)
 358 %\wedge Ref_{C}(c,S) \]
 359 %In a relational database, this is a simple and efficient join operation.
 360 \end{example}
 361
 362 The accuracy of metadata for discriminating the statements of the library
 363 is remarkable. We computed (see~\cite{edinburgh})
 364 that the average number of mathematical notions
 365 in the Coq library sharing the same metadata set is close to the actual number
 366 of \emph{duplicates} (i.e. metadata almost precisely identify statements).
 367
 368 If more accuracy is needed, further filtering phases can be appended to the
 369 \whelp{} pipeline of Fig.~\ref{fig:engine} to prune out false matches.
 370 For instance, since the number of results of the query phase is usually
 371 small, very accurate yet slow filters can be exploited.
 372
 373 According to the type as proposition analogy, the metadata above may
 374 be also used to index the type of functions.
 375 For instance, functions from $\NAT$ to $R$ (reals) would be identified by the
 376 following metadata:
 377 \begin{center}
 378 \begin{tabular}{|l|l|}
 379 \hline
 380 \METAHEADING
 381 $\NAT$  & MH  \\\hline
 382 $R$ & MC \\\hline
 383 \end{tabular}
 384 \end{center}
 385 in this case, however, the metadata model is a bit too rough, since for instance
 386 functions of type $\NAT\to\NAT$, $\NAT\to\NAT\to\NAT$,
 387 $(\NAT\to\NAT)\to\NAT\to\NAT$ and so on would all share the following metadata
 388 set:
 389 \begin{center}
 390 \begin{tabular}{|l|l|}
 391 \hline
 392 \METAHEADING
 393 $\NAT$  & MH  \\\hline
 394 $\NAT$ & MC \\\hline
 395 \end{tabular}
 396 \end{center}
 397 To improve this situation, we add an integer to MC (MH),
 398 expressing the number of parameters of the term (respectively,
 399 of the given hypothesis).
 400 We call \emph{depth} this value since in the case of CIC is equal to
 401 the nesting depth of
 402 dependent products\footnote{Recall that in type theory, the function space is
 403 just a degenerate case
 404 of dependent product.} along the spine relative to the given position.
 405 For instance, the three types above would now have the
 406 following metadata sets:
 407
 408 \begin{center}
 409  \hspace*{\fill}
 410  \begin{tabular}[t]{|l|l|}
 411   \multicolumn{2}{c}{$\NAT\to\NAT$} \\ \hline
 412   \METAHEADING
 413   $\NAT$  & MH(0) \\ \hline
 414   $\NAT$  & MC(1) \\ \hline
 415  \end{tabular}
 416  \hfill
 417  \begin{tabular}[t]{|l|l|}
 418   \multicolumn{2}{c}{$\NAT\to\NAT\to\NAT$} \\ \hline
 419   \METAHEADING
 420   $\NAT$  & MH(0) \\ \hline
 421   $\NAT$  & MC(2) \\ \hline
 422  \end{tabular}
 423  \hfill
 424  \begin{tabular}[t]{|l|l|}
 425   \multicolumn{2}{c}{$(\NAT\to\NAT)\to\NAT\to\NAT$} \\ \hline
 426   \METAHEADING
 427   $\NAT$  & MH(1) \\ \hline
 428   $\NAT$  & H \\ \hline
 429   $\NAT$  & MH(0) \\ \hline
 430   $\NAT$  & MC(2) \\ \hline
 431  \end{tabular}
 432  \hspace*{\fill}
 433 \end{center}
 434
 435 The depth is a technical improvement that is particularly important
 436 for retrieving functions from their types (we shall also see a use in the
 437 \elim{} query, Sect.~\ref{sec:elim}), but is otherwise of minor relevance. In
 438 the following examples,
 439 we shall always list it among the metadata
 440 for the sake of completeness, but it may be usually neglected by the reader.
 441
 442 \section{\whelp{} Queries}
 443 \label{sec:queries}
 444
 445 \subsection{\match}
 446 \label{sec:match}
 447 Not all mathematical results have a canonical name or a
 448 set of keywords which could easily identify them.
 449 For this reason, it is extremely useful to be able to search the library
 450 by means of the explicit statement. More generally, exploiting the
 451 well-known types-as-formulae analogy of Curry-Howard,
 452 \whelp's \match{} operation takes as input a type and returns a list
 453 of objects (definition or proofs) inhabiting it.
 454
 455 \example{Find a proof of the distributivity of times over plus on natural
 456 numbers.}
 457 In order to retrieve those statements, \whelp{} need to be fed with the
 458 distributivity law as input: \texttt{\texmacro{forall}
 459 x,y,z:nat. x * (y+z) = x*y + x*z}. The \match{} query will return 4 results:
 460
 461 \begin{enumerate}
 462  \small
 463  \item \url{cic:/Coq/Arith/Mult/mult_plus_distr_l.con}
 464  \item \url{cic:/Coq/Arith/Mult/mult_plus_distr_r.con}
 465  \item \url{cic:/Rocq/SUBST/comparith/mult_plus_distr_r.con}
 466  \item \url{cic:/Sophia-Antipolis/HARDWARE/GENE/Arith_compl/mult_plus_distr2.con}
 467 \end{enumerate}
 468
 469 Each result locates a theorem in the Coq library that is organized in
 470 a hiearchical fashion.
 471 For example (4) identifies the \texttt{mult\_plus\_distr2}
 472 theorem in the \texttt{Arith\_compl.v} file that is part of a contribution
 473 on hardware circuits developed at the university of Sophia-Antipolis.
 474
 475 (1), (3), and (4) have types which are $\alpha$-convertible with the
 476 user query; (2) is an interesting ``false match'' returned by \whelp{} having
 477 type $\forall n,m,p\in\IN.(n+m)*p=n*p+m*p$, i.e. it is the
 478 symmetric version of the distributivity proposition we were looking for.
 479
 480 The match operation simply amounts
 481 to revert the indexing operation, looking for terms matching
 482 the metadata set computed from the input. For instance, the term
 483 \texttt{\texmacro{forall}
 484 x,y,z:nat. x * (y+z) = x*y + x*z} has the following metadata:
 485 \begin{center}
 486 \begin{tabular}{|l|l|}
 487 \hline
 488 \METAHEADING
 489 $\NAT$  & MH(0)  \\\hline
 490 $=$    & MC(3) \\\hline
 491 $\NAT$  & C  \\\hline
 492 $*$    & C  \\\hline
 493 $+$    & C  \\\hline
 494 \end{tabular}
 495 \end{center}
 496 Note that $\NAT$ occurs in conclusion as an hidden parameter of equality;
 497 the indexed term is the term after disambiguation, not the user input.
 498
 499 Searching for the distributivity law then amounts to look for a term
 500 $s$
 501 such that :
 502 %\[\REF{\mathit{MH}(0)}{c}{nat} \wedge \REF{\mathit{MC}(3)}{c}{\le} \wedge
 503 %\REF{C}{c}{nat} \wedge \REF{C}{c}{+} \wedge \REF{C}{c}{*}\]
 504 \[\REF{\mathit{MH}(0)}{s}{nat} \land \REF{\mathit{MC}(3)}{s}{=} \land
 505 \REF{C}{s}{nat} \land \REF{C}{s}{*} \land \REF{C}{s}{+}\]
 506 In a relational database, this is a simple and efficient join operation.
 507
 508 \example{Suppose we are interested in looking for a definition of
 509 summation for series of natural numbers}.
 510 The type of such an object
 511 is something of the kind $(\NAT\to\NAT)\to\NAT\to\NAT\to\NAT$, taking
 512 the series, two natural numbers expressing summation lower and upper bound, and
 513 giving back the resulting sum.
 514 Feeding \whelp's \match{} query
 515 with such a type does give back four results:
 516 \begin{enumerate}
 517  \small
 518  \item \url{cic:/Coq/Reals/Rfunctions/sum_nat_f.con}
 519  \item \url{cic:/Sophia-Antipolis/Bertrand/Product/prod_nm.con}
 520  \item \url{cic:/Sophia-Antipolis/Bertrand/Summation/sum_nm.con}
 521  \item \url{cic:/Sophia-Antipolis/Rsa/Binomials/sum_nm.con}
 522 \end{enumerate}
 523 Although we have a definition for summation in the standard library,
 524 namely $\mathit{sum\_nat\_f}$, its theory is very underdeveloped. Luckily we
 525 have a much more complete development for $\mathit{sum\_nm}$ in a contribution
 526 from Sophia, where:
 527
 528 \[\mathit{sum\_nm} \;n \;m \;f = \sum_{x=n}^{n+m}f(x)\]
 529
 530 Having discovered the name for summation, we may then
 531 inquire about the proofs of some of its properties; for instance, considering
 532 the semantics of $\mathit{sum\_nm}$, we may wonder if the following statement is
 533 already in the library:
 534 \[ \forall m,n,c:nat.(\mathit{sum\_nm}~n~m~\lambda x:nat.c) = (S~m)*c\]
 535 Matching the previous theorem actually succeed, returning the following:\\
 536 {\small\url{cic:/Sophia-Antipolis/Bertrand/Summation/sum_nm_c.con}}.
 537
 538 \subsubsection{Matching Incomplete Patterns}
 539 \whelp{} also support matching with partial patterns, i.e. patterns
 540 with placeholders denoted by $?$. The approach is essentially
 541 identical to the previous one: we compute all the constants $A$ appearing
 542 in the pattern, and look for all terms referring at least the set
 543 of constants in $A$, at suitable positions.
 544
 545 Suppose for instance that you are interested in looking for all known facts
 546 about the computation of $\mathit{sin}$ on given reals. You may just ask
 547 \whelp{} to \match{} $\mathit{sin}~?=\,\,?$, that would result in the following
 548 list (plus a couple of spurious results due to the fact that \coq{} variables
 549 are not indexed, at present):
 550
 551 \begin{enumerate}
 552  \small
 553  \item \url{cic:/Coq/Reals/Rtrigo/sin_2PI.con}
 554  \item \url{cic:/Coq/Reals/Rtrigo/sin_PI.con}
 555  \item \url{cic:/Coq/Reals/Rtrigo/sin_PI2.con}
 556  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin3PI4.con}
 557  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_2PI3.con}
 558  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_3PI2.con}
 559  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_5PI4.con}
 560  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_PI3.con}
 561  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_PI3_cos_PI6.con}
 562  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_PI4.con}
 563  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_PI6.con}
 564  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_PI6_cos_PI3.con}
 565  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_cos5PI4.con}
 566  \item \url{cic:/Coq/Reals/Rtrigo_calc/sin_cos_PI4.con}
 567  \item \url{cic:/Coq/Reals/Rtrigo_def/sin_0.con}
 568 \end{enumerate}
 569
 570 Previous statements semantics is reasonably clear by their names; \whelp,
 571 however, also performs in-line expansion of the statements, and provides
 572 hyperlinks to the corresponding proofs (see Fig.~\ref{fig:match}).
 573
 574 \begin{figure}[tb]
 575   \centerline{\includegraphics[width=0.9\textwidth]{match_sin}}
 576   \caption{\label{fig:match} \match{} results.}
 577 \end{figure}
 578
 579 \subsection{\hint}
 580 \label{sec:hint}
 581 In a process of backward construction of a proof, typical of
 582 proof assistants, one is often interested in knowing what theorems
 583 can be applied to derive the current goal. The \hint{} operation of
 584 \whelp{} is exactly meant to this purpose.
 585
 586 Formally,
 587 given a goal $g$ and a theorem
 588 $t_1\to t_2\to\cdots\to t_n\to t$, the problem consists in checking
 589 if there exists a substitution $\theta$ such that:
 590
 591 \begin{equation}
 592  t\theta = g
 593  \label{eq:hint1}
 594 \end{equation}
 595
 596 A necessary condition for (\ref{eq:hint1}),
 597 which provides a very efficient filtering of the solution space,
 598 is that the set of constants in $t$ must be a subset of those in
 599 $g$. In terms of our metadata model, the problem consists
 600 to find all s such as
 601
 602 \begin{equation}
 603  \{x | Ref(s,x) \} \subseteq A
 604  \label{eq:hint2}
 605 \end{equation}
 606
 607 where $A$ is the set of constants in $g$.
 608 This is not a simple operation: the naive
 609 approach would require to iterate, for every possible source $s$, the
 610 computation of its forward references, i.e. of $\{x | Ref(s,x) \}$,
 611 followed by a set comparison with $A$.
 612
 613 The solution of~\cite{metadata2} is based on the following remarks.
 614 Let us call $Card(s)$ the cardinality of $ \{x | Ref(s,x) \}$, which
 615 can be pre-computed for every $s$. Then, (\ref{eq:hint2}) holds if and only if
 616 there is a subset $A'$ of $A$ such that $A' = \{x | Ref(s,x) \}$, or
 617 equivalently:
 618 \[  A' \subseteq \{x | Ref(s,x) \} \wedge  |A'| = Card(s) \]
 619 and finally:
 620 \[  \bigwedge_{a \in A'} Ref(s,a) \wedge  |A'| = Card(s) \]
 621 The last one is a simple join that can be efficiently computed by any
 622 relational database. So the actual cost is essentially bounded
 623 by the computation of all subsets of $A$, and $A$, being the signature
 624 of a formula, is never very large (and often quite small).
 625
 626 The problem of matching against a large library of heterogeneous
 627 notions is however different and far more complex than in a
 628 traditional theorem proving setting, where one typically works with
 629 respect to a given theory with a fixed, and usually small signature.
 630 If e.g. we look for theorems whose conclusion matches some kind
 631 of equation like $e_1 = e_2$ we shall eventually find in the library
 632 \emph{a lot} of injectivity results relative to operators we are
 633 surely not interested in: in a library of 40,000 theorems like the
 634 one of Coq we would get back about 3,000 of such \emph{silly
 635 matches}. Stated in other words, canonical indexing techniques
 636 specifically tailored on the matching problem such as
 637 discrimination trees~\cite{McCune},
 638 substitution trees~\cite{Graf} or context trees~\cite{Ganzinger}
 639 are eventually doomed to fail in a mathematical knowledge management
 640 context where one cannot assume a preliminary knowledge on term signatures.
 641
 642 On the other side, the metadata approach is much more flexible,
 643 allowing a simple integration of matching operation with additional
 644 and different constraints. For instance in the version of
 645 \emph{hint} described in~\cite{metadata1} the problem of reducing
 646 the number of \emph{silly matches} was solved by requiring at
 647 least a minimal intersection between the signatures
 648 of the two matching terms. However, this approach did sometimes
 649 rule out some interesting answers. In the current version the
 650 problem has been solved imposing further constraints of the full
 651 signature of the term (in particular on the hypothesis),
 652 essentially filtering out all solutions that would extend the
 653 signature of the goal. The actual implementation of this approach
 654 requires a more or less trivial extension to hypothesis of the
 655 methodology described in~\cite{metadata2}.
 656
 657 \subsection{\elim}
 658 \label{sec:elim}
 659
 660 Most statements in the \coq{} knowledge base concern properties of functions and
 661 relations over algebraic types. Proofs of such statements are carried out
 662 by structural induction over these types. In particular, to prove a goal by
 663 induction over the type $t$, one needs to apply a lemma stating an induction
 664 principle over $t$ (an \emph{eliminator} of $t$ \cite{mcbride}) to that
 665 goal. Since many different eliminators can be provided for the same type $t$
 666 (either automatically generated from $t$, or set up by the user), it is
 667 convenient to have a way of retrieving all the eliminators of a given type.  The
 668 \elim{} query of \whelp{} does this job.\footnote{Of course it is possible to
 669 let the author state what are the elimination principles, storing this
 670 information as metadata accessible to \whelp{}. The technique we present
 671 consists in automatically guessing the intended usage of a theorem from
 672 its shape. Moreover, this is the only possible technique for legacy libraries
 673 that lack classification information.}
 674 To understand how it works, let's
 675 take the case of the algebraic type of the natural numbers: one feeds \whelp{}
 676 with the identifier $\NAT$, which denotes this type in the knowledge base, and
 677 expects to find at least the well-known induction principle $\natind$:
 678
 679 \[ \forall P:\NAT \to \Prop.(P~0) \to (\forall n:\NAT.(P~n) \to (P~(S~n))) \to
 680    \forall n:\NAT.(P~n) \]
 681
 682 A fairly good approximation of this statement is based on the following
 683 observations: the first premise ($\NAT \to \Prop$) has an antecedent, a
 684 reference to $\Prop$ in its root and a reference to $\NAT$; the forth premise
 685 has no antecedents and a reference to $\NAT$ in its root; the conclusion
 686 contains a bound variable in its root (i.e. $P$).
 687 Notice that we choose not to approximate the major premises of $\natind$ (the
 688 second and the third) because they depend on the structure of $\NAT$ and
 689 discriminate the different induction principles over this type.
 690
 691 Thus, a set of constraints approximating $\natind$ is the following
 692 (recall that \emph{Rel} stands for an arbitrary bound variable):
 693 \begin{center}
 694 \begin{tabular}{|l|l|}
 695 \hline
 696 \METAHEADING
 697 $\Prop$  & MH(1)  \\\hline
 698 $\NAT$ &  H  \\\hline
 699 $\NAT$ & MH(0) \\\hline
 700 \emph{Rel} & MC \\\hline
 701 \end{tabular}
 702 \end{center}
 703
 704 The \elim{} query of \whelp{} simply generalizes this scheme substituting
 705 $\NAT$ for a given type $t$ and retrieving any statement $c$ such that:
 706 \[ \REF{\mathit{MH}(1)}{c}{\Prop} \land \REF{\mathit{H}}{c}{t} \land
 707    \REF{\mathit{MH}(0)}{c}{t} \land
 708    \REF{\mathit{MC}}{c}{Rel}
 709 \]
 710
 711 In the case of $\NAT$, \elim{} returns $47$ statements ordered by the
 712 frequency of their use in the library (as expected $\natind$
 713 is the first one).
 714
 715 %More sophisticated approximations of $c$ are possible, but they require a
 716 %greater amount of computation over the metadata set indexing the knowledge
 717 %base.
 718
 719 \subsection{\locate}
 720 \label{sec:locate}
 721
 722 \whelp's \locate{} query implements a simple ``lookup by name'' for library
 723 notions. Once fed with an identifier $i$, \locate{} returns the list of all
 724 objects whose name is $i$. Intuitively, the \emph{name} of an object contained
 725 in library is the name chosen for it by its author.\footnote{In the current
 726 implementation object names correspond to the last fragment of object URIs,
 727 without extension.
 728 %For example, the name of the definition of natural numbers
 729 %\url{cic:/Coq/Init/Datatypes/nat.ind} is \texttt{nat}.
 730 }
 731
 732 This list is obtained querying the specific relational metadata
 733 $\mathit{Name}(c,i)$ that binds each unit of knowledge $c$ to an identifier $i$
 734 (its \emph{name}). Unix-shell-like wildcards can be used to specify an
 735 incomplete identifier: all objects whose name matches the incomplete identifier
 736 are returned.
 737
 738 Even if not based on the metadata model described in Sect.~\ref{sec:metadata},
 739 \locate{} turns out to be really useful to browse the library since quite often
 740 one remembers the name of an object, but not the corresponding contribution.
 741
 742 \begin{example}
 743 By entering the name $\mathit{gcd}$, \locate{} returns four different versions
 744 of the ``greatest common divisor'':
 745
 746 \begin{enumerate}
 747  \small
 748  \item \url{cic:/Orsay/Maths/gcd/gcd.ind#xpointer(1/1)}
 749  \item \url{cic:/Eindhoven/POCKLINGTON/gcd/gcd.con}
 750  \item \url{cic:/Sophia-Antipolis/Bertrand/Gcd/gcd.con}
 751  \item \url{cic:/Sophia-Antipolis/Rsa/Divides/gcd.con}
 752 \end{enumerate}
 753
 754 %These are four versions of the ``greatest common divisor'':
 755 %the first refers to a relation, the second is a function taking two integer
 756 %numbers, and the last two refer to functions taking two natural numbers.
 757 \end{example}
 758
 759 \section{Web Interface}
 760 \label{sec:interface}
 761
 762 The result of \whelp{} is, for all queries, a
 763 list of URIs (unique identifiers) for notions in the library. This list is not
 764 particularly informative for the user, who would like to have
 765 hyperlinks or, even better, in-line expansion of the notions.
 766
 767 In the \mowgli{} project we developed a service to render on the fly,
 768 via a complex chain of XSLT transformations, mathematical objects encoded
 769 in XML (those objects of the library of Coq that \whelp{} is indexing).
 770 XSLT is the standard presentational language for XML documents and an
 771 XSLT transformation (or \emph{stylesheet}) is a purely functional program
 772 written in XSLT that describes a simple transformation between two XML formats.
 773
 774 The service can also render \emph{views} (misleadingly called ``theories''
 775 in~\cite{annals}), that is an arbitrary, structured collection of mathematical
 776 objects, suitably assembled (by an author or some mechanical tool) for
 777 presentational purposes.
 778 In a view, definitions, theorems, and so on may be intermixed
 779 with explanatory text or figures, and statements are expanded
 780 without proofs: a link to the corresponding proof objects
 781 allows the user to inspect proofs, if desired.
 782
 783 Providing \whelp{} with an
 784 appealing user interface for presenting the answers (see Fig.~\ref{fig:match})
 785 has been as simple as making it generate a view and pipelining \whelp{}
 786 with \UWOBO,\footnote{\url{http://helm.cs.unibo.it/software/uwobo/}}
 787 the component of our architecture that implements the rendering service.
 788
 789 \UWOBO{} is a stylesheet manager implemented in
 790 \OCAML\footnote{\url{http://caml.inria.fr/}} and based on \LIBXSLT
 791 %\footnote{Even
 792 %\UWOBO{} has largely evolved in recent years: the version described
 793 %in~\cite{annals} was implemented in Java and based on Xalan!  \LIBXSLT{}
 794 %(\url{http://xmlsoft.org/XSLT/}) is now the fastest available implementation of
 795 %XSLT.}
 796 , whose main functionality is the application of a list of stylesheets
 797 (each one with the respective list of parameters) to a document.  The
 798 stylesheets are pre-compiled to improve performance. Both stylesheets and the
 799 document are identified using HTTP URLs and can reside on any host. \UWOBO{} is
 800 both a Web server and a Web client, accepting processing requests and asking for
 801 the document to be processed. \whelp{} is a Web server, accepting queries as
 802 processing requests and returning views to the client.
 803
 804 The \whelp{} interface is thus simply organized as a HTTP pipeline (see
 805 Fig.~\ref{fig:http}).
 806
 807 \begin{figure}[!ht]
 808   \centerline{\includegraphics[width=0.7\textwidth]{architecture_new}}
 809   \caption{\label{fig:http} \whelp's HTTP pipeline.}
 810 \end{figure}
 811
 812
 813 \section{Conclusions}
 814 \label{sec:conclusions}
 815
 816 \whelp{} is the Web searching helper developed at the University
 817 of Bologna as a part of the European Project IST-2001-33562 \mowgli.
 818 HELP is also the acronym of the four operations currently supported
 819 by the system: Hint, Elim, Locate and Pattern-matching.
 820
 821 Much work remains to be done, spanning from relatively simple
 822 technical improvements, to more complex architectural re-visitations
 823 concerning the indexing technique and the design and implementation
 824 of the queries.
 825
 826 Among the main technical improvements which we plan to support in
 827 a near future there are:
 828 \begin{enumerate}
 829 \item the possibility to confine the search to
 830 sub-libraries, and in particular to the standard library alone (this
 831 is easy due to the paths of names);
 832 \item skipping the annoying dialog phase with the user during
 833 disambiguation for the choice of the intended interpretation, replacing it
 834 with a direct investigation of all possibilities;
 835 \item interactive support for advanced queries, allowing the user
 836 to directly manipulate the metadata constraints (very powerful,
 837 if properly used).
 838 \end{enumerate}
 839
 840 The current indexing politics has some evident limitations, resulting
 841 in unexpected results of queries.
 842
 843 The most annoying problem is due to the current management of \coq{} variables.
 844 Roughly, in \coq, \emph{variables} are meant for \emph{declarations}, while
 845 \emph{constants} are meant for \emph{definitions}.
 846 The current XML-exportation module of \coq{} \cite{exportation}
 847 does not discharge section
 848 variables, replacing this operation with an explicit substitution mechanism;
 849 in particular, variables may be instantiated and their status look more
 850 similar to local variables than to constants. For this reason, variables
 851 have not been indexed; that currently looks as a mistake.
 852
 853 A second problem is due to coercions. The lack of an explicit
 854 mechanism for composition of coercions tends to clutter the terms
 855 with long chains of coercions, which in case of big algebraic developments
 856 as e.g. C-Corn \cite{C-Corn}, can easily reach about ten elements. The fact
 857 that an Object $r$ refers a coercion $c$ contains very little information,
 858 especially if coercions typically come in a row, as in Coq. In the future,
 859 we plan to skip coercions during indexing.
 860
 861 Notice that the two previous problems, both logic dependent, only affect
 862 metadata extraction and not the metadata model that is logic independent.
 863
 864 The final set of improvements concerns the queries. A major issue, for
 865 all kinds of content based operations, is to take care, at some extent,
 866 of delta reduction. For instance, in Coq, the elementary order relations over
 867 natural numbers are defined in terms of the \emph{less or equal} relation,
 868 that is a suitable inductive type. Every query concerning such relations
 869 could be thus reduced to a similar one about the \emph{less or equal} relation
 870 by delta reduction. Even more appealing it looks the possibility to
 871 extend the queries up to equational rewriting of the input (via
 872 equations available in the library).\footnote{The possibility of
 873 considering search up to isomorphism (see \cite{iso1,iso2})
 874 looks instead less interesting, because
 875 our indexing policy is an interesting surrogate that works very
 876 well in practice while being much simpler than search up to isomorphisms.}
 877
 878 Similarly, the \hint{} operation, could and should be improved by suitably
 879 tuning the current politics for computing the \emph{intended} signature
 880 of the search space (for instance, ``closing'' it by delta reduction or
 881 rewriting, adding constructors of inductive types, and so on).
 882
 883 Different kind of queries could be designed as well. An obvious
 884 generalization of \hint{} is a \auto{}, automatically attempting
 885 to solve a goal by repeated applications of theorems of the library
 886 (a deeper exploration of the search space could be also useful
 887 for a better rating of the \hint{} results).
 888
 889 A more interesting example of content-based query that exploits the higher
 890 order nature of the input syntax is to look for all mathematical objects
 891 providing examples, or instances, of a given notion. For instance
 892 we may define the following property, asserting the commutativity of
 893 a generic function f
 894 \[is\_commutative := \lambda A:Set.\lambda f:A\to A \to A.
 895 \forall x,y:A. (f\;x\;y) = (f\;y\;x)\]
 896 Then, an \instance{} query should be able to retrieve from the library
 897 all commutative operations which have been defined.\footnote{\match{} is
 898 in fact a particular case of \instance{} where the initial sequence of
 899 lambda abstractions is empty.}
 900
 901 To conclude, we remark that the only two components of \whelp{} that are
 902 dependent on CIC, the logic of the Coq system, are the disambiguator for the
 903 user input and the metadata extractor.
 904 Moreover, the algorithm used for disambiguation depends only on the existence of
 905 a refiner for mathematical formulae extended with placeholders and the metadata
 906 model is logic independent. Thus \whelp{}
 907 can be easily modified to index and search other mathematical libraries provided
 908 that the statements of the theorems can be easily parsed (to extract the
 909 metadata) and that there exists a service to render the results of the queries.
 910 In particular, the Mizar development team has just released version 7.3.01 of
 911 the Mizar proof assistant that provides both a native XML format and XSLT
 912 stylesheets to render the proofs. Thus it is now possible to instantiate
 913 \whelp{} to work on the library of Mizar, soon making possible a direct
 914 comparison on the field between \whelp{} and MML Query, the new search engine
 915 for Mizar articles described in \cite{mizarbrowsing,mizarsearchengine}.
 916 Another interesting comparison is with the approach described in
 917 \cite{cairns}, which has been tested on the Mizar library.
 918
 919 \begin{thebibliography}{}
 920
 921  \bibitem{edinburgh} A.~Asperti, F.~Guidi, L.~Padovani, C.~Sacerdoti Coen,
 922   I.~Schena. \emph{The Science of Equality: some statistical considerations on
 923   the Coq library}. Mathematical Knowledge Management Symposium, 25-29 November
 924   2003,  Heriot-Watt University, Edinburgh, Scotland.
 925
 926  \bibitem{annals} A.~Asperti, F.~Guidi, L.~Padovani, C.~Sacerdoti Coen,
 927   I.~Schena. \emph{Mathematical Knowledge Management in HELM}. Annals of
 928   Mathematics and Artificial Intelligence, 38(1): 27--46; May 2003.
 929
 930  \bibitem{metadata2} A. Asperti, M. Selmi. \emph{Efficient Retrieval of
 931   Mathematical Statements}. In Proceeding of the Third International Conference
 932   on Mathematical Knowledge Management, MKM 2004. Bialowieza, Poland. LNCS 3119.
 933
 934  \bibitem{pechino} A.Asperti, B.Wegner. \emph{An Approach to
 935   Machine-Understandable Representation of the Mathematical Information in
 936   Digital Documents}.  In: Fengshai Bai and Bernd Wegner (eds.): Electronic
 937   Information and Communication in Mathematics, LNCS vol. 2730, pp. 14--23, 2003
 938
 939  \bibitem{mizarsearchengine} G. Bancerek, P. Rudnicki. \emph{Information
 940   Retrieval in MML}. In A.Asperti, B.Buchberger,J.Davenport (eds),
 941   Proceedings of the Second International Conference on
 942   Mathematical Knowledge Management, MKM 2003. LNCS, 2594.
 943
 944  \bibitem{mizarbrowsing} G. Bancerek, J.Urban. \emph{Integrated Semantic
 945   Browsing of the Mizar Mathematical Repository}. In: A.Asperti,
 946   G.Bancerek, A.Trybulec (eds.), Proceeding of the Third
 947   International Conference on Mathematical Knowledge
 948   Management, Springer LNCS 3119.
 949
 950  \bibitem{cairns} P.Cairns. \emph{Informalising Formal Mathematics: Searching
 951   the Mizar Library with Latent Semantics}. In Proceeding of the Third
 952   International Conference on Mathematical Knowledge Management, MKM 2004.
 953   Bialowieza, Poland.  LNCS 3119.
 954
 955  \bibitem{coq} The Coq proof-assistant, \url{http://coq.inria.fr}
 956
 957  \bibitem{C-Corn} L. Cruz-Filipe, H. Geuvers, F. Wiedijk, ``C-CoRN, the
 958   Constructive Coq Repository at Nijmegen''.  In: A.Asperti,
 959   G.Bancerek, A.Trybulec (eds.), Proceeding of the Third
 960   International Conference on Mathematical Knowledge
 961   Management, Springer LNCS 3119, 88-103, 2004.
 962
 963  \bibitem{iso1}
 964   D.Delahaye, R. Di Cosmo. Information Retrieval in a Coq Proof Library using
 965   Type Isomorphisms. In Proceedings of TYPES 99, L\"okeberg. Springer-Verlag
 966   LNCS, 1999.
 967
 968  \bibitem{iso2}
 969   R. Di Cosmo. \emph{Isomorphisms of Types: from Lambda Calculus to
 970   Information Retrieval and Language Design}, Birkhauser, 1995,
 971   IBSN-0-8176-3763-X.
 972
 973  %\bibitem{diffeq} D. Draheim, W. Neun, D. Suliman. \emph{Classifying
 974  % Differential Equations on the Web}. In Proceeding of the Third
 975  % International Conference on Mathematical Knowledge Management, MKM 2004. LNCS,
 976  % 3119.
 977
 978  \bibitem{Ganzinger} H. Ganzinger, R. Nieuwehuis, P. Nivela. \emph{Fast Term
 979   Indexing with Coded Context Trees. Journal of Automated Reasoning}. To
 980   appear.
 981
 982  \bibitem{Graf} P.Graf. Substitution Tree Indexing. Proceedings of the 6th RTA
 983   Conference, Springer-Verlag LNCS 914, pp. 117-131, Kaiserlautern, Germany,
 984   April 4-7, 1995.
 985
 986  \bibitem{metadata1} F. Guidi, C. Sacerdoti Coen. \emph{Querying Distributed
 987   Digital Libraries of Mathematics}. In Proceedings of Calculemus 2003, 11th
 988   Symposium on the Integration of Symbolic Computation and Mechanized Reasoning.
 989   Aracne Editrice.
 990
 991  \bibitem{mcbride} C. McBride. Dependently Typed Functional Programs and their
 992   Proofs. Ph.D. thesis, University of Edinburgh, 1999.
 993
 994  \bibitem{McCune} W. McCune. \emph{Experiments with discrimination tree indexing
 995   and path indexing for term retrieval}. Journal of Automated Reasoning,
 996   9(2):147-167, October 1992.
 997
 998  \bibitem{munoz}C. Munoz. A Calculus of Substitutions for Incomplete-Proof
 999   Representation in Type Theory. Ph.D. thesis, INRIA, 1997.
1000
1001  \bibitem{exportation} C. Sacerdoti Coen. \emph{From Proof-Assistans to
1002   Distributed Libraries of Mathematics: Tips and Pitfalls}.
1003   In Proc. Mathematical Knowledge Management 2003, Lecture Notes in Computer
1004   Science, Vol. 2594, pp. 30--44, Springer-Verlag.
1005
1006  \bibitem{disambiguation} C. Sacerdoti Coen, S. Zacchiroli. \emph{Efficient
1007   Ambiguous Parsing of Mathematical Formulae}. In Proceedings of the Third
1008   International Conference on Mathematical Knowledge Management, MKM 2004. LNCS,
1009   3119.
1010
1011 \end{thebibliography}
1012
1013 \end{document}
1014