X-Git-Url: http://matita.cs.unibo.it/gitweb/?a=blobdiff_plain;f=helm%2Fpapers%2Fwhelp%2Fmain.tex;fp=helm%2Fpapers%2Fwhelp%2Fmain.tex;h=0000000000000000000000000000000000000000;hb=85747dc6d0578b484544bb8120aad7aa89813f27;hp=69515dffe279e6b353de009d4a3dbf03b0a95db6;hpb=c1986639552e01334a05db4236627a6c1ffacf21;p=helm.git diff --git a/helm/papers/whelp/main.tex b/helm/papers/whelp/main.tex deleted file mode 100644 index 69515dffe..000000000 --- a/helm/papers/whelp/main.tex +++ /dev/null @@ -1,548 +0,0 @@ -\documentclass[runningheads,a4paper]{article} -\pagestyle{headings} -\usepackage{graphicx} -\usepackage{amssymb} -\usepackage{hyperref} - -\newcommand{\andreaEmail}{asperti@cs.unibo.it} -\newcommand{\zackEmail}{zacchiro@cs.unibo.it} -\newcommand{\moogle}{Moogle} -\newcommand{\IR}{\ensuremath{\mathcal{R}}} - -\title{Searching mathematics on the Web:\\ -state of the art and future developments} - -\author{ -\begin{tabular}{c@{\hspace{2em}}c} - Andrea Asperti & Stefano Zacchiroli \\ - \href{mailto:\andreaEmail}{\texttt{\andreaEmail}} & - \href{mailto:\zackEmail}{\texttt{\zackEmail}} -\end{tabular}\\[2em] -Department of Computer Science, University of Bologna\\ -Mura Anteo Zamboni, 7 -- 40127 Bologna, ITALY} - -\begin{document} -\maketitle - -\begin{abstract} - A huge amount of mathematical knowledge is nowadays available on the World Wide - Web. Many different solutions and technologies for searching that knowledge - have been developed as well. We present the state of the art of searching - mathematics on the Web, giving some insight on future developments in this - area. -\end{abstract} - -\section{Introduction} -The World Wide Web has become one of the main resources used by mathematicians -in every day work. Its usefulness is not limited to browsing fellow researchers, -university, or research projects web pages. A full range of mathematical -services are available as well on the web, ranging from electronic libraries of -mathematics to communities of distributed agents, implemented as web services, -able to cooperate in order to solve a given mathematical problem. - -Searching such a huge amount of mathematical sources is a particularly -complex problem, given the variety of different users with completely -different needs, and the heterogeneity of the mathematical information -and its possible encodings. - -The different kind of queries may be roughly categorized in three -main groups: -\begin{description} -\item[Bibliographic searches] this is the most traditional kind of query, -aimed at retrieving a document given its author, title, date of publication, -a list of keywords or similar information. A typical query could be e.g. -{\em give me a listing of all articles written by Karl Weierstrass on - the subject of analytic functions}. -\item[Mathematical services] in this case the user in typically interested -to {\em solve} a problem with the help of some mathematical tool, or a -combination of them. A typical query in this context could be the request -for a web service able to establish the primality of a number given as input -by the user. -\item[Content based searches] the third and probably most ambitious -category of queries are those based on the mathematical content of the -information (opposed to its textual representation). These queries are -aimed at a very fine-grained analysis of the repository, looking e.g. -for all documents stating something about the expression -$cos(z) + i~sin(z)$, where of course $z$ has to be understood as an -universally quantified variable whose actual name is thus irrelevant -(try with Google!). -\end{description} - -In this paper ws discuss in more detail the previous categories of queries, -presenting the state of the art and the main research directions. A particular -attention will be devoted to {\em mathematical services} and, especially, {\em -content based searches} being the most innovative and peculiar kind of queries -for mathematical repositories. - -\section{Bibliographic searches} -\label{sec:bibliographic} - -Most part of printed human knowledge available all over the world is stocked -inside libraries. The problem of how to search those libraries in an efficient -manner has been traditionally solved by the combined use of metadata (data about -the documents themselves) and indexes. -In this case, mathematical documents do not substantially differ from -other kinds of documents and standard knowledge -management and indexing techniques can be profitably applied. -From each document of the library a set -of metadata is extracted and several orthogonal indexes are created on top of -them referencing the physical locations of the actual documents. Searches -performed using that indexes are called \emph{bibliographic searches}. - -Metadata could include information about document title, author, editor, -classification and so on. The classification field is of particular interest -for -searches since it defines a taxonomy over human knowledge: related documents -are likely to share a common classification. - -Since the beginning, several classifications have been developed by librarians, -the most widespread being the Dewey Decimal Classification~\cite{dewey} in which -``Natural sciences \& mathematics'' have been assigned decimal number 500. Since -Dewey's is too lax for properly classifying mathematical documents, other -classifications have been developed by mathematicians, the most widespread being -MSC 2000~\cite{msc2000} --- Mathematical Subject Classification Scheme --- -maintained by the American Mathematical Society. In this classification for -example 35E15 is a subtopic of partial differential equation (35xxx) with -constant coefficients (35Exx), documents with that classification will be about -the initial value problem (35E15). - -The arrival of the web era has increased accessibility of mathematical -libraries -and improved the expressivity of queries, but has not (yet?) radically changed -the way in which bibliographic searches are performed. - -Zentralblatt MATH~\cite{zmath}, coordinated by FIZ Karlsruhe, is the most -prominent project in this area being the longest running indexing and -abstracting service in pure and applied mathematics available on the web. It -classifies more than 2,000,000 entries accordingly to MSC 2000. Searches are -possible via various fields, like author, title, document type, MSC -classification, year of publication and so on. Boolean combinations of the -various fields for fine grained searches are possible as well. - -For each results title, abstract and all metadata information associated to the -document are shown and other interesting actions are possible like browsing -related documents (same MSC 2000 classification) and on-line ordering of the -printed document. - -While Zentralblatt is the most relevant indexing and abstracting services, -other -on-line databases of mathematical documents are available offering, on a smaller -scale, similar classification services. Just to name a few of them: -MathSciNet~\cite{mathscinet} by the American Mathematical Society and the -Electronic Research Archive for Mathematics (ERAM)~\cite{eram}. - -\subsection{The European project Euler} -A big improvement in the accessibility of all these electronic libraries have -been induced by the European based Euler project~\cite{euler}. On the behalf of -this project a web based gateway, with searching capabilities really similar to -those of Zentralblatt MATH, has been developed. Using Euler the user have access -to the catalogues and repositories of mathematical documents of participating -institutions, while the latter keep control over creation and maintenance of -their data. - -Euler is the state of the art of mathematical bibliographic searches on -the web: -a portal offering unified access to documents from Zentralblatt, the CWI -database of the Dutch national research center of mathematics~\cite{cwi}, ERAM -and many more institutions. Once an entry is found, access to the electronic or -printed version of the document is mediated via the web site of the document -owner. - -Euler is currently supported by the European Community as a take-up -Project (n.IST-2000-29445), based on the achievements of the successfully -completed EULER project (FP4 "Telematics for Libraries" project LB-5609). - -\subsection{Key phrases and information clouds} -To manage huge scientific archive metadata are essential, in particular key -phrases which should come from a large standardized (controlled) and updateable -list. A major problem here is that a perfectly good key phrase for a given chunk -of text may very well simply not occur there (or be so linguistically disguised -that it cannot actually be recognized). Scientists get around this by looking at -the surrounding text. - -This is the idea of an \emph{identification cloud} which in its simplest form is -just a list of words (possibly with weights) that is attached to a standard key -phrase and that are likely to occur in texts dealing with the concepts embodied -by that key phrase. The concept was introduced in the ongoing project Trial -Solution (IST-1999-11397)~\cite{clouds} with promising results. - - -\section{Mathematical services} -\label{sec:webservices} - -In the last years several research efforts have been made for the -standardization and the deployment of web services~\cite{webservices}. Fitting -properly in the semantic web~\cite{semanticweb} framework, web services are -software systems designed to support machine to machine interaction over a -network (usually the Internet), typically exposing a programming interface based -on exchange of XML documents~\cite{xml}. - -\subsection{The W3C standardization effort} -The standardization activity of the World Wide Web Consortium in the -area of Web Services is articulated in 5 working groups: Web -Services Architecture Working Group, XML Protocol Working Group, Web -Services Description Working Group, Web Services Choreography Working -Group and an auxiliary Coordination Group. The most relevant ones -are: -\begin{description} - - \item[Web Services Description - Working Group] - This group\footnote{\url{http://www.w3.org/2002/ws/desc/}} is - chartered to design an XML based language that should be able to - describe a web service \emph{interface}. This task includes also the - design of web service \emph{messages}, \emph{message exchange - patterns} and \emph{protocol bindings}. - - The group has already released, among other documents, a working draft of WSDL - (Web Services Description Language) 2.0~\cite{wsdl} and an additional document - which describe bindings of this language to other existing technologies like - SOAP, HTTP GET and POST methods, MIME~\cite{wsdlbindings}. - - \item[Web Services Choreography Working] - This ``young'' group\footnote{\url{http://www.w3.org/2002/ws/chor/}}, - started since January 2003, is chartered to design a language that is - able to describe choreographies of web services. Intend meaning of a - \emph{Web Service Choreography} is some kind of interaction between - web services. - - One possible usage of a choreographies is the creation of complex web services - simply composing simple web services as we compose functions in math. -\end{description} - -\subsection{The Monet project IST-2001-34145} - -Web service technologies could be really effective in solving long standing -problems of interapplication communication. Still, W3C standards provide only a -framework in which this problem could be solved and do not instantiate the -technologies to specific fields of interest. The road of standardizing and -deploying web service technologies for the special needs of mathematicians has -been took by the European Community funded Monet project~\cite{monet1,monet2}. -Aim of the project, recently completed, was the development of a framework in -which mathematical web services can describe their capabilities in as much -detail as is necessary to allow a sophisticated software agent to select a -suitable service based on an analysis of the characteristics of a user's -problem. - -Using standards and technologies developed by the Monet project it is possible -to implement what we call a mathematical \emph{functional search}, that is -finding on the network a web service able to resolve a given mathematical -\emph{problem}\footnote{Monet takes care of several other aspects of -mathematical web services like client-broker architecture, publishing and -discovery of services, planning, orchestration and so on. We will focus our -discussion on the discovery part of Monet}. - -Characteristics of mathematical web services in Monet are described in the XML -based language MSDL~\cite{msdl} (Mathematical Service Description Language). An -MSDL document is composed of several parts: classification, implementation -details, service interface and binding descriptions, broker interface and -service metadata. All these data could be used for querying an\emph{Instance -Store} (IS) for available services. - -From the point of view of the mathematician, the most interesting part is the -classification, a specification of \emph{what} the service does. Classification -is done on several axis: at each service several classification could be applied -and each of them could be used in user queries. - -A first classification is done giving a description of the \emph{mathematical -problems} the service is able to solve. Descriptions include problem inputs, -output and pre/post conditions. For example, minimization of a multivariate -function over the real numbers could be described as follows: -\begin{description} - \item[Input] $F: \IR^n\to\IR$ - \item[Output] 1. $x\in\IR^n$;\quad 2. $m\in\IR$ - \item[Post-conditions] 1. $F(x)=m$;\quad 2. $\nexists y\in\IR~|~F(y)