3 <title>MOWGLI - A New Approach for the Content Description in Digital
5 <link rel="stylesheet" href="../../../style/mowgli.css" type="text/css" />
8 <h1 style="text-align: center">MOWGLI - A New Approach for the Content Description in Digital Documents</h1>
10 <h2 style="text-align: center">Andrea Asperti, University of Bologna, and Bernd Wegner, TU Berlin</h2>
15 <div style="font-style: italic">
16 <p>The acronym MOWGLI stands for "Mathematics On the Web: Get it by Logic and
17 Interfaces". MOWGLI is an European Project founded by the European Community
18 in the ``Information Society Technologies'' (IST) Programme. The partners are
19 the University of Bologna, INRIA (Rocquencourt), the German Research Centre
20 for Artificial Intelligence (DFKI, Saarbruecken), the Katholieke Universiteit
21 Nijmegen, the Max Planck Institute for Gravitational Physics (Albert Einstein
22 Institute, Golm), Trusted Logic (Paris) and TU Berlin.</p>
24 <p>The aim of the project is the study and the development of a technological
25 infrastructure for the creation and maintenance of a virtual, distributed,
26 hypertextual library of mathematical knowledge based on a content description
27 of the information. Currently, almost all mathematical documents available on
28 the Web are marked up only for presentation, severely crippling the
29 potentialities for automation, interoperability, sophisticated searching
30 mechanisms, intelligent applications, transformation and processing. The goal
31 of MOWGLI is to overcome these limitations, passing from a machine-readable to
32 a machine-understandable representation of the information, and developing the
33 technological infrastructure for its exploitation.</p>
35 <p>The project deals with problems traditionally belonging to different
36 scientific communities: digital libraries, Web publishing, automation of
37 mathematics and computer aided reasoning. Any serious solution to the complex
38 problem of mathematical knowledge management needs a co-ordinated effort of
39 all these groups and a synergy of their different expertise. MOWGLI attempts
40 to build a solid co-operation environment between these communities. The
41 current paper will concentrate on the aspects related to digital libraries.</p>
45 <h2>1. Aims and mission of MOWGLI</h2>
47 <p>After a ten years period of electronic publishing in mathematics we are still
48 confronted with slightly enhanced electronic versions of printed publications.
49 Almost all mathematical documents available on the Web are marked up only for
50 presentation, if such an enhancement is available at all. Only a minority of
51 documents try to care about some of the potentialities for automation,
52 interoperability, sophisticated searching mechanisms, intelligent
53 applications, transformation and processing. But these approaches could be
54 considered as first preliminary steps towards an electronic document providing
55 all these facilities. Hence, the goal of MOWGLI is to overcome these
56 limitations, passing form a machine-readable to a machine-understandable
57 representation of the information, and developing the technological
58 infrastructure for its exploitation.</p>
60 <p>In order to reach this goal MOWGLI has to deal with problems traditionally
61 belonging to different scientific communities: digital libraries, Web
62 publishing, automation of mathematics and computer aided reasoning. To our
63 knowledge, MOWGLI is the first attempt to build a solid co-operation
64 environment between these communities. In principle, any serious approach for
65 providing good tools for mathematical knowledge management needs a
66 co-ordinated effort of several partners from the above mentioned communities
68 their different expertise. The choice of partners for the took this condition
69 into account, as can be seen below.</p>
71 <p>The goals of MOWGLI largely overlap with the aims of the so called "Semantic
72 Web" <a href="#14">[14]</a>.
73 Associating meaning with content or establishing a layer of machine
74 understandable data will allow automated agents, sophisticated search engines
75 and interoperable services and will enable higher degree of automation and
76 more intelligent applications. The ultimate goal of the Semantic Web is to
77 allow machines to share and exploit knowledge in the Web way, i.e. without
78 central authority, with few basic rules, in a scalable, adaptable, extensible
79 manner. However, the actual development of the Semantic Web and its
80 technologies has been hindered so far by the lack of large scale, distributed
81 repositories of structured, content oriented information. The case of
82 mathematical knowledge, the most rigorous and condensed form of knowledge, is
83 paradigmatic. The World Wide Web is already now the largest single resource of
84 mathematical knowledge, and its importance hopefully be increased by the
85 emerging display technologies like MathML.</p>
87 <p>Machine understandable information will make possible to offer added-value
90 <li>Preservation of the real informative content in a highly structured and
91 machine understandable format, suitable for transformation, automatic
92 elaboration and processing.</li>
93 <li>Cut and paste on the level of computation (take the output from a Web
94 search engine and paste it into a computer algebra system).</li>
95 <li>Automatic proof checking of published proofs.</li>
96 <li>Semantic search for mathematical concepts (rather than keywords).</li>
97 <li>Indexing and Classification.</li>
101 <p>Due to its rich notational, logical and semantic structure, mathematical
102 knowledge is a main case study for the development of the new generation of
103 semantic Web systems. The aim of the MOWGLI project is both to help in this
104 process, as well as pave the way towards a really useful virtual, distributed,
105 hyper-textual resource for the working mathematician, scientist or engineer.</p>
108 <h2>2. Standards and Tools</h2>
110 <p>Current standards for electronic publishing in mathematics are mainly
111 presentation oriented. New tools for the management and publishing of
112 mathematical documents are in development like MathML
113 <a href="#3">[3]</a>, OpenMath, OMDoc
114 (<a href="#17">[17]</a>,<a href="#18">[18]</a>) and integrated with different
115 XML technology <a href="#7">[7]</a> (XSLT <a href="#8">[8]</a>, RDF
116 <a href="#4">[4]</a>, <a href="#5">[5]</a>, SOAP <a href="#6">[6]</a>, ...).
117 All these languages cover different and orthogonal
118 aspects of the information and its management; our aim is not to propose a new
119 standard, but to study and to develop the technological infrastructure
120 required for taking advantage of the potentialities of all of current
121 standards and those which are likely to be established in the near future.</p>
123 <p>MOWGLI makes an essential use of standard XML technology and aspires to
124 become an example of ``best practice'' in its use, and a pioneering leading
125 project in the new area of the Semantic Web <a href="#12">[12]</a>.
126 In particular, the potentialities of
127 XML will be deeply explored in the following directions:
129 <li>Publishing. XML offers sophisticated publishing technologies (Stylesheets,
130 MathML, SVG, etc.) which can be profitably used to solve, in a standard way,
131 the annoying notational problems that traditionally afflict content based and
132 machine-understandable encodings of the information.</li>
133 <li>Searching and Retrieving. Metadata will play a major role in MOWGLI. New
134 W3C languages such as the Resource Description Framework or XML Query are
135 likely to produce major innovative solutions in this field.</li>
136 <li>Interoperability. Disposing of a common, machine understandable layer is a
137 major and essential step in this direction.</li>
138 <li>Distribution. All XML technology is finally aimed to the access of the Web
139 as a single, distributed resource, with no central authority and few, simple
144 <p>MathML <a href="#3">[3]</a>, introducing for the first time a content markup
146 with a presentational one, has indubitably been a pioneering project towards
147 the mining of the mathematical treasure available on the web. Still, its
148 limitations are evident as well:
150 <li>MathML is merely focused on mathematical expressions. However, in order to
151 bring the idea of a Semantic Web of Mathematics to its full potentialities,
152 other layers of mathematical information must be considered as well. In
153 particular, we need a clean, microscopic description of proofs, a markup for
154 mathematichal "objects" (Theorems, Lemmas, Corollaries, Examples, etc.), a
155 markup for "structured collections" of these objects (Documents, Theories,
156 etc.), possibly "functors" between these collections, and finally a good
157 "metadata" layer.</li>
158 <li>MathML is just an (important) piece in a much wider technological puzzle.
159 Passing from content to a good presentational format requires sophisticated
160 operations; on the other side, these transformations are themselves a basic
161 component of the whole mathematical knowledge (like mathematical fonts). XSLT
162 <a href="#8">[8]</a> provides here the right technology, opening the way to
163 the creation of well maintained and documented libraries of mathematical
164 stylesheets <a href="#11">[11]</a>.</li>
168 <p>Similarly, the creation and maintenance of the library as a distributed
169 repository, and the crucial aspect of managing the information in the ``web
170 way'' requires a light but powerful communication protocol, overcoming some of
171 the limitations of HTTP (SOAP <a href="#6">[6]</a> looks as a promising
174 <p>Metadata will eventually require a fairly sophisticated model, much beyond
175 what is currently offered by typical metadata models as the Dublin-Core system
176 <a href="#1">[1]</a>. Here, RDF (Resource Description Framework)
177 (<a href="#4">[4]</a>, <a href="#5">[5]</a>) looks as the right
178 framework for developing the model, providing a general architectural model
179 for expressing metadata and a precise syntax for the encoding and interchange
180 of these metadata over the Web.</p>
182 <p>The fact of encoding also the microscopic, logical level of mathematics opens
183 the possibility to have completely formalised subsystems of the library
184 (<a href="#9">[9]</a>,<a href="#10">[10]</a>), which could be checked
185 automatically by standard tools for the
186 automation of formal reasoning and the mechanisation of mathematics (proof
187 assistants and logical frameworks
188 (<a href="#15">[15]</a>,<a href="#16">[16]</a>). At the same time, any of these
189 tools could be used as an authoring system for documents of the library, by
190 simply exporting their internal libraries into XML, and using stylesheets to
191 transform the output into a standard, machine-understandable representation,
192 such as MathML content markup or OpenMath. In MOWGLI we shall use the COQ
193 Proof Assistant of INRIA <a href="#13">[13]</a> as a paradigmatic example of
194 these applications.</p>
196 <p>An alternative route for the creation of content-based mathematical
197 information from standard digital repositories by means of a suitable
198 LaTeX-based authoring system will be explored by the Albert Einstein
199 Institute. They publish the "Living Reviews in Relativity"
200 <a href="#2">[2]</a>, a solely
201 electronic journal on the Web, which provides refereed, regularly updated
202 review articles on all areas of gravitational physics. AEI will develop a
203 LaTeX-based authoring tool interfacing with MOWGLI, and serve as a showcase to
204 demonstrate how content-mark-up in mathematics improves the usability and
205 information depth of electronic science journals.</p>
208 <h2>3. A minimal technological infrastructure</h2>
210 <p>It is clear that the creation and maintenance of large repositories of
211 content-based mathematical knowledge can only be conceived as a cooperative
212 and distributed process, comprising not only the creation of documents, but
213 also libraries of notational rules, metadata and management tools. The crucial
214 point is to build a minimal infrastructure to start up this process, so that
215 more and more tools can be added by interested parties. All these
216 considerations lead to two requirements for the developments in MOWGLI:
218 <li>Information must be accessible with few basic rules an no central
219 authority (the web way).</li>
220 <li>Make extensive use of standard XML technology and tools, even when it would
221 be easier or more efficient just to develop an ad-hoc solution.</li>
225 <p>In this way, we put no barrier to third party development and, every time a
226 standard technology or tool is improved, we can simply benefit of the new
227 implementation with minimal effort.</p>
229 <p>The MOWGLI architecture is essentially based on three components, which are
230 distribution sites, standard browsers and plug-outs, and active components,
231 such as XSLT processors, to elaborate the information. Distribution sites are
232 simply HTTP and FTP servers, widespread throughout the world; user browsers
233 are HTTP clients and run on the user host. We do not require any other
234 components to run on a specific host. Active components must provide answers
235 to browsers, requiring an HTTP server interface; they must also ask data to
236 distribution sites, acting as HTTP clients. Hence, MOWGLI is essentially
237 conceived as an HTTP pipeline.</p>
239 <p>The module client of the distribution sites is the "getter", which maps URIs
240 to URLs and hence documents, offering functionalities similar to the APT
241 packet management system
242 (<a href="http://www.debian.org">http://www.debian.org</a>).</p>
244 <p>The main active component is the XSLT stylesheet manager, whose typical
245 functionality is the application of a list of stylesheets (each one with the
246 respective list of parameters) to a document. However, other components may be
247 added in a completely modular way. This is exactly the content-based
248 architectural design of future web system enabled by XML technology.</p>
251 <h2>4. The contributions from the participants</h2>
253 <p>The concrete background for the work in MOWGLI is represented by the
254 activities at the participating institutions. Though details could easily be
255 obtained from the MOWGLI web-page
256 (<a href="http://mowgli.cs.unibo.it">http://mowgli.cs.unibo.it</a>) some short
257 remarks on this background should be made here.</p>
259 <p>The Department of Computer Science at the University of Bologna is the only
260 educational institution in Italy to be affiliated to W3C. They care about the
261 coordination of the project. The HELM project (Hypertextual Electronic Library
263 <a href="http://www.cs.unibo.it/helm">http://www.cs.unibo.it/helm</a>, see also
264 <a href="#12">[12]</a>) is active in
265 Bologna since 1999. It is one of the systems of reference mentioned in the
266 previous section.</p>
268 <p>INRIA (Institut National de Recherche en Informatique et Automatique) is a
269 French institution located in Rocquencourt. They pursue two projects of
270 importance for MOWGLI: the Lemme project, introducing and developing formal
271 methods for use in writing scientific computing software, and the LogiCal
272 project, which developed the Coq proof assistant (see
273 <a href="#13">[13]</a>).</p>
275 <p>The German Research Center for Artificial Intelligence (DFKI) is based in
276 Kaiserslautern and Saarbruecken. Its main mission is technology transfer, i.e.
277 to move innovations in Artificial Intelligence from the lab to the market
278 place. Its main MOWGLI-related prototypical product so far has been the
279 Web-based learning environment ActiveMath that integrates several external services.</p>
281 <p>The Subfaculteit Informatica of Katholieke Universiteit Nijmegen hosts a
282 broad experience in logic, formal methods and theorem proving. They are
283 involved in several research activities in this domain as the EC sponsored
284 Network "TYPES", the FTA project (Fundamental Theorem of Algebra), the EC
285 Working group Calculemus which also deals with OpenMath et al.</p>
287 <p>The role of the Albert Einstein Institute (MPG, Golm) near Potsdam has been
288 described above already. They provide a test bed with the Living Reviews which
289 will represent the important link to the domain of mathematical publishing.
290 This also is the main concern of the partner TU Berlin which is formally
291 associated to AEI caring about the exploitation and information dissemination
294 <p>Trusted Logic makes the group complete. This is a French start-up company,
295 which offers a wide range of efficient and secure solutions of smart cards and
296 terminals in a wide range of areas. Their development methodology includes a
297 permanent concern of quality and security aspects.</p>
299 <p>As it is common for projects like MOWGLI the cooperation between the partners
300 is regulated by workpackages and a time schedule for the deliveries. But the
301 project started formally in March 2002. Hence these things are still theory,
302 and it will be subject of the next report on MOWGLI to describe, how theory
303 came into practise.</p>
306 <h2>BIBLIOGRAPHY</h2>
309 <dt><a name="1"></a>[1]</dt>
310 <dd>The Dublin Core Metadata Inititiative. <a href="http://purl.org/dc/">http://purl.org/dc/</a></dd>
312 <dt><a name="2"></a>[2]</dt>
313 <dd>Living Reviews in Relativity.
314 <a href="http://www.livingreviews.org">http://www.livingreviews.org.</a></dd>
316 <dt><a name="3"></a>[3]</dt>
317 <dd>Mathematical Markup Language (MathML) 2.0 W3C Recommendation, 21 February
318 2001. <a href="http://www.w3.org/TR/MathML2/">http://www.w3.org/TR/MathML2/.</a>
321 <dt><a name="4"></a>[4]</dt>
322 <dd>Resource Description Framework (RDF) Model and Syntax Specification, W3C
323 Recommendation 22 February 1999.
324 <a href="http://www.w3.org/TR/1999/REC-rdf-syntax-19990222">/http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/</a></dd>
326 <dt><a name="5"></a>[5]</dt>
327 <dd>Resource Description Framework (RDF) Schema Specification 1.0, W3C
328 Candidate Recommendation 27 March 2000.
329 <a href="http://www.w3.org/TR/rdf-schema/">http://www.w3.org/TR/rdf-schema/</a></dd>
331 <dt><a name="6"></a>[6]</dt>
332 <dd>SOAP Version 1.2 Part 0: Primer. W3C Working Draft 17 December 2001.
333 <a href="http://www.w3.org/TR/2001/WD-soap12-part0-20011217">http://www.w3.org/TR/2001/WD-soap12-part0-20011217</a>.</dd>
335 <dt><a name="7"></a>[7]</dt>
336 <dd>Extensible Markup Language (XML) Specification. Version 1.0. W3C
337 Recommendation, 10 February 1998.
338 <a href="http://www.w3.org/TR/REC-xml">http://www.w3.org/TR/REC-xml</a>
341 <dt><a name="8"></a>[8]</dt>
342 <dd>XSL Transformations (XSLT). Version 1.0, W3C Recommendation, 16 November
343 1999. <a href="http://www.w3.org/TR/xslt">http://www.w3.org/TR/xslt</a>.</dd>
345 <dt><a name="9"></a>[9]</dt>
346 <dd>Asperti, A.; Padovani, L.; Sacerdoti Coen C.; Schena, I.: Formal
347 Mathematics in MathML. Proceedings of the First International Conference on
348 MathML and Math on the Web, October 20-21 2000, University of Illinois at Urbana-Champaign.</dd>
350 <dt><a name="10"></a>[10]</dt>
351 <dd>Asperti, A.; Padovani, L.; Sacerdoti Coen, C.; Schena, I.: Formal
352 Mathematics on the Web. Proceedings of the Eighth International Conference on
353 Libraries and Associations in the Transient World: New Technologies and New
354 Forms of Cooperation, June 9-17, 2001, Sudak, Autonomous Republic of Crimea, Ukraine.</dd>
356 <dt><a name="11"></a>[11]</dt>
357 <dd>Asperti, A.; Padovani, L.; Sacerdoti Coen, C.; Schena, I.: XML,
358 Stylesheets and the re-mathematization of Formal Content. Proceedings of
359 Extreme Markup Languages 2001 Conference, August 12-17, 2001, Montreal, Canada.</dd>
361 <dt><a name="12"></a>[12]</dt>
362 <dd>Asperti, A.; Padovani, L.; Sacerdoti Coen, C.; Schena, I.: HELM and the
363 semantic Math-Web. Proceedings of the 14th International Conference on Theorem
364 Proving in Higher Order Logics (TPHOLS 2001), 3-6 September 2001,
365 Edinburgh, Scotland.</dd>
367 <dt><a name="13"></a>[13]</dt>
368 <dd>B. Barras et al.:The Coq Proof Assistant Reference Manual, version 6.3.1,
369 <a href="http://pauillac.inria.fr/coq">http://pauillac.inria.fr/coq</a></dd>
371 <dt><a name="14"></a>[14]</dt>
372 <dd>Tim Berner's Lee: The Semantic Web. W3C Architecture Note, 1998. </dd>
374 <dt><a name="15"></a>[15]</dt>
375 <dd>G. Huet, G. Plotkin (eds): Logical Frameworks. Cambridge University</dd>
378 <dt><a name="16"></a>[16]</dt>
379 <dd>G. Huet, G. Plotkin (eds): Logical Environments. Cambridge University
382 <dt><a name="17"></a>[17]</dt>
383 <dd>Kohlase, M.: OMDoc: Towards an Internet Standard for the Administration,
384 Distribution and Teaching of mathematical Knowledge. Proceedings of Artificial
385 Intelligence and Symbolic Computation, Springer LNAI, 2000. </dd>
387 <dt><a name="18"></a>[18]</dt>
388 <dd>Kohlase, M.: OMDoc: An Infrastructure for OpenMath Content Dictionary
389 Information. Bulletin of the ACM Special Interest Group for Algorithmic
390 Mathematics SIGSAM, 2000.</dd>
394 Prof. Dr. Andrea Asperti<br />
395 Dipartimento di Scienze dell Informazione<br />
396 Universita degli Studii di Bologna<br />
397 Via di mura Anteo Zamboni VII<br />
398 I - 40127 Bologna<br />
403 Prof. Dr. Bernd Wegner<br />
404 Fakultaet II, Institut fuer Mathematik<br />
405 TU Berlin, Sekr. MA 8-1<br />
406 Strasse des 17. Juni 135<br />
407 D - 10623 Berlin<br />