helm/mowgli/home/xml/project-objectives.xml

   1 <?xml version="1.0"?>
   2
   3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   4                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   5
   6 <html>
   7  <head>
   8    <title>Project Objectives</title>
   9  </head>
  10  <body>
  11   <h1>Project Objectives</h1>
  12   <p>The new frontier of Content Based Information Systems is the so called
  13      ``Semantic Web'' (see
  14      <a href="publications/others/w3c_bl98.html">others/w3c_bl98</a>).
  15      Associating meaning with content or establishing a layer of machine
  16      understandable data will allow automated agents, sophisticated search
  17      engines and interoperable services and will enable higher degree
  18      of automation and more intelligent applications.  The ultimate goal of the
  19      Semantic Web is to allow machines to share and exploit knowledge in the
  20      Web way, i.e. without central authority, with few basic rules, in a
  21      scalable, adaptable, extensible manner.  However, the actual development
  22      of the Semantic Web and its technologies has been hindered so far by the
  23      lack of large scale, distributed repositories of structured, content
  24      oriented information. The case of Mathematical knowledge, the most
  25      rigorous and condensed form of knowledge, is paradigmatic.  The World Wide
  26      Web is already now the largest single resource of mathematical knowledge,
  27      and its importance will be exponentiated by the emerging display
  28      technologies like MathML.  However, almost all mathematical documents
  29      available on the Web are marked up only for presentation (in this respect,
  30      current practice in MathML improves on, but does not fundamentally differ
  31      from the older paper-oriented markup schemes like {\LaTeX} or Postscript).
  32      A consequence of this is that the online material is machine-readable, but
  33      not machine-understandable, severely crippling the possibility to offer
  34      added-value services like</p>
  35   <ul>
  36    <li>Preservation of the real informative content in a highly structured and
  37        machine understandable format, suitable for transformation, automatic
  38        elaboration and processing.</li>
  39    <li>Cut and paste on the level of computation (take the output from a Web
  40        search engine and paste it into a computer algebra system).</li>
  41    <li>Automatic proof checking of published proofs.</li>
  42    <li>Semantical search for mathematical concepts (rather than keywords).</li>
  43    <li>Classification: given a concrete mathematical structure, is there a
  44        general theory for it?</li>
  45   </ul>
  46   <p>Due to its rich notational, logical and semantical structure, mathematical
  47      knowledge is thus a main case study for the development of the new
  48      generation of semantic Web systems.  The aim of the proposed project is
  49      both to help in this process, as well as pave the way towards a really
  50      useful virtual, distributed, hyper-textual resource for the working
  51      mathematician, scientist or engineer.  All modern sciences have a
  52      strongly mathematicised core, and will benefit. The real market and
  53      application area for the techniques developed in this project, apart from
  54      the obvious realm of education, lies with high-tech and engineering
  55      corporations that rely on huge formula databases. Currently, both the
  56      content markup as well as the added-value services alluded to above are
  57      very underdeveloped, limiting the usefulness of the vital knowledge. The
  58      infrastructure and knowhow needed for mining this information treasure
  59      and obtaining a competitive edge in development is exactly what we are
  60      attempting to develop in our project.</p>
  61   <p>Several languages have been already proposed for the management of
  62      mathematical information on the Web, both for publishing, communication
  63      and archiving purposes: most notably,
  64      <a href="http://www.w3.org/TR/MathML2/">MathML</a>,
  65      <a href="http://www.nag.co.uk/projects/openmath/omsoc/">OpenMath</a>,
  66      <a href="http://www.mathweb.org/omdoc/">OMDoc</a>. Other languages
  67      must be also considered for definition and specification of Metadata,
  68      such as the <a href="http://purl.org/dc/">Dublin Core</a> System, or
  69      the Resource Description Framework
  70      [<a href="http://www.w3.org/RDF/">RDF</a>].
  71      All these languages, which tend to cover different and orthogonal aspects
  72      of the management of mathematical documents, must be eventually taken into
  73      account for the ambitious goal of our project. One of our aims is actually
  74      the definition of a modular architecture which could exploit the
  75      distinctive potentialities of each one of these languages, integrating
  76      them into a single application.  The integration is in this case
  77      facilitated by the fact that all the languages mentioned are particular
  78      instances of XML, providing the opportunity to use standard XML
  79      technology, and in particular XSL Transformations or
  80      stylesheets [<a href="http://www.w3.org/TR/xslt">XSLT</a>], to pass from
  81      one language to the other.</p>
  82
  83   <img border="0" alt="Architecture" src="./../images/arch.gif" />
  84
  85   <p>The fact of encoding also the microscopic, logical level of mathematics
  86      opens the possibility to have completely formalised subsystems of the
  87      library, which could be checked automatically by standard tools for the
  88      automation of formal reasoning and the mechanisation of mathematics
  89      (proof assistants and logical frameworks, see
  90      <a href="publications/others/cup_hp91.html">others/cup_hp91</a> and
  91      <a href="publications/others/cup_hp93.html">others/cup_hp93</a>). At
  92      the same time, any of these tools could be used as an authoring system for
  93      documents of the library, by simply exporting their internal libraries
  94      into XML, and using stylesheets to transform the output into a standard,
  95      machine-understandable representation, such as MathML content markup or
  96      OpenMath.</p>
  97   <p>The precise formal content can still be preserved by the machinery of
  98      <a href="http://www.w3.org/TR/xlink/">Xlinks</a>. Moreover, stylesheets
  99      can be also used to solve the annoying notational problem that usually
 100      afflicts formal mathematics, providing a simple way for adding
 101      user-defined styles and notations.</p>
 102
 103     <p>So, our approach leads to a natural integration of proof assistant tools
 104        and the Web. In this integration, the emphasis is just on ``content'':
 105        we do not try to link directly the specific applications to the Web,
 106        that would be a major mistake, for obvious modularity reasons. On the
 107        contrary, we adopt XML as a neutral specification language, and then we
 108        merely work on XML-documents, forgetting the underlying application. In
 109        this way, similar software tools can be applied to different logical
 110        dialects, regardless of their concrete nature. Moreover, if having a
 111        common representation layer is not the ultimate solution to all
 112        inter-operability problems between different applications, it is
 113        however a first and essential step in this direction.  Finally, this
 114        ``standardisation'' process should naturally lead to a substantial
 115        simplification and re-organisation of the current, ``monolithic''
 116        architecture of logical frameworks. All the many different and often
 117        loosely connected functionalities of these complex programs (proof
 118        checking, editing, search and consulting, program extraction, and so on)
 119        could be clearly split in more or less autonomous tasks, and could be
 120        developed by different teams, in totally different languages. This is
 121        the new, ``content-based'' architectural design of future systems.</p>
 122  </body>
 123 </html>