1 <?xml version="1.0" encoding="UTF-8"?>
3 <page xmlns="http://lambdadelta.info/"
4 description = "\lambda\delta home page: Open Symbolic Notation"
5 title = "\lambda\delta home page: Open Symbolic Notation"
7 head = "Open Symbolic Notation"
10 Open Symbolic Notation, abbreviated OSN,
11 is an easy and flexible data-interchange text format
12 intended for the lightweight representation of
13 generic abstract syntax trees in the domain of formal languages.
14 In order to meet these design goals, OSN pursues the following features.
16 <list><item class="red-mark"><style class="alpha">
17 <link to="https://en.wikipedia.org/wiki/S-expression">Symbolic expressions</link>
18 based on widely accepted syntactical conventions
19 provide for a <notice text="lightweight"/> and <notice text="generic"/> grammar,
20 which is both <notice text="easy for machines to process"/>,
21 and <notice text="easy for humans to understand"/>.
22 As a mean to support <notice text="efficient"/> information processing,
23 OSN aims at an economic representation of data
24 contrary to <link to="http://www.w3.org/TR/2008/REC-xml-20081126/#sec-origin-goals">XML design goal 10</link>.
25 Compared to other data-interchange formats based on symbolic expressions,
26 like <link to="http://people.csail.mit.edu/rivest/Sexp.txt">canonical symbolic expressions</link>,
27 representing arbitrary data in binary format is a secondary concern in the design of OSN,
28 as well as the support for canonicalization.
29 Apparently, these features fall outside the scope of OSN,
30 which targets the data structures of <notice text="formal languages"/>.
31 </style></item></list>
32 <list><item class="blue-mark"><style class="alpha">
33 Optionally <link to="https://en.wikipedia.org/wiki/Namespace">qualified</link> symbolic expressions
34 allow OSN texts to mix data from different domains preserving their own semantics
35 because name conflicts can be avoided.
36 As a consequence OSN documents are <notice text="easy to extend"/> in that
37 domain-specific OSN applications can work as expected even if
38 data from different domains is added to the text they process.
39 </style></item></list>
40 <list><item class="green-mark"><style class="alpha">
41 The <link to="https://en.wikipedia.org/wiki/ASCII">US-ASCII</link> character set,
42 extended to <link to="http://www.utf-8.com/">UTF-8</link> in
43 free-form text strings for the convenience of human readers,
44 makes OSN documents <notice text="easy to visualize and transport"/> over communication media.
45 OSN design aims at supporting <notice text="application-independent"/> standard encodings.
46 </style></item></list>
48 <section6 name="grammar">Grammar</section6>
51 An OSN text uses the <link to="http://www.utf-8.com/">UTF-8</link> character set
52 and contains the next seven tokens that we define in a very common EBNF variant.
53 Characters not starting a token are not allowed.
54 The ones in the range U+0021 ... U+007E are ! # $ % & * / ? @ \ ^ | ~
55 and are available for extensions of OSN.
58 This token can represent the identifiers and the numerical constants of most programming languages:
60 <prod of="symbol"/> <def/>
61 <plus/> <prod of="symbol-char"/>
63 <prod of="symbol-char"/> <def/>
64 <str2 of="+"/> <or/> <str2 of="-"/> <or/> <str2 of="."/> <or/>
65 <str2 of="0"/> <etc/> <str2 of="9"/> <or/>
66 <str2 of="A"/> <etc/> <str2 of="Z"/> <or/>
67 <str2 of="_"/> <or/> <str2 of="`"/> <or/>
68 <str2 of="a"/> <etc/> <str2 of="z"/>
73 This token contains free-form text with commonly accepted escape sequences:
75 <prod of="string"/> <def/>
76 <str2 of="""/> <and/>
78 <prod of="string-char"/> <or/>
80 <str2 of="\"/> <prod of="escape"/>
84 <prod of="string-char"/> <def/>
86 <xchr of="0"/> <etc/> <xchr of="10FFFF"/>
87 <close/> <but/> <open/>
88 <xchr of="0"/> <etc/> <xchr of="1F"/> <or/>
89 <str1 of="'"/> <or/> <str2 of="\"/> <or/>
90 <str2 of="""/> <or/> <xchr of="7F"/>
93 <prod of="escape"/> <def/>
94 <plus/> <prod of="space"/> <or/>
95 <str2 of="""/> <or/> <str1 of="'"/> <or/>
96 <str2 of="("/> <or/> <str2 of=")"/> <or/>
97 <str2 of="0"/> <or/> <str2 of="\"/> <or/>
98 <str2 of="a"/> <or/> <str2 of="b"/> <or/>
100 <str2 of="f"/> <or/> <str2 of="n"/> <or/>
101 <str2 of="r"/> <or/> <str2 of="t"/> <or/>
103 <str2 of="u"/> <and/> <spec of="4"/> <prod of="hex"/>
107 <str2 of="x"/> <and/> <spec of="2"/> <prod of="hex"/>
110 <prod of="space"/> <def/>
111 <xchr of="9"/> <etc/> <xchr of="D"/> <or/>
114 <prod of="hex"/> <def/>
115 <str2 of="0"/> <etc/> <str2 of="9"/> <or/>
116 <str2 of="A"/> <etc/> <str2 of="F"/> <or/>
117 <str2 of="a"/> <etc/> <str2 of="f"/>
122 This token is a widely used alternative of the former token:
124 <prod of="string-alt"/> <def/>
125 <str1 of="'"/> <and/>
127 <prod of="string-char"/> <or/>
128 <str2 of="""/> <or/>
129 <str2 of="\"/> <prod of="escape"/>
136 This token separates the qualifiers of a symbolic expression:
138 <prod of="sep"/> <def/> <str2 of=":"/> <stop/>
142 This token starts a compound symbolic expression:
144 <prod of="open"/> <def/>
145 <str2 of="("/> <or/> <str2 of="<"/> <or/> <str2 of="["/> <or/> <str2 of="{"/>
150 This token ends a compound symbolic expression:
152 <prod of="close"/> <def/>
153 <str2 of=")"/> <or/> <str2 of=">"/> <or/> <str2 of="]"/> <or/> <str2 of="}"/>
158 This token is ignored and separates the other tokens:
160 <prod of="gap"/> <def/>
161 <prod of="space"/> <or/>
162 <str2 of=","/> <or/> <str2 of=";"/> <or/> <str2 of="="/>
168 The grammar of OSN is very liberal by design.
169 Spaces of the form <ebnf><plus/> <prod of="gap"/></ebnf> can appear between any pair of tokens.
174 <prod of="text"/> <def/>
175 <star/> <prod of="q-expr"/>
180 A qualified symbolic expression:
182 <prod of="q-expr"/> <def/>
184 <prod of="symbol"/> <and/> <plus/> <prod of="sep"/>
191 An unqualified symbolic expression:
193 <prod of="expr"/> <def/>
194 <prod of="symbol"/> <or/>
195 <prod of="string"/> <or/>
196 <prod of="string-alt"/> <or/>
198 <prod of="open"/> <and/>
199 <prod of="text"/> <and/>
206 <section1 name="semantics">Semantics</section1>
212 morover, the escape sequences \x <two hexadecimal digits> and \u <four hexadecimal digits>
213 allow to specify a character by its code point <newline/>
214 finally the escape sequences \( for U+0002 and \) for U+0003 are available
217 <section5 name="implementation">Implementation</section5>