helm/DEVEL/pxp/netstring/mimestring.mli

   1 (* $Id$
   2  * ----------------------------------------------------------------------
   3  *
   4  *)
   5
   6 (**********************************************************************)
   7 (* Collection of auxiliary functions to parse MIME headers            *)
   8 (**********************************************************************)
   9
  10
  11 val scan_header :
  12        ?unfold:bool ->
  13        string -> start_pos:int -> end_pos:int ->
  14          ((string * string) list * int)
  15     (* let params, i2 = scan_header s i0 i1:
  16      *
  17      * DESCRIPTION
  18      *
  19      * Scans the MIME header that begins at position i0 in the string s
  20      * and that must end somewhere before position i1. It is intended
  21      * that in i1 the character position following the end of the body of the
  22      * MIME message is passed.
  23      * Returns the parameters of the header as (name,value) pairs (in
  24      * params), and in i2 the position of the character following
  25      * directly after the header (i.e. after the blank line separating
  26      * the header from the body).
  27      * The following normalizations have already been applied:
  28      * - The names are all in lowercase
  29      * - Newline characters (CR and LF) have been removed (unless
  30      *   ?unfold:false has been passed)
  31      * - Whitespace at the beginning and at the end of values has been
  32      *   removed (unless ?unfold:false is specified)
  33      * The rules of RFC 2047 have NOT been applied.
  34      * The function fails if the header violates the header format
  35      * strongly. (Some minor deviations are tolerated, e.g. it is sufficient
  36      * to separate lines by only LF instead of CRLF.)
  37      *
  38      * OPTIONS:
  39      *
  40      * unfold: If true (the default), folded lines are concatenated and
  41      *   returned as one line. This means that CR and LF characters are
  42      *   deleted and that whitespace at the beginning and the end of the
  43      *   string is removed.
  44      *   You may set ?unfold:false to locate individual characters in the
  45      *   parameter value exactly.
  46      *
  47      * ABOUT MIME MESSAGE FORMAT:
  48      *
  49      * This is the modern name for messages in "E-Mail format". Messages
  50      * consist of a header and a body; the first empty line separates both
  51      * parts. The header contains lines "param-name: param-value" where
  52      * the param-name must begin on column 0 of the line, and the ":"
  53      * separates the name and the value. So the format is roughly:
  54      *
  55      * param1-name: param1-value
  56      * ...
  57      * paramN-name: paramN-value
  58      *
  59      * body
  60      *
  61      * This function wants in i0 the position of the first character of
  62      * param1-name in the string, and in i1 the position of the character
  63      * following the body. It returns as i2 the position where the body
  64      * begins. Furthermore, in 'params' all parameters are returned that
  65      * exist in the header.
  66      *
  67      * DETAILS
  68      *
  69      * Note that parameter values are restricted; you cannot represent
  70      * arbitrary strings. The following problems can arise:
  71      * - Values cannot begin with whitespace characters, because there
  72      *   may be an arbitrary number of whitespaces between the ':' and the
  73      *   value.
  74      * - Values (and names of parameters, too) must only be formed of
  75      *   7 bit ASCII characters. (If this is not enough, the MIME standard
  76      *   knows the extension RFC 2047 that allows that header values may
  77      *   be composed of arbitrary characters of arbitrary character sets.)
  78      * - Header values may be broken into several lines, the continuation
  79      *   lines must begin with whitespace characters. This means that values
  80      *   must not contain line breaks as semantical part of the value.
  81      *   And it may mean that ONE whitespace character is not distinguishable
  82      *   from SEVERAL whitespace characters.
  83      * - Header lines must not be longer than 76 characters. Values that
  84      *   would result into longer lines must be broken into several lines.
  85      *   This means that you cannot represent strings that contain too few
  86      *   whitespace characters.
  87      * - Some gateways pad the lines with spaces at the end of the lines.
  88      *
  89      * This implementation of a MIME scanner tolerates a number of
  90      * deviations from the standard: long lines are not rejected; 8 bit
  91      * values are accepted; lines may be ended only with LF instead of
  92      * CRLF.
  93      * Furthermore, header values are transformed:
  94      * - leading and trailing spaces are always removed
  95      * - CRs and LFs are deleted; it is guaranteed that there is at least
  96      *   one space or tab where CR/LFs are deleted.
  97      * Last but not least, the names of the header values are converted
  98      * to lowercase; MIME specifies that they are case-independent.
  99      *
 100      * COMPATIBILITY WITH THE STANDARD
 101      *
 102      * This function can parse all MIME headers that conform to RFC 822.
 103      * But there may be still problems, as RFC 822 allows some crazy
 104      * representations that are actually not used in practice.
 105      * In particular, RFC 822 allows it to use backslashes to "indicate"
 106      * that a CRLF sequence is semantically meant as line break. As this
 107      * function normally deletes CRLFs, it is not possible to recognize such
 108      * indicators in the result of the function.
 109      *)
 110
 111 (**********************************************************************)
 112
 113 (* The following types and functions allow it to build scanners for
 114  * structured MIME values in a highly configurable way.
 115  *
 116  * WHAT ARE STRUCTURED VALUES?
 117  *
 118  * RFC 822 (together with some other RFCs) defines lexical rules
 119  * how formal MIME header values should be divided up into tokens. Formal
 120  * MIME headers are those headers that are formed according to some
 121  * grammar, e.g. mail addresses or MIME types.
 122  *    Some of the characters separate phrases of the value; these are
 123  * the "special" characters. For example, '@' is normally a special
 124  * character for mail addresses, because it separates the user name
 125  * from the domain name. RFC 822 defines a fixed set of special
 126  * characters, but other RFCs use different sets. Because of this,
 127  * the following functions allow it to configure the set of special characters.
 128  *    Every sequence of characters may be embraced by double quotes,
 129  * which means that the sequence is meant as literal data item;
 130  * special characters are not recognized inside a quoted string. You may
 131  * use the backslash to insert any character (including double quotes)
 132  * verbatim into the quoted string (e.g. "He said: \"Give it to me!\"").
 133  * The sequence of a backslash character and another character is called
 134  * a quoted pair.
 135  *    Structured values may contain comments. The beginning of a comment
 136  * is indicated by '(', and the end by ')'. Comments may be nested.
 137  * Comments may contain quoted pairs. A
 138  * comment counts as if a space character were written instead of it.
 139  *    Control characters are the ASCII characters 0 to 31, and 127.
 140  * RFC 822 demands that MIME headers are 7 bit ASCII strings. Because
 141  * of this, this function also counts the characters 128 to 255 as
 142  * control characters.
 143  *    Domain literals are strings embraced by '[' and ']'; such literals
 144  * may contain quoted pairs. Today, domain literals are used to specify
 145  * IP addresses.
 146  *    Every character sequence not falling in one of the above categories
 147  * is an atom (a sequence of non-special and non-control characters).
 148  * When recognized, atoms may be encoded in a character set different than
 149  * US-ASCII; such atoms are called encoded words (see RFC 2047).
 150  *
 151  * EXTENDED INTERFACE:
 152  *
 153  * In order to scan a string containing a MIME value, you must first
 154  * create a mime_scanner using the function create_mime_scanner.
 155  * The scanner contains the reference to the scanned string, and a
 156  * specification how the string is to be scanned. The specification
 157  * consists of the lists 'specials' and 'scan_options'.
 158  *
 159  * The character list 'specials' specifies the set of special characters.
 160  * These characters are returned as Special c token; the following additional
 161  * rules apply:
 162  *
 163  * - Spaces:
 164  *   If ' ' in specials: A space character is returned as Special ' '.
 165  *       Note that there may also be an effect on how comments are returned
 166  *       (see below).
 167  *   If ' ' not in specials: Spaces are ignored.
 168  *
 169  * - Tabs, CRs, LFs:
 170  *   If '\t' in specials: A tab character is returned as Special '\t'.
 171  *   If '\t' not in specials: Tabs are ignored.
 172  *
 173  *   If '\r' in specials: A CR character is returned as Special '\r'.
 174  *   If '\r' not in specials: CRs are ignored.
 175  *
 176  *   If '\n' in specials: A LF character is returned as Special '\n'.
 177  *   If '\n' not in specials: LFs are ignored.
 178  *
 179  * - Comments:
 180  *   If '(' in specials: Comments are not recognized. The character '('
 181  *       is returned as Special '('.
 182  *   If '(' not in specials: Comments are recognized. How comments are
 183  *       returned, depends on the following:
 184  *       If Return_comments in scan_options: Outer comments are returned as
 185  *           Comment (note that inner comments count but
 186  *           are not returned as tokens)
 187  *       If otherwise ' ' in specials: Outer comments are returned as
 188  *           Special ' '
 189  *       Otherwise: Comments are recognized but ignored.
 190  *
 191  * - Quoted strings:
 192  *   If '"' in specials: Quoted strings are not recognized, and double quotes
 193  *       are returned as Special '"'.
 194  *   If '"' not in specials: Quoted strings are returned as QString tokens.
 195  *
 196  * - Domain literals:
 197  *   If '[' in specials: Domain literals are not recognized, and left brackets
 198  *       are returned as Special '['.
 199  *   If '[' not in specials: Domain literals are returned as DomainLiteral
 200  *       tokens.
 201  *
 202  * Note that the rule for domain literals is completely new in netstring-0.9.
 203  * It may cause incompatibilities with previous versions if '[' is not
 204  * special.
 205  *
 206  * The general rule for special characters: Every special character c is
 207  * returned as Special c, and any additional scanning functionality
 208  * for this character is turned off.
 209  *
 210  * If recognized, quoted strings are returned as QString s, where
 211  * s is the string without the embracing quotes, and with already
 212  * decoded quoted pairs.
 213  *
 214  * Control characters c are returned as Control c.
 215  *
 216  * If recognized, comments may either be returned as spaces (in the case
 217  * you are not interested in the contents of comments), or as Comment tokens.
 218  * The contents of comments are not further scanned; you must start a
 219  * subscanner to analyze comments as structured values.
 220  *
 221  * If recognized, domain literals are returned as DomainLiteral s, where
 222  * s is the literal without brackets, and with decoded quoted pairs.
 223  *
 224  * Atoms are returned as Atom s where s is a longest sequence of
 225  * atomic characters (all characters which are neither special nor control
 226  * characters nor delimiters for substructures). If the option
 227  * Recognize_encoded_words is on, atoms which look like encoded words
 228  * are returned as EncodedWord tokens. (Important note: Neither '?' nor
 229  * '=' must be special in order to enable this functionality.)
 230  *
 231  * After the mime_scanner has been created, you can scan the tokens by
 232  * invoking scan_token which returns one token at a time, or by invoking
 233  * scan_token_list which returns all following tokens.
 234  *
 235  * There are two token types: s_token is the base type and is intended to
 236  * be used for pattern matching. s_extended_token is a wrapper that
 237  * additionally contains information where the token occurs.
 238  *
 239  * SIMPLE INTERFACE
 240  *
 241  * Instead of creating a mime_scanner and calling the scan functions,
 242  * you may also invoke scan_structured_value. This function returns the
 243  * list of tokens directly; however, it is restricted to s_token.
 244  *
 245  * EXAMPLES
 246  *
 247  * scan_structured_value "user@domain.com" [ '@'; '.' ] []
 248  *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 249  *
 250  * scan_structured_value "user @ domain . com" [ '@'; '.' ] []
 251  *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 252  *
 253  * scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] []
 254  *   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 255  *
 256  * scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ]
 257  *     [ Return_comments ]
 258  *   = [ Atom "user"; Comment; Special '@'; Atom "domain"; Special '.';
 259  *       Atom "com" ]
 260  *
 261  * scan_structured_value "user (Do you know him?) @ domain . com"
 262  *     [ '@'; '.'; ' ' ] []
 263  *   = [ Atom "user"; Special ' '; Special ' '; Special ' '; Special '@';
 264  *       Special ' '; Atom "domain";
 265  *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 266  *
 267  * scan_structured_value "user (Do you know him?) @ domain . com"
 268  *     [ '@'; '.'; ' ' ] [ Return_comments ]
 269  *   = [ Atom "user"; Special ' '; Comment; Special ' '; Special '@';
 270  *       Special ' '; Atom "domain";
 271  *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 272  *
 273  * scan_structured_value "user @ domain . com" [ '@'; '.'; ' ' ] []
 274  *   = [ Atom "user"; Special ' '; Special '@'; Special ' '; Atom "domain";
 275  *       Special ' '; Special '.'; Special ' '; Atom "com" ]
 276  *
 277  * scan_structured_value "user(Do you know him?)@domain.com" ['@'; '.'; '(']
 278  *     []
 279  *   = [ Atom "user"; Special '('; Atom "Do"; Atom "you"; Atom "know";
 280  *       Atom "him?)"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
 281  *
 282  * scan_structured_value "\"My.name\"@domain.com" [ '@'; '.' ] []
 283  *   = [ QString "My.name"; Special '@'; Atom "domain"; Special '.';
 284  *       Atom "com" ]
 285  *
 286  * scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?="
 287  *     [ ] [ ]
 288  *   = [ Atom "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" ]
 289  *
 290  * scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?="
 291  *     [ ] [ Recognize_encoded_words ]
 292  *   = [ EncodedWord("ISO-8859-1", "Q", "Keld_J=F8rn_Simonsen") ]
 293  *
 294  *)
 295
 296
 297
 298 type s_token =
 299     Atom of string
 300   | EncodedWord of (string * string * string)
 301   | QString of string
 302   | Control of char
 303   | Special of char
 304   | DomainLiteral of string
 305   | Comment
 306   | End
 307
 308 (* - Words are: Atom, EncodedWord, QString.
 309  * - Atom s: The character sequence forming the atom is contained in s
 310  * - EncodedWord(charset, encoding, encoded_string) means:
 311  *   * charset is the (uppercase) character set
 312  *   * encoding is either "Q" or "B"
 313  *   * encoded_string: contains the text of the word; the text is represented
 314  *     as octet string following the conventions for character set charset and
 315  *     then encoded either as "Q" or "B" string.
 316  * - QString s: Here, s are the characters inside the double quotes after
 317  *   decoding any quoted pairs (backslash + character pairs)
 318  * - Control c: The control character c
 319  * - Special c: The special character c
 320  * - DomainLiteral s: s contains the characters inside the brackets after
 321  *   decoding any quoted pairs
 322  * - Comment: if the option Return_comments is specified, this token
 323  *   represents the whole comment.
 324  * - End: Is returned after the last token
 325  *)
 326
 327
 328 type s_option =
 329     No_backslash_escaping
 330       (* Do not handle backslashes in quoted string and comments as escape
 331        * characters; backslashes are handled as normal characters.
 332        * For example: "C:\dir\file" will be returned as
 333        * QString "C:\dir\file", and not as QString "C:dirfile".
 334        * - This is a common error in many MIME implementations.
 335        *)
 336   | Return_comments
 337       (* Comments are returned as token Comment (unless '(' is included
 338        * in the list of special characters, in which case comments are
 339        * not recognized at all).
 340        * You may get the exact location of the comment by applying
 341        * get_pos and get_length to the extended token.
 342        *)
 343   | Recognize_encoded_words
 344       (* Enables that encoded words are recognized and returned as
 345        * EncodedWord(charset,encoding,content) instead of Atom.
 346        *)
 347
 348 type s_extended_token
 349   (* An opaque type containing s_token plus:
 350    * - where the token occurs
 351    * - RFC-2047 access functions
 352    *)
 353
 354 val get_token : s_extended_token -> s_token
 355     (* Return the s_token within the s_extended_token *)
 356
 357 val get_decoded_word : s_extended_token -> string
 358 val get_charset : s_extended_token -> string
 359     (* Return the decoded word (the contents of the word after decoding the
 360      * "Q" or "B" representation), and the character set of the decoded word
 361      * (uppercase).
 362      * These functions not only work for EncodedWord:
 363      * - Atom: Returns the atom without decoding it
 364      * - QString: Returns the characters inside the double quotes, and
 365      *   decodes any quoted pairs (backslash + character)
 366      * - Control: Returns the one-character string
 367      * - Special: Returns the one-character string
 368      * - DomainLiteral: Returns the characters inside the brackets, and
 369      *   decodes any quoted pairs
 370      * - Comment: Returns ""
 371      * The character set is "US-ASCII" for these tokens.
 372      *)
 373
 374 val get_pos : s_extended_token -> int
 375     (* Return the byte position where the token starts in the string
 376      * (the first byte has position 0)
 377      *)
 378
 379 val get_line : s_extended_token -> int
 380     (* Return the line number where the token starts (numbering begins
 381      * usually with 1)
 382      *)
 383
 384 val get_column : s_extended_token -> int
 385     (* Return the column of the line where the token starts (first column
 386      * is number 0)
 387      *)
 388
 389 val get_length : s_extended_token -> int
 390     (* Return the length of the token in bytes *)
 391
 392 val separates_adjacent_encoded_words : s_extended_token -> bool
 393     (* True iff the current token is white space (Special ' ', Special '\t',
 394      * Special '\r' or Special '\n') and the last non-white space token
 395      * was EncodedWord and the next non-white space token will be
 396      * EncodedWord.
 397      * Such spaces do not count and must be ignored by any application.
 398      *)
 399
 400
 401 type mime_scanner
 402
 403 val create_mime_scanner :
 404       specials:char list ->
 405       scan_options:s_option list ->
 406       ?pos:int ->
 407       ?line:int ->
 408       ?column:int ->
 409       string ->
 410         mime_scanner
 411     (* Creates a new mime_scanner scanning the passed string.
 412      * specials: The list of characters recognized as special characters.
 413      * scan_options: The list of global options modifying the behaviour
 414      *   of the scanner
 415      * pos: The position of the byte where the scanner starts in the
 416      *   passed string. Defaults to 0.
 417      * line: The line number of this byte. Defaults to 1.
 418      * column: The column number of this byte. Default to 0.
 419      *
 420      * The optional parameters pos, line, column are intentionally after
 421      * scan_options and before the string argument, so you can specify
 422      * scanners by partially applying arguments to create_mime_scanner
 423      * which are not yet connected with a particular string:
 424      * let my_scanner_spec = create_mime_scanner my_specials my_options in
 425      * ...
 426      * let my_scanner = my_scanner_spec my_string in
 427      * ...
 428      *)
 429
 430 val get_pos_of_scanner : mime_scanner -> int
 431 val get_line_of_scanner : mime_scanner -> int
 432 val get_column_of_scanner : mime_scanner -> int
 433     (* Return the current position, line, and column of a mime_scanner.
 434      * The primary purpose of these functions is to simplify switching
 435      * from one mime_scanner to another within a string:
 436      *
 437      * let scanner1 = create_mime_scanner ... s in
 438      * ... now scanning some tokens from s using scanner1 ...
 439      * let scanner2 = create_mime_scanner ...
 440      *                  ?pos:(get_pos_of_scanner scanner1)
 441      *                  ?line:(get_line_of_scanner scanner1)
 442      *                  ?column:(get_column_of_scanner scanner1)
 443      *                  s in
 444      * ... scanning more tokens from s using scanner2 ...
 445      *
 446      * RESTRICTION: These functions are not available if the option
 447      * Recognize_encoded_words is on. The reason is that this option
 448      * enables look-ahead scanning; please use the location of the last
 449      * scanned token instead.
 450      * It is currently not clear whether a better implementation is needed
 451      * (costs a bit more time).
 452      *
 453      * Note: To improve the performance of switching, it is recommended to
 454      * create scanner specs in advance (see the example my_scanner_spec
 455      * above).
 456      *)
 457
 458 val scan_token : mime_scanner -> (s_extended_token * s_token)
 459     (* Returns the next token, or End if there is no more token. *)
 460
 461 val scan_token_list : mime_scanner -> (s_extended_token * s_token) list
 462     (* Returns all following tokens as a list (excluding End) *)
 463
 464 val scan_structured_value : string -> char list -> s_option list -> s_token list
 465     (* This function is included for backwards compatibility, and for all
 466      * cases not requiring extended tokens.
 467      *
 468      * It scans the passed string according to the list of special characters
 469      * and the list of options, and returns the list of all tokens.
 470      *)
 471
 472 val specials_rfc822 : char list
 473 val specials_rfc2045 : char list
 474     (* The sets of special characters defined by the RFCs 822 and 2045.
 475      *
 476      * CHANGE in netstring-0.9: '[' and ']' are no longer special because
 477      * there is now support for domain literals.
 478      * '?' and '=' are not special in the rfc2045 version because there is
 479      * already support for encoded words.
 480      *)
 481
 482
 483 (**********************************************************************)
 484
 485 (* Widely used scanners: *)
 486
 487
 488 val scan_encoded_text_value : string -> s_extended_token list
 489     (* Scans a "text" value. The returned token list contains only
 490      * Special, Atom and EncodedWord tokens.
 491      * Spaces, TABs, CRs, LFs are returned unless
 492      * they occur between adjacent encoded words in which case
 493      * they are ignored.
 494      *)
 495
 496
 497 val scan_value_with_parameters : string -> s_option list ->
 498                                    (string * (string * string) list)
 499     (* let name, params = scan_value_with_parameters s options:
 500      * Scans phrases like
 501      *    name ; p1=v1 ; p2=v2 ; ...
 502      * The scan is done with the set of special characters [';', '='].
 503      *)
 504
 505 val scan_mime_type : string -> s_option list ->
 506                        (string * (string * string) list)
 507     (* let name, params = scan_mime_type s options:
 508      * Scans MIME types like
 509      *    text/plain; charset=iso-8859-1
 510      * The name of the type and the names of the parameters are converted
 511      * to lower case.
 512      *)
 513
 514
 515 (**********************************************************************)
 516
 517 (* Scanners for MIME bodies *)
 518
 519 val scan_multipart_body : string -> start_pos:int -> end_pos:int ->
 520                             boundary:string ->
 521                             ((string * string) list * string) list
 522     (* let [params1, value1; params2, value2; ...]
 523      *   = scan_multipart_body s i0 i1 b
 524      *
 525      * Scans the string s that is the body of a multipart message.
 526      * The multipart message begins at position i0 in s and i1 the position
 527      * of the character following the message. In b the boundary string
 528      * must be passed (this is the "boundary" parameter of the multipart
 529      * MIME type, e.g. multipart/mixed;boundary="some string" ).
 530      *     The return value is the list of the parts, where each part
 531      * is returned as pair (params, value). The left component params
 532      * is the list of name/value pairs of the header of the part. The
 533      * right component is the RAW content of the part, i.e. if the part
 534      * is encoded ("content-transfer-encoding"), the content is returned
 535      * in the encoded representation. The caller must himself decode
 536      * the content.
 537      *     The material before the first boundary and after the last
 538      * boundary is not returned.
 539      *
 540      * MULTIPART MESSAGES
 541      *
 542      * The MIME standard defines a way to group several message parts to
 543      * a larger message (for E-Mails this technique is known as "attaching"
 544      * files to messages); these are the so-called multipart messages.
 545      * Such messages are recognized by the major type string "multipart",
 546      * e.g. multipart/mixed or multipart/form-data. Multipart types MUST
 547      * have a boundary parameter because boundaries are essential for the
 548      * representation.
 549      *    Multipart messages have a format like
 550      *
 551      * ...Header...
 552      * Content-type: multipart/xyz; boundary="abc"
 553      * ...Header...
 554      *
 555      * Body begins here ("prologue")
 556      * --abc
 557      * ...Header part 1...
 558      *
 559      * ...Body part 1...
 560      * --abc
 561      * ...Header part 2...
 562      *
 563      *
 564      * ...Body part 2
 565      * --abc
 566      * ...
 567      * --abc--
 568      * Epilogue
 569      *
 570      * The parts are separated by boundary lines which begin with "--" and
 571      * the string passed as boundary parameter. (Note that there may follow
 572      * arbitrary text on boundary lines after "--abc".) The boundary is
 573      * chosen such that it does not occur as prefix of any line of the
 574      * inner parts of the message.
 575      *     The parts are again MIME messages, with header and body. Note
 576      * that it is explicitely allowed that the parts are even multipart
 577      * messages.
 578      *     The texts before the first boundary and after the last boundary
 579      * are ignored.
 580      *     Note that multipart messages as a whole MUST NOT be encoded.
 581      * Only the PARTS of the messages may be encoded (if they are not
 582      * multipart messages themselves).
 583      *
 584      * Please read RFC 2046 if want to know the gory details of this
 585      * brain-dead format.
 586      *)
 587
 588 val scan_multipart_body_and_decode : string -> start_pos:int -> end_pos:int ->
 589                                         boundary:string ->
 590                                         ((string * string) list * string) list
 591     (* Same as scan_multipart_body, but decodes the bodies of the parts
 592      * if they are encoded using the methods "base64" or "quoted printable".
 593      * Fails, if an unknown encoding is used.
 594      *)
 595
 596 val scan_multipart_body_from_netstream
 597     : Netstream.t ->
 598       boundary:string ->
 599       create:((string * string) list -> 'a) ->
 600       add:('a -> Netstream.t -> int -> int -> unit) ->
 601       stop:('a -> unit) ->
 602       unit
 603     (* scan_multipart_body_from_netstream s b create add stop:
 604      *
 605      * Reads the MIME message from the netstream s block by block. The
 606      * parts are delimited by the boundary b.
 607      *
 608      * Once a new part is detected and begins, the function 'create' is
 609      * called with the MIME header as argument. The result p of this function
 610      * may be of any type.
 611      *
 612      * For every chunk of the part that is being read, the function 'add'
 613      * is invoked: add p s k n.
 614      * Here, p is the value returned by the 'create' invocation for the
 615      * current part. s is the netstream. The current window of s contains
 616      * the read chunk completely; the chunk begins at position k of the
 617      * window (relative to the beginning of the window) and has a length
 618      * of n bytes.
 619      *
 620      * When the part has been fully read, the function 'stop' is
 621      * called with p as argument.
 622      *
 623      * That means, for every part the following is executed:
 624      * - let p = create h
 625      * - add p s k1 n1
 626      * - add p s k2 n2
 627      * - ...
 628      * - add p s kN nN
 629      * - stop p
 630      *
 631      * IMPORTANT PRECONDITION:
 632      * - The block size of the netstream s must be at least
 633      *   String.length b + 3
 634      *
 635      * EXCEPTIONS:
 636      * - Exceptions can happen because of ill-formed input, and within
 637      *   the callbacks of the functions 'create', 'add', 'stop'.
 638      * - If the exception happens while part p is being read, and the
 639      *   'create' function has already been called (successfully), the
 640      *   'stop' function is also called (you have the chance to close files).
 641      *)
 642
 643
 644 (* THREAD-SAFETY:
 645  * The functions are thread-safe as long as the threads do not share
 646  * values.
 647  *)
 648
 649 (* ======================================================================
 650  * History:
 651  *
 652  * $Log$
 653  * Revision 1.1  2000/11/17 09:57:27  lpadovan
 654  * Initial revision
 655  *
 656  * Revision 1.8  2000/08/13 00:04:36  gerd
 657  *      Encoded_word -> EncodedWord
 658  *      Bugfixes.
 659  *
 660  * Revision 1.7  2000/08/07 00:25:00  gerd
 661  *      Major update of the interface for structured field lexing.
 662  *
 663  * Revision 1.6  2000/06/25 22:34:43  gerd
 664  *      Added labels to arguments.
 665  *
 666  * Revision 1.5  2000/06/25 21:15:48  gerd
 667  *      Checked thread-safety.
 668  *
 669  * Revision 1.4  2000/05/16 22:29:12  gerd
 670  *      New "option" arguments specifying the level of MIME
 671  * compatibility.
 672  *
 673  * Revision 1.3  2000/04/15 13:09:01  gerd
 674  *      Implemented uploads to temporary files.
 675  *
 676  * Revision 1.2  2000/03/02 01:15:30  gerd
 677  *      Updated.
 678  *
 679  * Revision 1.1  2000/02/25 15:21:12  gerd
 680  *      Initial revision.
 681  *
 682  *
 683  *)