5 makeunidef.pl - Generates Unicode data files for B<ucs.sty>.
9 makeunidef.pl [B<--database>=I<file>] [B<--targetdir>=I<dir>]
10 [B<--verbose>] [B<--help>] [B<--comments>] [B<--nocomments>]
11 [B<--compress>] [B<--nocompress>] [B<--exclude>=I<option>]
16 Generate the F<uni-....def>-files for use by B<ucs.sty>. These are
17 generated out of one ore more config files (with suffix C<.gz> if
18 gzipped), whose format is described below (see L<"CONFIG FILES">).
20 For some characters there will be autogenerated code, if none is
21 supplied by the config files, this code will be associated with the
22 Unicode option C<autogenerated>.
28 =item B<--comments>, B<--nocomments>
30 Enable or disable the automatic generation of comments.
31 Defaults to B<--comments>.
33 =item B<--compress>, B<--nocompress>
34 Enable or disable the compression of uninames.dat.
35 Defaults to B<--compress>.
37 =item B<-d>, B<--db>, B<--database>=I<file>
39 Specify the Unicode database, as provided by the Unicode
40 Consortium. Defaults to F<UnicodeData.txt>. The file must have the
41 suffix C<.gz> if it is gzipped.
47 =item B<--exclude>, B<--ex>=I<option>
49 Do not include the characters associated with option I<option>. This
50 is e.g. useful for saving space by not including all CJK
51 characters. When an excluded option is used in documents, an error is
54 =item B<-h>, B<--help>
66 =item B<-t>, B<--dir>, B<--target>, B<--targetdir>=I<dir>
68 Sets the target directory for the generated files to I<dir>. The default is
69 the current directory.
71 =item B<-v>, B<--verbose>
79 The config files (which may be gzipped) are to be written in a line
80 oriented format. C<#> starts a comment which end at the end of the
81 actual line. The C<#> must be preceded by a whitespace, except for
82 lines consisting only of a comment. An empty line or a line containing
83 only a comment is ignored.
85 A line can be a command or a character definition.
86 For possible commands see L<"CONFIG COMMANDS">.
88 definition has the following syntax:
92 where I<code> is the code position and I<macro> a LaTeX-macro to
93 render the glyph. I<macro> is stripped of surrounding whitespaces, and
94 I<code> has one of the following formats (all case insensitive):
98 =item U+I<num>, 0xI<num>, $I<num>
100 I<num> being the hexadecimal representation of the code position.
104 I<num> being the decimal representation of the code position, whereby
105 I<num> must not start with 0.
109 I<num> being the octal representation of the code position.
113 In all the above formats, I<num> can have any number of digits.
115 =head2 CONFIG COMMANDS
117 A command line consists of an command name and optional space
118 separated command arguments. The following commands are defined:
124 Same as B<OPTION>, by the second argument specifies a LaTeX
125 package. If this package is loaded, the option is set automatically.
127 This option is deprecated.
131 This starts a group. All options set after this are only valid up to
132 the matching B<END> or B<}>. Config files, which set options, should
133 be enclosed in a block, so that they can't have side effects on other
134 parts of the config file. Each config file is contained in an implicit
135 block. Blocks can be nested.
143 All macros are wrapped inside the envelope given by this command. The
144 original macro is inserted into the envelope instead of every
145 occurrence of C<@@@> and the decimal character code is inserted for
146 each occurrence of C<$$$>. B<ENVELOPE> will override B<ENVELOPE>
147 commands executed before.
151 Inserts some code into F<uni-global.def>. This file is loaded with
152 F<ucs.sty> in the preamble. F<uni-global.def> is not executed in a group,
153 so defining globally is not necessary. See also B<PROVIDE>.
157 This command takes one argument. It is the name of the Unicode option
158 to associate with the macros defined after this command. An option set
159 by B<OPTION> is valid until it is overridden by another B<OPTION>
160 command or until the block ends.
164 Associates a piece of code with characters defined afterwards (until
165 the block ends). The code is inserted into every B<uni-....def>-file
166 containing one of these characters, so the code is guaranteed to have
167 been executed before the corresponding characters are executed. No
168 assumption should be made whether the code is executed in a group or
169 at top level, i.e. declarations should be made globally, but no
170 garbage should be defined or catcodes modified without restoring
171 them afterwards. The code may be executed several times. See also B<GLOBAL>.
175 =head1 KNOWN BUGS AND PROBLEMS
177 There should be a better way to quote whitespaces in command arguments
178 and to insert C<#>s. Perhaps I will add this in future, thereby trying
179 to preserve backward compatibility.
181 =head1 REPORTING BUGS
183 Send bug reports to Dominique Unruh <I<dominique@unruh.de>>, the mails
184 containing the words B<bug report: makeunidef.pl> in the subject.
188 Dominique Unruh <I<dominique@unruh.de>>.
194 =item F<uni-I<N>.def>, F<uni-global.def>, F<uninames.dat>
196 Unicode definitions for B<ucs.sty>, created by this program, I<N>
197 stands for different decimal numbers.
199 =item F<UnicodeData.txt>
201 Unicode database by the Unicode Consortium, read only. This file can be
202 found at F<http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>
208 The LaTeX package B<ucs.sty>.
223 sub dumphuffman($$$); #{}
226 use vars qw/$createcomments $targetdir @configfiles @characters
227 %files @unidata $unidata $messagebreak $dump_unidata
228 $verbose $generate_uninames $tables_to_dump $tabledir $autogen
229 %knownoptions $loadunidata %stats_charsperoption
230 %providehandlers $providehandlerseq %provides $huffman_decoder
231 %file_attribs $compressnames %excludedoptions %uninames_abbrev @ranges
232 $onlyfile $fileswritten $devnull
235 $messagebreak = '\MessageBreak ';
236 $autogen = 'autogenerated by makeunidef.pl';
240 # "\000" => '{Character available with following options:'.
241 # '\MessageBreak\space\space\space}',
242 "\001" => '{Unicode character \number\uc@got\uc@spc = U+\uc@temp@a:'.
244 # "\002" => '{No name found}',
245 "\003" => '{Character available with following excluded options:'.
246 '\MessageBreak\space\space\space}',
247 "\n" => '\MessageBreak',
253 return unless $dump_unidata;
254 my $filename = "$targetdir/uni-global.def";
255 my $file = openfile($filename,
256 "%%% unicode global data for ucs.sty, $autogen",
257 "Unicode global data");
258 print $file "$code\n";
261 sub getprovidehandler($) {
263 return undef unless defined $code;
264 my $handler = $providehandlers{$code};
265 unless (defined $handler) {
266 # print "NEW HANDLER: '$code'\n";
267 $handler = $providehandlerseq++;
268 $providehandlers{$code} = $handler;
270 $provides{$handler} = { code => $code };
274 sub openreadfile($) {
275 my $filename = shift;
276 $filename .= ".gz" if (!-e $filename && -e "$filename.gz");
277 if ($filename =~ /\.gz/) {
278 die "Could not read file $filename" unless -r $filename;
279 return new IO::File("gzip -cdf ''\Q$filename\E |");
281 return new IO::File($filename,"r");
287 return 'default' unless define($o) ne '';
293 my $fh = openreadfile $filename or
294 die "Could not open configfile $filename: $!";
297 while (my $line = <$fh>) {
299 my $linepos = "$filename:$.";
300 $line =~ s/(^|\s)\#.*$//;
304 if ($line =~ /^[0-9]/ || $line =~ /^u\+/i) {
305 my ($num,$command) = split ' ',$line,2;
306 $num = oct($num) if $num =~ /^0/;
307 $num = hex($num) if $num =~ s/^\$//;
308 $num = hex($num) if $num =~ s/^u\+//i;
309 #print "Line: $line, Num: $num, Command: $command\n";
311 $o{definedat} = $linepos;
312 $command = '' if lc($command) eq '<empty>';
313 unless (defined ($command)) {
314 warn "$linepos: Use <empty> to declare an empty ".
315 "glyph macro.\n"; next; };
317 if (($command =~ /(^|[^\#])\#1/) ||
318 (defined $o{envelope} && $o{envelope} =~ /(^|[^\#])\#1/));
319 if (isprivate($num) &&
320 optionname($o{onoption}) !~ /^(private|local)./) {
321 warn sprintf "%s: Character U+%04X is in private area, ".
322 "but has option '%s' (not 'private...' or 'local...').\n",
323 $linepos,$num,optionname($o{onoption});
325 push @{$characters[$num]}, [ $command, \%o ];
326 my $range = findrange2($num);
327 if (defined $range) { ${$$range{options}}{optionname($o{onoption})} =1};
328 if ($unidata[$num]) { $ {$ {$unidata[$num]}{options}}{optionname($o{onoption})} =1};
329 $stats_charsperoption{define($o{onoption})}++;
331 my @cmd = split ' ',$line;
332 if ($cmd[0] eq 'BEGIN' || $cmd[0] eq '{') {
334 push @optionstack, \%t;
335 } elsif ($cmd[0] eq 'END' || $cmd[0] eq '}') {
337 my $t = pop @optionstack;
340 warn "$linepos: '$cmd[0]' outside a block.\n";
342 } elsif ($cmd[0] eq 'OPTION') {
343 $options{onoption} = $cmd[1];
344 unless (defined $knownoptions{$cmd[1]}) {
345 $knownoptions{$cmd[1]} = [];
347 } elsif ($cmd[0] eq 'AUTOOPTION') {
348 $options{onoption} = $cmd[1];
349 unless (defined $knownoptions{$cmd[1]}) {
350 $knownoptions{$cmd[1]} = [ $cmd[2] ];
352 # ************ DOUBLED
353 push @{$knownoptions{$cmd[1]}}, $cmd[2];
355 } elsif ($cmd[0] eq 'ENVELOPE') {
356 $options{envelope} = join ' ',@cmd[1..$#cmd];
357 # } elsif ($cmd[0] eq 'GLOBAL') {
358 # globalcode(join ' ',@cmd[1..$#cmd]);
359 } elsif (($cmd[0] eq 'PROVIDE') || ($cmd[0] eq 'GLOBAL')) {
361 if ($cmd[1] eq 'MULTILINE') {
363 my $xlinepos = $linepos;
364 while (!$e && defined($l = <$fh>)) {
365 if ($l =~ /^\s*END\s*$/) { $e = 1 }
366 else { $prov .= $l; }
369 warn "$xlinepos: Unfinished PROVIDE MULTILINE.\n";
373 $prov = join ' ',@cmd[1..$#cmd];
375 chomp $prov; $prov =~ s/%$//s;
376 if ($cmd[0] eq 'PROVIDE') {
377 $options{provide} = getprovidehandler($prov);
380 } elsif ($cmd[0] eq 'FONTENC') {
381 $options{fontenc} = $cmd[1];
382 } elsif ($cmd[0] eq 'FONTFAMILY') {
383 $options{fontfamily} = $cmd[1];
384 } elsif ($cmd[0] eq 'RIGHTLEFT') {
385 $options{rightleft} = 1;
386 } elsif ($cmd[0] eq 'TABLECODE') {
387 $options{tablecode} = join ' ',@cmd[1..$#cmd];
388 } elsif ($cmd[0] eq 'CTRLGLYPH') {
389 $options{ctrlglyph} = 1;
390 } elsif ($cmd[0] eq 'COMBINECHAR') {
391 if ($cmd[1] =~ /^U\+[0-9A-F]+$/i) {
392 my ($n) = ($cmd[1] =~ /^U\+([0-9A-F]+)$/i);
393 $options{combinechar} = hex($n);
394 delete $options{combineglyph};
395 delete $options{combineoption};
396 } elsif ($cmd[1] =~ /^[a-z]+\/U\+[0-9A-F]+$/i) {
397 my ($o,$n) = ($cmd[1] =~ /^([a-z]+)\/U\+([0-9A-F]+)$/i);
398 $options{combinechar} = hex($n);
399 $options{combineoption} = $o;
400 delete $options{combineglyph};
402 $options{combineglyph} = $cmd[1];
403 delete $options{combinechar};
404 delete $options{combineoption};
406 } elsif ($cmd[0] eq 'LOADFONTENC') {
407 $options{loadfontenc} = $cmd[1];
408 } elsif ($cmd[0] eq 'PACKAGE') {
409 $options{package} = $cmd[1];
410 } elsif ($cmd[0] eq 'TABLEGLYPH') {
411 $options{tableglyph} = join ' ',@cmd[1..$#cmd];
412 } elsif ($cmd[0] eq 'TABLEENVELOPE') {
413 $options{tableenvelope} = join ' ',@cmd[1..$#cmd];
415 warn "$linepos: Malformed line $line";
419 close $fh or die "Something went wrong when closing $filename: $!";
422 sub openfile ($$$%) {
423 my ($filename,$header,$description,%attribs) = @_;
424 my $file = $files{$filename};
425 unless (defined $file) {
427 $nowrite = 1 if defined($onlyfile) && $filename !~ m@(^|/)$onlyfile$@;
428 $fileswritten ++ unless $nowrite;
429 $attribs{nowrite} = 1 if $nowrite;
431 $file = new IO::File($filename,O_CREAT|O_WRONLY|O_EXCL) or
432 die "Could not open $filename for writing: $!";
434 print "Not writing file $filename\n" if $verbose;
435 $devnull = new IO::File('/dev/null',O_WRONLY) unless ($devnull);
437 print $file "$header\n";
438 my @date = localtime;
439 my $date = sprintf "%04d/%02d/%02d",
440 $date[5]+1900, $date[4]+1, $date[3];
441 my $basename = $filename; $basename =~ s@.*/@@;
442 print $file "\\ifx\\ProvidesFile\\undefined\\else
443 \\ProvidesFile{$basename}[$date UCS: $description]%
447 $files{$filename} = $file;
448 $file_attribs{$filename} = \%attribs;
453 sub dumpcharacters() {
454 for (my $i=0; $i<=$#characters; $i++) {
455 next unless defined $characters[$i];
457 for my $j (@{$characters[$i]}) {
459 my $options = $$j[1];
460 my $envelope = $$options{envelope};
461 if (define($envelope) ne '') {
464 $val =~ s/\$\$\$/$i/g;
465 $val =~ s/\@\@\@/$t/g;
467 $val = "\\uc\@cmb$val" if ($$options{combining});
468 my $comment = define($$options{comment});
469 my $onoption = $$options{onoption};
470 next if $excludedoptions{(defined $onoption)?$onoption:'default'};
471 my $page = int($i/256);
472 my $filename = "$targetdir/uni-$page.def";
473 #print "Character $i ($filename):\n";
474 my $range = sprintf("U+%04X..U+%04X",
475 $page*256,$page*256+255);
476 my $provide = $$options{provide};
477 my $header = "%%% Unicode to TeX mapping, file uni-$page.def, ".
478 "$range, autogenerated by makeunidef.pl";
479 my $file = openfile($filename,$header,"Unicode data $range");
481 if (defined $provide) {
482 $provide = $provides{$provide};
483 unless (defined $$provide{"done:$filename"}) {
484 $$provide{"done:$filename"} = 1;
485 print $file "$$provide{code}%\n";
486 #print "Providing $$provide{code} to $filename\n";
492 $texcmd = "\\uc\@dclc{$i}{".
493 (defined $onoption?$onoption:'default').
496 $texcmd .= "% $comment";
497 $texcmd .= " (missing)" unless defined $val;
499 print $file "$texcmd\n";
500 my $tag = define($onoption);
501 if (defined $seen{$tag}) {
503 "%s: Character U+%04X(%s) redefined. ".
504 "(First definition in %s).\n",
505 $$options{definedat},$i,optionname($onoption),$seen{$tag};
507 $seen{$tag} = define($$options{definedat});
517 my $f = openreadfile $unidata or
518 die "Could not open $unidata for reading: $!";
521 while (my $line = <$f>) {
523 my @line = split ';', $line;
525 $line{num} = hex($line[0]);
526 $line{name} = $line[1];
527 $line{category} = $line[2];
528 $line{combining} = $line[3];
529 $line{flags} = $line[4];
530 my @compose = split ' ',$line[5];
531 my $compflag = undef;
532 if (defined $compose[0] && $compose[0] =~ /^</) {
533 $compflag = shift @compose;
537 $line{compose} = \@compose;
538 $line{compflag} = $compflag;
539 $line{decdigit} = $line[6];
540 $line{digit} = $line[7];
541 $line{numerical} = $line[8];
542 $line{bracket} = $line[9];
543 $line{alias} = $line[10];
544 $line{comment} = $line[11];
545 $line{upcase} = $line[12];
546 $line{downcase} = $line[13];
547 $line{titlecase} = $line[14];
548 $line{case3} = $line[15];
550 my ($rangename,$rangedir) =
551 ($line{name} =~ /^\<(.*), (First|Last)\>$/);
552 if (defined $rangename) {
553 if ($rangedir eq 'First') {
554 if (defined $inrange) {
555 warn "$filename:$.: Range '$rangename' started while ".
556 "in range '$inrange'.\n";
558 #warn "Starting: '$rangename'";
559 $inrange = $rangename;
560 $rangestart = $line{num};
562 } elsif ($rangedir eq 'Last') {
563 if (defined $inrange && $rangename eq $inrange) {
564 #warn "Ending: '$inrange'";
565 $line{rangestart} = $rangestart;
566 $line{rangeend} = $line{num};
567 # $line{num} = $line{rangestart};
568 $line{name} = "Contained in range '$rangename'";
569 #printf "Range found: U+%04X..U+%04X '%s'\n",
570 #$rangestart, $line{num}, $rangename if $verbose;
571 } elsif (defined $inrange) {
572 warn "$filename:$.: ".
573 "Range '$inrange' ended by '$rangename'";
577 die "Internal error";
580 if (defined $inrange) {
581 warn "$filename:$.: ".
582 "Range '$inrange' not ended on next line.\n";
586 $unidata[$line{num}] = \%line;
587 push @ranges, \%line if $line{rangestart};
589 close $f or die "Something went wrong when closing $filename: $!";
592 sub generate_globals() {
593 while (my ($option,$pkgs) = each %knownoptions) {
594 if (!$excludedoptions{$option}) {
597 globalcode("\\DeclareUnicodeOption[$pkg]{$option}\%");
599 globalcode("\\DeclareUnicodeOption{$option}\%");
600 for my $pkg (@$pkgs) {
601 globalcode("\\LinkUnicodeOptionToPkg{$option}{$pkg}\%");
606 ("\\XDeclareUnicodeOption{}{$option}{}{".
607 "\\PackageError{ucs}{Option $option has not been generated}".
608 "{run makeunidef.pl without --exclude $option.}}{}");
616 return $str if defined $str;
622 return $str if defined $str;
628 for my $r (@ranges) {
629 if ($i>=$$r{rangestart} && $i<=$$r{rangeend}) {
638 for my $r (@ranges) {
639 if ($i>=$$r{rangestart} && $i<=$$r{rangeend}) {
647 for (my $i=0; $i<=$#unidata; $i++) {
648 my $char = $unidata[$i];
650 next unless defined $char;
652 next if (defined $characters[$i]);
653 if (@{$$char{compose}}) {
655 for my $comp (@{$$char{compose}}) {
656 $success = 0 unless defined $characters[hex($comp)];
659 my $compflag = $$char{compflag};
660 my $compmode = 'none';
661 if (define($compflag) eq '') { $compmode = 'normal' }
662 elsif ($compflag eq 'fraction') { $compmode = 'normal' }
663 elsif ($compflag eq 'compat') { $compmode = 'normal' }
664 elsif ($compflag eq 'noBreak') { $compmode = 'normal' }
665 elsif ($compflag eq 'super') { $compmode = 'normal.super' }
666 elsif ($compflag eq 'sub') { $compmode = 'normal.sub' }
667 elsif ($compflag eq 'font') { $compmode = 'normal' }
668 elsif ($compflag eq 'circle') { $compmode = 'normal.circle' }
669 elsif ($compflag eq 'square') { $compmode = 'normal.square' }
670 elsif ($compflag eq 'wide') { $compmode = 'normal.wide' }
671 elsif ($compflag eq 'narrow') { $compmode = 'normal.narrow' }
672 elsif ($compflag eq 'vertical') { $compmode = 'normal.vertical' }
673 elsif ($compflag eq 'small') { $compmode = 'normal.small' }
674 else { warn "Unknown composition flag \<$compflag\> ".
675 "in character ".sprintf("%04X",$$char{num}); };
676 if ($compmode =~ /^normal/) {
677 $tex = join '', map {
679 "\\unichar{$a}" } @{$$char{compose}};
680 if ($compmode eq 'normal.super') {
681 $tex = "\\unicodesuper{$tex}";
682 } elsif ($compmode eq 'normal.sub') {
683 $tex = "\\unicodesub{$tex}";
684 } elsif ($compmode eq 'normal.square') {
685 $tex = "\\unicodesquare{$tex}";
686 } elsif ($compmode eq 'normal.circle') {
687 $tex = "\\unicodecircle{$tex}";
688 } elsif ($compmode eq 'normal.wide') {
689 $tex = "\\unicodewide{$tex}";
690 } elsif ($compmode eq 'normal.narrow') {
691 $tex = "\\unicodenarrow{$tex}";
692 } elsif ($compmode eq 'normal.vertical') {
693 $tex = "\\unicodevertical{$tex}";
694 } elsif ($compmode eq 'normal.small') {
695 $tex = "\\unicodesmall{$tex}";
697 } elsif ($compmode eq 'none') {
699 die "Unknown composition mode $compmode. INTERNAL ERROR";
702 if (!$success && $$char{numerical} ne '') {
703 #print "NUMERICAL $i\n";
704 $tex = $$char{numerical};
706 #print "C: $$char{compflag}\n" if defined $$char{compflag};
709 push @{$characters[$i]},
710 [ "$tex", #"\\dirtyunicode{$i}{$tex}"
712 onoption => 'autogenerated' } ];
713 unless (defined $knownoptions{autogenerated}) {
714 $knownoptions{autogenerated} = [];
721 sub initcharacters() {
722 # for (my $i=0; $i<128; $i++) {
723 # push @{$characters[$i]}, [ "\\char$i\\relax", { comment => 'ASCII' } ];
728 for (my $i=0; $i<$#unidata; $i++) {
729 next unless defined $unidata[$i];
730 push @{$characters[$i]}, [] unless defined $characters[$i];
731 for my $char (@{$characters[$i]}) {
732 my $options = $$char[1];
733 next if defined $$options{comment};
734 my $onoption = $$options{onoption};
735 #print "CHAR: $char\n";
736 #print "CHAR2: $$char[1]\n";
737 my $utf8 = utf16toutf8($i);
739 $comment .= "OPTION: $onoption, " if ($onoption);
740 $comment .= sprintf "0x%04X = %d (%s) - %s",
741 $i ,$i, $utf8, $ {$unidata[$i]}{name};
742 $$options{comment} = $comment;
743 $$char[1] = $options;
750 if ($char<0x80) { return $char; }
752 return chr(0xc0+int($char/64)).chr(0x80+($char%64)); }
753 return chr(0xe0+int($char/0x1000)).
754 chr(0x80+int(($char%0x1000)/64)).
755 chr(0x80+($char%64));
759 for my $file (values %files) {
765 for my $filename (keys %files) {
766 my $file = $files{$filename};
767 my $attrib = $file_attribs{$filename};
768 next if $$attrib{nowrite};
770 $cs = $$attrib{coding} if defined $$attrib{coding};
777 close $file or die "Something went wrong when closing $filename: $!";
779 if (define0($fileswritten)==0) {
780 warn "No files written"; }
785 Generates uni-....def files for ucs.sty
788 --database <file> Set unicode database (default: UnicodeData.txt; short -d)
789 --nocomments Disable comments in generated files
790 --targetdir <dir> Set target directory (default: .; short: -t)
791 --verbose Be verbose (short: -v)
792 --help This page (short: -h)
793 <file(s)> Use this configfile(s)
795 Configfiles consists of lines, each like this
796 <unicode number> <LaTeX-Code>
797 where <unicode number> may be prefixed by 0x or 0 to denote hex or oct.
798 Lines beginning with # denote a comment.
804 $a = 'default' if define($a) eq '';
805 $b = 'default' if define($b) eq '';
810 my ($c,$option) = @_;
812 #print "getchar($c,$option)\n";
813 return undef unless $characters[$c];
814 for my $char (@{$characters[$c]}) {
815 my $options = $$char[1];
816 return $char if define($$options{onoption}) eq $option;
817 return $char if define($$options{onoption}) eq '' && $option eq 'default';
822 sub expand_tablespec($);
824 sub expand_tablespec($) {
829 $spec = '*:*' if $spec eq '*';
830 my @spec = ($spec =~ /^([a-z0-9]+|\*):(.*)$/i);
831 my $option = $spec[0];
832 my $range = $spec[1];
833 if (define($option) eq '') {
834 warn "Invalid table specifier '$spec'. Ignoring";
837 if ($option eq '*') {
838 push @specs, expand_tablespec("default:$range");
839 for my $o (sort keys %knownoptions) {
840 push @specs, expand_tablespec("$o:$range") unless $o eq 'default';
842 } elsif ($range eq '*') {
843 for (my $i=0; $i<=$#characters; $i+=$tablesize) {
844 for (my $j=0; $j<$tablesize; $j++) {
845 if (my $c = getchar($i+$j,$option)) {
846 my $options = $$c[1];
847 #next if $$options{dirty};
848 push @specs, expand_tablespec
849 (sprintf "%s:%04X-%04X",
850 $option,$i,$i+$tablesize-1);
856 warn "Unknown option '$option' in table spec '$spec'. Ignoring"
857 unless $option eq 'default' || defined $knownoptions{$option};
858 my @range = map { hex } split '-', $range, 2;
860 push @specs, { option => $option,
863 format => 'table', };
868 sub dump_charactertables() {
872 s/\s+$//; s/^\s+//; $_
873 } split /,/, $tables_to_dump;
874 for my $spec (@tables) {
875 dump_charactertable($spec);
880 sub default_entrydata($) {
884 'hexcode' => sprintf("%04X", $i),
885 'hexcodebeforelast' => sprintf("%1X", int($i%256/16)),
889 sub make_entrydata($$$$);
890 sub make_entrydata($$$$) {
891 my ($option,$i,$options,$hasctrlglyph) = @_;
892 my %entrydata = default_entrydata($i);
893 if (defined $$options{tableglyph}) {
894 warn sprintf "%s: Character %s/U+%04X has ".
895 "TABLEGLYPH specification.\n",
896 $$options{definedat},$option,$i;
898 if ($$options{ctrlglyph}) {
900 %entrydata = %{make_entrydata('ctrlglyphs',$i,$hasctrlglyph,0)};
901 $entrydata{ctrlglyph} = 1;
902 $entrydata{option} = 'ctrlglyphs';
905 warn sprintf "%s: Char U+%04X has no control ".
906 "glyph.\n",$$options{definedat},$i;
907 $entrydata{tableglyph} =
908 sprintf('{\tiny %04X}',$i);
911 $entrydata{dirty} = ($$options{dirty})?1:0;
912 $entrydata{rightleft} = ($$options{rightleft})?1:0;
913 $entrydata{combining} = ($$options{combining})?1:0;
914 $entrydata{fontfamily} = $$options{fontfamily}
915 if defined $$options{fontfamily};
916 $entrydata{combinechar} = $$options{combinechar}
917 if defined $$options{combinechar};
918 $entrydata{combineglyph} = $$options{combineglyph}
919 if defined $$options{combineglyph};
920 $entrydata{combineoption} = $$options{combineoption}
921 if defined $$options{combineoption};
922 $entrydata{fontenc} = $$options{fontenc}
923 if defined $$options{fontenc};
924 if (defined $$options{tableglyph}) {
925 warn sprintf "Add to ctrlglyphs.ucf:\nU+%04X\t%s\n",
926 $i,$$options{tableglyph}
927 unless $hasctrlglyph;
928 $entrydata{tableglyph} = $$options{tableglyph};
930 if (defined $$options{tableenvelope}) {
931 warn sprintf "%s: Character %s/u+%04x has ".
932 "TABLEENVELOPE specification.\n",
933 $$options{definedat},$option,$i;
934 $entrydata{tableenvelope} = $$options{tableenvelope};
935 $entrydata{tableenvelope} =~ s/\@\@\@/\#1/g;
936 $entrydata{"tableenvelope.optspec"} = '#1';
938 # $entrydata{fontenc} = 'T1'
939 # unless defined $entrydata{fontenc};
940 for my $x (qw/loadfontenc package tablecode/) {
941 $entrydata{"$x.option"} = $$options{$x}; }
945 sub dump_charactertable($$$$) {
947 my $option = $$spec{option};
948 my $start = $$spec{start};
949 my $end = $$spec{end};
950 my $format = $$spec{format};
952 my $hstart = sprintf "%04X", $start;
953 my $hend = sprintf "%04X", $end;
954 my $filename = "$tabledir/table-$option-$hstart-$hend.tex";
956 print "Generating $filename\n" if $verbose;
958 my $file = openfile("$tabledir/table-$option-$hstart-$hend.tex",
959 "%%% table for option $option, U+$hstart..U+$hend, $autogen",
960 "Unicode chart U+$hstart..U+$hend");
965 \\def\\gformat{$format}%
966 \\def\\goption{$option}%
967 \\def\\ghstart{$hstart}%
969 \\def\\gstart{$start}%
973 my $table = "\\tstart%\n";
977 for (my $i=$start; $i<=$end; $i++) {
979 @chars = @{$characters[$i]} if defined @{$characters[$i]};
980 my $entry = ($unidata[$i])?"\\noglyph":"\\unassigned";
982 my $hasctrlglyph = undef;
983 for my $char (@chars) {
984 my $options = $$char[1];
985 if (define($$char[0]) ne '' ||
986 define($$options{envelope}) ne '') {
987 if (optioneq($$options{onoption},'ctrlglyphs')) {
988 $hasctrlglyph = $options;
993 my $entrydata = {default_entrydata($i)};
995 for my $char (@chars) {
996 my $options = $$char[1];
997 if (define($$char[0]) ne '' ||
998 define($$options{envelope}) ne '') {
999 if (optioneq($$options{onoption},$option)) {
1001 $entrydata = make_entrydata($option,$i,$options,
1003 $fontencs{$$entrydata{fontenc}} = 1
1004 if defined $$entrydata{fontenc} &&
1005 $$entrydata{fontenc} ne 'T1';
1006 $fontencs{$$entrydata{'loadfontenc.option'}} = 1
1007 if defined $$entrydata{'loadfontenc.option'};
1008 if (defined $$entrydata{"package.option"}) {
1009 for my $p (split ';', $$entrydata{'package.option'}) {
1010 $packages{$p} = 1; }}
1011 $tablecode{$$entrydata{"tablecode.option"}} = 1
1012 if defined $$entrydata{"tablecode.option"};
1015 if (!$$options{dirty} || $entry eq '') {
1016 $entry = '\withoption';
1021 for my $i (keys %$entrydata) {
1023 my $os = define($$entrydata{"$i.optspec"});
1024 $table .= "\\gdef\\a$i$os\{$$entrydata{$i}\}%\n"; }
1025 my $comment = sprintf "%04X %s %s", $i, utf16toutf8($i),
1026 define($unidata[$i]->{name});
1027 $table .= "$entry\% $comment\n";
1029 $table .= "\\tend%\n";
1031 print $file "\\def\\gfontencs{",join(',',keys %fontencs,'T1'),"}%\n";
1032 for my $p (keys %packages) {
1033 $p = "{$p}" unless $p =~ /\}$/;
1034 print $file "\\usepackage$p%\n";
1036 for my $c (keys %tablecode) {
1037 print $file "$c%\n";
1045 #Getopt::Long::Configure(qw/bundling/);
1047 unless (GetOptions(\%opt,qw/config|configfile|c=s@
1056 targetdir|dir|t|target=s
1062 die "Bad command line options, try --help";
1068 unless ($opt{forceasterisk}) {
1069 @ARGV = grep { if (/\*/ && !-e $_) {
1070 print "Configfile $_ seems to be unmatched wildcard. Ignoring.\n"
1075 push @{$opt{config}}, @ARGV if @ARGV;
1076 die "You must supply at least one config file"
1077 unless defined $opt{config};
1078 $opt{exclude} = [] unless defined $opt{exclude};
1079 @{$opt{exclude}} = split /,/,join ',',@{$opt{exclude}};
1080 $opt{database} = 'UnicodeData.txt' unless defined $opt{database};
1081 $opt{comments} = 1 unless defined $opt{comments};
1082 $opt{names} = 1 unless defined $opt{names};
1083 $opt{data} = 1 unless defined $opt{data};
1084 $opt{targetdir} = '.' unless defined $opt{targetdir};
1085 $opt{loadunidata} = 1 unless defined $opt{loadunidata};
1086 $opt{tables} = '*' if $opt{tabledir} && !defined $opt{tables};
1087 $opt{tabledir} = '.' unless defined $opt{tabledir};
1088 $opt{compress} = 1 unless defined $opt{compress};
1089 @configfiles = @{$opt{config}};
1090 $unidata = $opt{database};
1091 $createcomments = $opt{comments};
1092 $targetdir = $opt{targetdir};
1093 $verbose = $opt{verbose};
1094 $generate_uninames = $opt{names};
1095 $tabledir = $opt{tabledir};
1096 $tables_to_dump = $opt{tables};
1097 $dump_unidata = $opt{data};
1098 $loadunidata = $opt{loadunidata};
1099 $compressnames = $opt{compress};
1100 $onlyfile = $opt{onlyfile};
1101 %excludedoptions = map { $_ => 1 } @{$opt{exclude}};
1106 my ($unt) = ($str =~ /^(.*)$/);
1111 my $fullname = shift;
1112 return 1 if defined($onlyfile) && $fullname !~ m@(^|/)$onlyfile$@;
1113 return 1 unless -e $fullname;
1114 my $f = new IO::File($fullname,O_RDONLY) or
1115 die "Could not open $fullname for reading: $!";
1118 unless ($line =~ /^\%\%\%.*autogenerated by makeunidef.pl/) {
1119 die "File $fullname was not generated by me, ".
1120 "will not overwrite it"; }
1121 unlink untaint($fullname) or
1122 die "Could not remove $fullname: $!";
1123 #print "FILE: $file\n";
1126 sub unlinkfiles($$) {
1127 my ($pattern,$dir) = @_;
1128 my $d = new IO::Handle;
1130 die "Could not open directory $targetdir: $!";
1131 while (my $file = readdir $d) {
1132 #print "F: $file\n";
1133 next unless $file =~ /$pattern/x;
1134 #print "UF: $file\n";
1135 unlinkfile("$dir/$file") or
1136 die "Could not unlink $dir/$file";
1140 $huffman_decoder = '\count255=128
1141 \loop\ifnum\count255<256\relax
1142 \catcode\count255=13
1143 \advance\count255by1\relax
1149 \def^^80{\0\0\0\0\0\0\0}%
1150 \def^^81{\0\0\0\0\0\0\1}%
1151 \def^^82{\0\0\0\0\0\1\0}%
1152 \def^^83{\0\0\0\0\0\1\1}%
1153 \def^^84{\0\0\0\0\1\0\0}%
1154 \def^^85{\0\0\0\0\1\0\1}%
1155 \def^^86{\0\0\0\0\1\1\0}%
1156 \def^^87{\0\0\0\0\1\1\1}%
1157 \def^^88{\0\0\0\1\0\0\0}%
1158 \def^^89{\0\0\0\1\0\0\1}%
1159 \def^^8a{\0\0\0\1\0\1\0}%
1160 \def^^8b{\0\0\0\1\0\1\1}%
1161 \def^^8c{\0\0\0\1\1\0\0}%
1162 \def^^8d{\0\0\0\1\1\0\1}%
1163 \def^^8e{\0\0\0\1\1\1\0}%
1164 \def^^8f{\0\0\0\1\1\1\1}%
1165 \def^^90{\0\0\1\0\0\0\0}%
1166 \def^^91{\0\0\1\0\0\0\1}%
1167 \def^^92{\0\0\1\0\0\1\0}%
1168 \def^^93{\0\0\1\0\0\1\1}%
1169 \def^^94{\0\0\1\0\1\0\0}%
1170 \def^^95{\0\0\1\0\1\0\1}%
1171 \def^^96{\0\0\1\0\1\1\0}%
1172 \def^^97{\0\0\1\0\1\1\1}%
1173 \def^^98{\0\0\1\1\0\0\0}%
1174 \def^^99{\0\0\1\1\0\0\1}%
1175 \def^^9a{\0\0\1\1\0\1\0}%
1176 \def^^9b{\0\0\1\1\0\1\1}%
1177 \def^^9c{\0\0\1\1\1\0\0}%
1178 \def^^9d{\0\0\1\1\1\0\1}%
1179 \def^^9e{\0\0\1\1\1\1\0}%
1180 \def^^9f{\0\0\1\1\1\1\1}%
1181 \def^^a0{\0\1\0\0\0\0\0}%
1182 \def^^a1{\0\1\0\0\0\0\1}%
1183 \def^^a2{\0\1\0\0\0\1\0}%
1184 \def^^a3{\0\1\0\0\0\1\1}%
1185 \def^^a4{\0\1\0\0\1\0\0}%
1186 \def^^a5{\0\1\0\0\1\0\1}%
1187 \def^^a6{\0\1\0\0\1\1\0}%
1188 \def^^a7{\0\1\0\0\1\1\1}%
1189 \def^^a8{\0\1\0\1\0\0\0}%
1190 \def^^a9{\0\1\0\1\0\0\1}%
1191 \def^^aa{\0\1\0\1\0\1\0}%
1192 \def^^ab{\0\1\0\1\0\1\1}%
1193 \def^^ac{\0\1\0\1\1\0\0}%
1194 \def^^ad{\0\1\0\1\1\0\1}%
1195 \def^^ae{\0\1\0\1\1\1\0}%
1196 \def^^af{\0\1\0\1\1\1\1}%
1197 \def^^b0{\0\1\1\0\0\0\0}%
1198 \def^^b1{\0\1\1\0\0\0\1}%
1199 \def^^b2{\0\1\1\0\0\1\0}%
1200 \def^^b3{\0\1\1\0\0\1\1}%
1201 \def^^b4{\0\1\1\0\1\0\0}%
1202 \def^^b5{\0\1\1\0\1\0\1}%
1203 \def^^b6{\0\1\1\0\1\1\0}%
1204 \def^^b7{\0\1\1\0\1\1\1}%
1205 \def^^b8{\0\1\1\1\0\0\0}%
1206 \def^^b9{\0\1\1\1\0\0\1}%
1207 \def^^ba{\0\1\1\1\0\1\0}%
1208 \def^^bb{\0\1\1\1\0\1\1}%
1209 \def^^bc{\0\1\1\1\1\0\0}%
1210 \def^^bd{\0\1\1\1\1\0\1}%
1211 \def^^be{\0\1\1\1\1\1\0}%
1212 \def^^bf{\0\1\1\1\1\1\1}%
1213 \def^^c0{\1\0\0\0\0\0\0}%
1214 \def^^c1{\1\0\0\0\0\0\1}%
1215 \def^^c2{\1\0\0\0\0\1\0}%
1216 \def^^c3{\1\0\0\0\0\1\1}%
1217 \def^^c4{\1\0\0\0\1\0\0}%
1218 \def^^c5{\1\0\0\0\1\0\1}%
1219 \def^^c6{\1\0\0\0\1\1\0}%
1220 \def^^c7{\1\0\0\0\1\1\1}%
1221 \def^^c8{\1\0\0\1\0\0\0}%
1222 \def^^c9{\1\0\0\1\0\0\1}%
1223 \def^^ca{\1\0\0\1\0\1\0}%
1224 \def^^cb{\1\0\0\1\0\1\1}%
1225 \def^^cc{\1\0\0\1\1\0\0}%
1226 \def^^cd{\1\0\0\1\1\0\1}%
1227 \def^^ce{\1\0\0\1\1\1\0}%
1228 \def^^cf{\1\0\0\1\1\1\1}%
1229 \def^^d0{\1\0\1\0\0\0\0}%
1230 \def^^d1{\1\0\1\0\0\0\1}%
1231 \def^^d2{\1\0\1\0\0\1\0}%
1232 \def^^d3{\1\0\1\0\0\1\1}%
1233 \def^^d4{\1\0\1\0\1\0\0}%
1234 \def^^d5{\1\0\1\0\1\0\1}%
1235 \def^^d6{\1\0\1\0\1\1\0}%
1236 \def^^d7{\1\0\1\0\1\1\1}%
1237 \def^^d8{\1\0\1\1\0\0\0}%
1238 \def^^d9{\1\0\1\1\0\0\1}%
1239 \def^^da{\1\0\1\1\0\1\0}%
1240 \def^^db{\1\0\1\1\0\1\1}%
1241 \def^^dc{\1\0\1\1\1\0\0}%
1242 \def^^dd{\1\0\1\1\1\0\1}%
1243 \def^^de{\1\0\1\1\1\1\0}%
1244 \def^^df{\1\0\1\1\1\1\1}%
1245 \def^^e0{\1\1\0\0\0\0\0}%
1246 \def^^e1{\1\1\0\0\0\0\1}%
1247 \def^^e2{\1\1\0\0\0\1\0}%
1248 \def^^e3{\1\1\0\0\0\1\1}%
1249 \def^^e4{\1\1\0\0\1\0\0}%
1250 \def^^e5{\1\1\0\0\1\0\1}%
1251 \def^^e6{\1\1\0\0\1\1\0}%
1252 \def^^e7{\1\1\0\0\1\1\1}%
1253 \def^^e8{\1\1\0\1\0\0\0}%
1254 \def^^e9{\1\1\0\1\0\0\1}%
1255 \def^^ea{\1\1\0\1\0\1\0}%
1256 \def^^eb{\1\1\0\1\0\1\1}%
1257 \def^^ec{\1\1\0\1\1\0\0}%
1258 \def^^ed{\1\1\0\1\1\0\1}%
1259 \def^^ee{\1\1\0\1\1\1\0}%
1260 \def^^ef{\1\1\0\1\1\1\1}%
1261 \def^^f0{\1\1\1\0\0\0\0}%
1262 \def^^f1{\1\1\1\0\0\0\1}%
1263 \def^^f2{\1\1\1\0\0\1\0}%
1264 \def^^f3{\1\1\1\0\0\1\1}%
1265 \def^^f4{\1\1\1\0\1\0\0}%
1266 \def^^f5{\1\1\1\0\1\0\1}%
1267 \def^^f6{\1\1\1\0\1\1\0}%
1268 \def^^f7{\1\1\1\0\1\1\1}%
1269 \def^^f8{\1\1\1\1\0\0\0}%
1270 \def^^f9{\1\1\1\1\0\0\1}%
1271 \def^^fa{\1\1\1\1\0\1\0}%
1272 \def^^fb{\1\1\1\1\0\1\1}%
1273 \def^^fc{\1\1\1\1\1\0\0}%
1274 \def^^fd{\1\1\1\1\1\0\1}%
1275 \def^^fe{\1\1\1\1\1\1\0}%
1276 \def^^ff{\1\1\1\1\1\1\1}%
1280 {\toks255{}\toks254{}\relax
1282 \let\uc@temp@c\uc@temp@a
1283 \unicode@numtohex\uc@temp@a\uc@got4%
1284 \edef\uc@temp@b{\the\toks254}%
1285 \edef\uc@temp@b{\uc@temp@b}%
1286 \global\let\uc@temp@a\uc@temp@c
1287 \expandafter\info\expandafter{\uc@temp@b}%
1289 \def\skipcodes#1G{\advance\@tempcnta by"#1\relax\checkline}%
1292 \def\add#1{\toks254\expandafter{\the\toks254 #1}}%
1293 \def\checkline{\advance\@tempcnta by1\relax
1294 \ifnum\@tempcnta=\uc@got
1295 \expandafter\uncompress\else\expandafter\skipline\fi}%
1296 \def\checkrange#1I#2G{%
1297 \advance\@tempcnta by"#1\relax
1298 \@tempcntb\@tempcnta\advance\@tempcntb by-"#2\relax
1299 \ifnum\@tempcntb>\uc@got\let\uc@temp@c\skipline\else
1300 \ifnum\@tempcnta<\uc@got\let\uc@temp@c\skipline\else
1301 \let\uc@temp@c\uncompress
1307 \def\1{\toks255\expandafter{\the\toks255 b}\2}%
1308 \def\0{\toks255\expandafter{\the\toks255 a}\2}%
1309 \def\2{\expandafter\ifx\csname hc@\the\toks255\endcsname\relax
1310 \else\csname hc@\the\toks255\endcsname\toks255{}\fi}%';
1314 my $fullname; my $rangeend; my $xoptions;
1315 if (defined $unidata[$i]) {
1316 my $name = $ {$unidata[$i]}{name};
1317 my $alias = $ {$unidata[$i]}{alias};
1318 my $comment = $ {$unidata[$i]}{comment};
1319 $rangeend = $ {$unidata[$i]}{rangeend};
1321 $fullname .= "\n$alias" if $alias;
1322 $fullname .= "\n$comment" if $comment;
1323 $xoptions = join ', ', grep { $excludedoptions{$_} }
1324 keys %{$ {$unidata[$i]}{options}};
1325 $xoptions = undef if $xoptions eq '';
1328 # if ($characters[$i]) {
1329 # $options = grep { defined $_ } join ', ', map {
1330 # my $o = $$_[1]; $o = $$o{onoption};
1331 # $o = 'default' if (!defined $o) || ($o eq ''); $o;
1332 # $o = undef unless $excludedoptions{$o}; $o;
1333 # } @{$characters[$i]};
1334 # if (defined $rangeend) {
1335 # $options2 = $options; $options = undef;
1336 # $fullname2 = findrange($i);
1338 # $fullname = findrange($i) unless defined $fullname;
1342 return undef unless defined $fullname;
1345 if (defined $fullname) { $str .= "$fullname\n" } else { $str .= "\002\n" };
1346 if (defined $xoptions) { $str .= "\003$xoptions\n" };
1350 # if (defined $options2) {
1352 # if (defined $fullname2) { $str2 .= "$fullname\n" }
1353 # else { $str2 .= "\002\n" };
1354 # { $str2 .= "\000$options2\n" };
1361 sub dumpnames_compressed() {
1362 my $file = openfile("$targetdir/uninames.dat",
1363 "%%% unicode name hash for ucs.sty, $autogen",
1364 "Unicode character names, compressed",
1365 coding => 'no-conversion');
1366 print "Creating Huffman code\n" if $verbose;
1367 my (@weight, @tree, %lookup);
1368 my $nexthuffman = 0;
1369 my $upto = $#unidata; $upto = $#characters if $#characters > $upto;
1370 for (my $i=1; $i<=$upto; $i++) {
1371 for my $str (getname $i) {
1372 next unless defined $str;
1373 for my $c (split '',$str) {
1374 my $n = $lookup{$c};
1375 unless (defined $n) { $n = $nexthuffman++; $lookup{$c} = $n };
1382 my $highnum = 9999999;
1384 my ($smallest, $smallest2);
1385 my $smallestval = $highnum;
1386 my $smallest2val = $highnum;
1387 for (my $i=0; $i<$nexthuffman; $i++) {
1388 my $val = $weight[$i];
1390 next unless defined $val;
1392 if ($val < $smallestval) {
1393 my $tmp = $val; $val = $smallestval; $smallestval = $tmp;
1394 $tmp = $j; $j = $smallest; $smallest = $tmp;
1396 if ($val < $smallest2val) {
1397 $smallest2val = $val; $smallest2 = $j;
1399 # print "A: $smallest, $smallest2\n";
1401 last if ($smallest2val==$highnum);
1402 $weight[$smallest] = undef;
1403 $weight[$smallest2] = undef;
1404 $tree[$nexthuffman] = [$tree[$smallest],$tree[$smallest2]];
1405 $weight[$nexthuffman] = $smallestval+$smallest2val;
1406 # print "$smallest + $smallest2 => $nexthuffman\n";
1407 # print Dumper($tree[$nexthuffman]);
1411 dumphuffman($tree[$nexthuffman-1],'',\%lookup);
1412 my $padding = undef;
1413 for my $i (values %lookup) {
1415 $padding = $i; last;
1418 unless (defined $padding) {
1419 warn "There is no huffman bit sequence longer than 7 bit. ".
1420 "Padding with spaces";
1421 $padding = $lookup{' '} x 8;
1424 print "Compressing\n" if $verbose;
1426 for my $c (keys %lookup) {
1427 my $v = $lookup{$c};
1428 $v =~ s/1/b/g; $v =~ s/0/a/g;
1429 $c = $uninames_abbrev{$c} if defined $uninames_abbrev{$c};
1430 print $file "\\def\\hc\@$v\{\\add $c\}\%\n";
1432 print $file "$huffman_decoder\n";
1435 for (my $i=1; $i<=$#unidata; $i++) {
1436 for my $str (getname $i) {
1437 next unless defined $str;
1439 if ($unidata[$i]) { $rangestart = $ {$unidata[$i]}{rangestart}; };
1442 if (!defined $lookup{$&}) { print "L: '$&'\n"; };
1445 # $str = join '', map { if ($_) {
1446 # pack "B*", sprintf "3%-7s", $_;
1447 # } } split /(.{1,7})/, $str;
1449 sprintf "3%s%s",$&,substr($padding,0,7-length($&));
1451 $str = pack("B*",$str);
1453 if (defined $rangestart) {
1454 # printf $file "I%XI%X", $i-$cp+1, $rangeend-$i;
1456 printf $file "I%XI%X", $i-$cp+1, $i-$rangestart;
1459 if ($i>$cp) { printf $file "H%X", $i-$cp; $cp=$i};
1461 print $file "G$str\n";
1468 sub dumphuffman($$$) {
1469 my ($tree,$prefix,$lookup) = @_;
1470 # print Dumper([$tree]); return;
1472 dumphuffman($$tree[0],$prefix.'0',$lookup);
1473 dumphuffman($$tree[1],$prefix.'1',$lookup);
1475 # print "HUFFMAN: '$tree' => '$prefix'\n";
1476 $$lookup{$tree} = $prefix;
1482 return 1 if ($cp>=0xe000 && $cp<=0xf8ff);
1483 return 1 if ($cp>=0xf0000 && $cp<=0xffffd);
1484 return 1 if ($cp>=0x100000 && $cp<=0x10fffd);
1488 sub dumpnames_uncompressed() {
1489 my $file = openfile("$targetdir/uninames.dat",
1490 "%%% unicode name hash for ucs.sty, $autogen",
1491 "Unicode character names");
1492 warn "Dumping uncompressed unicode data, ".
1493 "this is no longer maintained and may give unpredicted results.\n";
1495 for (my $i=1; $i<=$#unidata; $i++) {
1496 next unless defined $unidata[$i];
1497 my $name = $ {$unidata[$i]}{name};
1498 my $alias = $ {$unidata[$i]}{alias};
1499 my $comment = $ {$unidata[$i]}{comment};
1500 my $str = sprintf "Unicode character %d = 0x%04x:${messagebreak}%s",
1502 $str .= "${messagebreak}$alias" if $alias;
1503 $str .= "${messagebreak}$comment" if $comment;
1504 print $file "\\info{$str}\%\n";
1510 print "PASS: initcharacters\n" if $verbose;
1512 if ($dump_unidata) {
1513 print "PASS: unlink unidata files\n" if $verbose;
1517 ^uni-global\.def$ ',$targetdir);
1519 if ($tables_to_dump) {
1520 print "PASS: unlink tables\n" if $verbose;
1521 unlinkfiles(' ^table-.*-.*-.*\.tex$ ',$tabledir);
1524 print "PASS: loadunidata($unidata)\n" if $verbose;
1525 loadunidata($unidata); }
1526 for my $configfile (@configfiles) {
1527 print "PASS: loadconfig($configfile)\n" if $verbose;
1528 loadconfig($configfile); }
1529 print "PASS: generate\n" if $verbose;
1531 if ($dump_unidata) {
1532 print "PASS: generate globals\n" if $verbose;
1533 generate_globals(); }
1534 if ($createcomments) {
1535 print "PASS: makecomments\n" if $verbose;
1537 if ($dump_unidata) {
1538 print "PASS: dumpcharacters\n" if $verbose;
1541 if ($tables_to_dump) {
1542 print "PASS: dumping character tables to directory $tabledir/\n"
1544 dump_charactertables();
1546 if ($generate_uninames && $dump_unidata && $loadunidata) {
1547 if ($compressnames) {
1548 print "PASS: dumpnames (compressed)\n" if $verbose;
1549 dumpnames_compressed();
1551 print "PASS: dumpnames (uncompressed)\n" if $verbose;
1552 dumpnames_uncompressed();
1554 print "PASS: closefiles\n" if $verbose;
1560 ### Local IspellDict: british
1563 # LocalWords: uni def makeunidef pl de ucs sty AUTOOPTION num configfiles dir
1564 # LocalWords: nocomments SYSNOPSIS targetdir db UnicodeData txt uninames dat
1565 # LocalWords: UNIDATA org www http unicode