Polyglot Emacs 20.4
A look at multilingual Emacs.
Ken'ichi Handa's Mule (multilingual Emacs) first appeared at the end of 1992.
After almost five years, the Mule enhancements were included with GNU Emacs
20.x. For those of us who have yearned for multi-script capability since our
first encounter with a computer (more than twenty years ago), it has been a long
and often frustrating wait.
A D V E R T I S E M E N T
We are now at GNU Emacs version 20.4 and things are
finally beginning to look interesting to people who wish to work with multiple
scripts on Linux. I am using Emacs for translation, exegesis and preparing
reference material in multiple scripts.
Who wants to install a special Japanese Linux just to be able to read a
Japanese source file for a translation job or read and write Japanese e-mail? I
want to be able to include Chinese bibliographies, text and notes in the papers
I write. I would also like to be able to include Tibetan or Greek scripts for
philosophical or technical terms, along with their transliteration into Latin
script. When I am discussing the structure of Chinese characters, I want to be
able to make comparisons with Egyptian hieroglyphs. I want my quotations of
French and German to look like French and German. I want to be able to publish
all this on web pages as well as my PostScript printer. Some high-priced
programs are coming out that address these issues, but Emacs 20.4 is here now.
It runs on the best generally available operating system in the world,
GNU/Linux, and it is free.
As an example of using Emacs to prepare multi-script reference material,
let's look at the Buddhist numerical lists I am currently working on. Without
much difficulty, I can write this list with a pen on a piece of paper. See
Figure 1 for a bitmap of a page from my handwritten list. (Apologies for my poor
calligraphy.) To get these scripts into a computer text file requires an input
method specific to each script. Fortunately, Emacs comes with Quail, which has a
method to input each script in this example and many more.
To invoke Quail for Devanagari, use the Mule menu or type:
ctrl-x return
ctrl-\ devanagar-transliteration
Three other choices are available. For Tibetan, you will want
tibetan-wylie. For Chinese, more than twenty methods are available. I use
chinese-py-b5 for traditional characters and chinese-tonepy for the
simplified characters. The Quail Japanese input method is adequate for short
strings, but not extensive input. This is one place where I feel a free input
method editor for Linux is needed that equals or surpasses Microsoft's free
Japanese IME for Windows. Wnn 4.2, the last free version of Wnn, worked well. I
have used it in the past with Mule, but so far I have not be able to get it to
work with Emacs.
Soon, I hope to add the Korean, Thai, Lao and Vietnamese equivalents of each
term in each list. All these are supported by Emacs. Finally, I hope to add the
Mongolian script, which is not yet available in Emacs.
As you start to use several different input methods, you may soon find the
command to invoke them, ctrl-x return ctrl-\ cumbersome. I rebind it
with:
meta-x global-set-key f3 return
set-input-method
While you're at it, you might as well bind the command
universal-coding-system-argument to something handy�I use f2.
universal-coding-system-argument is the
command that lets you specify which coding system you want Emacs to use when you
execute your next command. If you do much multi-script work, this will probably
be ctrl-x ctrl-v return, which revisits the file you just visited. On the
revisit, Emacs uses the coding system you specified as the
universal-coding-system-argument. From the main Emacs menu, you can select
Mule/Set Coding System/Next Command to do the same thing. (See Emacs Manual
31.4.5 for details on rebinding keys.)
For information on each input method and sometimes a list of the characters
you can use with it, type ctrl-h I. As usual in Emacs, tab will give you
a list of choices if you don't know the exact name of the input method you are
after. ctrl-g escapes from whatever you are doing in the mini-buffer. I
said �sometimes� because some lists are missing. For example, in response to
ctrl-h I ipa, Emacs returns �Input method: ipa (IPA in mode line) for IPA
International Phonetic Alphabet for English, French, German and Italian� but
provides no list of the actual symbols. For Devanagari, on the other hand, a
full list of the letters of the script is presented. Not given are the details
on how to evoke several operations essential for being able to input the script
properly. If you are familiar with the script, you can probably hack your way
through. If, like me, you are a beginner and merely attempting to input it from
a transcription in Latin script, even assuming your transcription is precise,
you will not be pleased. Detailed descriptions of the various input methods are
needed.
Start with Tibetan. Type the Wylie transliteration and the script
appears�very smooth. For beginners, it is easier than writing Tibetan by hand.
For the time being, I had to give up trying to input Devanagari. You may have
better luck.
All these input methods come in a package called Leim. As of this writing,
Leim is bundled with the Emacs-20.3.92 from ftp://ftp.etl.go.jp/pub/mule/.notready/,
but it must be downloaded as a separate package for the Emacs pre-release from
ftp://alpha.gnu.org/. Anyway, if Emacs cannot find all files included with Leim
when it compiles on your system, you won't have any input methods. Let's hope
the distributions will include Leim by default and give you an option to exclude
it.
To evoke the multi-script capabilities of the new Emacs, another essential
ingredient is Intlfonts 1.x. At present, it is version 1.2. This package
provides all the fonts you need to display all the scripts. It, too, must be
installed before you will find any joy in multi-script work.
The latest version of ps-print.el that comes with Emacs allows you to at
least dump your multiple-script files to the laser printer. Currently, you must
be content with �not scalable� bit-mapped fonts where one size fits all, but
this is an essential first step. Perhaps CJK-TeX by Werner Lemberg
([email protected]) or Omega, the new, purportedly
internationalized TeX, will generously give us the ability to produce
high-class, camera-ready, multi-script output for printing and a description of
how to do it.
In Figure 2, we see the same text as in Figure 1 entered into an Emacs
buffer, or as much of it as I could enter without a better understanding of the
Devanagari input method.
It could be argued that we should not look to Emacs for help in printing,
aside from the minimum requirement of being able to dump multi-script texts to
the printer. Maybe the same holds for Web publishing�I don't know. However, it
is frustrating to create a document in five or more scripts perfectly well in
Emacs and then not be able to print it at a camera-ready level of quality, or
publish it on the Web where the only character set that would allow the
inclusion of all five scripts, with some on-going support, is Unicode,
particularly in its UTF-8 encoding.
If there were an option to map the multi-script file in Emacs to the Unicode
character set and then save it in UTF-8 encoding, ithe file would be directly
available as content for an XML or HTML document. True, some browsers cannot
understand Unicode yet, and the browser user may not have all the fonts
installed, but this is bound to change for the better soon. One thing for sure:
few will be able to read it in Mule internal format or even in a mix of ISO-2022
character sets/encodings.
Regarding a Unicode converter for Emacs, Miyashita Hisashi took a first shot
at a Mule internal-encoding-to-Unicode converter with his MULE-UCS converter,
but as of this writing I have not been able to install it. I noticed on the
Unicode mailing list that Mark Leisher ([email protected]) also has one
under construction. Hopefully, by the time this article is printed, we will be
able to produce a Unicode/UTF-8-encoded file from Emacs.
|