Mined Unicode Howto
Environment setup and Usage of mined for Unicode text
UTF-8 encoded Unicode support and features
-
See the mined features page for
an overview of mined support for Unicode editing.
- For general information on Unicode and its support on computers,
see also Markus Kuhn's
UTF-8 and Unicode FAQ for Unix/Linux.
-
Environment setup
- Install suitable terminal
-
Mined is a text mode editor. Its UTF-8 support is available
for example with the newer versions of
xterm (>= 145 recommended),
rxvt-unicode, mlterm, kde konsole, or on the Linux console,
each in UTF-8 mode.
- If you don't have a recent version of xterm on your system, compile
one yourself; configure xterm with the option "--enable-wide-chars" or
use the script
configure-xterm
from the
mined runtime support library. Then invoke "make".
- Install suitable fonts
-
Install Unicode fonts for your X server.
- To check if your X installation already provides Unicode fonts,
you may invoke the command
xlsfonts | grep iso10646
.
If this doesn't list anything, or if you cannot find a suitable font
setup, do one of the following:
-
- Automatic installation:
- The Mined runtime support library contains a script
"installfonts" that downloads these fonts and installs them with your
X server. It finally gives some hints how to add them to your
permanent font configuration.
- Manual installation:
-
- Retrieve some of the following fonts:
-
UCS fonts for X
with their
CJK supplement
from Markus Kuhn's page
Unicode fonts and tools for X11
-
Adobe and B&H bitmap fonts
from the same site which contain fixed width Courier and
Lucida Typewriter fonts
-
Unicode VGA font
from
Dmitry Bolkhovityanov's site
-
Monospace Roman BDF fonts
and their Oblique / Bold / Bold Oblique supplements from
George Williams Unicode fonts page
- The nicest looking font in the UCS fonts archive mentioned above
is the 10x20 size font, it is suitable for higher screen resolutions.
Unfortunately, the CJK double-width fonts are not distributed in
the corresponding 20x20 size, but only in the 18x18 size. The
corresponding single-width font in 9x18 size, however, looks quite
spindly and for my taste rather awkward.
For this reason, I am providing a script to generate 20x20 CJK fonts
automatically from the 18x18 UCS fonts distributed for X servers.
It is bdf18to20
and you find it in the mined runtime
support library. Go into the directory where you unpacked the fonts
and invoke the script.
- Install the fonts with your X server: unpack them into a directory
(e.g.
$HOME/xfonts
), go into that directory, invoke the
mkfontdir
command. Then make sure that the fonts are
loaded into your X server, using the command
xset +fp $HOME/xfonts
; a suitable place to include this
automatically would be your $HOME/.xinitrc
X
initialisation file if you have one.
- Note: If you are working in a network, make sure the xset
command is invoked such that the X server has access to the given
directory on the machine it is running on.
- Some X servers (e.g. Exceed on Windows) do not accept BDF fonts;
use the "Compile Fonts" function of the configuration menu to install
the fonts.
- Start terminal in UTF-8 mode
-
Invoke a terminal window in UTF-8 mode and configure it to use
fonts sufficient to display the text you want to edit.
- Invoke xterm with suitable resource configuration or command line
parameters.
I recommend to invoke xterm with the Unicode script
uterm
from the mined runtime
support library.
Since mined 2000.8, UTF-8 mode is auto-detected. So it will work
even if your locale environment is not configured correctly.
- Note: xterm is quite touchy about configuring suitable
matching fonts for single-width and double-width glyphs. If you are
unlucky, CJK character display will result in garbage on the screen.
My recommendation is to generate the 20x20 UCS fonts with my
bdf18to20
script as mentioned above and configure xterm
to use 10x20 - it will then automatically select one of the 20x20
fonts for double-width characters; if you have a preference among
them, use the -fw command line option or the wideFont X resource (in
your $HOME/.Xdefaults
file).
See the pattern file Xdefaults.mined
in the mined runtime
support library for suggestions of suitable entries.
(Double-width font matching works much better with rxvt which even seems
to scale double-width fonts in an acceptable way if needed.)
- If you prefer rxvt, use rxvt-unicode and make sure to indicate
using UTF-8 by setting a locale in your environment that is installed
on your system.
- Note: rxvt is quite touchy about configuring a known locale
setting; it does not have a strict UTF-8 option that would reliably
work on all systems.
-
Hint: For hints how to configure the environment explicitly so
that rxvt, konsole and other applications work with UTF-8, see the
mined manual page (about LC_CTYPE and other environment variables).
Accurate locale setting is not needed by xterm and mined.
For other terminals (e.g. mlterm), see their manual for how to
configure UTF-8 mode.
- Alternatively, you can start mined directly together with its own
terminal window. For this purpose, the mined runtime support library
contains the script
umined
; this also quickly enables you
to use the most recent version of Unicode width data (specifying wide
and combining characters) as built-in to xterm in contrast to
system-provided locale data which may refer to an older version of Unicode.
- On a Windows system, you can also use the script
wmined
or wmined.bat
which will invoke mined
in an rxvt terminal window. wmined
starts rxvt with
Windows look-and-feel colour settings and tries to match your font
size preferences by inspecting the Windows registry. The advantage of
using rxvt on Windows is that it can run stand-alone, without an X server.
The disadvantage is that rxvt-unicode does not run on Windows yet.
-
Handling Unicode text with mined
- Screen handling
-
Usually, mined will auto-detect a UTF-8 terminal and also
the detailed features it has (like double-width and
combining characters, Arabic ligature joining, different width
data sets).
- Character encoding
-
By default, mined detects automatically if the text in an edited
file is UTF-8 encoded (Unicode character set) or not (either
8 bit encoded or CJK encoded); it also detects and maintains UTF-16.
Mined handles illegal UTF-8 sequences transparently so
if you accidentally open an 8 bit or CJK encoded file in UTF-8
mode, or a file with mixed parts, you can edit the text without
problems and will not loose any information. Non-UTF-8 codes
are indicated by display background highlighting.
While editing, you can switch the character encoding assumed
for text interpretation with the encoding menu
(left-click to toggle current and previous encoding,
right-click to open menu).
- Unicode display on non-Unicode terminal
-
If a UTF-8 file is edited in a Latin-1 terminal environment,
characters outside of the Latin-1 range (greater than 0xFF)
are displayed as a block symbol
¤
with special indications for wide and combining characters.
The Euro symbol is displayed as E
.
Please consult the manual page, section
Unicode display for details.
- Combining characters
-
Mined supports display and editing of combined characters
consisting of a base character and one or more combining
characters.
It provides two display mode, a combined display mode
which displays the combined characters as they should appear,
and a separated display mode which separates base character
and combining characters for explicit handling.
These modes can be selected and are indicated in the
Combining display flag: ç
: combined mode,
`
: separated mode.
See the manual page, section
Combining characters for details.
- Bidirectional display
-
Mined auto-detects if it is run in a terminal that supports
bidi scripts (e.g. mlterm), or it can be told so with the
command line parameter
+UU
.
The mined runtime support library contains a script
mterm
to invoke mlterm with suitable parameters
to set up bidi mode and a suitable font.
- CJK and 8 bit character set support on Unicode terminal
-
Mined also handles major CJK encodings in a UTF-8 terminal,
as well as a selected set of mapped 8 bit character sets.
See the mined features page,
or the manual page, sections
CJK support and
Character encoding support for details.
Mined homepage and download.
Thomas Wolff