In recent (as of 1998/08/11) kernels, the screen driver is based on 16-bit unicode (UCS2) encoding, which means that every console-font loaded should be defined using a unicode Screen Font Map (SFM for short), which tells, for each character in the font, the list of UCS2 characters it will render.
SFM's were formerly called ``Unicode Map'', or ``unimap'' for short, but this term should be dropped, as now what they called ``screen maps'' uses Unicode as well: it probably confuses many many people
Starting with release 1997.11.13 of the Linux Console Tools, consolechars(8)
now understands SFM fallback tables. Before that, SFM's should
contain at the same time the Unicode of the characters it was
primarily meant to render, as well as any approximations the user
would like to. These fallback tables allow to only put the primary
mappings in the SFM provided with the font-file, and to
separately keep a list telling ``if no glyph for that
character is available in the current font, then try to display it
with the glyph for this one, or else the one for that one, or
...''. This permits to keep in one only place all possible
fallbacks, and everyone will be able to choose which fallback tables
(s)he wants. Have a look at data/consoletrans/*.fallback
for
examples.
A fallback-table file is made of fallback entries, each entry being on
its own line. Empty lines, and lines beginning with the #
comment character are ignored.
A fallback entry is a series of 2 or more UCS2 codes. The first one is the character for which we want a glyph; the following ones are those whose glyph we want to use when no glyph designed specially for our character is available. The order of the codes defines a priority order (own glyph if available, then second char's, then the third's, etc.)
If a SFM was to be loaded, fallback mappings are added to this map
before it is loaded. If there was not (ie. a font without SFM was
loaded, and no --sfm
option was given to consolechars
, or
the --force-no-sfm
option was given), then the current SFM is
requested from the kernel, the fallback mappings are added, and the
resulting SFM is loaded back into the kernel.
Note that each fallback entry is checked against the original SFM, not
against the SFM we get by adding former fallback entries to the
original SFM (the one read from a file, or given by the kernel); this
applies even to entries in different files, and thus the order of
-k
options has no effect. If you want some entries to be
influenced by previous ones, you will have to use different fallback
files, and to load them with several consecutive invocations of
consolechars -k
.
There are basically 2 screen-modes (byte mode and UTF mode). The simpler to explain is the UTF mode, in which the bytes received from the application (ie. written to the console screen) are interpreted as UTF8 sequences, which are converted in the equivalent UCS2 codes, and then looked-up in the SFM to determine the glyphs used to display each character.
Switching to and from UTF mode is done by sending to the screen the
escape sequences <ESC>%G
and <ESC>%@
respectively. You may use the unicode_start(1)
and
unicode_stop(1)
scripts instead, as they also change the keyboard
mode, and let you optionally change the screen-font.
Use vt-is-UTF8(1)
to find out whether active VT is in UTF mode.
The byte mode is a bit more complicated, as it uses an additional map to transform the byte-characters sent by the application into UCS2 characters, which are then treated as told above. This map I call the Application Charset Map (ACM), because it defines the encoding the application uses, but it used to be called a ``screen map'', or ``console map'' (this comes from the time where the screen driver didn't use Unicode, and there was only one Map down there).
Although there is only one ACM active at a given time, there are 4 of them at any time in the kernel; 3 of them are built-in and never change, and they define the ISO latin1 charset, the DEC VT100 charset, and the IBM codepage 437; the 4th is user-definable, and defaults on boot to the ``straight to font'' mapping, decribed below under ``Special UCS2 codes''.
The consolechars(1)
command can be used to change the ACM, as
well as the font and its associated SFM.
The Linux Console Driver has 2 slots for charsets, labeled G0 and
G1. Each of these slots contains a reference to one of the 4
kernel ACMs, 3 of which are predefined to provide the cp437,
iso01, and vt100 graphics charsets. The 4th one is
user-definable; this is the one you can set with consolechars
--acm
and get with consolechars --old-acm
. The console's
defaults are iso01 for G0 and vt100 graphics for
G1.
Versions of the Linux Console Tools prior to 1998.08.11, as well as all versions of
kbd
at least until 0.96a, were always assuming you wanted to use
the G0 slot, pointing to the user-defined ACM. You can now use the
charset
utility to tune your charset slots.
You will note that, although each VT has its own slot settings, there is only one user-defined ACM for use by all the VTs. That is, whereas you can have tty1 using G0=cp437 and G1=vt100, at the same time as tty2 using G0=iso01 and G1=iso02 (user-defined), you cannot have at the same time tty1 using iso02 and tty2 using iso03. This is a limitation of the linux kernel.
Note that you can emulate such a setting using the filterm
utility, with your console in UTF8-mode, by telling filterm
to
translate screen output on-the-fly to UTF8.
You'll find filterm in the konwert package, by Marcin Kowalczyk, which is available from his WWW site.
There are special UCS2 values you should care about, but the present list is probably not exhaustive:
C
from U+F000
to U+F1FF
are not looked-up
in the SFM, and directly accesses the character in font-position C
& 0x01FF
(yes, a font can be 512-chars on many hardware
platforms, like VGA). This is refered to as the straight to font
zone.
U+FFFD
is the replacement character, usually at
font-position 0 in a font. It is displayed by the kernel each time
the application requested a unicode character that is not present in
the SFM. This allows not only the driver to be safe in Unicode mode,
but also prevents displaying invalid characters when the ACM on a
particular VT contains characters not in the current font !There was a time where the kernel didn't know anything about Unicode. In this ancient time, Application Charset Maps were called ``screen maps'', and just mapped the application's characters into font positions. The file format used for these 8bit ACM's is still supported for backward compatibility, but should not be used any more.
The old way of using custom ACM's didn't know about unicode, so the
ACM had to depend on the font. Now, as each VT chooses its own ACM
(from the 4 ones in the kernel at a given time), and as the
console-font is common to all VT's, we can use a charset even if the
font can't display all of its characters; it will then display the
replacement character (U+FFFD
).
psfaddtable(1)
, psfgettable(1)
, psfstriptable(1)
,
showfont(1)
.