540 lines
26 KiB
Plaintext
540 lines
26 KiB
Plaintext
Name: 0README.TXT
|
|
|
|
Background information - mapping tables for the Mac(TM) OS
|
|
Version 6: Nov. 15, 1995 - update info for Hebrew and Thai
|
|
(Version 5: Apr. 15, 1995)
|
|
Peter Edberg, Apple Computer, Inc. <edberg1@applelink.apple.com>
|
|
Copyright (C) 1995 by Apple Computer, Inc., all rights reserved.
|
|
|
|
|
|
0. Preliminaries
|
|
----------------
|
|
|
|
For maximum interchangeability, this file and the accompanying MacOS
|
|
mapping tables use only ASCII characters and are intended to be
|
|
displayed in a monospaced font. Every line terminates with carriage
|
|
return.
|
|
|
|
Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple
|
|
Computer, Inc., registered in the United States and other countries.
|
|
QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is
|
|
a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems
|
|
Inc., which may be registered in certain jurisdictions. IBM is a
|
|
registered trademark of International Business Machines Corporation. ITC
|
|
Zapf Dingbats is a registered trademark of the International Typeface
|
|
Corporation. For the sake of brevity, throughout this document and the
|
|
accompanying tables, "Unicode" can be used to refer to the Unicode
|
|
standard.
|
|
|
|
Apple Computer, Inc. ("Apple") makes no warranty or representation,
|
|
either express or implied, with respect to this document and the
|
|
accompanying tables, their quality, accuracy, or fitness for a
|
|
particular purpose. In no event will Apple be liable for direct,
|
|
indirect, special, incidental, or consequential damages resulting from
|
|
any defect or inaccuracy in this document or the accompanying tables.
|
|
|
|
1. Introduction
|
|
---------------
|
|
|
|
In order to understand the accompanying MacOS mapping tables, you will
|
|
need to understand something about the MacOS Unicode Converter. This
|
|
converter has been designed to handle the complex issues that can arise
|
|
when converting between Unicode and other character sets, including:
|
|
* Round-trip fidelity (and the use of corporate characters);
|
|
* Supporting various mapping tolerance levels (strict, loose), in order
|
|
to provide both round-trip fidelity and a way to handle characters
|
|
that have multiple or ambiguous semantics in some character sets;
|
|
* Handling character set variants and extensions, which may require
|
|
selective inclusion or exclusion of certain mappings or sets of
|
|
mappings;
|
|
* Mapping single characters in one set to multiple characters in another
|
|
or vice versa (in general, a match may map a sequence of 1 to n
|
|
characters in one set to a sequence of 0 to m characters in another);
|
|
* Handling mappings that may depend on attributes such as resolved
|
|
character direction, vertical or horizontal display direction, etc.
|
|
|
|
The above issues are described in more detail in sections 2-6. Section 7
|
|
provides some general information on MacOS character sets and a list of
|
|
MacOS character encodings.
|
|
|
|
This document and all of the accompanying mapping tables and character
|
|
lists are preliminary and subject to change. Updated documents and
|
|
tables will be available from the Unicode Inc. ftp site (unicode.org),
|
|
the Apple ftp site (ftp.info.apple.com), the Apple Computer World-Wide
|
|
Web pages (http://www.info.apple.com), and possibly on diskette from
|
|
APDA (Apple's mail-order distribution service for developers).
|
|
|
|
2. Round-trip fidelity and corporate characters
|
|
-----------------------------------------------
|
|
|
|
For the various national and international standards that were sources
|
|
for Unicode, Unicode provides round-trip fidelity: Text in one of those
|
|
encodings can be mapped to Unicode and back again with no loss of
|
|
information. Characters which were distinct in the source standard are
|
|
distinct in Unicode.
|
|
|
|
However, Unicode does not attempt to provide round-trip fidelity for
|
|
most vendor standards. Nevertheless, Apple and other platform vendors
|
|
may need to provide such round-trip fidelity for their current encodings
|
|
(this can be important in file systems, for example). In order to do
|
|
this, Apple maps some characters in the current MacOS encodings to
|
|
character codes at the upper end of the Unicode private use area (i.e.
|
|
the corporate use zone). In general, these are characters that are
|
|
rarely used in text that is interchanged with other systems, or
|
|
characters for which mistranslation in interchange would have a minimal
|
|
impact on most documents. Apple's usage of character codes in the
|
|
corporate use zone is documented in the accompanying file
|
|
"MacOS_CorpChars".
|
|
|
|
There is another round-trip fidelity issue that is important for the
|
|
MacOS Unicode converter. Among other things, this converter will be used
|
|
to convert between two non-Unicode encodings by using Unicode as an
|
|
intermediate form. For example, characters in the MacOS standard Roman
|
|
encoding could be converted to ISO/IEC 8859-1 by converting them first
|
|
to Unicode, and then converting the Unicode text to ISO/IEC 8859-1.
|
|
However, not all MacOS standard Roman characters can be represented in a
|
|
distinct way in ISO/IEC 8859-1. In such cases it is useful to know the
|
|
subset of MacOS Roman characters that can be converted to 8859-1 and
|
|
back (via Unicode) with no loss of information.
|
|
|
|
3. Mapping tolerance: Strict and loose
|
|
--------------------------------------
|
|
|
|
In many character sets, a single character may have multiple semantics,
|
|
either by explicit definition, ambiguous definition, or established
|
|
usage. For example, the JIS character 0x2142 (Shift-JIS 0x8161) is
|
|
specified in the JIS X0208 standard to have two meanings: "double
|
|
vertical line" and "parallel". Each of these meanings corresponds to a
|
|
different Unicode character: 0x2016 "DOUBLE VERTICAL LINE" and 0x2225
|
|
"PARALLEL TO". When mapping from Unicode to JIS, it is normally
|
|
desirable to map both of these Unicode characters to the single JIS
|
|
character 0x2142. However, when mapping this JIS character to Unicode,
|
|
we can choose only one of the possible Unicode characters.
|
|
|
|
For some character set X, the converse of the X-to-Unicode mappings are
|
|
called "strict" mappings from Unicode to X. In general, strict mappings
|
|
permit roundtrip conversion from Unicode to X and back for a subset of
|
|
Unicode characters. Strict mappings are useful when round-trip fidelity
|
|
is desired for an X-to-Unicode-to-Y mapping.
|
|
|
|
For some characters in X, there may be additional mappings from Unicode
|
|
that fall within the range of explicit or established usage for those
|
|
characters; these are called "loose" mappings. It is important to note
|
|
that the range of allowed loose mappings is determined by the character
|
|
set X.
|
|
|
|
Furthermore, in some cases it is helpful to map a Unicode character to a
|
|
sequence of one or more target characters that may not have the same
|
|
meaning or use, but which may provide an approximate graphic
|
|
representation of the corresponding Unicode character. These are called
|
|
"fallback" mappings.
|
|
|
|
Some examples of strict and loose mappings:
|
|
|
|
a) In the JIS example above, JIS 0x2142 is usually mapped to Unicode
|
|
0x2016 "DOUBLE VERTICAL LINE". Thus the reverse mapping is a strict
|
|
mapping from Unicode to JIS, while mapping Unicode 0x2225 "PARALLEL TO"
|
|
to JIS 0x2142 is a loose mapping.
|
|
|
|
b) When mapping ASCII to Unicode, 0x0A "line feed" and 0x0D "carriage
|
|
return" are usually mapped to the Unicode code points 0x000A and 0x000D.
|
|
When mapping Unicode to ASCII, loose mappings could include mapping
|
|
0x2028 "LINE SEPARATOR" to 0x0A and mapping 0x2029 "PARAGRAPH SEPARATOR"
|
|
to 0x0D.
|
|
|
|
c) Other loose mappings from Unicode to ASCII might include mapping
|
|
Unicodes 0x2010 "HYPHEN" and 0x2212 "MINUS SIGN" to ASCII 0x2D "hyphen-
|
|
minus".
|
|
|
|
d) In the conventional mapping from ISO/IEC 8859-1 to Unicode, the
|
|
8859-1 character 0xE0 "small letter a with grave accent" is mapped to
|
|
Unicode 0x00E0 "LATIN SMALL LETTER A WITH GRAVE", so the reverse mapping
|
|
is a strict mapping from Unicode to 8859-1. However, the two-character
|
|
Unicode sequence 0x0061+0x0300 ("LATIN SMALL LETTER A" + "COMBINING
|
|
GRAVE ACCENT") can also be mapped to 8859-1 0xE0 as a loose mapping.
|
|
|
|
e) Since Shift-JIS distinguishes halfwidth and fullwidth characters,
|
|
loose mappings for Shift-JIS must also keep these distinct. For example,
|
|
Shift-JIS 0x814D (JIS 0x212E) "grave accent [fullwidth]" is often mapped
|
|
to Unicode 0xFF40, "FULLWIDTH GRAVE ACCENT", and the reverse is a strict
|
|
mapping. In this case the Unicode sequence 0x3000+0x0300 "IDEOGRAPHIC
|
|
SPACE" + "COMBINING GRAVE ACCENT" can also be mapped to Shift-JIS 0x814D
|
|
as a loose mapping. However, the Unicode sequence 0x0020+0x0300 "SPACE"
|
|
+ "COMBINING GRAVE ACCENT" should not be mapped to Shift-JIS 0x814D as a
|
|
loose mapping, although this sequence could be mapped to Shift-JIS 0x60
|
|
"grave accent [halfwidth]" as a loose mapping.
|
|
|
|
Although the MacOS Unicode converter (and its tables) supports strict,
|
|
loose, and fallback mappings, the MacOS character mapping tables
|
|
accompanying this document provide only the strict mappings.
|
|
|
|
4. Character set variants and extensions
|
|
----------------------------------------
|
|
|
|
An example illustrates this issue:
|
|
|
|
The MacOS standard Japanese character set is based on Shift-JIS with
|
|
some additional characters. The additions include:
|
|
(a) For one-byte characters, five additions and one modification.
|
|
(b) Separately-encoded vertical forms for some punctuation and kana
|
|
characters from JIS rows 1, 4, and 5. These vertical forms are in
|
|
JIS rows 85, 88, and 89.
|
|
(c) Apple extension characters, in JIS rows 9-15.
|
|
|
|
However, in older versions of the Japanese system, some of the fonts
|
|
were based on a different encoding which did not include the Apple
|
|
extension characters and which encoded vertical forms in JIS rows 11,
|
|
14, and 15. Furthermore, PostScript fonts use a different set of
|
|
extensions in rows 9-15.
|
|
|
|
With the MacOS Unicode converter, several variants can be specified for
|
|
the MacOS Japanese character set:
|
|
* The standard set, with extensions and vertical forms;
|
|
* A reduced version of the standard set, without the separately encoded
|
|
vertical forms;
|
|
* An alternate set that corresponds to the old font variant;
|
|
* An alternate set that corresponds to the PostScript variant;
|
|
* A basic "least common denominator" set that works with all the old and
|
|
new fonts.
|
|
|
|
The MacOS Japanese character set mappings provided in the accompanying
|
|
tables cover only the standard character set, but they are grouped into
|
|
three sections: the basic set, the Apple extensions, and the vertical
|
|
forms.
|
|
|
|
5. Mappings that are not one-to-one
|
|
-----------------------------------
|
|
|
|
In some cases, a character in a non-Unicode character set may map to a
|
|
sequence of characters in Unicode. To handle the reverse mapping, the
|
|
MacOS Unicode converter can break a Unicode stream into appropriate text
|
|
elements (which may consist of more than one Unicode character) and can
|
|
look up multi-character Unicode sequences.
|
|
|
|
For example, the Apple extensions in the MacOS standard Japanese
|
|
character set include a character for the circled CJK ideograph for
|
|
"big". Although Unicode encodes other circled ideographs as single
|
|
characters, it does not encode this one. However, this character can be
|
|
represented in Unicode as the Unicode sequence 0x5927+0x20DD, the CJK
|
|
ideograph for "big" followed by COMBINING ENCLOSING CIRCLE.
|
|
|
|
In addition, a single Unicode character (or a multi-character Unicode
|
|
sequence) may map to a sequence of multiple characters in another
|
|
encoding. For example, the Unicode character 0x00BD "VULGAR FRACTION ONE
|
|
HALF" cannot be mapped into the MacOS standard Roman character set as a
|
|
single character, but it can be mapped to the sequence 0x31+0xDA+0x32,
|
|
"digit one" + "fraction slash" + "digit two" (normally this would be a
|
|
loose mapping).
|
|
|
|
Finally, some Unicode characters may be silently consumed when mapping
|
|
to some other encodings. For example, when mapping from Unicode to the
|
|
MacOS Arabic character set, resolved direction is used to disambiguate
|
|
some mappings (this is discussed in the next section). Direction
|
|
override characters (Unicodes 0x202C-0x202E) may be used to control the
|
|
resolved direction to achieve proper results. Having fulfilled this
|
|
role, the direction override characters can then be discarded. They are
|
|
included among the Unicode characters that can be represented in the
|
|
MacOS Arabic set (they are represented by the direction inherent in
|
|
certain characters), but there is no specific output character that
|
|
corresponds to them.
|
|
|
|
The accompanying mapping tables for the MacOS Japanese character set and
|
|
the MacOS Arabic character set include one-to-many mappings.
|
|
|
|
6. Mappings that depend on attributes
|
|
-------------------------------------
|
|
|
|
Mappings from Unicode to other character sets may depend on attributes
|
|
such as resolved character direction, the state of symmetric swapping,
|
|
and whether the text should use vertical form codes if available (i.e.
|
|
whether the text is intended for vertical display on a system that
|
|
cannot automatically substitute vertical forms).
|
|
|
|
a) Resolved character direction
|
|
|
|
The MacOS Arabic character set was developed in 1986-1987. At that time
|
|
the bidirectional line layout algorithm used in the MacOS was fairly
|
|
simple; it used only a few direction classes (instead of the 13 or so
|
|
now used in the Unicode bidirectional algorithm). In order to permit
|
|
users to handle some tricky layout problems, certain punctuation and
|
|
symbol characters have duplicate code points, one with a left-right
|
|
direction attribute and the other with a right-left direction attribute.
|
|
|
|
For example, ampersand is encoded at 0x26 with a left-right attribute,
|
|
and at 0xA6 with a right-left attribute. However, there is only one
|
|
ampersand character in Unicode. We need to have a way to map both of the
|
|
MacOS Arabic ampersand characters to Unicode and back again without loss
|
|
of information. Mapping one of the MacOS Arabic ampersand characters to
|
|
a code in the Unicode corporate use zone is undesirable, since both of
|
|
the ampersand characters are likely to be used in text that is
|
|
interchanged with other systems.
|
|
|
|
The problem is solved with the use of direction override characters and
|
|
direction-dependent mappings. When mapping from the MacOS Arabic
|
|
character set to Unicode, such problem characters are surrounded with an
|
|
appropriate direction override:
|
|
MacOS Arabic 0x26 ampersand (left)
|
|
-> Unicode 0x202D (LRO) + 0x0026 (AMPERSAND) + 0x202C (PDF)
|
|
MacOS Arabic 0xA6 ampersand (right)
|
|
-> Unicode 0x202E (RLO) + 0x0026 (AMPERSAND) + 0x202C (PDF)
|
|
The mappings from Unicode to MacOS Arabic can be disambiguated by the
|
|
use of resolved direction:
|
|
Unicode 0x0026 -> MacOS Arabic 0x26 (if L) or 0xA6 (if R)
|
|
|
|
Direction overrides are also used for some other purposes in mapping
|
|
MacOS Arabic characters to Unicode. For example, the single MacOS Arabic
|
|
ellipsis character has direction class right-left, while the Unicode
|
|
HORIZONTAL ELLIPSIS character has direction class neutral. When mapping
|
|
the MacOS ellipsis to Unicode, it is surrounded with a direction
|
|
override to help preserve proper text layout. However, resolved
|
|
direction is not needed or used when mapping the Unicode HORIZONTAL
|
|
ELLIPSIS back to MacOS Arabic.
|
|
|
|
b) Symmetric swapping
|
|
|
|
In loose mappings from Unicode to the MacOS Arabic character set, the
|
|
state of symmetric swapping (which may be changed by the Unicode
|
|
characters 0x206A, 0x206B) affects the mapping of paired characters such
|
|
as punctuation and brackets. This does not affect the strict mappings
|
|
given in the accompanying tables.
|
|
|
|
c) Horizontal or vertical display
|
|
|
|
As noted above, the MacOS standard Japanese character set (for
|
|
historical reasons) includes separately-encoded vertical forms for some
|
|
punctuation and kana. When Unicode characters in the CJK punctuation and
|
|
kana ranges are mapped to MacOS Japanese characters and (1) those
|
|
characters are intended for vertical display, (2) they will be displayed
|
|
in an environment that does not provide automatic vertical form
|
|
substitution, and (3) loose mappings are being used, a vertical display
|
|
attribute can be used to map certain Unicode characters to the
|
|
corresponding vertical form codes in the MacOS Japanese character set.
|
|
|
|
Note that this capability is only used for loose mappings, and does not
|
|
affect the strict mappings given in the accompanying tables. Also note
|
|
that this does not affect mapping of the Unicode vertical presentation
|
|
forms (which always map to the MacOS Japanese vertical form codes if
|
|
those codes are available in the specified variant). Finally, note that
|
|
the QuickDraw(TM) GX display environment does provide automatic vertical
|
|
forms substitution with appropriate fonts.
|
|
|
|
7. MacOS character sets
|
|
---------------------------
|
|
|
|
The MacOS can support multiple character sets. In the current MacOS
|
|
architecture these character sets are distinguished primarily by script
|
|
code: font family IDs are grouped into ranges, and each range is
|
|
associated with a script code.
|
|
|
|
In some cases, there are several variant encodings that share a single
|
|
script code. Usually these are minor variants. To distinguish these
|
|
variants, additional information is required, such as font name or
|
|
system localization code.
|
|
|
|
The encodings described here (and in the accompanying tables) are the
|
|
encodings used in MacOS versions 7.1 and later. In some cases, certain
|
|
earlier system versions have used variants of these encodings.
|
|
|
|
In all MacOS encodings, character codes 0x00-0x7F are identical to ASCII
|
|
(except for MacOS Japanese, which changes reverse solidus to yen sign).
|
|
Fonts used as "system" fonts (for menus, dialogs, etc.) have four glyphs
|
|
at code points 0x11-0x14 for transient use by the Menu Manager. These
|
|
glyphs are not intended as characters for use in normal text, and the
|
|
associated code points are not generally interpreted as associated with
|
|
these glyphs. (However, a "system font variant" mapping table could
|
|
provide mappings for these).
|
|
|
|
Note that in general, character sets cannot be determined from font
|
|
layouts (they are not the same thing!). This is most noticeable with
|
|
Arabic, Hebrew, and Devanagari.
|
|
|
|
The following is a list of current MacOS character sets. The
|
|
accompanying tables provide mappings from many of these encodings to
|
|
Unicode.
|
|
|
|
a) MacOS encodings for script code 0, smRoman.
|
|
|
|
* Standard Roman - this is the default for script code 0 (when the
|
|
special cases listed below do not apply). It covers several western
|
|
European languages, and includes math operators and various symbols.
|
|
|
|
* Symbol - this is the encoding for the font named "Symbol". It includes
|
|
Greek letters, math operators, and miscellaneous symbols. The layout
|
|
of the Symbol character set is identical to the layout of the Adobe
|
|
Symbol encoding vector, with the addition of the Apple logo at 0xF0.
|
|
The Symbol character set encodes some glyph fragments (of arrows,
|
|
brackets, etc.) as well as both serif and sans-serif forms for
|
|
copyright, registered, and trade mark sign; round-trip mapping of
|
|
these characters requires the use of corporate characters.
|
|
|
|
* Dingbats - this is the encoding for the font named "Zapf Dingbats".
|
|
The layout of the Dingbats character set is identical to or a superset
|
|
of the layout of the Adobe Zapf Dingbats encoding vector.
|
|
|
|
* Turkish - this is the encoding if the script code is 0 and the system
|
|
region code (system localization) is 24, verTurkey. It has 7 code
|
|
point differences from standard Roman.
|
|
|
|
* Croatian - this is the encoding if the script code is 0 and the system
|
|
region code is 68, verCroatia (or 25, verYugoCroatian, only used in
|
|
older systems). It has 20 code point differences from standard Roman,
|
|
but only 10 differences in repertoire.
|
|
|
|
* Icelandic - this is the encoding if the script code is 0 and the
|
|
system region code is 21, verIceland. It has 6 code point differences
|
|
from standard Roman.
|
|
|
|
* Romanian - this is the encoding if the script code is 0 and the system
|
|
region code is 39, verRomania . It has 6 code point differences from
|
|
standard Roman.
|
|
|
|
* Standard Greek (monotonic) - this is the encoding if the script code
|
|
is 0 and the system region code is 20, verGreece. Although a script
|
|
code is defined for Greek, the Greek localized system does not use it
|
|
(the font family IDs are in the smRoman range). This encoding is based
|
|
on the ISO/IEC 8859-7 repertoire with additional Roman characters for
|
|
French and German, as well as additional symbols.
|
|
Greek system 4.1 used a different encoding that matched 8859-7 code
|
|
points for Greek letters. Greek system 6.0.7 also used a variant of
|
|
the standard encoding, but it was quickly replaced by Greek system
|
|
6.0.7.1 which used the standard encoding.
|
|
NOTE- The Greek Language Kit, when released, will use the Greek script
|
|
code (its Greek fonts will have family IDs in the smGreek range); see
|
|
notes under script code 6 below.
|
|
|
|
See also the Central European Roman encoding under script code 29
|
|
below.
|
|
|
|
b) MacOS encodings for script code 1, smJapanese.
|
|
|
|
* Standard Japanese - this is the default for script code 1. As
|
|
described above, it is based on a Shift-JIS implementation of JIS
|
|
X0208-1990 ("fullwidth") and JIS X0201-1976 ("halfwidth"), with 5
|
|
additional one-byte characters and one modified character, a set of
|
|
Apple extension characters which include many industry standard
|
|
extensions, and separate codes for vertical forms of some punctuation
|
|
and kana.
|
|
|
|
There are two variants of standard Japanese associated with specific
|
|
fonts: (1) For MaruGothic and HonMincho TrueType fonts in system
|
|
software release J-7.1 and Japanese Language Kit 1.0, and (2) for the
|
|
PostScript fonts Gothic BBB and Ryumin, which are used with the screen
|
|
fonts ChuGothic and SaiMincho. Although they are supported by the
|
|
MacOS Unicode converter, these variants are not documented here or
|
|
in the accompanying tables. The MacOS Unicode converter also
|
|
supports some artificial variants which are just subsets of the
|
|
standard Japanese encoding.
|
|
|
|
c) MacOS encodings for script code 2, smTradChinese.
|
|
|
|
* Standard Traditional Chinese - this is an extension of Big-5.
|
|
|
|
d) MacOS encodings for script code 3, smKorean.
|
|
|
|
* Standard Korean - this is a "shifted" implementation of KSC 5601-1987
|
|
(0xA0 is added to the row and to the column), with some additional
|
|
characters.
|
|
|
|
e) MacOS encodings for script code 4, smArabic.
|
|
|
|
* Standard Arabic - This is based on the ISO/IEC 8859-6 repertoire, with
|
|
additional Arabic letters for Persian and Urdu and with additional
|
|
Roman letters for European languages. It has the interesting feature
|
|
mentioned above that certain ASCII punctuation and symbol characters
|
|
are encoded twice, once for each direction. Digit character codes
|
|
0x30-0x39 have left-to-right directionality, and may be displayed with
|
|
either European digit forms or Arabic digit forms depending on
|
|
context. Digit codes 0xB0-0xB9 have right-left directionality and are
|
|
always displayed with Arabic digit forms; these are used for special
|
|
layout situations such as part numbers.
|
|
|
|
f) MacOS encodings for script code 5, smHebrew.
|
|
|
|
* Standard Hebrew - This is based on the ISO/IEC 8859-8 Hebrew letter
|
|
repertoire, but adds Hebrew points, some Hebrew ligatures, some
|
|
additional Roman letters for European languages, and some non-ASCII
|
|
punctuation. As with standard Arabic, certain ASCII punctuation and
|
|
symbol characters are encoded twice, once for each direction. This
|
|
is also true for the European digits.
|
|
|
|
There is one minor variant of standard Hebrew associated with
|
|
certain fonts, in which LEFT SINGLE QUOTATION MARK at 0xD4 is
|
|
replaced by FIGURE SPACE.
|
|
|
|
g) MacOS encodings for script code 6, smGreek.
|
|
|
|
This script code will refer to the encoding used with the Greek
|
|
Language Kit, when released. It will either be the standard Greek
|
|
encoding described above, or a variant that supports polytonic Greek.
|
|
|
|
h) MacOS encodings for script code 7, smCyrillic.
|
|
|
|
* Standard Cyrillic - this is the default for script code 7 (when the
|
|
special cases listed below do not apply). It is based on the ISO/IEC
|
|
8859-5 Cyrillic character repertoire.
|
|
|
|
* Ukrainian - this is the encoding if the script code is 7 and the
|
|
system region code (system localization) is 62, verUkraine. It has 2
|
|
code point differences from standard Cyrillic (it adds a case pair
|
|
for GHE WITH UPTURN).
|
|
|
|
* Bulgarian -
|
|
|
|
An additional Cyrillic variant has been defined to cover the Cyrillic
|
|
characters needed for the languages of the central Asian republics
|
|
(plus Russian): Uzbek, Kazakh, Kirghiz, Azerbaijani, Turkmen, Tajik).
|
|
|
|
i) MacOS encodings for script code 9, smDevanagari.
|
|
|
|
* Standard Devanagari - This is an extension of IS 13194:1991 (ISCII-91)
|
|
but is not yet fully defined. The Devanagari encoding used in system
|
|
software versions 6.x was different, and was based on ISCII-88.
|
|
|
|
j) MacOS encodings for script code 21, smThai.
|
|
|
|
* Standard Thai - This is based on TIS 620-2533, except that three of
|
|
the TIS 620-2533 characters are replaced with other characters. Some
|
|
undefined code points in TIS 620-2533 are used for additional
|
|
punctuation characters.
|
|
|
|
k) MacOS encodings for script code 25, smSimpChinese.
|
|
|
|
* Standard simplified Chinese - this is a "shifted" implementation of
|
|
GB 2312-1980 (0xA0 is added to the row and to the column), with some
|
|
additional characters.
|
|
|
|
l) MacOS encodings for script code 29, smEastEurRoman.
|
|
|
|
* Standard Central European - This is similar to standard Roman, but
|
|
with a different (and larger) set of European characters and with
|
|
fewer symbols. It covers several Slavic languages (Czech, Polish,
|
|
Slovak, Slovenian), Hungarian, and the languages of the Baltic
|
|
republics (Estonian, Latvian, Lithuanian).
|
|
|
|
|
|
FILE LIST:
|
|
The file names here have been changed from the original files
|
|
previously published on the Unicode.org FTP server. The mapping
|
|
is as follows:
|
|
|
|
Original Name "8.3" Name
|
|
------------------- ------------
|
|
MacOS-ReadMe.txt 0README.TXT
|
|
MacOS_Cyrillic.txt CYRILLIC.TXT
|
|
MacOS_Japanese.txt JAPAN.TXT
|
|
MacOS_Turkish.txt TURKISH.TXT
|
|
MacOS_Arabic.txt ARABIC.TXT
|
|
MacOS_Dingbats.txt DINGBAT.TXT
|
|
MacOS_Roman.txt ROMAN.TXT
|
|
MacOS_Ukrainian.txt UKRAINE.TXT
|
|
MacOS_CentralEuro.txt CNTEURO.TXT
|
|
MacOS_Greek.txt GREEK.TXT
|
|
MacOS_Romanian.txt ROMANIA.TXT
|
|
MacOS_CorpChars.txt CORPCHR.TXT
|
|
MacOS_Hebrew.txt HEBREW.TXT
|
|
MacOS_Symbol.txt SYMBOL.TXT
|
|
MacOS_Croatian.txt CROATIAN.TXT
|
|
MacOS_Icelandic.txt ICELAND.TXT
|
|
MacOS_Thai.txt THAI.TXT
|
|
|