* Updated all ISO8859 and Vendor Unicode mappings from current Unicode.org information * John Cowan says that East Asia is obsolete * Delete obsolete file
200 lines
8.2 KiB
Plaintext
200 lines
8.2 KiB
Plaintext
This may be obsolete: (RMK 2021/08)
|
|
|
|
The following is a listing of the differences between IBM's mappings
|
|
of certain IBM code pages and Unicode, and the mappings for IBM code
|
|
pages that came from various sources and were earlier made available
|
|
at the Unicode ftp site.
|
|
|
|
The first set of differences refers to tables found originally in the
|
|
Microsoft directory at unicode.org. The DBCS tables were in the
|
|
EastAsiaMaps directory. The findings are based on the content of the
|
|
files as of April 26, 1996. Newer versions may result in fewer
|
|
differences.
|
|
|
|
Note that IBM has many conversion tables to Unicode available in a
|
|
binary (machine readable) format on a CD-ROM included with the
|
|
document SC09-2190-00 "CDRA Reference and Registry". These tables
|
|
are also available in a readable format from IBM. For licensing
|
|
information, please contact an IBM representative to the Unicode
|
|
council. Currently (March/96) these are:
|
|
|
|
Dr. V.S. (Uma) Umamaheswaran dorai@vnet.ibm.com
|
|
John Gioia gioia@vnet.ibm.com
|
|
Lisa Moore lisam@vnet.ibm.com
|
|
|
|
Or, you may send a note to nltc@vnet.ibm.com.
|
|
|
|
=======================================================================
|
|
Code Pages 00437 (US etc.)
|
|
00860 (Portugal)
|
|
00861 (Iceland)
|
|
00862 (Israel)
|
|
00863 (Canadian French)
|
|
00865 (Nordic)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
0xE6 U00B5 (MICRO SIGN) U03BC (GREEK SMALL LETTER MU)
|
|
|
|
The "rotation" of the control characters at 0x1A, 0x1C and
|
|
0x7F is due to the frequent use of 0x1A as end-of-file by
|
|
PC file systems and applications.
|
|
|
|
=======================================================================
|
|
Code Page 00852
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
0xAA U00AC (NOT SIGN) ---- (Unassigned)
|
|
|
|
Note that Microsoft code page 852 has a "not sign" in position 0xAA
|
|
whereas IBM 852 has 0xAA as unassigned. In IBM mappings, unassigned
|
|
code points in a code page are generally mapped to SUB (U001A) in the
|
|
binary tables. The readable tables do not contain entries for these
|
|
points.
|
|
|
|
=======================================================================
|
|
Code Page 00850 (Latin 1)
|
|
00855 (Cyrillic)
|
|
00866 (Cyrillic - Russian)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
|
|
|
|
=======================================================================
|
|
Code Page 00857 (Turkish)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
|
|
|
|
=======================================================================
|
|
Code Page 00864 (Arabic)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
0x9B U009B ---- (Unassigned)
|
|
0x9C U009C ---- (Unassigned)
|
|
0x9F U009F U200C (ZERO WIDTH NON-JOINER)
|
|
0xA6 UF8BE ---- (Unassigned)
|
|
0xA7 UF8BF ---- (Unassigned)
|
|
0xD7 UFEC1 (ISOLATE ARABIC TAH) UFEC3 (MEDIAL ARABIC TAH)
|
|
0xD8 UFEC5 (ISOLATE ARABIC DHAH) UFEC7 (MEDIAL ARABIC DHAH)
|
|
0xF1 U0651 (ARABIC SHADDAH) UFE7C (ARABIC SPACING SHADDAH)
|
|
|
|
|
|
=======================================================================
|
|
Code Page 00869 (Greek)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
0x80 U0080 ---- (Unassigned)
|
|
0x81 U0081 ---- (Unassigned)
|
|
0x82 U0082 ---- (Unassigned)
|
|
0x83 U0083 ---- (Unassigned)
|
|
0x84 U0084 ---- (Unassigned)
|
|
0x85 U0085 ---- (Unassigned)
|
|
0x87 U0087 ---- (Unassigned)
|
|
0x88 U00B7 (MIDDLE DOT) U0387 (GREEK ANO TELEIA)
|
|
0x93 U0093 ---- (Unassigned)
|
|
0x94 U0094 ---- (Unassigned)
|
|
0xEF U0384 (GREEK TONOS) U00B4 (ACUTE ACCENT)
|
|
|
|
|
|
=======================================================================
|
|
Code Page 00874 (Thai)
|
|
|
|
Microsoft IBM
|
|
--------- -----
|
|
0x1A U001A U001C
|
|
0x1C U001C U007F
|
|
0x7F U007F U001A
|
|
0x91 U2018 ---- (Unassigned)
|
|
0x92 U2019 ---- (Unassigned)
|
|
0x93 U201C ---- (Unassigned)
|
|
0x94 U201D ---- (Unassigned)
|
|
0x95 U2022 ---- (Unassigned)
|
|
0x96 U2013 ---- (Unassigned)
|
|
0x97 U2014 ---- (Unassigned)
|
|
0xA0 U00A0 U0E48 (THAI CHARACTER MAI EK)
|
|
0xDB ---- (Unassigned) U0E49 (THAI CHARACTER MAI THO)
|
|
0xDC ---- (Unassigned) U0E4A (THAI CHARACTER MAI TRI)
|
|
0xDD ---- (Unassigned) U0E4B (THAI CHARACTER MAI CHATTAW)
|
|
0xDE ---- (Unassigned) U0E4C (THAI CHARACTER THANTHAKHAT)
|
|
0xFC ---- (Unassigned) U00A2 (CENT SIGN)
|
|
0xFD ---- (Unassigned) U00AC (NOT SIGN)
|
|
0xFE ---- (Unassigned) U00A6 (BROKEN BAR)
|
|
0xFF ---- (Unassigned) U00A0 (NO-BREAK SPACE)
|
|
|
|
|
|
=======================================================================
|
|
|
|
Regarding the mapping table between Unicode and Asian each country's
|
|
National Standard, there are several differences between the tables in
|
|
this directory and the table developed by IBM.
|
|
The following list shows these differences.
|
|
=======================================================================
|
|
Japanese (JIS X0208)
|
|
|
|
Unicode IBM
|
|
------- -----
|
|
(1) 0x213D U2015 U2014
|
|
|
|
Note:
|
|
(1) According to conversion table defined on JIS X0221 (ISO 10646-1
|
|
JIS version), JIS:0x213D is mapped to U2014.
|
|
=======================================================================
|
|
S-Chinese (GB 2312)
|
|
|
|
Unicode IBM
|
|
------- -----
|
|
(1) 0x212C U2225 U2016
|
|
(2) 0x2327 UFF07 U00B4
|
|
|
|
Note:
|
|
(1) According to GB2312-80 standard, the character name of
|
|
GB2312:0x212C is Double Vertical Line. And, U2016 is
|
|
Double Vertical Line.
|
|
(2) The character name of GB2312:0x2327 is Single Quotation
|
|
Mark Right. Though no same character name exists in Unicode,
|
|
glyph of U00B4 looks similar than UFF07.
|
|
|
|
=======================================================================
|
|
|
|
Please note that it is IBM policy to create a new code page identifier
|
|
when there are any changes (other than additions) to a code page.
|
|
This allows data or environment tagged with a specific identifier to
|
|
have a precise meaning even after many years, and allows migration of
|
|
environments (resources such as keyboard drivers, fonts etc..) to be
|
|
managed in an orderly manner. This may result in a mismatch between
|
|
an IBM identifier and the same identifier used by another vendor if
|
|
that other vendor has modified the contents of the code page without
|
|
changing the identifier.
|
|
|
|
Current known cases of a code page not matching the IBM code page using
|
|
the same identifier include:
|
|
|
|
Microsoft 932 (Japanese)
|
|
Microsoft 936 (Simplified Chinese)
|
|
Microsoft 949 (Korean)
|
|
Microsoft 852 (See above)
|