1
0
mirror of synced 2026-01-13 15:37:38 +00:00
rmkaplan f5205e23c6
Update unicode mapping tables (#440)
* Updated all ISO8859 and Vendor Unicode mappings from current Unicode.org information

* John Cowan says that East Asia is obsolete

* Delete obsolete file
2021-08-28 09:01:27 -07:00
..

This may be obsolete:  (RMK 2021/08)

The following is a listing of the differences between IBM's mappings
of certain IBM code pages and Unicode, and the mappings for IBM code
pages that came from various sources and were earlier made available
at the Unicode ftp site.

The first set of differences refers to tables found originally in the
Microsoft directory at unicode.org.  The DBCS tables were in the
EastAsiaMaps directory. The findings are based on the content of the
files as of April 26, 1996. Newer versions may result in fewer
differences.

Note that IBM has many conversion tables to Unicode available in a
binary (machine readable) format on a CD-ROM included with the
document SC09-2190-00 "CDRA Reference and Registry".  These tables
are also available in a readable format from IBM.  For licensing
information, please contact an IBM representative to the Unicode
council.  Currently (March/96) these are:

  Dr. V.S. (Uma) Umamaheswaran   dorai@vnet.ibm.com
  John Gioia                     gioia@vnet.ibm.com
  Lisa Moore                     lisam@vnet.ibm.com

Or, you may send a note to nltc@vnet.ibm.com.

=======================================================================
Code Pages 00437 (US etc.)
           00860 (Portugal)
           00861 (Iceland)
           00862 (Israel)
           00863 (Canadian French)
           00865 (Nordic)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A
0xE6      U00B5 (MICRO SIGN)          U03BC (GREEK SMALL LETTER MU)

The "rotation" of the control characters at 0x1A, 0x1C and
0x7F is due to the frequent use of 0x1A as end-of-file by
PC file systems and applications.

=======================================================================
Code Page 00852

          Microsoft                         IBM
          ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A
0xAA      U00AC (NOT SIGN)             ----  (Unassigned)

Note that Microsoft code page 852 has a "not sign" in position 0xAA
whereas IBM 852 has 0xAA as unassigned.  In IBM mappings, unassigned
code points in a code page are generally mapped to SUB (U001A) in the
binary tables.  The readable tables do not contain entries for these
points.

=======================================================================
Code Page 00850 (Latin 1)
          00855 (Cyrillic)
          00866 (Cyrillic - Russian)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A


=======================================================================
Code Page 00857 (Turkish)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A


=======================================================================
Code Page 00864 (Arabic)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A
0x9B      U009B                        ----  (Unassigned)
0x9C      U009C                        ----  (Unassigned)
0x9F      U009F                       U200C  (ZERO WIDTH NON-JOINER)
0xA6      UF8BE                        ----  (Unassigned)
0xA7      UF8BF                        ----  (Unassigned)
0xD7      UFEC1 (ISOLATE ARABIC TAH)  UFEC3  (MEDIAL ARABIC TAH)
0xD8      UFEC5 (ISOLATE ARABIC DHAH) UFEC7  (MEDIAL ARABIC DHAH)
0xF1      U0651 (ARABIC SHADDAH)      UFE7C  (ARABIC SPACING SHADDAH)


=======================================================================
Code Page 00869 (Greek)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A
0x80      U0080                        ----  (Unassigned)
0x81      U0081                        ----  (Unassigned)
0x82      U0082                        ----  (Unassigned)
0x83      U0083                        ----  (Unassigned)
0x84      U0084                        ----  (Unassigned)
0x85      U0085                        ----  (Unassigned)
0x87      U0087                        ----  (Unassigned)
0x88      U00B7 (MIDDLE DOT)          U0387  (GREEK ANO TELEIA)
0x93      U0093                        ----  (Unassigned)
0x94      U0094                        ----  (Unassigned)
0xEF      U0384 (GREEK TONOS)         U00B4  (ACUTE ACCENT)


=======================================================================
Code Page 00874 (Thai)

      Microsoft                         IBM
      ---------                       -----
0x1A      U001A                       U001C
0x1C      U001C                       U007F
0x7F      U007F                       U001A
0x91      U2018                        ----  (Unassigned)
0x92      U2019                        ----  (Unassigned)
0x93      U201C                        ----  (Unassigned)
0x94      U201D                        ----  (Unassigned)
0x95      U2022                        ----  (Unassigned)
0x96      U2013                        ----  (Unassigned)
0x97      U2014                        ----  (Unassigned)
0xA0      U00A0                       U0E48  (THAI CHARACTER MAI EK)
0xDB       ----  (Unassigned)         U0E49  (THAI CHARACTER MAI THO)
0xDC       ----  (Unassigned)         U0E4A  (THAI CHARACTER MAI TRI)
0xDD       ----  (Unassigned)         U0E4B  (THAI CHARACTER MAI CHATTAW)
0xDE       ----  (Unassigned)         U0E4C  (THAI CHARACTER THANTHAKHAT)
0xFC       ----  (Unassigned)         U00A2  (CENT SIGN)
0xFD       ----  (Unassigned)         U00AC  (NOT SIGN)
0xFE       ----  (Unassigned)         U00A6  (BROKEN BAR)
0xFF       ----  (Unassigned)         U00A0  (NO-BREAK SPACE)


=======================================================================

Regarding the mapping table between Unicode and Asian each country's
National Standard, there are several differences between the tables in
this directory and the table developed by IBM.
The following list shows these differences.
=======================================================================
Japanese (JIS X0208)

            Unicode                       IBM
            -------                       -----
(1) 0x213D  U2015                         U2014

Note:
 (1) According to conversion table defined on JIS X0221 (ISO 10646-1
     JIS version), JIS:0x213D is mapped to U2014.
=======================================================================
S-Chinese (GB 2312)

            Unicode                       IBM
            -------                       -----
(1) 0x212C  U2225                         U2016
(2) 0x2327  UFF07                         U00B4

Note:
 (1) According to GB2312-80 standard, the character name of
     GB2312:0x212C is Double Vertical Line. And, U2016 is
     Double Vertical Line.
 (2) The character name of GB2312:0x2327 is Single Quotation
     Mark Right. Though no same character name exists in Unicode,
     glyph of U00B4 looks similar than UFF07.

=======================================================================

Please note that it is IBM policy to create a new code page identifier
when there are any changes (other than additions) to a code page.
This allows data or environment tagged with a specific identifier to
have a precise meaning even after many years, and allows migration of
environments (resources such as keyboard drivers, fonts etc..) to be
managed in an orderly manner.  This may result in a mismatch between
an IBM identifier and the same identifier used by another vendor if
that other vendor has modified the contents of the code page without
changing the identifier.

Current known cases of a code page not matching the IBM code page using
the same identifier include:

  Microsoft 932 (Japanese)
  Microsoft 936 (Simplified Chinese)
  Microsoft 949 (Korean)
  Microsoft 852 (See above)