1
0
mirror of synced 2026-04-26 04:08:08 +00:00
Files
Interlisp.medley/unicode/vendors/apple/arabic.txt
2020-11-15 19:22:14 -08:00

394 lines
18 KiB
Plaintext

#
# Name: MacOS_Arabic [to Unicode]
# Unicode versions: 1.1, 2.0
# Table version: 0.2 (from internal ufrm version <11>)
# Date: 15 April 1995
# Authors: Peter Edberg <edberg1@applelink.apple.com>
# Frank Tang
#
# Copyright (c) 1995 Apple Computer, Inc. All Rights reserved.
#
# Apple, the Apple logo, and Macintosh are trademarks of Apple
# Computer, Inc., registered in the United States and other countries.
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
# throughout this document, "Macintosh" can be used to refer to
# Macintosh computers and "Unicode" can be used to refer to the
# Unicode standard.
#
# Apple makes no warranty or representation, either express or
# implied, with respect to these tables, their quality, accuracy, or
# fitness for a particular purpose. In no event will Apple be liable
# for direct, indirect, special, incidental, or consequential damages
# resulting from any defect or inaccuracy in this document or the
# accompanying tables.
#
# These mapping tables and character lists are preliminary and
# subject to change. Updated tables will be available from the
# Unicode Inc. ftp site (unicode.org), the Apple Computer ftp site
# (ftp.info.apple.com), the Apple Computer World-Wide Web pages
# (http://www.info.apple.com), and possibly on diskette from APDA
# (Apple's mail-order distribution service for developers).
#
# Format:
# -------
#
# Three tab-separated columns;
# '#' begins a comment which continues to the end of the line.
# Column #1 is the MacOS Arabic code (in hex as 0xNN)
# Column #2 is the Unicode or Unicode sequence (in hex as 0xNNNN
# or 0xNNNN+0xNNNN+0xNNNN).
# Column #3 is the Unicode name (follows a comment sign, '#')
# Note: The abbreviations LRO, RLO, and PDF are used for
# LEFT-TO-RIGHT OVERRIDE, RIGHT-TO-LEFT OVERRIDE, and
# POP DIRECTIONAL FORMATTING, respectively.
#
# The entries are in MacOS Arabic code order.
#
# Note that in many cases, a single MacOS Arabic character maps
# to a sequence of Unicode characters: LRO or RLO plus some Unicode
# character + PDF. This is indicated by joining the Unicode
# characters with '+'. This happens when the direction class of
# the MacOS Arabic character is different than the direction class
# of the Unicode character (usually the MacOS Arabic character has
# a strong direction class and the corresponding Unicode character
# is neutral or has a wek direction class).
#
# Notes on MacOS Arabic:
# ----------------------
#
# 1. General
#
# The MacOS Arabic character set is used for the Arabic and Persian
# (Farsi) localizations.
#
# The MacOS Arabic character set is essentially a superset of ISO
# 8859-6. The 8859-6 code points that are interpreted differently
# in the MacOS Arabic set are as follows:
# 0xA0 is no-break space in 8859-6 and right-left space in MacOS
# Arabic; NBSP is 0x81 in MacOS Arabic.
# 0xA4 is currency sign in 8859-6 and right-left dollar sign in
# MacOS Arabic.
# 0xAD is soft hyphen in 8859-6 and right-left hyphen in MacOS
# Arabic.
# ISO 8859-6 specifies that codes 0x30-0x39 can be rendered either
# with European digit shapes or Arabic digit shapes. This is also
# true MacOS Arabic, which determines from context which digit shapes
# to use.
#
# The MacOS Arabic character set uses the C1 controls area and other
# code points which are undefined in ISO 8859-6 for additional
# graphic characters: additional Arabic letters for Persian and Urdu,
# some accented Roman letters for European languages (such as French),
# and duplicates of some of the punctuation, symbols, and digits in
# the ASCII block. The duplicate punctuation, symbol, and digit
# characters have right-left directionality, while the ASCII versions
# have left-right directionality. See the next section for more
# information on this.
#
# MacOS Arabic characters 0xEB-0xF2 are non-spacing/combining marks.
#
# 2. Directional characters and roundtrip fidelity
#
# The MacOS Arabic character set was developed in 1986-1987. At that
# time the bidirectional line line layout algorithm used in the MacOS
# Arabic system was fairly simple; it used only a few direction
# classes (instead of the 13 or so now used in the Unicode
# bidirectional algorithm). In order to permit users to handle some
# tricky layout problems, certain punctuation and symbol characters
# have duplicate code points, one with a left-right direction
# attribute and the other with a right-left direction attribute.
#
# For example, ampersand is encoded at 0x26 with a left-right
# attribute, and at 0xA6 with a right-left attribute. However, there
# is only one ampersand character in Unicode. We need to have a way
# to map both MacOS Arabic ampersand characters to Unicode and back
# again without loss of information. Mapping one of the MacOS Arabic
# ampersand characters to a code in the Unicode corporate use zone is
# undesirable, since both of the ampersand characters are likely to
# be used in text that is interchanged.
#
# The problem is solved with the use of direction override characters
# and direction-dependent mappings. When mapping from MacOS Arabic to
# Unicode, such problem characters are surrounded with an appropriate
# direction override:
# MacOS Arabic 0x26 ampersand (left) ->
# Unicode 0x202D (LRO) + 0x0026 (AMPERSAND) + 0x202C (PDF)
# MacOS Arabic 0xA6 ampersand (right) ->
# Unicode 0x202E (RLO) + 0x0026 (AMPERSAND) + 0x202C (PDF)
# When mapping from Unicode to MacOS Arabic, the MacOS Unicode
# converter uses the Unicode bidirectional algorithm to determine
# resolved directions. The mapping from Unicode to MacOS Arabic can
# then be disambiguated by the use of the resolved direction:
# Unicode 0x0026 -> MacOS Arabic 0x26 (if L) or 0xA6 (if R)
#
# However, note that this means we also need to discard the direction
# override characters when mapping from Unicode to MacOS Arabic.
#
# Even when direction overrides are not needed for roundtrip
# fidelity, they are sometimes used when mapping MacOS Arabic
# characters to Unicode in order to achieve similar text layout with
# the resulting Unicode text. For example, the single MacOS Arabic
# ellipsis character has direction class right-left,and there is no
# left-right version. However, the Unicode HORIZONTAL ELLIPSIS
# character has direction class neutral (which means it may end up
# with a resolved direction of left-right if surrounded by left-right
# characters). When mapping the MacOS Arabic ellipsis to Unicode, it
# is surrounded with a direction override to help preserve proper
# text layout. The resolved direction is not needed or used when
# mapping the Unicode HORIZONTAL ELLIPSIS back to MacOS Arabic.
#
# MacOS Arabic also has duplicate digit codes at 0xB0-0xB9. These
# have right-left direction and are always displayed with Arabic
# digit glyphs (unlike the 0x30-0x39 digits, which have left-right
# direction). The MacOS Arabic 0xB0-0xB9 digits are mapped to the
# Unicode Arabic digits U+0660-U+0669 and surrounded with direction
# overrides, since the Unicode Arabic digits have a weak left-right
# direction.
#
# 3. Problematic character assignments
#
# In the Cairo font, the characters at 0x2A and 0xAA are rendered as
# an asterisk (which normally has 6 points) and the character at 0xC0
# is rendered as something that looks like a large 8-pointed asterisk.
# This handling of 0x2A and 0xAA is consistent with (1) the general
# principle that in MacOS character sets, the ASCII part should be
# identical to ASCII (0x2A is asterisk in ASCII), and (2) in MacOS
# Arabic, the right-left duplicates have codes that are equal to the
# ASCII code of the left-right version plus 0x80. However, in all of
# the other MacOS Arabic fonts, 0x2A and 0xAA are rendered as
# multiply sign (U+00D7), and 0xC0 is rendered as asterisk (with 6
# points). Also note that Unicode has a character ARABIC FIVE POINTED
# STAR (U+066D), which is similar to an asterisk but has five points.
#
# For now the strict mappings treat 0x2A and 0xAA as asterisk; the
# loose mappings also map U+00D7 to 0xAA; and 0xC0 is treated as
# ARABIC FIVE POINTED STAR (until we find a better mapping).
#
##################
0x20 0x202D+0x0020+0x202C # LRO + SPACE + PDF
0x21 0x202D+0x0021+0x202C # LRO + EXCLAMATION MARK + PDF
0x22 0x202D+0x0022+0x202C # LRO + QUOTATION MARK + PDF
0x23 0x202D+0x0023+0x202C # LRO + NUMBER SIGN + PDF
0x24 0x202D+0x0024+0x202C # LRO + DOLLAR SIGN + PDF
0x25 0x0025 # PERCENT SIGN
0x26 0x202D+0x0026+0x202C # LRO + AMPERSAND + PDF
0x27 0x202D+0x0027+0x202C # LRO + APOSTROPHE + PDF
0x28 0x202D+0x0028+0x202C # LRO + LEFT PARENTHESIS + PDF
0x29 0x202D+0x0029+0x202C # LRO + RIGHT PARENTHESIS + PDF
0x2A 0x202D+0x002A+0x202C # LRO + ASTERISK + PDF
0x2B 0x202D+0x002B+0x202C # LRO + PLUS SIGN + PDF
0x2C 0x002C # COMMA
0x2D 0x202D+0x002D+0x202C # LRO + HYPHEN-MINUS + PDF
0x2E 0x202D+0x002E+0x202C # LRO + FULL STOP + PDF
0x2F 0x202D+0x002F+0x202C # LRO + SOLIDUS + PDF
0x30 0x0030 # DIGIT ZERO
0x31 0x0031 # DIGIT ONE
0x32 0x0032 # DIGIT TWO
0x33 0x0033 # DIGIT THREE
0x34 0x0034 # DIGIT FOUR
0x35 0x0035 # DIGIT FIVE
0x36 0x0036 # DIGIT SIX
0x37 0x0037 # DIGIT SEVEN
0x38 0x0038 # DIGIT EIGHT
0x39 0x0039 # DIGIT NINE
0x3A 0x202D+0x003A+0x202C # LRO + COLON + PDF
0x3B 0x003B # SEMICOLON
0x3C 0x202D+0x003C+0x202C # LRO + LESS-THAN SIGN + PDF
0x3D 0x202D+0x003D+0x202C # LRO + EQUALS SIGN + PDF
0x3E 0x202D+0x003E+0x202C # LRO + GREATER-THAN SIGN + PDF
0x3F 0x003F # QUESTION MARK
0x40 0x0040 # COMMERCIAL AT
0x41 0x0041 # LATIN CAPITAL LETTER A
0x42 0x0042 # LATIN CAPITAL LETTER B
0x43 0x0043 # LATIN CAPITAL LETTER C
0x44 0x0044 # LATIN CAPITAL LETTER D
0x45 0x0045 # LATIN CAPITAL LETTER E
0x46 0x0046 # LATIN CAPITAL LETTER F
0x47 0x0047 # LATIN CAPITAL LETTER G
0x48 0x0048 # LATIN CAPITAL LETTER H
0x49 0x0049 # LATIN CAPITAL LETTER I
0x4A 0x004A # LATIN CAPITAL LETTER J
0x4B 0x004B # LATIN CAPITAL LETTER K
0x4C 0x004C # LATIN CAPITAL LETTER L
0x4D 0x004D # LATIN CAPITAL LETTER M
0x4E 0x004E # LATIN CAPITAL LETTER N
0x4F 0x004F # LATIN CAPITAL LETTER O
0x50 0x0050 # LATIN CAPITAL LETTER P
0x51 0x0051 # LATIN CAPITAL LETTER Q
0x52 0x0052 # LATIN CAPITAL LETTER R
0x53 0x0053 # LATIN CAPITAL LETTER S
0x54 0x0054 # LATIN CAPITAL LETTER T
0x55 0x0055 # LATIN CAPITAL LETTER U
0x56 0x0056 # LATIN CAPITAL LETTER V
0x57 0x0057 # LATIN CAPITAL LETTER W
0x58 0x0058 # LATIN CAPITAL LETTER X
0x59 0x0059 # LATIN CAPITAL LETTER Y
0x5A 0x005A # LATIN CAPITAL LETTER Z
0x5B 0x202D+0x005B+0x202C # LRO + LEFT SQUARE BRACKET + PDF
0x5C 0x202D+0x005C+0x202C # LRO + REVERSE SOLIDUS + PDF
0x5D 0x202D+0x005D+0x202C # LRO + RIGHT SQUARE BRACKET + PDF
0x5E 0x202D+0x005E+0x202C # LRO + CIRCUMFLEX ACCENT + PDF
0x5F 0x202D+0x005F+0x202C # LRO + LOW LINE + PDF
0x60 0x0060 # GRAVE ACCENT
0x61 0x0061 # LATIN SMALL LETTER A
0x62 0x0062 # LATIN SMALL LETTER B
0x63 0x0063 # LATIN SMALL LETTER C
0x64 0x0064 # LATIN SMALL LETTER D
0x65 0x0065 # LATIN SMALL LETTER E
0x66 0x0066 # LATIN SMALL LETTER F
0x67 0x0067 # LATIN SMALL LETTER G
0x68 0x0068 # LATIN SMALL LETTER H
0x69 0x0069 # LATIN SMALL LETTER I
0x6A 0x006A # LATIN SMALL LETTER J
0x6B 0x006B # LATIN SMALL LETTER K
0x6C 0x006C # LATIN SMALL LETTER L
0x6D 0x006D # LATIN SMALL LETTER M
0x6E 0x006E # LATIN SMALL LETTER N
0x6F 0x006F # LATIN SMALL LETTER O
0x70 0x0070 # LATIN SMALL LETTER P
0x71 0x0071 # LATIN SMALL LETTER Q
0x72 0x0072 # LATIN SMALL LETTER R
0x73 0x0073 # LATIN SMALL LETTER S
0x74 0x0074 # LATIN SMALL LETTER T
0x75 0x0075 # LATIN SMALL LETTER U
0x76 0x0076 # LATIN SMALL LETTER V
0x77 0x0077 # LATIN SMALL LETTER W
0x78 0x0078 # LATIN SMALL LETTER X
0x79 0x0079 # LATIN SMALL LETTER Y
0x7A 0x007A # LATIN SMALL LETTER Z
0x7B 0x202D+0x007B+0x202C # LRO + LEFT CURLY BRACKET + PDF
0x7C 0x202D+0x007C+0x202C # LRO + VERTICAL LINE + PDF
0x7D 0x202D+0x007D+0x202C # LRO + RIGHT CURLY BRACKET + PDF
0x7E 0x007E # TILDE
#
0x80 0x00C4 # LATIN CAPITAL LETTER A WITH DIAERESIS
0x81 0x202E+0x00A0+0x202C # RLO + NO-BREAK SPACE + PDF
0x82 0x00C7 # LATIN CAPITAL LETTER C WITH CEDILLA
0x83 0x00C9 # LATIN CAPITAL LETTER E WITH ACUTE
0x84 0x00D1 # LATIN CAPITAL LETTER N WITH TILDE
0x85 0x00D6 # LATIN CAPITAL LETTER O WITH DIAERESIS
0x86 0x00DC # LATIN CAPITAL LETTER U WITH DIAERESIS
0x87 0x00E1 # LATIN SMALL LETTER A WITH ACUTE
0x88 0x00E0 # LATIN SMALL LETTER A WITH GRAVE
0x89 0x00E2 # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x8A 0x00E4 # LATIN SMALL LETTER A WITH DIAERESIS
0x8B 0x06BA # ARABIC LETTER NOON GHUNNA
0x8C 0x202E+0x00AB+0x202C # RLO + LEFT-POINTING DOUBLE ANGLE QUOTATION MARK + PDF
0x8D 0x00E7 # LATIN SMALL LETTER C WITH CEDILLA
0x8E 0x00E9 # LATIN SMALL LETTER E WITH ACUTE
0x8F 0x00E8 # LATIN SMALL LETTER E WITH GRAVE
0x90 0x00EA # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x91 0x00EB # LATIN SMALL LETTER E WITH DIAERESIS
0x92 0x00ED # LATIN SMALL LETTER I WITH ACUTE
0x93 0x202E+0x2026+0x202C # RLO + HORIZONTAL ELLIPSIS + PDF
0x94 0x00EE # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x95 0x00EF # LATIN SMALL LETTER I WITH DIAERESIS
0x96 0x00F1 # LATIN SMALL LETTER N WITH TILDE
0x97 0x00F3 # LATIN SMALL LETTER O WITH ACUTE
0x98 0x202E+0x00BB+0x202C # RLO + RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK + PDF
0x99 0x00F4 # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x9A 0x00F6 # LATIN SMALL LETTER O WITH DIAERESIS
0x9B 0x202E+0x00F7+0x202C # RLO + DIVISION SIGN + PDF
0x9C 0x00FA # LATIN SMALL LETTER U WITH ACUTE
0x9D 0x00F9 # LATIN SMALL LETTER U WITH GRAVE
0x9E 0x00FB # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x9F 0x00FC # LATIN SMALL LETTER U WITH DIAERESIS
0xA0 0x202E+0x0020+0x202C # RLO + SPACE + PDF
0xA1 0x202E+0x0021+0x202C # RLO + EXCLAMATION MARK + PDF
0xA2 0x202E+0x0022+0x202C # RLO + QUOTATION MARK + PDF
0xA3 0x202E+0x0023+0x202C # RLO + NUMBER SIGN + PDF
0xA4 0x202E+0x0024+0x202C # RLO + DOLLAR SIGN + PDF
0xA5 0x066A # ARABIC PERCENT SIGN
0xA6 0x202E+0x0026+0x202C # RLO + AMPERSAND + PDF
0xA7 0x202E+0x0027+0x202C # RLO + APOSTROPHE + PDF
0xA8 0x202E+0x0028+0x202C # RLO + LEFT PARENTHESIS + PDF
0xA9 0x202E+0x0029+0x202C # RLO + RIGHT PARENTHESIS + PDF
0xAA 0x202E+0x002A+0x202C # RLO + ASTERISK + PDF
0xAB 0x202E+0x002B+0x202C # RLO + PLUS SIGN + PDF
0xAC 0x060C # ARABIC COMMA
0xAD 0x202E+0x002D+0x202C # RLO + HYPHEN-MINUS + PDF
0xAE 0x202E+0x002E+0x202C # RLO + FULL STOP + PDF
0xAF 0x202E+0x002F+0x202C # RLO + SOLIDUS + PDF
0xB0 0x202E+0x0660+0x202C # RLO + ARABIC-INDIC DIGIT ZERO + PDF
0xB1 0x202E+0x0661+0x202C # RLO + ARABIC-INDIC DIGIT ONE + PDF
0xB2 0x202E+0x0662+0x202C # RLO + ARABIC-INDIC DIGIT TWO + PDF
0xB3 0x202E+0x0663+0x202C # RLO + ARABIC-INDIC DIGIT THREE + PDF
0xB4 0x202E+0x0664+0x202C # RLO + ARABIC-INDIC DIGIT FOUR + PDF
0xB5 0x202E+0x0665+0x202C # RLO + ARABIC-INDIC DIGIT FIVE + PDF
0xB6 0x202E+0x0666+0x202C # RLO + ARABIC-INDIC DIGIT SIX + PDF
0xB7 0x202E+0x0667+0x202C # RLO + ARABIC-INDIC DIGIT SEVEN + PDF
0xB8 0x202E+0x0668+0x202C # RLO + ARABIC-INDIC DIGIT EIGHT + PDF
0xB9 0x202E+0x0669+0x202C # RLO + ARABIC-INDIC DIGIT NINE + PDF
0xBA 0x202E+0x003A+0x202C # RLO + COLON + PDF
0xBB 0x061B # ARABIC SEMICOLON
0xBC 0x202E+0x003C+0x202C # RLO + LESS-THAN SIGN + PDF
0xBD 0x202E+0x003D+0x202C # RLO + EQUALS SIGN + PDF
0xBE 0x202E+0x003E+0x202C # RLO + GREATER-THAN SIGN + PDF
0xBF 0x061F # ARABIC QUESTION MARK
0xC0 0x066D # ARABIC FIVE POINTED STAR
0xC1 0x0621 # ARABIC LETTER HAMZA
0xC2 0x0622 # ARABIC LETTER ALEF WITH MADDA ABOVE
0xC3 0x0623 # ARABIC LETTER ALEF WITH HAMZA ABOVE
0xC4 0x0624 # ARABIC LETTER WAW WITH HAMZA ABOVE
0xC5 0x0625 # ARABIC LETTER ALEF WITH HAMZA BELOW
0xC6 0x0626 # ARABIC LETTER YEH WITH HAMZA ABOVE
0xC7 0x0627 # ARABIC LETTER ALEF
0xC8 0x0628 # ARABIC LETTER BEH
0xC9 0x0629 # ARABIC LETTER TEH MARBUTA
0xCA 0x062A # ARABIC LETTER TEH
0xCB 0x062B # ARABIC LETTER THEH
0xCC 0x062C # ARABIC LETTER JEEM
0xCD 0x062D # ARABIC LETTER HAH
0xCE 0x062E # ARABIC LETTER KHAH
0xCF 0x062F # ARABIC LETTER DAL
0xD0 0x0630 # ARABIC LETTER THAL
0xD1 0x0631 # ARABIC LETTER REH
0xD2 0x0632 # ARABIC LETTER ZAIN
0xD3 0x0633 # ARABIC LETTER SEEN
0xD4 0x0634 # ARABIC LETTER SHEEN
0xD5 0x0635 # ARABIC LETTER SAD
0xD6 0x0636 # ARABIC LETTER DAD
0xD7 0x0637 # ARABIC LETTER TAH
0xD8 0x0638 # ARABIC LETTER ZAH
0xD9 0x0639 # ARABIC LETTER AIN
0xDA 0x063A # ARABIC LETTER GHAIN
0xDB 0x202E+0x005B+0x202C # RLO + LEFT SQUARE BRACKET + PDF
0xDC 0x202E+0x005C+0x202C # RLO + REVERSE SOLIDUS + PDF
0xDD 0x202E+0x005D+0x202C # RLO + RIGHT SQUARE BRACKET + PDF
0xDE 0x202E+0x005E+0x202C # RLO + CIRCUMFLEX ACCENT + PDF
0xDF 0x202E+0x005F+0x202C # RLO + LOW LINE + PDF
0xE0 0x0640 # ARABIC TATWEEL
0xE1 0x0641 # ARABIC LETTER FEH
0xE2 0x0642 # ARABIC LETTER QAF
0xE3 0x0643 # ARABIC LETTER KAF
0xE4 0x0644 # ARABIC LETTER LAM
0xE5 0x0645 # ARABIC LETTER MEEM
0xE6 0x0646 # ARABIC LETTER NOON
0xE7 0x0647 # ARABIC LETTER HEH
0xE8 0x0648 # ARABIC LETTER WAW
0xE9 0x0649 # ARABIC LETTER ALEF MAKSURA
0xEA 0x064A # ARABIC LETTER YEH
0xEB 0x064B # ARABIC FATHATAN
0xEC 0x064C # ARABIC DAMMATAN
0xED 0x064D # ARABIC KASRATAN
0xEE 0x064E # ARABIC FATHA
0xEF 0x064F # ARABIC DAMMA
0xF0 0x0650 # ARABIC KASRA
0xF1 0x0651 # ARABIC SHADDA
0xF2 0x0652 # ARABIC SUKUN
0xF3 0x067E # ARABIC LETTER PEH
0xF4 0x0679 # ARABIC LETTER TTEH
0xF5 0x0686 # ARABIC LETTER TCHEH
0xF6 0x06D5 # ARABIC LETTER AE
0xF7 0x06A4 # ARABIC LETTER VEH
0xF8 0x06AF # ARABIC LETTER GAF
0xF9 0x0688 # ARABIC LETTER DDAL
0xFA 0x0691 # ARABIC LETTER RREH
0xFB 0x202E+0x007B+0x202C # RLO + LEFT CURLY BRACKET + PDF
0xFC 0x202E+0x007C+0x202C # RLO + VERTICAL LINE + PDF
0xFD 0x202E+0x007D+0x202C # RLO + RIGHT CURLY BRACKET + PDF
0xFE 0x0698 # ARABIC LETTER JEH
0xFF 0x06D2 # ARABIC LETTER YEH BARREE