* A revision to the font, Unicode, Tedit, and other modules to implement the MCCS character coding as the standard for internal text strings. MCCS is a variant of XCCS with arrows switched with circumflex/underscore and $ switched with currency, and allows for additional code assignments over time. :MCCS replaces :XCCS as the default external format, especially for source files. The file XCCS is removed in favor of the file MCCS, which includes the XCCS external format for backward compatibility. * This includes a single Medley-font formatted font file for each of the family/size/face display fonts. The glyph assignments correspond to the MCCS character encoding (except for fonts with idiosyncratic encodings--Hippo, Symbol). All charsets from legacy font files are included in each file, and the character sets and glyphs in each file have also been extended by offline coercion from related families (e.g. Glyphs not in legacy Terminal are taken from legacy Modern). There should be fewer black boxes, and character-display shouldn't change when you switch fonts. * The Unicode mapping tables have been redefined to set up correspondences between Unicode and MCCS, not XCCS. Separate XCCS to/from MCCS mapping functions are provided in the file MCCS; they are no longer included in INTERPRESS. * TEDIT converts characters in legacy fonts to their new MCCS codes as it reads formatted files, marks the file as MCCS compatible and preserves the new codes on writing. * Default keyboard assignments produce the MCCS uparrow and leftarrow for shift-6 and shift-hyphen, use Function-6 for circumflex and Function-10 for underscore. See documentation in FONTCODECHANGES.TEDIT MCCS.TEDIT MEDLEYFONTFORMAT.TEDIT in docs/internal, and library/UNICODE.TEDIT.
134 lines
22 KiB
Plaintext
134 lines
22 KiB
Plaintext
Medley UNICODE
|
||
2
|
||
|
||
4
|
||
|
||
1
|
||
|
||
UNICODE
|
||
1
|
||
|
||
4
|
||
|
||
By Ron Kaplan
|
||
This document was last edited in September 2025.
|
||
|
||
The UNICODE library package defines external file formats that enable Medley to read and write files where 16 bit character codes are represented as UTF-8 byte sequences or UTF-16 byte-pairs. It also provides for character codes to be converted (on reading) from Unicode codes to equivalent codes in the Medley-internal character code standard (MCCS) and (on writing) from MCCS codes to equivalent Unicode codes. MCSS is a minor variation of the Xerox Character Code Standard (XCCS), and the Medley-to-Unicode correspondences are derived for the most part from externally provided Xerox-to-Unicode mapping tables.
|
||
Unicode external formats
|
||
Seven external formats are defined when the package is loaded:
|
||
:UTF-8 codes are represented as UTF-8 byte sequences and MCCS/Unicode character conversion takes place.
|
||
:UTF-16BE codes are represented as 2-byte pairs, with the high order byte appearing first in the file, and characters are converted.
|
||
:UTF-16LE codes are represented as 2-byte pairs, with the low order byte appearing first in the file, and characters are converted.
|
||
:UTF-8-SLUG A variant of :UTF-8 whose OUTCHARFN produces the Unicode slug code FFFD for MCCS codes whose mappings are not defined in the MCCS-to-Unicode tables.
|
||
The three other external formats translate byte sequences into codes, but do not translate the codes. These allow Medley to see and process characters in their native encoding.
|
||
:UTF-8-RAW codes are represented as UTF-8 byte sequences, but character conversion does not take place.
|
||
:UTF-16BE-RAW codes are represented as big-ending 2-byte pairs but there is no conversion.
|
||
:UTF-16LE-RAW codes are represented as little-ending 2-byte pairs but there is no conversion.
|
||
These formats all define the end-of-line convention (mostly for writing) for the external files according to the variable EXTERNALEOL (LF, CR, CRLF), initially set to LF.
|
||
The external format can be specified as a parameter when a stream is opened:
|
||
(OPENSTREAM 'foo.txt 'INPUT 'OLD '((EXTERNALFORMAT :UTF-8)))
|
||
(CL:OPEN 'foo.txt :DIRECTION :INPUT :EXTERNAL-FORMAT :UTF-8)
|
||
The opening parameters may be overridden if READBOM is invoked by the calling function (e.g. Tedit) and it detects a byte-order-mark at the beginning of the file:
|
||
(READBOM STREAM) [Function]<5D><> |