117 lines
19 KiB
Plaintext
117 lines
19 KiB
Plaintext
Medley UNICODE
|
||
2
|
||
|
||
4
|
||
|
||
1
|
||
|
||
UNICODE
|
||
1
|
||
|
||
4
|
||
|
||
By Ron Kaplan
|
||
This document was last edited in March 2024.
|
||
|
||
The UNICODE library package defines external file formats that enable Medley to read and write files where 16 bit character codes are represented as UTF-8 byte sequences or UTF-16 byte-pairs. It also provides for character codes to be converted (on reading) from Unicode codes to equivalent codes in the Medley-internal Xerox Character Code Standard (XCCS) and (on writing) from XCCS codes to equivalent Unicode codes.
|
||
Unicode external formats
|
||
Four external formats are defined when the package is loaded:
|
||
:UTF-8 codes are represented as UTF-8 byte sequences and XCCS/Unicode character conversion takes place.
|
||
:UTF-16BE codes are represented as 2-byte pairs, with the high order byte appearing first in the file, and characters are converted.
|
||
:UTF-16LE codes are represented as 2-byte pairs, with the low order byte appearing first in the file, and characters are converted.
|
||
The two other external formats translate byte sequences into codes, but do not translate the codes. These allow Medley to see and process characters in their native encoding.
|
||
:UTF-8-RAW codes are represented as UTF-8 byte sequences, but character conversion does not take place.
|
||
:UTF-16BE-RAW codes are represented as big-ending 2-byte pairs but there is no conversion.
|
||
:UTF-16LE-RAW codes are represented as little-ending 2-byte pairs but there is no conversion.
|
||
These formats all define the end-of-line convention (mostly for writing) for the external files according to the variable EXTERNALEOL (LF, CR, CRLF), initially set to LF.
|
||
The external format can be specified as a parameter when a stream is opened:
|
||
(OPENSTREAM 'foo.txt 'INPUT 'OLD '((EXTERNALFORMAT :UTF-8)))
|
||
(CL:OPEN 'foo.txt :DIRECTION :INPUT :EXTERNAL-FORMAT :UTF-8)
|
||
The opening parameters may be overridden if READBOM is invoked by the calling function (e.g. Tedit) and it detects a byte-order-mark at the beginning of the file:
|
||
(READBOM STREAM) [Function]ÿÿ |