1
0
mirror of synced 2026-01-12 00:42:56 +00:00
Interlisp.medley/sources/EXTERNALFORMAT.TXT
Larry Masinter 4efe2f93af
Merge (rebase) Cleanup-character-IO-interfaces with master (#356)
* Cleanup  of character IO interface

Committing this branch for further testing.  I know at least that the TTY output stream somehow is defaulting to :XCCS, which is wrong, but I haven't yet found the interface for that.

* Clean out \NSIN etc

No top-level calls to the NS specific functions, just to the generic \OUTCHAR etc.

Updated full.database

* MODERNIZE: added dragging for fixed-menu windows

They can be dragged by their title bars

* UNICODE:  Added Greek to the default set

Also made spelling of default-externalformats consistent with FILEIO

* FASLOAD: EOL conversion in FASL::READ-TEXT

EOL's printed as LF's will be read as EOL

* LLREAD:  Added meta as a CHARACTERSETNAME

meta,a maps to 1,a now.  But slowly propagating this to TEDIT, SEDIT, etc will make it easier to change the coding of meta characters, e.g. as part of a Unicode transition.

* APRINT FILEIO LLREAD: \OUTCHAR now a closed function

Removed the macro

* LLKEY: call CHARCODE.DECODE directory in \KEYACTION1

Minor cleanup, avoid typical user entry and APPLY*

* WHEELSCROLL: re-enable on AFTERMAKESYS/SYSOUT FORMS

Also sets up mappings in the \COMMANDKEYACTIONS, whatever that is

* ABASIC:  NILL and ZERO change from LAMBDA NOBIND to LAMBDA NIL

So that things like Masterscope don't break

* MASTERSCOPE:  Added WHEREIS as last-resort for CONTAINS

Looks at the WHEREIS database, if present, for FNS and FUNCTIONS if it has no other information.  . WHO CONTAINS ANY CALLING FOO works, but not the inverse:  . WHO DOES FUM CONTAIN.  We still need to figure out why the CONTAINS table isn't populated

* POSTSCRIPTSTREAM: use standard \OUTCHAR conventions

Now uses generic \OUTCHAR to get the proper function from the stream (or default)

* Recompile with right EXPORTS.ALL

Some of the macros weren't correct.

* Fix POSTSCRIPTSTREAM

Cleaner separation between external \OUTCHAR and internal BOUT

* POSTSCRIPTSTREAM gets its own external format

* Minor fix

* Compile-time warning about EXPORTS.ALL

* MODERNIZE:  Modern button fn has same args as the original

For Notecards  #343

* Fixed another glitch in the MODERNIZE  arglist thing

\TEDIT.BUTTONEVENTFN actually takes a second STREAM argument.  I don't see where it is ever called with that.  The modernize replacement binds that argument, but it isn't being passed to the original.

* FILEWATCH:  added missing record field

* Update FILEWATCH.LCOM

* Eliminating record/type name conflicts

Mostly just qualifying references, more work to get BIGBITMAP stuff out of ADISPLAY and to eliminate ambiguity of LINE record (now XXLINE in XXGEOM)

* Compile away open calls to \OUTCHAR, add loadups/full.database

Mostly new LCOMS where \OUTCHAR calls were compiled open

* Remove garbage library/XCCS

Old tools for reading wikipedia XCCS tables, sources/XCCS will deal with XCCS external format

* Next step:  Remove open input-character calls, factor XCCS to separate file

XCCS is the default, but can be swapped out (eventually) by setting a few variables, without recompiling everything

* Lots of residual cleanup for XCCS isolation

* Delete old file MACINTERFACE (migrated to MODERNIZE)

* Eliminate straggling NS calls:  LAFITE, READINTERPRESS

* Typo

* READINTERPRESS:  removed CHARSET

* MODERNIZE: Interface to control title-bar response (for Notecards)

* Many changes for external format name consistency

Very close to the end of this

* Put :FORMAT in file info, fix TEDIT plaintext hardcopy

I distributed :FORMAT :XCCS as the default marking, but somehow one of the variables seems to get revert during the loadup.  This is correct, as far as it goes.

* Getting the format in the file-info

This is all very twisty, different variables set in different places.  It now seems to do the right thing, at least for new files.  Marks them with :FORMAT :XCCS.

* Another fileinfo glitch

* CLIPBOARD -UNICODE:  Make UTF8 to UTF-8 to match standards

* MODERNIZE:  fix bug in MODERWINDOW

* External format as MAKEFILE option, LOAD applies the file's format

(MAKEFILE 'XX '((FORMAT :UTF-8)))
  will dump XX as a UTF-8 file.  LOAD will load it back to XCCS internal.

* Compilers respect DEFINE-FILE-INFO format

* MODERNIZE:  little glitch

* Delete old FILEIO.LCOM

* More edge cases of external format thru MAKEFILE, PRETTY, PRETTYFILEINDEX etc.

* FILEBROWSER:  Can SEE UTF-8 Lisp sourcefile

* INSPECT:  Better macro for inspecting readtables

* recompile changed files and do new loadup

Co-authored-by: rmkaplan <ron.kaplan@post.harvard.edu>
2021-07-29 17:07:23 -07:00

71 lines
3.7 KiB
Plaintext

New architecture for character input-output and alternative external formats
Ron Kaplan, May 2021
The Medley system was built with the Xerox Character Coding standard as the target for multi-byte input and output and for the internal mapping of character codes to glyphs.
This is now quite out of date, and our goal is to move to more modern conventions like Unicode and UTF-8.
The coding conventions are embodied in macros that test a stream to see if it is XCCS, and to do special open-coded processing (often with the help of locally bound variables for encoding information) if it is.
If it isn't XCCS, then the macros instead apply functions that are obtained from fields in the stream. This is optimized for the default XCCS set up because in that case a separate function call is avoided, the action itself is open coded.
The new architecture recognizes that there may be an advantage to specifying a system default for character processing that avoids function calls but that doesn't depend on support (binding of special variables as opposed to accessing stream fields on each call) to get that last measure of efficiency.
Thus, there are 4 generic macros corresponding to the 4 character IO operations:
\INCCODE
\OUTCHAR
\BACKCHAR
\PEEKCCODE
Each of these is defined to fetch a corresponding field from the stream (OUTCHARFN, INCCODEFN, PEEKCCODEFN, BACKCHARFN). If that field is NIL, then each of these passes to a corresonding default macro:
\DEFAULTINCCODE
\DEFAULTOUTCHAR
\DEFAULTBACKCHAR
\DEFAULTPEEKCCODE
These default macros can then be redefined to make a wholesale switch of the default encoding standard.
The macro \OUTCHAR, for example, is defined as
if the stream has an OUTCHARFN, apply it. Otherwise do the \DEFAULTOUTCHAR
and so on for each of the others.
For the current XCCS default, \DEFAULTOUTCHAR is defined to call \XCCSOUTCHARFN.
The corresponding stream fields can be set directly, but the preferred interface is to wrap up the 4 functions for a given format in an EXTERNALFORMAT datastructure. The function
(\EXTERNALFORMAT stream formatname)
applies the information in the format into the stream. A particular (non-default) format can be specified as an optional parameter when a stream is opened, and each file device can have its own default external format. Then there is also a variable that holds the name of the name of the system-wide default, currently :XCCS.
If the default external format is applied to a stream, the relevant function fields are set to NIL to kick off the default macro for that particular function, otherwise the function is copied from the external format to the stream.
An external format has the following fields:
NAME
INCCODEFN
PEEKCCODEFN
BACKCHARFN
OUTCHARFN
EOL
The function (\INSTALL.EXTERNALFORMAT format) registers the given format under its name, so it can be retrieved when the name is given to \EXTERNALFORMAT.
If EOL is not NIL, then it is an end-of-line convention that will override whatever a stream might have had by default. (The value of EOL is one of the constants LF.EOLC, CR.EOLC, CRLF.EOLC.)
The system now includes external formats for
:XCCS (the global default)
:THROOUGH (untransformed bytes)
It probably would make sense to also include a :KEYBOARD external format, to generalize that as well.
UNICODE defines external formats for UTF8 with or without character translation, and also UTF16 (big-end and little-end). When we finally make the swap, we would make :UTF8 be the default, redefine the macros, and recompile all the callers.
The Japanse external formats that used to be included in the basic system are now provided by a JAPANESE in the library.
Finally, there is another macro \INCHAR that applies \CHECKEOLC to the result of \INCCODE.