* TEXTOFD: Property OBJECTBYTE returned instead of image objects This allows COMPARETEXT to work on TEDIT files * ATBL: Default reader environment uses *DEFAULT-EXTERNALFORMAT* instead of :XCCS constant * CMLEXEC: Fix FILETYPE property It had CL:COMPILE-FILE, but the directory had LCOMs. Changed to :FAKE-COMPILE-FILE. * FILEIO: single place for EOL specification Now only in SETFILEINFO, not separately in \DO.PARAMS.AT.OPEN * WINDOWOBJ: COPYINSERT now uniformly allows lists of objects It was incomplete. * COMPARETEXT: Now works for TEDIT files * EXAMINEDEFS: side-by-side attached SEDIT windows for comparing alternative definitions * OBJECTWINDOW: container for arbitrary image objects * ATBL: fixed typo * MODERNIZEP: pass shape and move to main window if PASSTOMAINCOMS * EXAMINEDEFS: Remove EXAMINEDEFS-REGION Replaced by equivalent functionality in new package REGIONMANAGER * TEDIT: adjustments to give caller control of window region * Revert "TEDIT: adjustments to give caller control of window region" This reverts commit aec12b41f0877d4d8b0864bdabc7cc412a313bc9. * Revert "EXAMINEDEFS: Remove EXAMINEDEFS-REGION" This reverts commit 0c670bbc564499f72c17bbfbc0eb24a7da4059b4. * TEDIT, TEDITWINDOW: Adjustments for propagating (typed) regions * EXAMINEDEFS: added EXAMINEFILES for looking viewing files side-by-side Fix titling glitch, add EXAMINEFILES * OBJECTWINDOW: minor cleanup * REGIONMANAGER: new package for managing typed regions, relative regions, and constellation regions * TEDIT-PF-SEE: commands for scrollable PF and SEE alternatives * COREIO: Fixed bug in \CORE.SETFILEINFO * COMPAREDIRECTORIES: Added CDBROWSER and associated reworking * COMPARESOURCES: Added CSBROWSER and associated reworking * COMPARETEXT: Reworked for TEDIT files Also for better window management
17 lines
6.3 KiB
Plaintext
17 lines
6.3 KiB
Plaintext
en·vÅos COMPARETEXT
|
||
2
|
||
|
||
4
|
||
|
||
1
|
||
|
||
COMPARETEXT
|
||
1
|
||
|
||
4
|
||
|
||
By Mike Sannella. Tested in Medley by Larry Masinter, updated by Ron Kaplan 12/2021.
|
||
|
||
Uses TEDIT.LCOM, GRAPHER.LCOM
|
||
INTRODUCTION
|
||
COMPARETEXT is a rather non-standard text file comparison program which tries to address two problems: (1) the problem of detecting certain types of changes, such as detecting when a paragraph is moved to a different part of a document; and (2) the problem of showing the user what changes have been made in a document.
|
||
The text comparison algorithm is an adaptation of the one described in the article "A Technique for Isolating Differences Between Files" by Paul Heckel, in CACM, V21, #4, April 1978. The main idea is to break each of the two text files into "chunks" (words, lines, paragraphs, ...), hash each chunk into a hash value, and match up chunks with the same hash value in the two files. This method detects switching two chunks, or moving a chunk anywhere else in the document.
|
||
COMPARING TEXT FILES
|
||
Two text files can be compared with the following function:
|
||
(COMPARETEXT FILE1 FILE2 HASH.TYPE REGION FILELABELS) [Function]
|
||
FILE1 and FILE2 are the names of the two files to compare. The order is not important, except that in the resulting graph the FILE1 information will appear on the left, and the FILE2 info on the right.
|
||
HASH.TYPE determines how "chunks" of text are defined; how fine-grained the comparison will be. This can be PARA to hash by paragraphs (delimited by two consecutive CRs), LINE to hash by lines (delimited by one CR), or WORD to hash words (delimited by any white space). HASH.TYPE=NIL defaults to PARA.
|
||
REGION is the region on the display screen used for the file comparison graph. If REGION=NIL, the system asks the user to specify a region, prompting with a region that is just wide enough for the graph. If REGION=T, a region in the lower left corner is used.
|
||
FILELABELS is an optional pair of labels that will appear over the columns of the difference graph instead of the (often overly long) full names of the files.
|
||
COMPARETEXT creates a graph with two columns. Each column contains the label for one of the files, and lists the chunks from that file. Each chunk is represented by an atom NNN:MMM, where NNN is the file pointer of the beginning of the chunk within the file, and MMM is the length of the chunk. Lines are drawn from one column to the other to show which chunks in one file are the same as those in the other file. Chunks with no lines going to them do not exist in the other file. [Note: a series of chunks in one file which are the same as a series of chunks in the other file are merged into one big chunk. A series of unconnected chunks is also merged.]
|
||
Pressing the LEFT mouse button over one of the chunk nodes causes the node and its counterpart in the other column to be inverted, and read-only Tedit windows are open on the files with the appropriate text selected. If a Tedit window to a file is already active, the selection is simply moved. If COMPARETEXT.AUTOTEDIT is true (initially), then regions are selected automatically for the Tedit windows, otherwise the mouse must be used to specify ghost regions.
|
||
Pressing the MIDDLE mouse button over a chunk node raises a pop-up menu with the items: PARA, LINE, and WORD. If one of these is selected, COMPARETEXT is called to compare the selected chunk with the last selected chunks (the ones that are boxed), using the hash type selected, and create a new graph window.
|
||
White space (space, tab, CR, LF) is used to delimit chunks, but is ignored when computing the hash value of a chunk. Therefore, if two paragraphs are identical except that one has a few extra CRs after it, they will be considered identical by COMPARETEXT.
|
||
If the variable COMPARETEXT.ALLCHUNKS is NIL (initially T), then the graph is abbreviated so that nodes for identical chunks in the same position are not shown.
|
||
|