DIGEST - digestify a mailing list.

2026-02-04 23:54:37 +00:00 · 2019-11-27 14:35:33 +01:00
parent 11858e7188
commit 31ae9da89c
8 changed files with 4428 additions and 2 deletions
--- a/doc/digest/-read-.-this-
+++ b/doc/digest/-read-.-this-
@@ -0,0 +1,21 @@
+This directory is for the ITS mail digesting tool.
+
+DEFS > contains the digest definitions.
+
+DIGEST ORDER contains documentation on the definitions and other
+    information on how to use the digestifier.
+
+LOG * files contain digestifier log output.
+
+TS DIGEST is the installed binary of the digestifier.
+    (DRAGON;HOURLY DIGEST should be a link to DIGEST;TS DIGEST.)
+
+DIGEST > is the source for TS DIGEST.
+
+TS MBXLOC is a utility program for locking inboxes so that COMSAT and
+    the digestifier wont touch them (so that you can edit them
+    yourself, should that become necessary).
+
+MBXLOC > is the source for TS MBXLOC.
+
+DIGEST BUGS ia the archive for BUG-DIGEST mail.
--- a/doc/digest/digest.bugs
+++ b/doc/digest/digest.bugs
--- a/doc/digest/digest.order
+++ b/doc/digest/digest.order
@@ -0,0 +1,334 @@
+                     Digestifying an ITS Mailing List
+
+Why Digestify?
+--------------
+
+First, what is digestifying and why do it?  A mailing list is used by a
+mailer program (such as ITS's COMSAT) to distribute messages to more than
+one address, translating the single list address given into the addresses
+of the intended recipients.  Normally this process occurs without any
+built-in delays; the mailer receives a message, checks the set of
+addressees for mailing lists to expand, performs such expansions, and
+immediately delivers the message to the desired recipients.
+
+However, some mailing lists have a consistently high volume of mail
+travelling to them.  Any message sent through an ITS machine ties up its
+COMSAT in proportion to the number of recipients and the number of messages
+sent, rather than in proportion to the total number of characters sent --
+for example, delivering any message to AI-List (one very large mailing
+list) takes two and a half hours!  Mail to hosts that are neither local to
+MIT nor directly connected to the central nodes of the Arpanet -- "weird"
+hosts -- is particularly expensive.  Thus, any mailing list which gets more
+than a few messages per day and which goes to more than about ten weird
+hosts imposes a large load on COMSAT.
+
+Also, the overhead to list members of reading mail sent to the list is a
+function of the number of messages received, as well as the number of
+characters.  For these two reasons, large mailing lists should be
+-digested-; that is, arrangements should be made to collect all the
+messages sent to such a list during each day, bundle them up as one single,
+long message, and send that message, the digest, to the list members.  The
+digest has a special format which can be undigested or burst -- broken up
+into individual messages -- by many mail-reading programs.
+
+This file explains how to get the digestifier daemon to digest a ITS
+mailing list automatically, using as an example a hypothetical mailing list
+called FOOBLATZ.
+
+
+How To Digestify
+----------------
+
+So now that you've decided your mailing list should be digested, how do
+you arrange that?
+
+First, all automatically digestified ITS mailing lists currently must
+reside on MC.LCS.MIT.EDU; if yours lives elsewhere, you must move it to MC.
+You can make forwarding pointers from other machines to MC; on another ITS
+machine, such a pointer would be a line like
+
+(FOOBLATZ (EQV-LIST FOOBLATZ@MC))
+
+in the other machine's .MAIL.;NAMES file.
+
+Second, you must alter the mailing list entry in MC:.MAIL.;NAMES > .  Mail
+sent to FOOBLATZ needs to be collected into a file, the inbox, for later
+digestification, rather than immediately sent out to members of the mailing
+list.  So the mailing list entry for FOOBLATZ should look like
+
+(FOOBLATZ (EQV-LIST ([COMAIL;FOO INBOX] (R-OPTION FAST-APPEND))))
+
+Note that the FAST-APPEND option makes COMSAT append new mail to the end of
+the inbox, which will cause the FOOBLATZ Digest to include the messages
+sent to FOOBLATZ in the chronological order in which they arrived at MC.
+Not including this option will cause the digest to include the messages in
+the reverse order of arrival, which will confuse many list members.
+Further, as explained below in the digestifier algorithm section, when
+discussion grows very brisk, the inbox may contain more than one digest's
+worth of messages; in this case the digestifier will create a digest
+starting at the beginning of the file and going until it reaches its size
+limit, so the FAST-APPEND option will ensure that the older part of the
+conversation is sent out first, and that very old messages don't accumulate
+unsent.
+
+Third, you must create an entry in the file MC:DIGEST;DEFS > for your
+digest.  Everything in that file up to the first ^_ (ascii 037) is a
+comment, and after that comes a series of digest definitions, separated by
+^_'s; to make life easy, put yours at the end.
+
+Each digest definition has a format that looks like an RFC822 mail header
+-- that is, it consists of a series of named fields of the form:
+
+Name: FooBlatz
+Inbox: COMAIL;FOO INBOX
+Administrivia: COMAIL;FOO ADMIN
+Record: COMAIL;FOO RECORD
+First-Issue-Number: 259
+AUTHOR: FOOBLATZ-REQUEST
+RCPT: (@FILE [COMAIL;FOO LIST])
+From: FooBlatz Daily Blast
+      <FooBlatz-Request%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU>
+Reply-To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU
+To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU
+
+Continuation lines, such as the second line of the From: field in the
+example above, -must- start with a space or a tab.  Blank lines between
+fields are ignored, so you can insert them in your digest entry if you
+like.  The order of the fields in the definition does not matter.
+Capitalization of field names does not matter, but capitalization of field
+values does -- if you want the From: field to look like "FooBlatz-Request"
+in the digests sent out, capitalize it that way.
+
+Here is a catalog of all of the currently accepted fields, their
+meanings, and whether they are required for the digestifier to work:
+
+  Name:			(required)
+
+    The name of the digest, such as "FooBlatz".  This name is usually used
+    before the word "Digest", as in "FooBlatz Digest #259".
+
+  Inbox:		(required)
+
+    The name of the file that COMSAT delivers mail to for this digest.  The
+    device is defaulted to "DSK", the directory is defaulted to "COMAIL",
+    and the second filename is defaulted to "INBOX".  You -must- supply the
+    first filename.  Thus you can say just "Inbox: FOO" if your inbox is
+    DSK:COMAIL;FOO INBOX.
+
+  Administrivia:	(optional)
+
+    The name of the file that the digestifier should check for
+    administrative messages that should be inserted at the front of the
+    next digest.  By default this file has the same name as the inbox file,
+    but with the second filename of "ADMIN".  If you don't specify an
+    Administrivia field, then the digestifier will not look for an
+    administrivia file at all -- if you want to use the default file name,
+    you can simply give Administrivia: a blank field.
+
+    If this field exists, the digestifier will look for a file of the
+    specified name; if the file exists, the digestifier will include its
+    contents in the digest between the list of message topics and the first
+    message, and delete the file.  Note that the administrivia file is not
+    a mailbox -- its contents will be included in the digest exactly,
+    including all the headers and other (for this purpose) extraneous
+    nonsense of anything sent to the file as mail.  Spare your list members
+    by avoiding this action; log in and write the file directly if that's
+    at all possible.  When you write the file, you don't need to explicitly
+    create white space around your text; the digestifier will automatically
+    provide blank lines before and after it.
+
+  Record:		(optional)
+
+    The name of the file that the digestifier uses to keep track of the
+    state of the digest.  This contains vital data like the current issue
+    number and the time that the most recent digest was mailed.  By default
+    this file has the same name as the inbox file, but with the second
+    filename of "RECORD".
+
+    Do not try to create this file yourself!  Doing so will only confuse
+    the situation.  The digestifier will create this file the first time it
+    processes your digest; if you don't specify a Record field, the
+    digestifier will use the default name for this file.
+
+  First-Issue-Number:	(optional, usually)
+
+    This field is -only- used by the digestifier when it creates a new
+    record file the first time it processes your digest.  This is used to
+    initialize the issue number stored in the record file so that the next
+    digest created will have the given number.  It should consist of a
+    string of digits (only digits!) representing a decimal number, like
+    "259".
+
+    If this field is not present and the digestifier can't find a record
+    file, then the digest definition is broken and no digest will be
+    produced.  For safety, you can remove the First-Issue-Number from your
+    digest definition after your record file is created; that way, if
+    someone accidentally deletes your record file, the digestifier won't
+    automatically recreate it and start duplicating issue numbers.
+
+    When converting an existing digest to use this digestifier, the initial
+    contents of this field should be set to the contents of the existing
+    "NUMBER" file associated with that digest, to preserve continuity of
+    issue numbers.
+
+  RCPT:			(required)
+
+    This is the digest recipient, the address to which the digest will
+    actually be mailed (independent of what you may list in the To: and
+    From: fields!).  This has exactly the format of a recipient from
+    .MAIL.;NAMES >, except it cannot be continued onto a continuation line.
+    Typically you will keep the actual mailing list in a file, say
+    COMAIL;FOO LIST, and have a RCPT field like
+
+	RCPT: (@FILE [COMAIL;FOO LIST])
+
+    Defining the list in an indirect file is a good idea for large mailing
+    lists that change frequently, since it allows you to avoid recompiling
+    NAMES > and messing with the NAMED ERR file.
+
+    When converting an existing digest to use this digestifier, note that a
+    separate entry in NAMES > is no longer called for to hold the mailing
+    list which includes the actual list members.  You can use such a list,
+    but it's probably easier to simply specify the indirect file containing
+    the list members explicitly in this field.
+
+  AUTHOR:		(required)
+
+    This is where delivery error messages should be directed.
+
+    Unlike the RCPT: field, this is not a NAMES > style recipient; nor is
+    it an RFC822 style recipient (something of the form User@Host).  It can
+    only be a simple string -- "FOOBLATZ-REQUEST" in our example.  This
+    means that unless you put a single person's name here, you will have to
+    create a mailing list to receive the errors.
+
+    If you want to try to keep human-generated requests apart from
+    mailer-generated errors, you can create a mailing list separate from
+    your administrative list -- called, say, FooBlatz-Errors -- and put it
+    here in the AUTHOR: field.
+
+  From:			(required)
+  Reply-To:		(optional)
+  To:			(optional)
+
+    The values of these three fields are copied verbatim into the header of
+    all generated digests.  If the optional fields are not given, the
+    generated digests will not have these fields -- no default values are
+    generated for them.  Please be careful to specify only RFC822-legal
+    values for these fields.  Currently most digests use an address of the
+    form
+
+	From: FooBlatz Daily Blast
+	      <FooBlatz-Request%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU>
+    or
+	To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU
+
+    (By the way, there is no reason why MC's name has to appear here.  Your
+    subscribers don't need to know that MC is involved in producing the
+    digest as long as you give them -some- address that reaches your
+    inbox.)
+
+    Generally the From: field will contain the name of the mailing list's
+    auxiliary administration list -- FooBlatz-Request in our example.  This
+    is the address where people will generally send their administrative
+    requests.  It need -not- be the same as the address that appears in the
+    AUTHOR: field, although typically it is.
+
+    The To: and Reply-To: fields should contain the address of the mailing
+    list itself -- that is, the address where people send mail they want
+    included in the digest.  Mail sent to this address should eventually
+    reach your digest's inbox file.
+
+    Actually, many other well-known RFC822 header fields can be given as
+    fields in the digest definition, but most digests will want to use
+    exactly these three.  (See the source code if you want to know what
+    others will work.  Note that Date, Subject and Message-ID headers are
+    automatically generated for each issue of each digest by the
+    digestifier.)
+
+
+Digestifier Algorithms
+----------------------
+
+The digestifier is run automatically once every hour.  It reads through the
+file DIGEST;DEFS > and considers each digest in turn, keeping a log of its
+actions in the file DIGEST;LOG >.
+
+For each digest, the digestifier considers a number of factors to determine
+whether or not it is going to produce a digest this time:
+
+1.  The current time of day.  The hours between 2AM and 7AM are "Prime
+    Time" and the digestifier prefers to create a digest then.
+
+2.  The current size of the digest's inbox.  The digestifier never produces
+    a digest larger than a certain size (around 48000 characters).  If the
+    inbox looks like it contains more than 1.5 digests worth of material,
+    then the inbox is "bloated" and the digestifier tries to create a
+    digest soon.
+
+3.  How long ago the previous digest was mailed.  The digestifier tries not
+    to produce digests so frequently that people and mailers are
+    overwhelmed with them, nor so infrequently that a message can sit in
+    the inbox for an unreasonably long time.
+
+The precise test is:
+
+  (AND <the previous digest was created more than 90 minutes ago>
+       (OR <the inbox is bloated>	; more than 1.5 digests pending
+	   (AND <it is prime time>	; between 2AM and 7AM
+		<the previous digest was created more than 18 hours ago>)))
+
+In English, this translates as:
+    If the last issue of this digest was sent out less than an hour and a
+half ago, wait.  If the last issue went out longer ago than that and the
+inbox is bloated, create a digest.  But if the inbox isn't bloated, check
+whether it's prime time; if it is, and the last issue went out yesterday,
+then create and send today's issue.
+
+The various numbers and times are all subject to future adjustment of
+course.
+
+This digestifier should be fairly robust in the face of system crashes,
+being gunned down in the middle of processing, etc.  The worst that can
+happen is that a duplicate issue can be produced, and that can only occur
+if the digestifier is zapped during an extremely small window.  I'll be
+surprised if it ever happens.
+
+It is perfectly safe to run two digestifiers at once, since both the
+digestifier and COMSAT use the LOCK device to coordinate access to inboxes.
+
+In fact, if you edit the DEFS file, it is probably a good idea to run the
+digestifier once yourself and check the LOG file to see if you made any
+errors.  Even if your inbox is empty, this procedure will catch many
+possible problems with your digest definition.  You should be able to run
+the digestifier by typing:
+
+    :DIGEST;DIGEST
+
+to DDT.  This might take a couple of minutes to finish (the digestifier
+might decide to produce some digests!) so be patient.  Then you should look
+at what was appended to the end of the current DIGEST;LOG > file.
+
+This digestifier tries to be fairly civic-minded about cleaning up after
+itself.  If it encounters any errors during the processing of a digest it
+logs the error, then carefully deletes any partially-written output and
+either proceeds to the next digest, or kills itself (depending on the
+nature of the error).  Only a few amazingly unlikely errors should ever
+leave a dead disowned job as a corpse.
+
+Mail is always delivered through the bulk COMSAT (through the .BULK.
+directory.) 
+
+This digestifier is careful to check the messages included in the digests
+it builds for lines of "-"s that could confuse digest bursting or
+undigestifying tools, and modifies the first "-" in any suspect line to be
+a space.
+
+Send bug reports to Bug-DIGEST.
+
+
+This digestifier was written by Alan Bawden and supersedes previous digests
+written by Rob Austein, David Wallace, and Chris Maeda.  A lot of the
+documentation was adapted from documentation written by David Chapman for
+the GUMBY digestifier.
+
--- a/doc/digest/new.admin
+++ b/doc/digest/new.admin
@@ -0,0 +1,32 @@
+Congratulations!  We're the lucky winners in the lottery to choose the
+lucky victim, er, test case for the new improved automatic digestifier,
+which has just been finished by the small but dedicated cadre of ITS
+hackers.  This change should if anything improve the situation for
+list members.  Specific differences you may notice:
+
+All messages included in the digest will now include all the To: and CC:
+fields they arrived here with.  So if a message was sent to some particular
+list member but cc'd to the whole list, or sent to several lists, that will
+now be obvious.  For those of us who use undigestifying or digest-bursting
+tools, this change will have the beneficial effect of causing each message
+burst from the digest to automatically have a legal mail header.
+
+There are now rules governing the maximum size of digests sent out.  If
+there is too much accumulated mail, the digestifier will send out the
+oldest section as the digest and save the newest section for later.
+Conversely, the digestifier also has the ability to produce more than one
+digest a day.  The effect of this change is that list members whose sites
+have difficulty swallowing very large mail should no longer run into that
+problem; on the other hand, if the discussion grows extensive, it will get
+throughput in efficient chunks.
+
+The digestifier will now check each included message for lines that begin
+with lots of dashes.  Such lines are created by the digestifier to separate
+messages from each other, so similar lines inside any message have the
+potential to confuse digest-bursting tools.  To guard against this problem,
+the digestifier will now change the initial dash of any such lines inside
+messages to a space.
+
+If this change causes any problems, please report them to us.
+
+SCA-REQUEST@MC.LCS.MIT.EDU
--- a/doc/programs.md
+++ b/doc/programs.md
@@ -82,6 +82,7 @@
 - DDTDOC, interactive DDT documentation.
 - DECUUO, TOPS-10 and WAITS emulator.
 - DFTP, Datacomputer file transfer.
+- DIGEST, digestify a mailing list.
 - DIRCPY, copy directory.
 - DIRDEV, list directories, sorted or subsetted.
 - DIRED, directory editor (independent from EMACS DIRED).