Digestifying an ITS Mailing List Why Digestify? -------------- First, what is digestifying and why do it? A mailing list is used by a mailer program (such as ITS's COMSAT) to distribute messages to more than one address, translating the single list address given into the addresses of the intended recipients. Normally this process occurs without any built-in delays; the mailer receives a message, checks the set of addressees for mailing lists to expand, performs such expansions, and immediately delivers the message to the desired recipients. However, some mailing lists have a consistently high volume of mail travelling to them. Any message sent through an ITS machine ties up its COMSAT in proportion to the number of recipients and the number of messages sent, rather than in proportion to the total number of characters sent -- for example, delivering any message to AI-List (one very large mailing list) takes two and a half hours! Mail to hosts that are neither local to MIT nor directly connected to the central nodes of the Arpanet -- "weird" hosts -- is particularly expensive. Thus, any mailing list which gets more than a few messages per day and which goes to more than about ten weird hosts imposes a large load on COMSAT. Also, the overhead to list members of reading mail sent to the list is a function of the number of messages received, as well as the number of characters. For these two reasons, large mailing lists should be -digested-; that is, arrangements should be made to collect all the messages sent to such a list during each day, bundle them up as one single, long message, and send that message, the digest, to the list members. The digest has a special format which can be undigested or burst -- broken up into individual messages -- by many mail-reading programs. This file explains how to get the digestifier daemon to digest a ITS mailing list automatically, using as an example a hypothetical mailing list called FOOBLATZ. How To Digestify ---------------- So now that you've decided your mailing list should be digested, how do you arrange that? First, all automatically digestified ITS mailing lists currently must reside on MC.LCS.MIT.EDU; if yours lives elsewhere, you must move it to MC. You can make forwarding pointers from other machines to MC; on another ITS machine, such a pointer would be a line like (FOOBLATZ (EQV-LIST FOOBLATZ@MC)) in the other machine's .MAIL.;NAMES file. Second, you must alter the mailing list entry in MC:.MAIL.;NAMES > . Mail sent to FOOBLATZ needs to be collected into a file, the inbox, for later digestification, rather than immediately sent out to members of the mailing list. So the mailing list entry for FOOBLATZ should look like (FOOBLATZ (EQV-LIST ([COMAIL;FOO INBOX] (R-OPTION FAST-APPEND)))) Note that the FAST-APPEND option makes COMSAT append new mail to the end of the inbox, which will cause the FOOBLATZ Digest to include the messages sent to FOOBLATZ in the chronological order in which they arrived at MC. Not including this option will cause the digest to include the messages in the reverse order of arrival, which will confuse many list members. Further, as explained below in the digestifier algorithm section, when discussion grows very brisk, the inbox may contain more than one digest's worth of messages; in this case the digestifier will create a digest starting at the beginning of the file and going until it reaches its size limit, so the FAST-APPEND option will ensure that the older part of the conversation is sent out first, and that very old messages don't accumulate unsent. Third, you must create an entry in the file MC:DIGEST;DEFS > for your digest. Everything in that file up to the first ^_ (ascii 037) is a comment, and after that comes a series of digest definitions, separated by ^_'s; to make life easy, put yours at the end. Each digest definition has a format that looks like an RFC822 mail header -- that is, it consists of a series of named fields of the form: Name: FooBlatz Inbox: COMAIL;FOO INBOX Administrivia: COMAIL;FOO ADMIN Record: COMAIL;FOO RECORD First-Issue-Number: 259 AUTHOR: FOOBLATZ-REQUEST RCPT: (@FILE [COMAIL;FOO LIST]) From: FooBlatz Daily Blast Reply-To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU Continuation lines, such as the second line of the From: field in the example above, -must- start with a space or a tab. Blank lines between fields are ignored, so you can insert them in your digest entry if you like. The order of the fields in the definition does not matter. Capitalization of field names does not matter, but capitalization of field values does -- if you want the From: field to look like "FooBlatz-Request" in the digests sent out, capitalize it that way. Here is a catalog of all of the currently accepted fields, their meanings, and whether they are required for the digestifier to work: Name: (required) The name of the digest, such as "FooBlatz". This name is usually used before the word "Digest", as in "FooBlatz Digest #259". Inbox: (required) The name of the file that COMSAT delivers mail to for this digest. The device is defaulted to "DSK", the directory is defaulted to "COMAIL", and the second filename is defaulted to "INBOX". You -must- supply the first filename. Thus you can say just "Inbox: FOO" if your inbox is DSK:COMAIL;FOO INBOX. Administrivia: (optional) The name of the file that the digestifier should check for administrative messages that should be inserted at the front of the next digest. By default this file has the same name as the inbox file, but with the second filename of "ADMIN". If you don't specify an Administrivia field, then the digestifier will not look for an administrivia file at all -- if you want to use the default file name, you can simply give Administrivia: a blank field. If this field exists, the digestifier will look for a file of the specified name; if the file exists, the digestifier will include its contents in the digest between the list of message topics and the first message, and delete the file. Note that the administrivia file is not a mailbox -- its contents will be included in the digest exactly, including all the headers and other (for this purpose) extraneous nonsense of anything sent to the file as mail. Spare your list members by avoiding this action; log in and write the file directly if that's at all possible. When you write the file, you don't need to explicitly create white space around your text; the digestifier will automatically provide blank lines before and after it. Record: (optional) The name of the file that the digestifier uses to keep track of the state of the digest. This contains vital data like the current issue number and the time that the most recent digest was mailed. By default this file has the same name as the inbox file, but with the second filename of "RECORD". Do not try to create this file yourself! Doing so will only confuse the situation. The digestifier will create this file the first time it processes your digest; if you don't specify a Record field, the digestifier will use the default name for this file. First-Issue-Number: (optional, usually) This field is -only- used by the digestifier when it creates a new record file the first time it processes your digest. This is used to initialize the issue number stored in the record file so that the next digest created will have the given number. It should consist of a string of digits (only digits!) representing a decimal number, like "259". If this field is not present and the digestifier can't find a record file, then the digest definition is broken and no digest will be produced. For safety, you can remove the First-Issue-Number from your digest definition after your record file is created; that way, if someone accidentally deletes your record file, the digestifier won't automatically recreate it and start duplicating issue numbers. When converting an existing digest to use this digestifier, the initial contents of this field should be set to the contents of the existing "NUMBER" file associated with that digest, to preserve continuity of issue numbers. RCPT: (required) This is the digest recipient, the address to which the digest will actually be mailed (independent of what you may list in the To: and From: fields!). This has exactly the format of a recipient from .MAIL.;NAMES >, except it cannot be continued onto a continuation line. Typically you will keep the actual mailing list in a file, say COMAIL;FOO LIST, and have a RCPT field like RCPT: (@FILE [COMAIL;FOO LIST]) Defining the list in an indirect file is a good idea for large mailing lists that change frequently, since it allows you to avoid recompiling NAMES > and messing with the NAMED ERR file. When converting an existing digest to use this digestifier, note that a separate entry in NAMES > is no longer called for to hold the mailing list which includes the actual list members. You can use such a list, but it's probably easier to simply specify the indirect file containing the list members explicitly in this field. AUTHOR: (required) This is where delivery error messages should be directed. Unlike the RCPT: field, this is not a NAMES > style recipient; nor is it an RFC822 style recipient (something of the form User@Host). It can only be a simple string -- "FOOBLATZ-REQUEST" in our example. This means that unless you put a single person's name here, you will have to create a mailing list to receive the errors. If you want to try to keep human-generated requests apart from mailer-generated errors, you can create a mailing list separate from your administrative list -- called, say, FooBlatz-Errors -- and put it here in the AUTHOR: field. From: (required) Reply-To: (optional) To: (optional) The values of these three fields are copied verbatim into the header of all generated digests. If the optional fields are not given, the generated digests will not have these fields -- no default values are generated for them. Please be careful to specify only RFC822-legal values for these fields. Currently most digests use an address of the form From: FooBlatz Daily Blast or To: FOOBLATZ%MC.LCS.MIT.EDU@Mintaka.LCS.MIT.EDU (By the way, there is no reason why MC's name has to appear here. Your subscribers don't need to know that MC is involved in producing the digest as long as you give them -some- address that reaches your inbox.) Generally the From: field will contain the name of the mailing list's auxiliary administration list -- FooBlatz-Request in our example. This is the address where people will generally send their administrative requests. It need -not- be the same as the address that appears in the AUTHOR: field, although typically it is. The To: and Reply-To: fields should contain the address of the mailing list itself -- that is, the address where people send mail they want included in the digest. Mail sent to this address should eventually reach your digest's inbox file. Actually, many other well-known RFC822 header fields can be given as fields in the digest definition, but most digests will want to use exactly these three. (See the source code if you want to know what others will work. Note that Date, Subject and Message-ID headers are automatically generated for each issue of each digest by the digestifier.) Digestifier Algorithms ---------------------- The digestifier is run automatically once every hour. It reads through the file DIGEST;DEFS > and considers each digest in turn, keeping a log of its actions in the file DIGEST;LOG >. For each digest, the digestifier considers a number of factors to determine whether or not it is going to produce a digest this time: 1. The current time of day. The hours between 2AM and 7AM are "Prime Time" and the digestifier prefers to create a digest then. 2. The current size of the digest's inbox. The digestifier never produces a digest larger than a certain size (around 48000 characters). If the inbox looks like it contains more than 1.5 digests worth of material, then the inbox is "bloated" and the digestifier tries to create a digest soon. 3. How long ago the previous digest was mailed. The digestifier tries not to produce digests so frequently that people and mailers are overwhelmed with them, nor so infrequently that a message can sit in the inbox for an unreasonably long time. The precise test is: (AND (OR ; more than 1.5 digests pending (AND ; between 2AM and 7AM ))) In English, this translates as: If the last issue of this digest was sent out less than an hour and a half ago, wait. If the last issue went out longer ago than that and the inbox is bloated, create a digest. But if the inbox isn't bloated, check whether it's prime time; if it is, and the last issue went out yesterday, then create and send today's issue. The various numbers and times are all subject to future adjustment of course. This digestifier should be fairly robust in the face of system crashes, being gunned down in the middle of processing, etc. The worst that can happen is that a duplicate issue can be produced, and that can only occur if the digestifier is zapped during an extremely small window. I'll be surprised if it ever happens. It is perfectly safe to run two digestifiers at once, since both the digestifier and COMSAT use the LOCK device to coordinate access to inboxes. In fact, if you edit the DEFS file, it is probably a good idea to run the digestifier once yourself and check the LOG file to see if you made any errors. Even if your inbox is empty, this procedure will catch many possible problems with your digest definition. You should be able to run the digestifier by typing: :DIGEST;DIGEST to DDT. This might take a couple of minutes to finish (the digestifier might decide to produce some digests!) so be patient. Then you should look at what was appended to the end of the current DIGEST;LOG > file. This digestifier tries to be fairly civic-minded about cleaning up after itself. If it encounters any errors during the processing of a digest it logs the error, then carefully deletes any partially-written output and either proceeds to the next digest, or kills itself (depending on the nature of the error). Only a few amazingly unlikely errors should ever leave a dead disowned job as a corpse. Mail is always delivered through the bulk COMSAT (through the .BULK. directory.) This digestifier is careful to check the messages included in the digests it builds for lines of "-"s that could confuse digest bursting or undigestifying tools, and modifies the first "-" in any suspect line to be a space. Send bug reports to Bug-DIGEST. This digestifier was written by Alan Bawden and supersedes previous digests written by Rob Austein, David Wallace, and Chris Maeda. A lot of the documentation was adapted from documentation written by David Chapman for the GUMBY digestifier.