control-archive 1.9.1

(processing and archiving of Netnews control messages)
Maintained by Russ Allbery <eagle@eyrie.org>

Copyright 2002-2004, 2007-2014, 2016-2021 Russ Allbery <eagle@eyrie.org>. Copyright 2001 Marco d'Itri. Copyright 1996 UUNET Technologies, Inc.. This software is distributed under a BSD-style license. Please see the section LICENSE below for more information.

BLURB

This software generates an INN control.ctl configuration file from hierarchy configuration fragments, verifies control messages using GnuPG where possible, processes new control messages to update a newsgroup list, archives new control messages, and exports the list of newsgroups in a format suitable for synchronizing the newsgroup list of a Netnews news server. It is the software that maintains the control message and newsgroup lists available from ftp.isc.org.

DESCRIPTION

This package contains three major components:

Manual changes to the canonical newsgroup list are supported in a way that generates the same log messages and uses the same locking structure so that they can co-exist with automated changes and be included in the same reports.

This is the software that generates the active newsgroup lists [1] and control message archive [2] hosted on ftp.isc.org, and the source of the control.ctl file provided with INN.

[1] ftp://ftp.isc.org/pub/usenet/CONFIG/
[2] ftp://ftp.isc.org/pub/usenet/control/

For a web presentation of the information recorded here, as well as other useful information about Usenet hierarchies, please see the list of Usenet managed hierarchies [3].

[3] http://usenet.trigofacile.com/hierarchies/

REQUIREMENTS

Perl 5.6 or later plus the following additional Perl modules are required:

gzip [1] and bzip2 [2] are required. Both are generally available with current operating systems, possibly as supplemental packages.

[1] https://www.gnu.org/software/gzip/
[2] http://www.bzip.org/

GnuPG [3] 1.4.20 or later is required to import keys and verify signatures. This package does not yet work with GnuPG 2.x.

[3] https://gnupg.org/

process-control expects to be fed file names and message IDs of control messages on standard input and therefore needs to be run from a news server or some other source of control messages. A minimalist news server like tinyleaf is suitable for this (I wrote tinyleaf, available as part of INN [4], for this purpose).

[4] https://www.eyrie.org/~eagle/software/inn/

VERSIONING

This package uses a three-part version number. The first number will be incremented for major changes, major new functionality, incompatible changes to the configuration format (more than just adding new keys), or similar disruptive changes. For lesser changes, the second number will be incremented for any change to the code or functioning of the software. A change to the third part of the version number indicates a release with changes only to the configuration, PGP keys, and documentation files.

LAYOUT

The configuration data is in one file per hierarchy in the config directory. Each file has the format specified in FORMAT and is designed to be readable by INN's new configuration parser in case this can be further automated down the road. The config/special directory contains overrides, raw control.ctl fragments that should be used for particular hierarchies instead of automatically-generated entries (usually for special comments). Eventually, the format should be extended to handle as many of these cases as possible.

The keys directory contains the PGP public keys for every hierarchy that has one. The user IDs on these keys must match the signer expected by the configuration data for the corresponding hierarchy.

The forms directory contains the basic file structure for the three generated files.

The scripts directory contains all the software that generates the configuration and documentation files, processes control messages, updates the database, creates the newsgroup lists, and generates reports. Most scripts in that directory have POD documentation included at the end of the script, viewable by running perldoc on the script.

The templates directory contains templates for the control-summary script. These are the templates I use myself. Other installations should customize them.

The docs directory contains the extra documentation files that are distributed from ftp.isc.org in the control message archive and newsgroup list directories, plus the DocKnot metadata for this package.

INSTALLATION

This software is set up to run from /srv/control. To use a different location, edit the paths at the beginning of each of the scripts in the scripts directory to use different paths. By default, copying all the files from the distribution into a /srv/control directory is almost all that's needed. An install rule is provided to do this. To install the software, run:

    make install

You will need write access to /srv/control or permission to create it.

process-control and generate-files need a GnuPG keyring containing all of the honored hierarchy keys. To generate this keyring, run make install or:

    mkdir keyring
    gpg --homedir=keyring --allow-non-selfsigned-uid --import keys/*

from the top level of this distribution. process-control also expects a control.ctl file in /srv/control/control.ctl, which can be generated from the files included here (after creating the keyring as described above) by running make install or:

    scripts/generate-files

Both of these are done automatically as part of make install. process-control expects /srv/control/archive to exist and archives control messages there. It expects /srv/control/tmp to exist and uses it for temporary files for GnuPG control message verification.

To process incoming control messages, you need to run process-control on each message. process-control expects to receive, on standard input, lines consisting of a path to a file, a space, and a message ID. This input format is designed to work with the tinyleaf server that comes with INN 2.5 and later, but it should also work as a channel feed from pre-storage-API versions of INN (1.x). It will not work without modification via a channel feed from a current version of INN, since it doesn't understand the storage API and doesn't know how to retrieve articles by tokens. This could be easily added; I just haven't needed it.

If you're using tinyleaf, here is the setup process:

  1. Create a directory that tinyleaf will use to store incoming articles temporarily, the archive directory, and the logs directory and install the software:

        make install
  2. Run tinyleaf on some port, configuring it to use that directory and to run process-control. A typical tinyleaf command line would be:

        tinyleaf /srv/control/spool /srv/control/scripts/process-control

    I run tinyleaf using systemd, but any inetd implementation should work equally well.

  3. Set up a news feed to the system running tinyleaf that sends control messages of interest. You should be careful not to send cancel control messages or you'll get a ton of junk in your logs. The INN newsfeeds entry I use is:

        isc-control:control,control.*,!control.cancel:Tf,Wnm:

    combined with nntpsend to send the articles.

That should be all there is to it. Watch the logs directory to see what happens for incoming messages.

scripts/process-control just maintains a database file. To export that data in a format that's useful for other software, run scripts/export-control. This expects a /srv/control/export directory into which it stores active and newsgroups files, a copy of the control.ctl file, and all of the logs in a LOGS subdirectory. This export directory can then be made available on the web, copied to another system, or whatever else is appropriate. Generally, scripts/export-control should be run periodically from cron.

Reports can be generated using scripts/control-summary. This script needs configuration before running; see the top of the script and its included POD documentation. There is a sample template in the templates directory, and scripts/weekly-report shows a sample cron job for sending out a regular report.

BOOTSTRAPPING

This package is intended to provide all of the tools, configuration, and information required to duplicate the ftp.isc.org control message archive and newsgroup list service if you so desire. To set up a similar service based on that service, however, you will also want to bootstrap from the existing data. Here is the procedure for that:

  1. Be sure that you're starting from the latest software and set of configuration files. I will generally try to make a new release after committing a batch of changes, but I may not make a new release after every change. See the sections below for information about the Git repository in which this package is maintained. You can always clone that repository to get the latest configuration (and then merge or cherry-pick changes from my repository into your repository as you desire).

  2. Download the current newsgroup list from:

        ftp://ftp.isc.org/pub/usenet/CONFIG/newsgroups.bz2

    and then bootstrap the database from it:

        bzip2 -dc newsgroups.bz2 | scripts/update-control bulkload
  3. If you want the log information so that your reports will include changes made in the ftp.isc.org archive before you created your own, copy the contents of ftp://ftp.isc.org/pub/usenet/CONFIG/LOGS/ into /srv/control/logs.

  4. If you want to start with the existing control message repository, download the contents of ftp://ftp.isc.org/pub/usenet/control/ into /srv/control/archive. You can do this using a recursive download tool that understands FTP, such as wget, but please use the options that add delays and don't hammer the server to death.

After finishing those steps, you will have a copy of the ftp.isc.org archive and can start processing control messages, possibly with different configuration choices. You can generate the files that are found in ftp://ftp.isc.org/pub/usenet/CONFIG/ by running scripts/export-control as described above.

MAINTENANCE

To add a new hierarchy, add a configuration fragment in the config directory named after the hierarchy, following the format of the existing files, and run scripts/generate-files to create a new control.ctl file. See the documentation in scripts/generate-files for details about the supported configuration keys.

If the hierarchy uses PGP-signed control messages, also put the PGP key into the keys directory in a file named after the hierarchy. Then, run:

    gpg --homedir=keyring --import keys/<hierarchy>

to add the new key to the working keyring.

The first user ID on the key must match the signer expected by the configuration data for the corresponding hierarchy. If a hierarchy administrator sets that up wrong (usually by putting additional key IDs on the key), this can be corrected by importing the key into a keyring with GnuPG, using gpg --edit-key to remove the offending user ID, and exporting the key again with gpg --export --ascii.

When adding a new hierarchy, it's often useful to bootstrap the newsgroup list by importing the current checkgroups. To do this, obtain the checkgroups as a text file (containing only the groups without any news headers) and run:

    scripts/update-control checkgroups <hierarchy> < <checkgroups>

where <hierarchy> is the hierarchy the checkgroups is for and <checkgroups> is the path to the checkgroups file.

SUPPORT

The control-archive web page at:

    https://www.eyrie.org/~eagle/software/control-archive/

will always have the current version of this package, the current documentation, and pointers to any additional resources.

Configuration updates should be sent to usenet-config@isc.org.

For bug tracking, use the issue tracker on GitHub:

    https://github.com/rra/control-archive/issues

However, please be aware that I tend to be extremely busy and work projects often take priority. I'll save your report and get to it as soon as I can, but it may take me a couple of months.

SOURCE REPOSITORY

control-archive is maintained using Git. You can access the current source on GitHub at:

    https://github.com/rra/control-archive

or by cloning the repository at:

    https://git.eyrie.org/git/usenet/control-archive.git

or view the repository via the web at:

    https://git.eyrie.org/?p=usenet/control-archive.git

The eyrie.org repository is the canonical one, maintained by the author, but using GitHub is probably more convenient for most purposes. Pull requests are gratefully reviewed and normally accepted.

LICENSE

The control-archive package as a whole is covered by the following copyright statement and license:

  Copyright 2002-2004, 2007-2014, 2016-2021
      Russ Allbery <eagle@eyrie.org>
  Copyright 2001 Marco d'Itri
  Copyright 1996 UUNET Technologies, Inc.
  Permission is hereby granted, free of charge, to any person obtaining
  a copy of this software and associated documentation files (the
  "Software"), to deal in the Software without restriction, including
  without limitation the rights to use, copy, modify, merge, publish,
  distribute, sublicense, and/or sell copies of the Software, and to
  permit persons to whom the Software is furnished to do so, subject to
  the following conditions:
  The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the Software.
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
  CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
  TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
  SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  This product includes software developed by UUNET Technologies, Inc.

Some files in this distribution are individually released under different licenses, all of which are compatible with the above general package license but which may require preservation of additional notices. All required notices, and detailed information about the licensing of each file, are recorded in the LICENSE file.

Files covered by a license with an assigned SPDX License Identifier include SPDX-License-Identifier tags to enable automated processing of license information. See https://spdx.org/licenses/ for more information.

For any copyright range specified by files in this package as YYYY-ZZZZ, the range specifies every single year in that closed interval.