Russ Allbery > Software > control-archive |
Processing:
Verification of the PGP signature of control messages should extract the e-mail address from the key ID and use that for matching in control.ctl rather than just the first word of the key ID.
Control messages in unmanaged hierarchies (alt.*, free.*) should really be throttled somehow to deal with floods of useless newsgroup creations. It's not clear how best to do the throttling, though (what headers to look at, what rate at which to throttle, and how to configure it). The need for this seems to have decreased lately.
process-control should support using sm to retrieve the article if instead of a path it gets a storage API token. This would let it run as a channel feed from a current innd.
The chkscope parameter in checkgroups control messages is honored, but the body is still parsed to find the affected hierarchy rather than using the chkscope information. This also means that permissions are not applied properly in cases where multiple hierarchies with different permission rules are included in the same checkgroups message.
Support checkgroups serial numbers from new USEPRO draft.
Support message/news-groupinfo in newgroup control messages. This requires parsing the MIME structure of the message (probably using one of the Perl MIME parsing libraries) and looking for a part of type application/news-groupinfo, and then checking its charset as well as parsing it.
Support an optional strongly-conforming mode which requires the tab in newsgroup descriptions and doesn't archive messages that aren't fully compliant with the current standards.
Support throttling of control messages by sender or NNTP-Posting-Host.
Configuration:
If PGP-enabled hierarchies marked PRIVATE included the verification configuration lines commented-out, bofh.*, cl.*, and ka.* could be handled without special fragments.
If there were a way to add special rules for particular group patterns within the hierarchy, carleton.*, git.*, israel.*, iu.*, umn.*, and utexas.* could be handled without special fragments.
Add a way to designate hierarchies that are still in use but have no active maintainers.
Output:
Currently, the generated newsgroups file uses the traditional mixed encoding, where each hierarchy's descriptions are encoded in whatever local character set that hierarchy happens to use. This means that the file itself has no single valid encoding. Incoming control messages should be parsed for a character set and then newsgroup descriptions converted to UTF-8 so that a consistent UTF-8 newsgroups file can be generated.
Julien ÉLIE collected the following information about character sets for newsgroup descriptions in existing hierarchies:
Interpreting everything else as cp1252 seems to work currently.
Generate control.conf (for DNews) and controlperm (for C News) output files as well.
Defunct hierarchies should not be listed in the regular control.ctl when they've been defunct for more than six months or a year. They should instead be moved to something like docs/hierarchies, but that file should be easily parsable.
Documentation:
The template for README.html hasn't been looked at in years, has David's e-mail address as the contact information, and is probably hopelessly out of date in other respects. It's also a bit odd to put all the PGP keys into an HTML file. It should be rewritten from scratch to include currently relevant information, probably moved into the pgpcontrol package instead of maintained here, and just reference the text PGPKEYS file.
The format of the PGPKEYS file is a left-over from when it was generated from README.html using lynx -dump (after README.html was manually updated). There are certainly better formats that could be used, and it would be nice to include the gpg -k --fingerprint output for each key before the key for the benefit of human readers. Currently some keys have that in their included text file and others don't.
docs/hierarchies is fairly obsolete. Maintaining it is probably more work still than anyone wants to go to, at least to the degree that it used to be maintained, but it needs an editing pass to remove information that only duplicates the current control.ctl file or which is obviously no longer correct.
Incorporate the Usenet Hierarchy Administration FAQ into this package and review and update it for current practices.
Style:
Update all Perl code to my current Perl coding style.
Use YAML for all configuration.
Russ Allbery > Software > control-archive |