Russ Allbery > Software > podlators | Pod::Text > |
(Convert POD data to formatted *roff input)
use Pod::Man; my $parser = Pod::Man->new (release => $VERSION, section => 8); # Read POD from STDIN and write to STDOUT. $parser->parse_file (\*STDIN); # Read POD from file.pod and write to file.1. $parser->parse_from_file ('file.pod', 'file.1');
Pod::Man is a module to convert documentation in the POD format (the preferred language for documenting Perl) into *roff input using the man macro set. The resulting *roff code is suitable for display on a terminal using nroff(1), normally via man(1), or printing using troff(1). It is conventionally invoked using the driver script pod2man, but it can also be used directly.
By default (on non-EBCDIC systems), Pod::Man outputs UTF-8. Its output should
work with the man program on systems that use groff (most Linux
distributions) or mandoc (most BSD variants), but may result in mangled
output on older UNIX systems. To choose a different, possibly more
backward-compatible output mangling on such systems, set the encoding
option to roff
(the default in earlier Pod::Man versions). See the
encoding
option and ENCODING for more details.
See COMPATIBILITY for the versions of Pod::Man with significant backward-incompatible changes (other than constructor options, whose versions are documented below), and the versions of Perl that included them.
Create a new Pod::Man object. ARGS should be a list of key/value pairs, where the keys are chosen from the following. Each option is annotated with the version of Pod::Man in which that option was added with its current meaning.
[1.00] Sets the centered page header for the .TH
macro. The default, if
this option is not specified, is User Contributed Perl Documentation
.
[4.00] Sets the left-hand footer for the .TH
macro. If this option is not
set, the contents of the environment variable POD_MAN_DATE, if set, will be
used. Failing that, the value of SOURCE_DATE_EPOCH, the modification date of
the input file, or the current time if stat() can't find that file (which will
be the case if the input is from STDIN
) will be used. If taken from any
source other than POD_MAN_DATE (which is used verbatim), the date will be
formatted as YYYY-MM-DD
and will be based on UTC (so that the output will
be reproducible regardless of local time zone).
[5.00] Specifies the encoding of the output. The value must be an encoding
recognized by the Encode module (see Encode::Supported), or the special
values roff
or groff
. The default on non-EBCDIC systems is UTF-8.
If the output contains characters that cannot be represented in this encoding,
that is an error that will be reported as configured by the errors
option.
If error handling is other than die
, the unrepresentable character will be
replaced with the Encode substitution character (normally ?
).
If the encoding
option is set to the special value groff
(the default on
EBCDIC systems), or if the Encode module is not available and the encoding is
set to anything other than roff
, Pod::Man will translate all non-ASCII
characters to \[uNNNN]
Unicode escapes. These are not traditionally part
of the *roff language, but are supported by groff and mandoc and thus by
the majority of manual page processors in use today.
If the encoding
option is set to the special value roff
, Pod::Man will
do its historic transformation of (some) ISO 8859-1 characters into *roff
escapes that may be adequate in troff and may be readable (if ugly) in nroff.
This was the default behavior of versions of Pod::Man before 5.00. With this
encoding, all other non-ASCII characters will be replaced with X
. It may
be required for very old troff and nroff implementations that do not support
UTF-8, but its representation of any non-ASCII character is very poor and
often specific to European languages.
If the output file handle has a PerlIO encoding layer set, setting encoding
to anything other than groff
or roff
will be ignored and no encoding
will be done by Pod::Man. It will instead rely on the encoding layer to make
whatever output encoding transformations are desired.
WARNING: The input encoding of the POD source is independent from the output
encoding, and setting this option does not affect the interpretation of the
POD input. Unless your POD source is US-ASCII, its encoding should be
declared with the =encoding
command in the source. If this is not done,
Pod::Simple will will attempt to guess the encoding and may be successful if
it's Latin-1 or UTF-8, but it will produce warnings. See perlpod(1) for
more information.
[2.27] How to report errors. die
says to throw an exception on any POD
formatting error. stderr
says to report errors on standard error, but not
to throw an exception. pod
says to include a POD ERRORS section in the
resulting documentation summarizing the errors. none
ignores POD errors
entirely, as much as possible.
The default is pod
.
[1.00] The fixed-width font to use for verbatim text and code. Defaults to
CW
. Some systems prefer CR
instead. Only matters for troff output.
[1.00] Bold version of the fixed-width font. Defaults to CB
. Only matters
for troff output.
[1.00] Italic version of the fixed-width font (something of a misnomer, since
most fixed-width fonts only have an oblique version, not an italic version).
Defaults to CI
. Only matters for troff output.
[1.00] Bold italic (in theory, probably oblique in practice) version of the
fixed-width font. Pod::Man doesn't assume you have this, and defaults to
CB
. Some systems (such as Solaris) have this font available as CX
.
Only matters for troff output.
[5.00] By default, Pod::Man applies some default formatting rules based on guesswork and regular expressions that are intended to make writing Perl documentation easier and require less explicit markup. These rules may not always be appropriate, particularly for documentation that isn't about Perl. This option allows turning all or some of it off.
The special value all
enables all guesswork. This is also the default for
backward compatibility reasons. The special value none
disables all
guesswork. Otherwise, the value of this option should be a comma-separated
list of one or more of the following keywords:
Convert function references like foo()
to bold even if they have no markup.
The function name accepts valid Perl characters for function names (including
:
), and the trailing parentheses must be present and empty.
Make the first part (before the parentheses) of manual page references like
foo(1)
bold even if they have no markup. The section must be a single
number optionally followed by lowercase letters.
If no guesswork is enabled, any text enclosed in C<> is surrounded by double quotes in nroff (terminal) output unless the contents are already quoted. When this guesswork is enabled, quote marks will also be suppressed for Perl variables, function names, function calls, numbers, and hex constants.
Convert Perl variable names to a fixed-width font even if they have no markup. This transformation will only be apparent in troff output, or some other output format (unlike nroff terminal output) that supports fixed-width fonts.
Any unknown guesswork name is silently ignored (for potential future compatibility), so be careful about spelling.
[5.00] Add commands telling groff that the input file is in the given
language. The value of this setting must be a language abbreviation for which
groff provides supplemental configuration, such as ja
(for Japanese) or
zh
(for Chinese).
Specifically, this adds:
.mso <language>.tmac .hla <language>
to the start of the file, which configure correct line breaking for the specified language. Without these commands, groff may not know how to add proper line breaks for Chinese and Japanese text if the manual page is installed into the normal manual page directory, such as /usr/share/man.
On many systems, this will be done automatically if the manual page is installed into a language-specific manual page directory, such as /usr/share/man/zh_CN. In that case, this option is not required.
Unfortunately, the commands added with this option are specific to groff and will not work with other troff and nroff implementations.
[4.08] Sets the quote marks used to surround C<> text. lquote
sets the
left quote mark and rquote
sets the right quote mark. Either may also be
set to the special value none
, in which case no quote mark is added on that
side of C<> text (but the font is still changed for troff output).
Also see the quotes
option, which can be used to set both quotes at once.
If both quotes
and one of the other options is set, lquote
or rquote
overrides quotes
.
[4.08] Set the name of the manual page for the .TH
macro. Without this
option, the manual name is set to the uppercased base name of the file being
converted unless the manual section is 3, in which case the path is parsed to
see if it is a Perl module path. If it is, a path like .../lib/Pod/Man.pm
is converted into a name like Pod::Man
. This option, if given, overrides
any automatic determination of the name.
If generating a manual page from standard input, the name will be set to
STDIN
if this option is not provided. In this case, providing this option
is strongly recommended to set a meaningful manual page name.
[2.27] Normally, L<> formatting codes with a URL but anchor text are formatted to show both the anchor text and the URL. In other words:
L<foo|http://example.com/>
is formatted as:
foo <http://example.com/>
This option, if set to a true value, suppresses the URL when anchor text
is given, so this example would be formatted as just foo
. This can
produce less cluttered output in cases where the URLs are not particularly
important.
[4.00] Sets the quote marks used to surround C<> text. If the value is a single character, it is used as both the left and right quote. Otherwise, it is split in half, and the first half of the string is used as the left quote and the second is used as the right quote.
This may also be set to the special value none
, in which case no quote
marks are added around C<> text (but the font is still changed for troff
output).
Also see the lquote
and rquote
options, which can be used to set the
left and right quotes independently. If both quotes
and one of the other
options is set, lquote
or rquote
overrides quotes
.
[1.00] Set the centered footer for the .TH
macro. By default, this is set
to the version of Perl you run Pod::Man under. Setting this to the empty
string will cause some *roff implementations to use the system default value.
Note that some system an
macro sets assume that the centered footer will be
a modification date and will prepend something like Last modified:
. If
this is the case for your target system, you may want to set release
to the
last modified date and date
to the version number.
[1.00] Set the section for the .TH
macro. The standard section numbering
convention is to use 1 for user commands, 2 for system calls, 3 for functions,
4 for devices, 5 for file formats, 6 for games, 7 for miscellaneous
information, and 8 for administrator commands. There is a lot of variation
here, however; some systems (like Solaris) use 4 for file formats, 5 for
miscellaneous information, and 7 for devices. Still others use 1m instead of
8, or some mix of both. About the only section numbers that are reliably
consistent are 1, 2, and 3.
By default, section 1 will be used unless the file ends in .pm
in which
case section 3 will be selected.
[2.19] If set to a true value, send error messages about invalid POD to
standard error instead of appending a POD ERRORS section to the generated
*roff output. This is equivalent to setting errors
to stderr
if
errors
is not already set.
This option is for backward compatibility with Pod::Man versions that did not
support errors
. Normally, the errors
option should be used instead.
[2.21] This option used to set the output encoding to UTF-8. Since this is now the default, it is ignored and does nothing.
As a derived class from Pod::Simple, Pod::Man supports the same methods and interfaces. See Pod::Simple for all the details. This section summarizes the most-frequently-used methods and the ones added by Pod::Man.
Direct the output from parse_file(), parse_lines(), or parse_string_document()
to the file handle FH instead of STDOUT
.
Direct the output from parse_file(), parse_lines(), or parse_string_document()
to the scalar variable pointed to by REF, rather than STDOUT
. For example:
my $man = Pod::Man->new(); my $output; $man->output_string(\$output); $man->parse_file('/some/input/file');
Be aware that the output in that variable will already be encoded in UTF-8.
Read the POD source from PATH and format it. By default, the output is sent
to STDOUT
, but this can be changed with the output_fh() or output_string()
methods.
Read the POD source from INPUT, format it, and output the results to OUTPUT.
parse_from_filehandle() is provided for backward compatibility with older versions of Pod::Man. parse_from_file() should be used instead.
Parse the provided lines as POD source, writing the output to either STDOUT
or the file handle set with the output_fh() or output_string() methods. This
method can be called repeatedly to provide more input lines. An explicit
undef
should be passed to indicate the end of input.
This method expects raw bytes, not decoded characters.
Parse the provided scalar variable as POD source, writing the output to either
STDOUT
or the file handle set with the output_fh() or output_string()
methods.
This method expects raw bytes, not decoded characters.
As of Pod::Man 5.00, the default output encoding for Pod::Man is UTF-8. This should work correctly on any modern system that uses either groff (most Linux distributions) or mandoc (Alpine Linux and most BSD variants, including macOS).
The user will probably have to use a UTF-8 locale to see correct output. This
may be done by default; if not, set the LANG or LC_CTYPE environment variables
to an appropriate local. The locale C.UTF-8
is available on most systems
if one wants correct output without changing the other things locales affect,
such as collation.
The backward-compatible output format used in Pod::Man versions before 5.00 is
available by setting the encoding
option to roff
. This may produce
marginally nicer results on older UNIX versions that do not use groff or
mandoc, but none of the available options will correctly render Unicode
characters on those systems.
Below are some additional details about how this choice was made and some discussion of alternatives.
The default output encoding for Pod::Man has been a long-standing problem. troff and nroff predate Unicode by a significant margin, and their implementations for many UNIX systems reflect that legacy. It's common for Unicode to not be supported in any form.
Because of this, versions of Pod::Man prior to 5.00 maintained the highly
conservative output of the original pod2man, which output pure ASCII with
complex macros to simulate common western European accented characters when
processed with troff. The nroff output was awkward and sometimes incorrect,
and characters not used in western European scripts were replaced with X
.
This choice maximized backwards compatibility with man and
nroff/troff implementations at the cost of incorrect rendering of many
POD documents, particularly those containing people's names.
The modern implementations, groff (used in most Linux distributions) and mandoc (used by most BSD variants), do now support Unicode. Other UNIX systems often do not, but they're now a tiny minority of the systems people use on a daily basis. It's increasingly common (for very good reasons) to use Unicode characters for POD documents rather than using ASCII conversions of people's names or avoiding non-English text, making the limitations in the old output format more apparent.
Four options have been proposed to fix this:
Optionally support UTF-8 output but don't change the default. This is the
approach taken since Pod::Man 2.1.0, which added the utf8
option. Some
Pod::Man users use this option for better output on platforms known to support
Unicode, but since the defaults have not changed, people continued to
encounter (and file bug reports about) the poor default rendering.
Convert characters to troff \(xx
escapes. This requires maintaining a
large translation table and addresses only a tiny part of the problem, since
many Unicode characters have no standard troff name. groff has the largest
list, but if one is willing to assume groff is the formatter, the next
option is better.
Convert characters to groff \[uNNNN]
escapes. This is implemented as the
groff
encoding for those who want to use it, and is supported by both
groff and mandoc. However, it is no better than UTF-8 output for
portability to other implementations. See Testing results for more
details.
Change the default output format to UTF-8 and ask those who want maximum backward compatibility to explicitly select the old encoding. This fixes the issue for most users at the cost of backwards compatibility. While the rendering of non-ASCII characters is different on older systems that don't support UTF-8, it's not always worse than the old output.
Pod::Man 5.00 and later makes the last choice. This arguably produces worse
output when manual pages are formatted with troff into PostScript or PDF,
but doing this is rare and normally manual, so the encoding can be changed in
those cases. The older output encoding is available by setting encoding
to
roff
.
Here is the results of testing encoding
values of utf-8
and groff
on
various operating systems. The testing methodology was to create man/man1
in the current directory, copy encoding.utf8 or encoding.groff from the
podlators 5.00 distribution to man/man1/encoding.1, and then run:
LANG=C.UTF-8 MANPATH=$(pwd)/man man 1 encoding
If the locale is not explicitly set to one that includes UTF-8, the Unicode
characters were usually converted to ASCII (by, for example, dropping an
accent) or deleted or replaced with <?>
if there was no conversion.
Tested on 2022-09-25. Many thanks to the GCC Compile Farm project for access to testing hosts.
OS UTF-8 groff ------------------ ------- ------- AIX 7.1 no [1] no [2] Alpine 3.15.0 yes yes CentOS 7.9 yes yes Debian 7 yes yes FreeBSD 13.0 yes yes NetBSD 9.2 yes yes OpenBSD 7.1 yes yes openSUSE Leap 15.4 yes yes Solaris 10 yes no [2] Solaris 11 no [3] no [3]
I did not have access to a macOS system for testing, but since it uses mandoc, it's behavior is probably the same as the BSD hosts.
Notes:
Unicode characters were converted to one or two random ASCII characters unrelated to the original character.
Unicode characters were shown as the body of the groff escape rather than the
indicated character (in other words, text like [u00EF]
).
Unicode characters were deleted entirely, as if they weren't there. Using
nroff -man
instead of man to format the page showed the same results as
Solaris 10. Using groff -k -man -Tutf8
to format the page produced the
correct output.
PostScript and PDF output using groff on a Debian 12 system do not support combining accent marks or SMP characters due to a lack of support in the default output font.
Testing on additional platforms is welcome. Please let the author know if you have additional results.
(F) You specified a *roff font (using fixed
, fixedbold
, etc.) that
wasn't either one or two characters. Pod::Man doesn't support *roff fonts
longer than two characters, although some *roff extensions do (the
canonical versions of nroff and troff don't either).
(F) The errors
parameter to the constructor was set to an unknown value.
(F) The quote specification given (the quotes
option to the
constructor) was invalid. A quote specification must be either one
character long or an even number (greater than one) characters long.
(F) The POD document being formatted had syntax errors and the errors
option was set to die
.
If set and Encode is not available, silently fall back to an encoding of
groff
without complaining to standard error. This environment variable is
set during Perl core builds, which build Encode after podlators. Encode is
expected to not (yet) be available in that case.
If set, this will be used as the value of the left-hand footer unless the
date
option is explicitly set, overriding the timestamp of the input
file or the current time. This is primarily useful to ensure reproducible
builds of the same output file given the same source and Pod::Man version,
even when file timestamps may not be consistent.
If set, and POD_MAN_DATE and the date
options are not set, this will be
used as the modification time of the source file, overriding the timestamp of
the input file or the current time. It should be set to the desired time in
seconds since UNIX epoch. This is primarily useful to ensure reproducible
builds of the same output file given the same source and Pod::Man version,
even when file timestamps may not be consistent. See
<https://reproducible-builds.org/specs/source-date-epoch/> for the full
specification.
(Arguably, according to the specification, this variable should be used only if the timestamp of the input file is not available and Pod::Man uses the current time. However, for reproducible builds in Debian, results were more reliable if this variable overrode the timestamp of the input file.)
Pod::Man 1.02 (based on Pod::Parser) was the first version included with Perl, in Perl 5.6.0.
The current API based on Pod::Simple was added in Pod::Man 2.00. Pod::Man 2.04 was included in Perl 5.9.3, the first version of Perl to incorporate those changes. This is the first version that correctly supports all modern POD syntax. The parse_from_filehandle() method was re-added for backward compatibility in Pod::Man 2.09, included in Perl 5.9.4.
Support for anchor text in L<> links of type URL was added in Pod::Man 2.23, included in Perl 5.11.5.
parse_lines(), parse_string_document(), and parse_file() set a default output
file handle of STDOUT
if one was not already set as of Pod::Man 2.28,
included in Perl 5.19.5.
Support for SOURCE_DATE_EPOCH and POD_MAN_DATE was added in Pod::Man 4.00, included in Perl 5.23.7, and generated dates were changed to use UTC instead of the local time zone. This is also the first release that aligned the module version and the version of the podlators distribution. All modules included in podlators, and the podlators distribution itself, share the same version number from this point forward.
Pod::Man 4.10, included in Perl 5.27.8, changed the formatting for manual page references and function names to bold instead of italic, following the current Linux manual page standard.
Pod::Man 5.00, included in Perl 5.37.7, changed the default output encoding to
UTF-8, overridable with the new encoding
option. It also fixed problems
with bold or italic extending too far when used with C<> escapes, and began
converting Unicode zero-width spaces (U+200B) to the \:
*roff escape. It
also dropped attempts to add subtle formatting corrections in the output that
would only be visible when typeset with troff, which had previously been a
significant source of bugs.
Pod::Man v6.0.0 and later unconditionally convert -
to the \-
*roff
escape, representing an ASCII hyphen-minus. Earlier versions attempted to use
heuristics to decide when a given -
character should translate to a
hyphen-minus or a true hyphen, but these heuristics were buggy and fragile.
v6.0.0 and later also unconditionally convert `
and '
to ASCII grave
accent and apostrophe marks instead of the default *roff behavior of
interpreting them as paired quotes.
There are numerous bugs and language-specific assumptions in the nroff
fallbacks for accented characters in the roff
encoding. Since the point of
this encoding is backward compatibility with the output from earlier versions
of Pod::Man, and it is deprecated except when necessary to support old
systems, those bugs are unlikely to ever be fixed.
Pod::Man doesn't handle font names longer than two characters. Neither do most troff implementations, but groff does as an extension. It would be nice to support as an option for those who want to use it.
Pod::Man copies the input spacing verbatim to the output *roff document. This means your output will be affected by how nroff generally handles sentence spacing.
nroff dates from an era in which it was standard to use two spaces after sentences, and will always add two spaces after a line-ending period (or similar punctuation) when reflowing text. For example, the following input:
=pod One sentence. Another sentence.
will result in two spaces after the period when the text is reflowed. If you
use two spaces after sentences anyway, this will be consistent, although you
will have to be careful to not end a line with an abbreviation such as e.g.
or Ms.
. Output will also be consistent if you use the *roff style guide
(and XKCD 1285) recommendation of putting a line
break after each sentence, although that will consistently produce two spaces
after each sentence, which may not be what you want.
If you prefer one space after sentences (which is the more modern style), you will unfortunately need to ensure that no line in the middle of a paragraph ends in a period or similar sentence-ending paragraph. Otherwise, nroff will add a two spaces after that sentence when reflowing, and your output document will have inconsistent spacing.
The *roff language distinguishes between two types of hyphens: -
, which is
a true typesetting hyphen (roughly equivalent to the Unicode U+2010 code
point), and \-
, which is the ASCII hyphen-minus (U+002D) that is used for
UNIX command options and most filenames. Hyphens, where appropriate, produce
better typesetting, but incorrectly using them for command names and options
can cause problems with searching and cut-and-paste.
POD does not draw this distinction. Before podlators v6.0.0, Pod::Man
attempted to translate -
in the input into either a hyphen or a
hyphen-minus, depending on context. However, this distinction proved
impossible to do correctly with heuristics. Pod::Man therefore translates all
-
characters in the input to \-
in the output, ensuring that command
names and options are correct at the cost of somewhat inferior typesetting and
line breaking issues with long hyphenated phrases.
To use true hyphens in the Pod::Man output, declare an input character set of UTF-8 (or some other Unicode encoding) and use Unicode hyphens. Pod::Man and *roff should handle those correctly with the default output format and most modern *roff implementations.
Similarly, Pod::Man disables the default *roff behavior of turning `
and
'
characters into matched quotes, and pairs of those characters into
matched double quotes, because there is no good way to tell from the POD input
whether this interpretation is desired or whether the intent is to use a
literal grave accent or neutral apostrophe. If you want paired quotes in the
output, use Unicode and its paired quote characters.
Written by Russ Allbery <rra@cpan.org>, based on the original pod2man by Tom Christiansen <tchrist@mox.perl.com>.
The modifications to work with Pod::Simple instead of Pod::Parser were contributed by Sean Burke <sburke@cpan.org>, but I've since hacked them beyond recognition and all bugs are mine.
Copyright 1999-2020, 2022-2024 Russ Allbery <rra@cpan.org>
Substantial contributions by Sean Burke <sburke@cpan.org>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Encode::Supported, Pod::Simple, perlpod(1), pod2man(1), nroff(1), troff(1), man(1), man(7)
Ossanna, Joseph F., and Brian W. Kernighan. "Troff User's Manual," Computing Science Technical Report No. 54, AT&T Bell Laboratories. This is the best documentation of standard nroff and troff. At the time of this writing, it's available at <http://www.troff.org/54.pdf>.
The manual page documenting the man macro set may be man(5) instead of man(7) on your system.
See perlpodstyle(1) for documentation on writing manual pages in POD if you've not done it before and aren't familiar with the conventions.
The current version of this module is always available from its web site at <https://www.eyrie.org/~eagle/software/podlators/>. It is also part of the Perl core distribution as of 5.6.0.
Russ Allbery > Software > podlators | Pod::Text > |