Russ Allbery > Software > Web Tools | spin Changes > |
(Translate thread, an HTML macro language, into XHTML)
spin [-dhv] [-e pattern ...] [-s url] [-o overrides] source [output]
spin [-s url] [-o overrides] -f
Perl 5.005 or later and the Image::Size and Text::Balanced modules. Also expects to find faq2html, cvs2xhtml, cl2xhtml, and pod2thread to convert certain types of files. The Git::Repository module is required to determine last change dates for thread source from Git history.
spin implements a fairly simple macro language that expands out into XHTML, as well as serving as a tool to maintain a set of web pages, updating a staging area with the latest versions, converting pages written in the macro language (named "thread"), and running faq2html where directed.
When invoked with the -f option, spin works in filter mode, reading thread from stdin and writing the converted output to stdout. Some features, such as appending a signature or navigation links, are disabled in this mode.
If source is a regular file, output should be the name of the file into which to put the output, and spin will process only that one file (which is assumed to be thread). output may be omitted to send the output to standard output. The same features are disabled in this mode as in filter mode.
Otherwise, each file in the directory source is examined recursively.
For each one, it is either copied verbatim into the same relative path
under output, used as instructions to an external program (see the
details on converters below), or converted to HTML. The HTML output for
external programs or for converted pages is put under output with the
same file name but with the extension changed to .html
. Missing
directories are created. If the -d flag is given, files and
directories in the output directory that do not correspond to files in
the source directory will be deleted.
Files that end in .th
are assumed to be in thread and are turned into
HTML. For the details of the thread language, see THREAD LANGUAGE
below.
Files that end in various other extensions are taken to be instructions to run an external converter on a file. The first line of such a pointer file should be the path to the source file, the second line any arguments to the converter, and the third line the style sheet to use if not the default. Which converter to run is based on the extension of the file as follows:
.changelog cl2xhtml .faq faq2html .log cvs log <file> | cvs2xhtml .rpod pod2thread <file> | spin -f
All other files not beginning with a period are copied as-is, except that files or directories named CVS, Makefile, or RCS are ignored. As an exception, .htaccess files are also copied over.
spin looks for a file named .sitemap at the top of the source directory and reads it for navigation information to generate the navigation links at the top and bottom of each page. The format of this file is one line per web page, with indentation showing the tree structure, and with each line formatted as a partial URL, a colon, and a page description. If two pages at the same level aren't related, a line with three dashes should be put between them at the same indentation level. The partial URLs should start with / representing the top of the hierarchy (the source directory), but all generated links will be relative.
Here's an example of a simple .sitemap file:
/personal/: Personal Information /personal/contact.html: Contact Information --- /personal/projects.html: Current Projects /links/: Links /links/lit.html: Other Literature /links/music.html: Music /links/sf.html: Science Fiction and Fantasy
This defines two sub-pages of the top page, /personal/ and /links/. /personal/ has two pages under it that are not part of the same set (and therefore shouldn't have links to each other). /links/ has three pages under it which are part of a set and should be linked between each other.
If .sitemap is present, this navigation information will also be put into the <head> section of the resulting HTML file as <link> tags. Some browsers will display this information as a navigation toolbar.
spin also looks for a file named .signature in the same directory as a thread file (and then at the top of the source tree if none is found in the current directory) and copies its contents verbatim into an <address> block at the end of the XHTML page (so the contents should be valid XHTML). The contents will be surrounded by an <address> tag, and added to the end of the supplied .signature contents will be information about when the page was last modified and generated.
spin looks for a file named .versions at the top of the source directory and reads it for version information. If it is present, each line should be of the form:
<product> <version> <date> <time> <files>
where <product> is the name of a product with a version number, <version> is the version, <date> and <time> specify the time of the last release (in ISO YYYY-MM-DD HH:MM:SS format and the local time zone), and <files> is any number of paths relative to source, separated by spaces, listing source thread files that use \version or \release for <product>. If there are more files than can be listed on one line, additional files can be listed on the next and subsequent lines so long as they all begin with whitespace (otherwise, they'll be taken to be other products). This information is not only used for the \version and \release commands, but also as dependency information. If the date of a release is newer than the timestamp of the output from one of the files listed in <files>, that file will be spun again even if it hasn't changed (to pick up the latest version and release information).
spin looks for a file named .rss in each directory it processes. If one is found, spin runs spin-rss on that file, passing the -b option to point to the directory about to be processed. spin does this before processing the files in that directory, so spin-rss can create or update files that will then be processed by spin as normal.
If there is a directory named .git at the top of the source tree,
spin will assume that the source is a Git repository and will try to
use git log
to determine the last modification date of files.
After populating the output tree with the results of converting or copying all the files in the source tree, delete all regular files in the output tree that do not have a corresponding file in the source tree. Directories will be mentioned in spin's output but will not be deleted.
Exclude files matching the given regular expression pattern from being converted. This flag may be used multiple times.
Run spin in filter mode rather than converting a whole tree of files. Thread source is read from stdin and the XHTML output is written to stdout. The signature and navigation links are disabled.
Print out this documentation (which is done simply by feeding the script
to perldoc -t
).
Load the overrides file using the Perl do command. This file should contain Perl code that overrides or adds to the Perl code that's part of spin. It can be used to define new commands or change the behavior of existing commands.
The base URL for style sheets. All style sheets specified in \heading commands will be considered to be relative to this URL and this URL will be prepended to them (otherwise, they'll be referred to as if they're in the same directory as the generated file). This will similarly be used as the base URL to style sheets for the output of cl2xhtml, cvs2xhtml, and faq2html.
Print out the version of spin and exit.
A thread file is mostly plain ASCII text with a blank line between
paragraphs. There is no need to explicitly mark paragraphs; paragraph
boundaries will be inferred from the blank line between them and the
appropriate <p> tags will be added to the HTML output. There is no need
to escape any character except \
(which should be written as \\
) and
an unbalanced [ or ] (which should be written as \entity[91]
or
\entity[93]
respectively). Escaping [ or ] is not necessary if the
brackets are balanced within the paragraph, and therefore is only rarely
needed.
Commands begin with \
. For example, the command to insert a line break
(corresponding to the <br> tag in HTML) is \break. If the command takes
arguments, they are enclosed in square brackets after the command. If
there are multiple arguments, they are each enclosed in square brackets
and follow each other. Any amount of whitespace (but nothing else) is
allowed between the command and the arguments, or between the arguments.
So, for example, all of the following are entirely equivalent:
\link[index.html][Main page] \link [index.html] [Main page] \link[index.html] [Main page] \link [index.html] [Main page]
(\link is a command that takes two arguments.)
Commands can take multiple paragraphs of text as arguments in some cases (for things like list items). Commands can be arbitrarily nested.
Some commands take an additional optional argument which specifies the
class attribute for that HTML tag, for use with style sheets, or the id
attribute, for use with style sheets or as an anchor. That argument is
enclosed in parentheses and placed before any other arguments. If the
argument begins with #
, it will be taken to be an id. Otherwise, it
will be taken as a class. For example, a first-level heading is normally
written as:
\h1[Heading]
(with one argument). Either of the following will add a class attribute
of header
to that HTML container that can be referred to in style
sheets:
\h1(header)[Heading] \h1 (header) [Heading]
and the following would add an id attribute of intro
to the heading so
that it could be referred to with the anchor #intro
:
\h1(#intro)[Introduction]
Note that the heading commands have special handling for id attributes; see below for more details.
There are two commands that are required to occur in every document. The first is \heading, which must occur before any regular page text. It takes two arguments, the first of which is the page title (the title that shows up in the window title bar for the browser and is the default text for bookmarks, not anything that's displayed as part of the body of the page) and the second of which is the style sheet to use. If there is no style sheet for this page, the second argument should be empty ([]).
The second required command is \signature, which must be the last command in the file. \signature will take care of appending the signature, appending navigation links, closing any open blocks, and any other cleanup that has to happen at the end of a generated HTML page.
It is also highly recommended, if you are using Subversion, CVS, or RCS
for revision control, to put \id[$Id$] as the first command in each
file. In Subversion, you will also need to enable keyword expansion with
svn propset svn:keywords Id file
. spin will then take care of
putting the last modified date in the footer for you based on the Id
timestamp (which may be more accurate than the last modified time of the
thread file). If you are using Git, you don't need to include anything
special in the thread source; as long as the source directory is the
working tree of a Git repository, spin will use Git to determine the
last modification date of the file.
You can include other files with the \include command, although it has a few restrictions. The \include command must appear either at the beginning of the file or after a blank line and should be followed by a blank line, and you should be careful not to include the same file recursively. Thread files will not be automatically respun when included files change, so you will need touch the thread file to force it to be respun.
Block commands are commands that should occur in a paragraph by themselves, not containined in a paragraph with other text. They indicate high-level structural elements of the page. Three of them were already discussed above:
As described above, this sets the page title to <title> and the style sheet to <style>. If the -s option was given, that base URL will be prepended to <style> to form the URL for the style sheet; otherwise, <style> will be used verbatim as a URL.
Tells spin the Subversion, CVS, or RCS revision number and time. This string is embedded verbatim in an HTML comment near the beginning of the generated output as well as used for the last modified information added by the \signature command. For this command to behave properly, it must be given before \heading.
Include <file> after the current paragraph. If multiple files are included in the same paragraph, they're included in reverse order, but this behavior may change in later versions of spin. It's strongly recommended to always put the \include command in its own paragraph. Don't put \heading or \signature into an included file; the results won't be correct.
Here are the rest of the block commands. Any argument of <text> can be multiple paragraphs and contain other embedded block commands (so you can nest a list inside another list, for example).
Put text in an indented block, equivalent to <blockquote> in HTML. Used primarily for quotations or things like license statements embedded in regular text.
<text> is formatted as an item in a bullet list. This is like <li> inside <ul> in HTML, but the surrounding list tags are inferred automatically and handled correctly when multiple \bullet commands are used in a row. Normally, <text> is treated like a paragraph.
If used with a class attribute of packed
, such as with:
\bullet(packed)[First item]
then the <text> argument will not be treated as a paragraph and will not be surrounded in <p> tags. No block commands should be used inside this type of \bullet command. This variation will, on most browsers, not put any additional whitespace around the line and will look better for bulleted lists where each item is a single line.
An element in a description list, where each item has a tag <heading> and an associated body text of <text>, like <dt> and <dd> in HTML. As with \bullet, the <dl> tags are inferred automatically.
Level one through level six headings, just like <h1> .. <h6> in HTML. If given an id argument, such as:
\h1(#anchor)[Heading]
then not only will an id attribute be added to the <h1> container but the
text of the heading will also be enclosed in an <a name> container to
ensure that #anchor
can be used as an anchor in a link even in older
browsers that don't understand id attributes. This is special handling
that only works with \h1 through \h6, not with other commands.
<text> is formatted as an item in a numbered list, like <li> inside <ol>
in HTML. As with \bullet and \desc, the surrounding tags are inferred
automatically. As with \bullet, a class attribute of packed
will omit
the paragraph tags around <text> for better formatting with a list of
short items. See the description under \bullet for more information.
Insert <text> preformatted, preserving spacing and line breaks. This uses the HTML <pre> tag, and therefore is normally also shown in a fixed-width font by the browser.
When using \pre inside indented blocks or lists, it's worth bearing in mind how browsers show indentation with \pre. Normally, the browser indents text inside \pre relative to the enclosing block, so you should only put as much whitespace before each line in \pre as those lines should be indented relative to the enclosing text. However lynx, unfortunately, indents relative to the left margin, so it's difficult to use indentation that looks correct in both lynx and other browsers.
Used for quotes at the top of a web page. The whole text will be enclosed
in a <blockquote> tag with class quote
for style sheets. <text> may be
multiple paragraphs, and then a final paragraph will be added (with class
attribution
) containing the author, a comma, and the <work> inside
<cite> tags. <work> can be omitted by passing an empty third argument.
If \quote is given a class argument of broken
, <text> will be treated
as a series of lines and a line break (<br />
) will be added to the
end of each line.
Indicates that this page has a corresponding RSS feed at the URL <url>. The title of the RSS feed (particularly important if a page has more than one feed) is given by <title>. The feed links are included in the page header output by \heading, so this command must be given before \heading to be effective.
A horizontal rule, <hr> in HTML.
Inserts an unordered list showing the structure of the whole site, provided that a .sitemap file was found at the root of the source directory and spin wasn't run as a filter or on a single file. If .sitemap wasn't found or if spin is running as a filter or on a single file, inserts nothing.
Be aware that spin doesn't know whether a file contains a \sitemap command and hence won't know to respin a file when the .sitemap file has changed. You will need touch the source file to force it to be respun.
Creates a table. The <options> text is added verbatim to the <table> tag
in the generated HTML, so it can be used to set various HTML attributes
like cellpadding
that aren't easily accessible in a portable fashion
from style sheets. <body> is the body of the table, which should
generally consist exclusively of \tablehead and \tablerow commands.
The descriptions are somewhat hard to read, so here's a sample table:
\table[rules="cols" borders="1"][ \tablehead [Older Versions] [Webauth v3] \tablerow [suauthSidentSrvtab] [WebAuthKeytab] \tablerow [suauthFailAction] [WebAuthLoginURL] \tablerow [suauthDebug] [WebAuthDebug] \tablerow [suauthProxyHeader] [(use mod_headers)] ]
The table support is currently preliminary. I've not yet found a good way of expressing tables, and it's possible that the syntax will change later.
A heading row in a table. \tablehead takes any number of <cell> arguments, wraps them all in a <tr> table row tag, and puts each cell inside <th>. If a cell should have a certain class attribute, the easiest way to do that is to use a \class command around the <cell> text, and the class attribute will be "lifted" up to become an attribute of the enclosing <th> tag.
A regular row in a table. \tablerow takes any number of <cell> arguments, wraps them all in a <tr> table row tag, and puts each cell inside <td>. If a cell should have a certain class attribute, the easiest way to do that is to use a \class command around the <cell> text, and the class attribute will be "lifted" up to become an attribute of the enclosing <th> tag.
Inline commands can be used in the middle of a paragraph intermixed with other text. Most of them are simple analogs to their HTML counterparts. All of the following take a single argument (the enclosed text) and map to simple HTML tags:
\bold <b></b> (usually use \strong) \cite <cite></cite> \code <code></code> \emph <em></em> \italic <i></i> (usually use \emph) \strike <strike></strike> (should use styles) \strong <strong></strong> \sub <sub></sub> \sup <sup></sup> \under <u></u> (should use styles)
Here are the other inline commands:
A forced line break, <br> in HTML.
Does nothing except wrap <text> in an HTML <span> tag. The only purpose of this command is to use it with a class argument that can be used in a style sheet. For example, you might write:
\class(red)[A style sheet can make this text red.]
so that the style sheet can then refer to class red
and change its
color.
An HTML entity with code <code>. Basically, becomes &<code>; in the generated HTML, or &#<code>; if <code> is entirely numeric. About the only time you'd need to use this is for non-ASCII characters (European names, for example) or if you need a literal [ or ] that isn't balanced.
Insert an inline image. <text> is the alt text for the image (which will be displayed on non-graphical browsers). Height and width tags are added automatically assuming that <url> is a relative URL in the same tree of files as the thread source.
Create a link to <url> with link text <text>. Basically <a href=""></a>.
Replaced with the date portion of the version information for <product>, taken from the .versions file at the top of the source tree. The date will be returned in the UTC time zone, not the local time zone.
Replaced with the size of <file> in B, KB, MB, GB, or TB as is most appropriate, without decimal places. The next largest unit is used if the value is larger than 1024. 1024 is used as the scaling factor, not 1000.
Replaced with the version number for <product>, taken from the .versions file at the top of the source tree.
One of the important things that thread supports over HTML is the ability to define new macros on the fly. If there are particular constructs that are frequently used on the page, you can define a macro at the top of that page and then just use it repeatedly throughout the page.
A string can be defined with the command:
\=[<string>][<value>]
where <string> is the name that will be used (can only be alphanumerics plus underscore) and <value> is the value that string will expand into. Any later occurrance of \=<string> in the file will be replaced with <value>. For example:
\=[HOME][http://www.stanford.edu/]
will cause any later occurrences of \=HOME in the file to be replaced with
the text http://www.stanford.edu/
. This can be useful for things like
URLs for links, so that all the URLs can be collected at the top of the
page for easy updating.
A new macro can be defined with the command:
\==[<name>][<arguments>][<definition>]
where <name> is the name of the macro (again consisting only of alphanumerics or underscore), <arguments> is the number of arguments that it takes, and <definition> is the definition of the macro. When the macro is expanded, any occurrence of \1 in the definition is replaced with the first argument, any occurrence of \2 with the second argument, and so forth.
For example:
\==[bolddesc] [2] [\desc[\bold[\1]][\2]]
defines a new macro \bolddesc that takes the same arguments as the regular \desc command but always wraps the first argument, the heading, in <strong>.
Currently, the style sheets for cl2xhtml, cvs2xhtml, faq2html, and pod2thread are hard-coded into this program to fit my web pages. This makes this program awkward for others to use, since the style sheet has to be specified in every pointer file if they're using different names.
There is no way to configure how navigation links are added if the sitemap support is used.
\include needs some work to make it behave as expected without requiring that each \include be in its own paragraph. It should be possible to support \heading and \signature in included files without breaking the navigation link support.
\sitemap can only be used at the top of the web site or the links would be wrong. It needs to do relative adjustment of the links.
The sitemap support currently only adds previous, next, up, and top links in the header of the generated web page. Most browsers that support this functionality also support first and last links, and the information is available in the sitemap file to generate those. They should also be included.
cl2xhtml(1), cvs2xhtml(1), faq2html(1), pod2thread(1), spin-rss(1)
The XHTML 1.0 standard at <http://www.w3.org/TR/xhtml1/>.
Current versions of this program are available from my web tools page at <http://www.eyrie.org/~eagle/software/web/>, as are copies of all of the above-mentioned programs.
Russ Allbery <rra@stanford.edu>
Copyright 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Russ Allbery <rra@stanford.edu>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Russ Allbery > Software > Web Tools | spin Changes > |