< faq2html Documentation | Russ Allbery > Software > Web Tools |
Fix recognition of paragraphs composed entirely of bullets, which was broken in some iteration on previous diffs.
Lines changed: +5 -3
Update copyright dates and my email address.
Lines changed: +4 -4
Support bullet lists with no blank lines between the entries but with some multi-line entries. Only bullet lists, not number lists, of this type are supported since I don't yet have any need for the latter and it's more complex to support them.
Lines changed: +34 -8
If a page title was given with -t, don't try to parse headers at the top of the document. This fixes formatting with Debian copyright-format 1.0 files (which need their own parser, but that's a problem for another day).
Lines changed: +10 -8
Support multiple subheadings at the start of the file. If the first line after a blank line following a subheading is centered, assume that it's the start of a second subheading and process it the same as the first. This prevents the indent from being set wrong for later text, which in turn prevents lots of lines from being erroneously considered headings.
Lines changed: +8 -2
Add -l option to request adding a last modified subheading based on the last modification timestamp, matching behavior that usually only happens when the file contains an RCS/CVS Id string. Always add a last modified subheading if there is last modified information available, either from an Id string or from the -l option.
Lines changed: +59 -24
Allow 34 characters instead of 33 in headings where no base indent has been established to allow for control-archive version headings.
Lines changed: +2 -2
Be less aggressive about using <pre> for single-line paragraphs if they look like regular text and end with a colon.
Lines changed: +4 -3
Suppress the revision number in last modified subheadings if it looks like a Subversion revision number, since those aren't very interesting.
Lines changed: +5 -4
Assume UTF-8 input and output for all pages instead of ISO 8859-1.
Lines changed: +13 -16
Allow 33 characters instead of 30 in headings when no base indent has been established to allow for pam-afs-session version headings. Bleh.
Lines changed: +2 -2
If we think part of a paragraph is a subheading, think that whole paragraph is a subheading even if a line is too long for us to consider it centered.
Lines changed: +2 -2
CVS, in its infinite wisdom, has decided to start using dashes instead of slashes to separate parts of the date in the Id string.
Lines changed: +2 -2
Update copyright date.
Lines changed: +5 -3
Allow for ampersands in headings. Don't lose the indentation level at headings. Keep track of indentation properly for paragraphs that are all numbered elements. Fix the handling of broken bodies of description paragraphs. Close all state at the end of the document.
Lines changed: +12 -8
Rework how block structure elements (<blockquote> and lists) are handled internally so that they can finally be nested. Maintain a stack of indentation and block structure pairs and maintain it properly on changes of indentation in the source file.
Header recognition is now even more conservative, due to the false positives from mostly looking at short lines. In order for a header to be recognized as such, it now must be outdented, underlined, or in all caps. (Recognizing the first outdented header of a document is still rather tricky.)
Only use Id strings that have been expanded, getting rid of the strange results when converting a document with an Id tag that hasn't been committed to CVS yet. Get rid of a few other warnings in unusual situations. Simplify the handling of quoted paragraphs. Allow for From_ lines in documents that begin with mail headers. Improve the distinction between mail headers and internal documentation headers.
Lines changed: +100 -80
Allow for description lists where the title has multiple lines. If there are several lines all with the same indentation at the beginning of a paragraph, and then the rest of the paragraph uses a different indentation consistently, this is now taken to be a description list (provided that it doesn't trigger one of the earlier checks). Hopefully this won't break anything.
Lines changed: +16 -9
Add code to handle all-numbered paragraphs similar to the code that handles all-bulletted paragraphs.
Lines changed: +24 -2
Allow / in headings for S/Ident, and be a bit more careful about establishing baselines, only doing it for paragraphs that aren't indented.
Lines changed: +5 -3
Add a -u option to say to use value attributes for <li> tags, and don't generate such attributes by default. This allows the output to be XHTML Strict unless this option is used. Don't use <strong> tags around the subheader; leave that for the style sheet.
Lines changed: +36 -16
Remove the <?xml-stylesheet> directive since it doesn't make much sense for Appendix C XHTML 1.0. Improve whitespace handling at the end of the generated file.
Lines changed: +3 -10
The information about whether the paragraph was broken obtained early on is invalid once we've discovered that we're looking at a description list, so recheck the actual body of the list rather than using it.
Lines changed: +2 -2
Add a -t option to specify the page title, overriding whatever is in the file (but intended primarily so that faq2html can convert text documents that don't have an internal title).
Lines changed: +12 -5
Suppress an undefined value warning when checking for sudden indentiation changes.
Lines changed: +4 -3
Determine whether the paragraph is broken before we turn URLs into links, since that can artificially increase the length of lines. Don't consider a paragraph to be quoted if there's only one line.
Lines changed: +11 -8
Use dashes instead of slashes in the date reported as part of the version string and send version output to stdout rather than stderr.
Lines changed: +4 -2
Move the <meta> tag to declare the character set to the bottom of the <head> section to match my other pages and use lowercase iso-8859-1. Move the generator comments to below <head> to work around a bug in IE.
Lines changed: +7 -7
Strip off leading numbers or bullets when checking for offset paragraphs; we were misdetecting numbered paragraphs as offset after a long <pre> block.
Consider URLs a sentence so that indented URLs aren't detected as literal text and wrapped in <pre>; I prefer <blockquote>.
Lines changed: +20 -4
Strip indentation from <pre> blocks because at least Mozilla respects both their indentation and the indentation of the surrounding block and that feels like the more correct behavior to me. This is different than what lynx and links do, so this will cause the generated XHTML to look different in those.
Lines changed: +60 -25
Include an XML processing directive also specifying the style sheet for XML renderers, and indent the contents of the <head> tag appropriately.
Lines changed: +13 -4
Remove support for minidents, look for URLs in angle brackets without the URL: prefix, add a character set header to all generated pages specifying ISO 8859-1, and take a command-line option specifying the style sheet.
Lines changed: +46 -41
Convert to XHTML instead of HTML. The main user-noticable differences are that formatting information is now not included in-line and instead an external style sheet is referenced, and section tags now start with "S" to fit the strictness requirements for anchors.
Lines changed: +78 -48
Fix the sentence detector to not think that "cvs commit ." looks like a sentence just because it ends in a period. Fix the heading detector so that all-lowercase short strings even with the standard indentation aren't considered headers.
Lines changed: +3 -2
Remember whether the last paragraph ended up being wrapped in <pre> and if so be a bit more aggressive about continuing to wrap things in <pre> (this fixes a problem with formatting an example header for a C program since it contained blank lines and the lines after the first looked more like block-quoted text). Don't put <p> tags around the <li> elements of lists where each element was a single line. Avoid recognizing broken paragraphs "quoted" with a single common character as block quotes, since this was erroneously detecting shell script comments as block quotes.
Lines changed: +30 -7
Break the header parsing out into its own functions and add rules to handle the documentation format that we use internally at Stanford. Also improve the -v/--version code to print out the last modified date.
Lines changed: +78 -36
Rewrap and fix section headings for 78 columns instead of 76. Always put <br> at the end of each line of a contents section, even if the lines aren't short enough for is_broken to trigger (in part because adding the <a href> tags pushes the lines long enough that it frequently doesn't).
Lines changed: +85 -86
Remove an unused variable. Bump revision; this is production.
Lines changed: +1 -2
Look for bulletted paragraphs before quoted paragraphs; this disallows paragraphs quoted with -, o, or *, but those should be uncommon. Advantage is that it recognizes bulletted lists better.
Lines changed: +9 -9
Improved the check for broken lines in a paragraph to always trigger on paragraphs that contain really short lines (not at the end of the paragraph). Allow two-line description list entries and hope it doesn't break anything else.
Lines changed: +3 -2
Be even more careful about detecting headings inside a contents block, where things that look like headings can easily be found in normal text.
Lines changed: +6 -3
Added support for paragraphs that are entirely bullet lines, made the heading detection code less sensitive so that lines over 30 characters are never considered headings (this may need revisiting), added support for wrapping in <pre> any paragraph that has varying internal indentation of greater magnitude than minidents, added support for URLs in headers.
Lines changed: +33 -5
To get accurate untabification, you have to start with the first tab on a line, not the last tab on a line, so C<.*> has to be C<.*?>.
Lines changed: +2 -2
Allow periods in section headings, just not at the end of the heading.
Lines changed: +2 -2
Relax the requirements for is_centered and strengthen them for is_literal to avoid some false positives.
Lines changed: +3 -3
Added documentation and enough option parsing to handle -h and -v.
Lines changed: +173 -2
Require that centered text have at least ten spaces of whitespace at the beginning of the line, and drop the requirement that document titles pass the is_header check in favor of just the now-more-restrictive centered check. Also ignore all text after a signature.
Lines changed: +15 -4
Strip mailto: and news: off of link text for URLs.
Lines changed: +9 -3
Fixed a bug in is_heading with even headers in the middle of a blockquote or list.
Lines changed: +2 -2
Actually parse subheaders, pulling out Original-author and HTML-title. The latter overrides Subject as the title of the document, and the former goes into the generated subheader if the document doesn't have one.
Lines changed: +21 -6
Fixed description list handling so that a sequence of description list entries would be recognized properly (including fixing an obvious bug), added support for minidents in description list entries, and fixed handling of whitespace-only lines. Also reordered some of the logic for producing the page title so that page titles derived from Subject headers would still have capitalization fixed.
Lines changed: +29 -22
Fixed a problem with output printing out stored whitespace twice, fixed quoted <blockquote> handling so that it sents $INDENT properly.
Lines changed: +8 -3
Reorder processing of the main body somewhat so that <pre> literal text still gets HTML special characters escaped, but <hr> checks don't (so that rules with <, >, or & don't cause us fits).
Lines changed: +25 -24
Escape HTML special characters in headers occuring after a digest divider.
Lines changed: +2 -1
Add the RCS/CVS Id of the original document to the header if we know it and put more blank lines around our leading comments.
Lines changed: +5 -4
Clean up our input handling so that everything uses a global glob reference to read from, more things are handled by slurp, and we can now act as a filter if so desired. Also declare our DTD and brag about who we are in a comment.
Lines changed: +57 -50
Cleaned up the algorithm to detect headings to better distinguish between them and outdented <pre> sections, at the same time allowing more characters in headings. In order to be maximally flexible there, though, one has to outdent headers; otherwise, we're more conservative. Also tweaked the check for broken lines slightly, fixing one bug and considering lines to be broken even if they're not short if they contain no internal whitespace (and there's more than one line in the paragraph). Simplified the check for whether something's a sentence to disallow ending in commas and semicolons, since escaped characters end in semicolons. Improved our handling of literal <pre> sections by continuing to slurp data until we see a totally blank line, thus allowing full patches to be included in FAQs. Disallow indented description lists to disambiguate between them and <pre> examples.
Lines changed: +50 -27
Fixed the handling of subheadings yet *again* so that they should now work correctly in the presence of blank lines between parts of them. Added support for multilevel headings in the body (both <h2> and <h3>) and some black magic to figure out which to use. Expanded the cases in which we'll turn *text* into strong tags by allowing the trailing * to be followed by word-separator punctuation. We no longer close open structure elements on starting a <pre>, and we handle multiparagraph list items. Also determine the baseline of indentation for body text and use sudden changes of indentation to 0 when that's less than the baseline as a sign to wrap <pre> around things.
Lines changed: +55 -28
Initial support for turning *text* into <bold>text</bold> in a way that shouldn't mangle use of * as a wildcard. Also fixed <br> and </strong> tag handling for subheaders in a way that doesn't result in unnecessary extra <br> tags in the subheading.
Lines changed: +27 -18
Added support for bulletted lists and for tables of contents in sections with a header matching /\bcontents\b/i. Tweaked the broken lines test and the description list test to avoid false positives on two-line paragraphs with minidented second lines, and to avoid counting the last line of a paragraph in the check for broken lines. Allowed internal whitespace in a rule of '-' or '=' and fixed blank line handling for digest dividers. Added <br> tags for broken lines in ordered lists, description lists, and regular paragraphs.
Lines changed: +52 -13
Cleaned up the general output methods to provide better handling of space between elements; it's now stashed away until the next element prints and is put after any closing tags. This simplifies a lot of code that was attempting to do something similar in various incorrect ways. Also cleaned up the URL processing subs to make them simpler and fixed the output for ordered lists.
Lines changed: +64 -68
Added support for description lists and revised the way multiparagraph structure was handled so that list items could span multiple paragraphs the way they're supposed to be able to do. Also added <a name> anchors for section headings, if they start with a section number.
Lines changed: +48 -15
Added support for verbatim paragraphs, automatic blockquoting based on indent levels, recognition and handling of non-wrapping line breaks, horizontal rules, and digest section dividers. Expanded and cleaned up the subheading code, with support for multiple subheadings and fixing of revision subheaders. Added support for slightly indented URLs in paragraphs, and fixed the URL code so that it strips the surrounding <URL:...>. Tweaked the recognition of verbatim paragraphs left over from the first draft.
Lines changed: +129 -41
Added support for a subtitle containing the authorship and last modified date information, if available. Also tweaked the heading detection logic to exclude lines ending in colons (or a variety of other characters allowed inside lines).
Lines changed: +37 -4
Clean up the whitespace handling of multiparagraph containers so that the blank lines after the last paragraph in the container shows up after the container close tag.
Lines changed: +26 -16
Added code to generate list items and list containers and to clean up after multiparagraph containers. We now support numbered paragraphs and generate the appropriate ordered list. Cleaned up the data flow in the body parser a good bit, separating things out and commenting them and cleaning up the loop boundaries and file reading code. Added support for an initial file heading, and setting the document title on the basis of it. Cleaned up the whitespace handling around containers and fixed bugs in a few regexes. This is the first version that can handle the Guidelines.
Lines changed: +128 -22
Added parsing of a leading RCS Id string, headers, and sub-headers for FAQs, and added generation of <html>, <head>, and <body> enclosures and some rudimentary <title> support. Added a check for headers and blank lines, expanded the check for a heading to include allowing for an underlined heading, and tweaked the check for literal spacing some. Also added subs to remove leading bullets or numbers on paragraphs and cleaned up the HTML generation helpers considerably. Switched away from paragraph mode to our own version of paragraph mode that handles lines with whitespace.
Lines changed: +104 -27
First working version, for simple cases.
< faq2html Documentation | Russ Allbery > Software > Web Tools |