CGI FPI and Pipeline

CGI Filter/Pipe Interface (CGI FPI) runs a set of modules when a HTML form is submitted, replacing traditional, monolithic CGI scripts.

by James Hoagland

CGI FPI

The CGI Filter/Pipe Interface (CGI FPI) replaces a monolithic ``do everything'' form handling CGI script with an easy to maintain, easy to adapt set of modules. Each module performs a well-defined task, with command-line like arguments for configuring its behavior. Modules are run from a single driver program, which parses the form input, builds a convenient internal representation of the form fields, and executes a set of modules in a specified order.

Environment variables and fields can both be modified and passed between modules. You can think of the modules as mini-CGI scripts; however, the details of obtaining the form data are hidden from the module developer. Perhaps a more useful analogy is that of Unix shell pipes with the modules corresponding to the programs executed.

CGI FPI has a number of benefits resulting from 1) increased modularity, 2) task subdivision, and 3) abstraction away from the details of CGI:

Easier to work with.
A set of simple modules means each module is easier to understand, debug and maintain.

More Options.
Being able to pick and choose among existing modules allows non-programmers more options.

Reusable.
Not all functionality has to be implemented from scratch---developers can re-use modules.

Streamlined.
Having clear arguments avoids needing to somehow encode user-selected options into the form.

Hidden ``dirty work''.
Hiding the CGI details allows the script developer to concentrate on functionality.

Portability.
OS-dependent operations and optimizations can be isolated in a few modules.

* Pipeline is a freely available implementation of CGI FPI which runs on Unix and requires Perl version 5. Implementations are certainly possible for other platforms such as Macintosh, DOS, and MS Windows.

Uses of Pipeline

Pipeline has already been used in a number of different instances. In one, a simple feedback form-handler takes data and appends it to a file, e-mails it to someone, or both. The data can be the result of an extensive survey, or simply a handful of fields the user has filled out. Pipeline has also been used in a chat type situation, where the output of a form is processed, appended to an HTML page, and then re-displayed.

An HTML form-creating utility is a good example of the use of Pipeline to make HTML pages on the fly. Pipeline has been used to distribute software, based on the contents of an HTML form; a package of modules called the ``HTML Form Processing Modules'' (HFPM) distributes itself with HFPM Pipeline modules. Pipeline has also been used to handle things like registration at conferences, and other form-based tasks.

Of course Pipeline can be used anywhere a regular CGI script might be used; however, it is particularly suited for cases where somewhat separate operations take place (perhaps with priority restrictions), and for cases where existing modules can be re-used or easily adapted to provide some of the desired functionality.

Using Pipeline

To use Pipeline version 1.0 (February 1996) in your form, set the ACTION part of the FORM tag to be the URL of pipeline.pl. For example:

<FORM METHOD="post"
ACTION="/cgi-bin/pipeline.pl"> 

Pipeline works with both the POST and GET form submission methods, and uses three hidden fields to control processing:

Forms using Pipeline must contain a hidden _pipeline field which uses the VALUE tag to indicate what modules to run, in what order, and with what arguments. The contents of this field are commonly known as the ``pipeline.'' The first ``word'' in each section is the name of the module; subsequent ``words'' are arguments to be passed to the module, and sections are separated by the ``pipe'' character, ``|''. The modules are executed with their arguments in the order specified in the hidden variable:

<INPUT TYPE="hidden" NAME="_pipeline"
VALUE="set.pl e-mail=smith 
| formatted_mail.pl /WWW/mailformat user@org.dom 
| showflds.pl"> 

In this example, the ``set.pl'' module is run first, with the argument email=smith, then ``formatted_mail.pl'' is run (with updated fields) with arguments /WWW/mailformat and user@org.dom, and finally ``showflds.pl'' is run with no arguments.

Using the optional _path field permits the convenience of telling Pipeline where to look for modules.

<INPUT TYPE="hidden" NAME="_path"
VALUE="/htbin/pipeline/specialmods"> 

The alternative (which is available even if you use this field) is to specify the full path to modules on the pipeline list of modules:

<INPUT TYPE="hidden" NAME="_pipeline"
VALUE="/htbin/pipeline/set.pl email=smith"> 

The [_path] field is analogous to the path variable in Unix shells; it stores a list of directories to look in for modules. Pipeline looks for the modules in the whitespace separated list in list order.

Note that whitespace is the separating character, so having ``set.pl message=Hi there'' in the pipeline results in Pipeline giving ``set.pl'' two arguments: ``message=Hi'' and ``there''. But what if you want ``message=Hi there'' as a single argument or if you want a ``|'' in some text? As with Unix shells, quoting is provided by Pipeline for the ``_pipeline'' field.

There are two quoting options available. The first is the backslash character (\) which quotes (or ``escapes'') the very next character. This can be used, for example, to insert a space or a pipe character. The second option is to enclose the text to be quoted in single-quotes ('). This can be used to force a range of text to be treated as a single ``word''. For example, either set.pl 'message=Hi there' (note the single quotes) or set.pl message=Hi\ there can be used to execute the ``set.pl'' module with argument ``message=Hi there''. If you need to pass a single quote ('), it must be escaped (\').

Pipeline also provides for error reporting for both itself and any modules it runs. If the optional hidden _errorsto field is set to an e-mail address, any problems get reported via e-mail. If there are problems sending mail, or if _errorsto is NULL, errors are sent to stderr.

An example--HFPM

The ``HTML Form Processing Modules'' package (HFPM) is a set of freely available modules for use with Pipeline. HFPM version 1.0 (February 1996) consists of 17 modules which demonstrate the potential of CGI FPI (and provides the full functionality of Getcomments version 2.2, a CGI script that is the predecessor of HFPM).

These modules are to set form fields and environmental variables:

These modules output to the user's client (browser):

These modules output to a file or to e-mail:

In addition to these modules, gc.pl is provided for backward compatibility with Getcomments version 2.2.

Full documentation on HFPM, including how to use it, is * available online.

An * on-line utility for creating forms that use Pipeline and HFPM is available publicly. At the time of this writing, three distinct types of forms can be created.

Other Modules

Other modules have been written besides those in HFPM. These include:

Some of these modules are in the public domain and some are not.

Module Ideas

There is plenty of room left for new modules to be developed.

For instance, validation of input is important in any sort of database application (and in other circumstances) where input needs some degree of integrity. A simple approach for doing sanity checks is to try to match the input to a specified regular expression. If the pattern doesn't match, the user could be asked to resubmit the data. More field-type-specific approaches can be taken as well.

For example, you might attempt to verify the form submitter's e-mail address. Numeric values could be checked for appropriate range in another module. Another module could check for consistency among entered values. You could also write a general module to check for appropriate mutual exclusion between fields or a module to check to make sure certain fields in a given set have been entered.

Modules could also be developed for performing transformations on a field. A module could be written to remove HTML tags from specified fields. This can be done, for example, in chat-type pages where you might want to prohibit inclusion of images. Another could remove certain text from specified field(s). This might be used to remove expletives from text. A fairly general module to search for a regular expression and to replace it with a set phrase could even be written.

Modules could be written to perform calculations on input fields. Some obvious possibilities include a general module to perform mathematical computations on numeric input fields. One could sum up cost per unit by units ordered using such a module. Another possibility is to do a string concatenation of textual fields. There are a large number of applications for modules in this area.

There is, of course, the need for an occasional application-specific module, although it is more desirable to produce reusable modules. For example, you could write a vote-tabulation module for voting and survey forms.

You could even write modules to interface with programs that are not modules. You could write a module to pass the form input to a non-CGI FPI compliant script via the POST or GET method, allowing non-modules to be run as well. Another module idea along these lines is a module to pipe form fields to an ordinary external program, i.e. ``htpasswd'', ``tar'', or a program that needs to be run as a side-effect of the form submission.

Writing Pipeline Modules

Writing your own Pipeline modules is fairly straightforward, especially if you use existing modules, such as those from HFPM , as a starting point. Writing modules is easier than writing raw CGI scripts, since the details of obtaining the field input are hidden. The fact that you typically have to write less since other modules provide some existing functionality serves as a deterrent to the time-consuming tendency toward creeping featurism.

An example module will help clarify the requirements of Pipeline modules. Below is the source for the ``set.pl'' HFPM module, which sets a field or environmental variable to a given value:

01 #!/usr/bin/perl
02
03 # Set.pl CGI Filter/Pipe Interface module to 
04 # set some fields and env.  variables
05 # to an indicated value. These are specified as
06 # arguments in the form dest=val.
07 # The destination is indicated in the left hand
08 # side of arguments.
09 # A '%' prefix indicates the destination is an
10 # environmental variable and a '$'

11 # prefix means a field destination, which is 
12 # the default.
13 # See 
14 # "http://seclab.cs.ucdavis.edu/~hoagland/hfpm/set.html"
15 # for more information.
16 # copyright (c) 1995 by James A. Hoagland
17 # (hoagland@cs.ucdavis.edu).
18
19 sub process {

20     my ($input,@args)= @_;
21     my($dest,$val);
22     foreach (@args) {
23         ($dest,$val)= split('=',$_,2);
24         $dest =~ s/^([\$\%])//;
25         if ($1 eq '%') { # env. var
26             $ENV{$dest}= $val;
27         } else { # a field
28             $input->param($dest,$val);
29         }
30     }
31 }
32
33 \&process; 

Pipeline modules are run when they are specified in the pipeline list. The last thing in a Pipeline module must be either a reference to a subroutine to be invoked or a true value. The true value can be used in the rare case where the behavior of a module is irrespective of fields and their values and any pipeline arguments.

If a subroutine reference is returned, as it is in line 33 above, it is interpreted as a subroutine to be invoked with certain arguments when the module is run from the pipeline. The first of these arguments is a CGI.pm object instance and the remaining arguments are the textual arguments to the module as specified on the pipeline. Line 20 shows the receipt of those arguments. CGI.pm is a Perl module written by Lincoln Stein that parses and stores field input, and provides means for accessing and modifying field values as well as convenient ways to output CGI and HTML. For more information about CGI.pm see * http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html.

Once the subroutine has been invoked, you can do almost anything that you want. You can look at and set field and environmental variable values, output to the client, return a URL to the client, output to a file, run programs, etc.

In the set.pl example above, lines 22-30 iterate over all the arguments in the pipeline. If it is a environmental variable to be set, then that is done on line 26. Environmental variables are inspected and modified in the usual way in Perl, through the %ENV hash. If the pipeline argument indicates that a field value is to be set, then that is done in line 28 by using the two-argument form of the param method of the CGI.pm instance. CGI.pm's param method performs one of three distinct operations depending on how many arguments it is given. The two argument form does an assignment of the value in the second argument to the field in the first argument. Giving it one argument results in the value of a field, and when it is called with no arguments, it returns a list of all fields. These can be used in fairly obvious ways.

You can use the methods of CGI.pm for returning the CGI header and outputting HTML to the client, or you can output using other methods, i.e. though a print to stdout. Note, however, that if multiple modules output to stdout, the result can be a jumble.

For the user's convenience (remember that some of the users of your module may not be very technically-oriented), it is a good idea to perform sanity checks on the input and report errors. The reporterr subroutine provides a means of telling the user of your module of problems in configuration. The first argument is a string to use as the message. The second argument is 0 is the error is non-fatal and 1 if pipeline.pl should exit as a result. If the _errorsto field is set, the error message is e-mailed by reporterr to the e-mail address specified therein. Otherwise it is output to stdout.

Pipeline has a couple of facilities available for debugging.You can use the reporterr subroutine with the _errorsto field set to your e-mail address to report debugging statements to yourself, which is a good idea, since what has gone wrong usually isn't apparent after the output has gone through the server and client.

Pipeline also provides the ability to run pipeline.pl from the command line. Field input can be simulated with field=value type arguments to pipeline.pl. This facility allows you to see you the exact output produced, including that of the Perl interpreter.

You can return the kindness of the free Pipeline and HFPM by making modules you write publicly available. Please let me know if you write some publicly available Pipeline modules. If you have some on-line documentation, I'll link to that from the Pipeline home page.

This should be enough to get you started writing modules, and using Pipeline. For more information, you can check the online sources. A couple of things to remember:

  1. Write your modules so that they are reusable, for your benefit and that of others.

  2. Document your modules.

  3. Keep your style consistent. This will help the users of your modules.

  4. Reuse modules whenever possible.

James A. Hoagland (hoagland@cs.ucdavis.edu) is the author of Getcomments, HFPM, and Pipeline and is the originator of the CGI FPI idea. In real life, Jim is a Ph.D. student in the Computer Science Department at the University of California, Davis. There he does research in computer security under Professor Karl Levitt as a Research Assistant.
* http://seclab.cs.ucdavis.edu/~hoagland/.