column

CGI Column

Reuven begins his tutorials on CGI with a gentle introduction to the GET method.

by Reuven Lerner

Hello out there in Webland, and welcome to CGI Programming. Every month, I hope to present you with new information on the world of CGI--from discussions of the CGI specification, to security issues, to reviews of new programs and books that make CGI program writing easier, to examples of clever hacks that you might want to incorporate in your programs.

Because this is the first installment of CGI Programming, I thought it would be useful to discuss the nature and origins of CGI, as well as to provide some skeleton programs that demonstrate some basic principles.

In the Beginning

The Web began as a very simple tool. Clients (aka "browsers") would connect to an HTTP server (aka "web server") and request documents by issuing a GET request. As the name implies, GET requests contained two essential elements--the word "GET", followed by the pathname of an HTML file. Thus, asking your browser to retrieve the document at URL http://www.foo.com/ would effectively instruct the browser to open an HTTP (HyperText Transport Protocol) connection to www.foo.com. The browser would then try to retrieve the document named "/" by sending the following request to the server:

GET /

To retrieve the slightly more complicated URL http://www.foo.com/library/index.html, the browser would connect to www.foo.com, but would instead send the following request to the server:

GET /library/index.html

This simple, elegant technique worked very well for a while, but had one major drawback--the information users received was pre-determined; there was no way for it to vary. Alice's request for /library/index.html at midnight would produce the same HTML as Bob's request for /library/index.html at noon, until the HTML file itself were modified.

It was at this point that the notion of back-end web scripting was born. With the advent of scripting, a server's reply to a browser request could depend on the user's input, as well as the name of the file. In the case of a plain HTML file, the file itself would be sent to the browser. But in the case of a script, the file would be treated as a program, whose output would be sent to the browser. (The difference is subtle, but significant: Think of the difference between architectural plans and the resulting house, and you will understand the difference between the text of a program and its output.)

Note that this scheme required no changes on the browser side, since they were still receiving HTML. Servers, however, now had to distinguish between requests for files and requests for program execution, both to avoid giving out the source code for web scripts and to stop servers from trying to execute HTML files.

Web scripting proved to be so powerful and popular that several HTTP servers implemented it. But as often happens in a fast-moving technology, each HTTP server implemented scripting in a slightly different way. Scripts written for one server wouldn't necessarily work on any other HTTP servers.

The Birth of CGI

CGI, the "common gateway interface", was designed to solve this problem, and thus make it easier to write portable web scripts. Once CGI was adopted by the major HTTP servers, programmers could write scripts (aka "gateways") that would work with all servers, rather than just one or two of them.

Perhaps the most important thing to remember about CGI is that it is a specification, rather than a programming language. CGI programs can be written in any language, from C to the Bourne shell. Perl, a freely distributable language written by Larry Wall, has become one of the most popular languages for developing CGI programs because of its powerful string- and file-handling capabilities, but it is by no means the only available option.

The CGI specification defines a set of inputs a web program can expect to receive, as well as the format in which those inputs will arrive. Once the program has processed those inputs, it produces output in HTML, which is then sent to the user's browser. As far as the browser is concerned, there isn't any difference between the HTML produced by a CGI program and the HTML stored in a file--so long as the HTML is legitimate, the browser will parse and display it.

Input to CGI Programs

The simplest way to invoke a CGI program is to use the GET request we saw earlier. Since the distinction between HTML files and CGI programs rests entirely on the server side, browsers use the same syntax--that is, given a CGI program /cgi-bin/myprog on the server www.foo.com, a user would execute that program by going to the URL http://www.foo.com/cgi-bin/myprog, which would open an HTTP connection to www.foo.com and issue the request,

 GET /cgi-bin/myprog

The server, realizing that myprog rests in cgi-bin (a directory that has been defined to contain programs, rather than HTML files), executes myprog and sends its output (which will presumably be in HTML) back to the browser. The browser doesn't know that it has just invoked a program; as far as it is concerned, the HTML might have come from a file on the server machine.

There is, however, one problem with the invocation of myprog in the above example, namely the lack of user input. While the output produced by myprog might vary according to the time of day, number of users who have accessed a site, or the contents of an external file, it doesn't take any inputs from a user, which are probably more useful. The CGI specification's solution to this was to make the question mark (the "?" character) a special character, separating the name of the program from its arguments.

Thus if myprog were to take a single argument (say, a user name), we would specify that with the URL http://www.foo.com/cgi-bin/myprog?reuven, which would result in the following request being sent to foo.com:

GET /cgi-bin/myprog?reuven

The server would then invoke the program in /cgi-bin/myprog, passing the argument "reuven" in the environment variable QUERY_STRING. Perl passes environment variables in the associative array %ENV, so the user's input is available as $ENV{"QUERY_STRING"}. Here is a short CGI program in Perl that says "hello" to the person named in the argument:

#!/usr/local/bin/perl
# Tell the browser that we are sending it HTML
print "Content-type: text/html\n\n";
# Print a nice HTML header
print "<HTML>\n";
print "<Head><Title>Hi there!</Title></Head>\n\n";
print "<Body>\n";
# Now print the message
print "<H1>Hi there!</H1>\n";
print "<P>Hello, $ENV{QUERY_STRING}!</P>\n";
# Close the HTML, and exit
print "</Body></HTML>\n";
exit;

If the above program were available as "/cgi-bin/myprog" on www.foo.com, we would get a greeting for "Barry" by going to URL http://www.foo.com/cgi-bin/myprog?Barry, which would translate into the GET request:

GET /cgi-bin/myprog?Barry
This command would in turn invoke myprog with an argument of "Barry", available via QUERY_STRING. Similarly, we could get a greeting for "Barbara" by going to URL http://www.foo.com/cgi-bin/myprog?Barbara, which would translate into the GET request:
GET /cgi-bin/myprog?Barbara
Again, this command would invoke myprog with an argument of "Barbara", available via QUERY_STRING.

Here is another short Perl CGI program that demonstrates the use of QUERY_STRING. This program expects to get a single number as its input, and tells the user whether that number is even or odd.

#!/usr/local/bin/perl
# Tell the browser that we are sending it HTML
print "Content-type: text/html\n\n";
# Print a nice HTML header
print "<HTML>\n";
print "<Head><Title>Hi there!</Title></Head>\n\n";
print "<Body>\n";
# Grab the user's argument and place it in a better-named variable
$number = $ENV{"QUERY_STRING"};
# Truncate any non-integer portion of $number
$number = int($number);
# Did the user give us zero. If so, produce an error message
if ($number == 0)
{
print "<H1>No number!</H1>\n";
print "<P>You did not enter a number as the argument to this\n";
print "program. Please try this URL again, using a number as\n";
print "an argument.</P>\n";
print "</Body></HTML>\n";
exit;
}
# Now determine if the number was even or odd by getting its modulus
if ($number % 2)   {
print "<H1>Odd!</H1>\n";
print "<P>Your number, $number, is odd.</P>\n";
print "</Body></HTML>\n";
exit;
}
else
{
print "<H1>Even!</H1>\n";
print "<P>Your number, $number, is even.</P>\n";
print "</Body></HTML>\n";
exit;
}

Installing These Programs

The above example programs should work on any computer running Unix (or a Unix-compatible operating system, such as Linux) that has the Perl language (version 4 or above) installed as /usr/local/bin/perl. If Perl is installed on your system in some other place, change the first line to reflect that difference.

In order to run the program, you will need access to one of the CGI directories on your system. If you are your system's Webmaster, adding CGI directories should be a simple matter of modifying the configuration file that came with your HTTP server--the NCSA server uses "ScriptAlias" directives in the srm.conf file, while the CERN server uses "Exec" directives in the httpd.conf file.

Once the program is placed in a CGI directory, make sure that it is readable and executable by all users on your system. This is important because most web servers operate--or they should operate --with as little access as possible, which generally means that they use the "nobody" user ID. (CGI programs have the potential to be quite destructive, and should thus be run with the minimum possible access.) You should be able to make the program readable and executable by everyone with the command

chmod a+rx myprog
assuming, of course, that you have called the program myprog. I don't have space to describe all of the problems that you might encounter when first starting up CGI programs so let me simply suggest that if you encounter problems trying to run myprog, check (a) that you can execute the program from the command line, (b) that the program resides in a CGI-enabled directory, (c) that the program is readable and executable by the user ID under which CGI programs are run, and (d) that the "Content-type" is printed out correctly, including the two "\n" characters. Trouble-shooting CGI programs will be addressed in a future issue of CGI Programming.

Next Time

Next month, we will look at how to parse multiple arguments in a GET request, as well as the ways in which the CGI specification allows us to pass parameters containing spaces, question marks, and other odd characters that could get misinterpreted. Following that, we will look at a more advanced method of passing arguments to CGI programs known as "POST". And hey, if you have any comments or questions about this column, or some ideas for future installments, please feel free to suggest them via e-mail to reuven@the-tech.mit.edu.


Reuven M. Lerner has been playing with the Web since early 1993, when it seemed like more of a fun toy than the world's Next Great Medium. He currently works from his apartment in Haifa, Israel as an independent Internet and Web consultant. When not working on the Web or informally volunteering with school-age children, he enjoys reading (on just about any subject, but especially computers, politics, and philosophy-- separately and together), cooking, solving crossword puzzles, and hiking. You can reach him at reuven@the-tech.mit.edu or reuven@netvision.net.il.