[Documentation (by subject) | Documentation by type | Documentation and information]

The information in this document may be outdated or incorrect. ACS provides this documentation as a courtesy for the curious.

An instantaneous introduction to CGI scripts and HTML forms

World Wide Web (WWW) browsers display hypertext documents written in the Hypertext Markup Language (HTML). Web browsers can also display "HTML forms" that allow users to enter data. By using forms browsers can collect as well as display infomation.

When information is collected by a browser it is sent to a HyperText Transfer Protocol (HTTP) server specified in the HTML form, and that server starts a program, also specified in the HTML form, that can process the collected information. Such programs are known as "Common Gateway Interface" programs, or CGI scripts.

This document describes the Common Gateway Interface in some detail. It focuses on the ways in which a form, a client browser, a server, and the HTTP protocol work together. To understand this complex interaction, you must first understand how a client and a server work together to deliver a "normal" HTML document. This is the "canonical" Web activity; the "usual" Web function. Then you need to understand how scripts are executed in the Web environment without mediating forms. Once these two processes are clear, the forms interface is straight-forward.

For a different approach to this same topic consult its companion piece Building blocks for CGI scripts in Perl, which provides Perl code for many common CGI tasks, making script creation fairly simple.

The Canonical Browser-Server Interaction

During a "normal" document exchange a WWW client (Netscape, Mosaic, Lynx, etc.) requests a document from a WWW server and displays that document on a user display device. If that document contains a link to another document, and the user activates that link, the WWW client will then fetch and display the linked document.

The following diagram shows a WWW client running on a desktop system, Computer A, interacting with two servers: An HTTP server running on Computer B and an HTTP server running on Computer C.

Canonical File Exchange on the Web

The client running on Computer A gets a document, stored in a file named docu1.html, from the HTTP server running on Computer B. This document contains a link to another document, stored in a file named docu2.html on Computer C. The Uniform Resource Locator (URL) for that link might look something like:

http://ComputerC.domain/docu2.html

If the user activates that link, the client retrieves the file from the HTTP server running on Computer C and displays it on the monitor connected to Computer A.

The HyperText Transfer Protocol defines communication between the client and an HTTP server. The following example shows what an HTTP exchange between a Lynx client and an HTTP server running on Computer C might look like as the client fetches docu2.html.

The client sends the following text to server:

GET /docu2.html HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.2 libwww/2.14 From: montulli@www.cc.ukans.edu * a blank line * The "GET" request indicates which file the client wants and announces that it is using HTTP version 1.0 to communicate. The client also lists the Multipurpose Internet Mail Extension (MIME) types it will accept in return, and identifies itself as a Lynx client. (The "Accept:" list has been truncated for brevity.) The client also identifies its user in the "From:" field.

Finally, the client sends a blank line indicating it has completed its request.

The server then responds by sending:

HTTP/1.0 200 OK Date: Wednesday, 02-Feb-94 23:04:12 GMT Server: NCSA/1.1 MIME-version: 1.0 Last-modified: Monday, 15-Nov-93 23:33:16 GMT Content-type: text/html Content-length: 2345 * a blank line * <HTML><HEAD><TITLE> . . . </TITLE> . . .etc. In this message the server agrees to use HTTP version 1.0 for communication and sends the status 200 indicating it has successfully processed the client's request. It then sends the date and identifies itself as an NCSA HTTP server. It also indicates it is using MIME version 1.0 to describe the information it is sending, and includes the MIME-type of the information about to be sent in the "Content-type:" header. Finally, it sends the number of characters it is going to send, followed by a blank line and the data itself.

Things to note here:

Executing "scripts"

An HTTP URL may identify a file that contains a program or script rather than an HTML document. That program may be executed when a user activates the link containing the URL.

The diagram below shows an hypertext document on Computer B with a link to a file on Computer C that holds the CGI program that will be executed if a user activates the link. This link is a "normal" http: link, but the file is stored in such a way that the HTTP server on Computer C can tell that the file contains a program that is to be run, rather than a document that is to be sent to the client as usual.

When the program runs, it prepares an HTML document on the fly, and sends that document to the client, which displays the document as it would any other HTML document.

Data Flow with an HTTP Script

Such programs are sometimes called HTTP scripts or "Common Gateway Interface" (CGI) scripts. Note that CGI scripts may be written in scripting languages (like Perl, TCL, etc.) or in any other programming language (like C, Pascal, Basic).

On some HTTP servers these CGI programs are stored in a directory called cgi-bin, and so they are also sometimes called "cgi-bin scripts."

Here is a simple AppleScript program that can be run by a MacHTTP server when it receives a request for the file containing the script. When it runs, this program builds an HTML document containing the current time and returns the document to the WWW client that requested it.

set crlf to (ASCII character 13) & (ASCII character 10) set header to "HTTP/1.0 200 OK" & crlf - & "Server: MacHTTP" & crlf set header to header & "MIME-Version: 1.0" - & crlf & "Content-type: text/html" set header to header & crlf & crlf - & "<title>Server Script</title>" set body to "<h2>The time is:</h2>" - & (current date) & "<p><p>" return header & body

The program is stored in a file named "date", in a folder called "scripts". When a user activates a link that points to this script, the Web client will generate an HTTP request that might look like:

GET /scripts/date HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.2 libwww/2.14 From: montulli@www.cc.ukans.edu * a blank line * When the script runs it will generate an HTTP response that might look like: HTTP/1.0 200 OK" Server: MacHTTP" MIME-Version: 1.0 Content-type: text/html * blank line * <title>Server Script</title> <h2>The time is:</h2> September 15, 1994 3:15 pm <p><p> This looks just like any HTTP response from an HTTP server returning a normal HTML document. It just happens to have been generated on the fly.

Executing a Script via an HTML Form

The ability to process fill-out forms within the Web required modifications to HTML, Web clients, and Web servers (and eventually to HTTP, as well).

A set of tags was added to HTML to direct a WWW client to display a form to be filled out by a user and then forward the collected data to an HTTP server specified in the form.

Servers were modified so that they could then start the CGI program specified in the form and pass the collected data to that program, which could, in turn, prepare a response (possibly by consulting a pre-existing database) and return a WWW document to the user.

The following diagram shows the various components of the process.

Data Flow with an HTTP Form

In this diagram, the Web client running on Computer A acquires a form from some Web server running on Computer B. It displays the form, the user enters data, and the client sends the entered information to the HTTP server running on Computer C. There, the data is handed off to a CGI program which prepares a document and sends it to the client on Computer A. The client then displays that document.

HTML Tags Related to Forms Mode

The tags added to HTML to allow for HTML forms are:
<FORM>. . . </FORM>
Define an input form.
Attributes: ACTION, METHOD, ENCTYPE
<INPUT>
Define an input field.
Attributes: NAME, TYPE, VALUE, CHECKED, SIZE, MAXLENGTH
<SELECT> . . . </SELECT>
Define a selection list.
Attributes: NAME, MULTIPLE, SIZE
<OPTION>
Define a selection list selection (within a SELECT).
Attribute: SELECTED
<TEXTAREA> . . . </TEXTAREA>
Define a text input window.
Attribute: NAME, ROWS, COLS

An Example Form

This section presents a simple form and shows how it can be represented using the HTML forms facility, filled out by a user, passed to a server, and generate a reply. The form asks for information about using the World Wide Web. This is a practice form. Please help us to improve the World Wide Web by filling in the following questionaire: Your organization? _________________________________ Commercial? ( ) How many users? ____________________ Which browsers do you use? 1. Cello ( ) 2. Lynx ( ) 3. X Mosaic ( ) 4. Others ___________________________________ A contact point for your site: __________________________________________ Many thanks on behalf of the WWW central support team. Submit Reset Here is an HTML document that defines the Example Form just presented (courtesy of Dave Raggett, Hewlett-Packard, but modified to reflect the current implementation of HTML)

You can select this link to see what this form looks like from your browser

<html> <head> <title>This is a practice form.</title> </head> <body> <FORM METHOD=POST ACTION="http://www.cc.ukans.edu/cgi-bin/post-query"> Please help us to improve the World Wide Web by filling in the following questionaire: <P>Your organization? <INPUT NAME="org" TYPE=text SIZE="48"> <P>Commercial? <INPUT NAME="commerce" TYPE=checkbox> How many users? <INPUT NAME="users" TYPE=int> <P>Which browsers do you use? <OL> <LI>Cello <INPUT NAME="browsers" TYPE=checkbox VALUE="cello"> <LI>Lynx <INPUT NAME="browsers" TYPE=checkbox VALUE="lynx"> <LI>X Mosaic <INPUT NAME="browsers" TYPE=checkbox VALUE="mosaic"> <LI>Others <INPUT NAME="others" SIZE=40> </OL> A contact point for your site: <INPUT NAME="contact" SIZE="42"> <P>Many thanks on behalf of the WWW central support team. <P><INPUT TYPE=submit> <INPUT TYPE=reset> </FORM> <P>Comments: <UL> <LI> </UL> <P>Questions: <UL> <LI> </UL> </BODY></HTML>