Chapter 3: CGI Environment Variables
A D V E R T I S E M E N T
Environment variables are a series of hidden values that the web server sends
to every CGI program you run. Your program can parse them and use the data they
send. Environment variables are stored in a hash named %ENV :
Key |
Value |
DOCUMENT_ROOT |
The root directory of your server |
HTTP_COOKIE |
The visitor's cookie, if one is set |
HTTP_HOST |
The hostname of the page being attempted |
HTTP_REFERER |
The URL of the page that called your program |
HTTP_USER_AGENT |
The browser type of the visitor |
HTTPS |
"on" if the program is being called through a secure server |
PATH |
The system path your server is running under |
QUERY_STRING |
The query string (see GET, below) |
REMOTE_ADDR |
The IP address of the visitor |
REMOTE_HOST |
The hostname of the visitor (if your server has reverse-name-lookups
on; otherwise this is the IP address again) |
REMOTE_PORT |
The port the visitor is connected to on the web server |
REMOTE_USER |
The visitor's username (for .htaccess-protected pages) |
REQUEST_METHOD |
GET or POST |
REQUEST_URI |
The interpreted pathname of the requested document or CGI (relative
to the document root) |
SCRIPT_FILENAME |
The full pathname of the current CGI |
SCRIPT_NAME |
The interpreted pathname of the current CGI (relative to the
document root) |
SERVER_ADMIN |
The email address for your server's webmaster |
SERVER_NAME |
Your server's fully qualified domain name (e.g. www.cgi101.com)
|
SERVER_PORT |
The port number your server is listening on |
SERVER_SOFTWARE |
The server software you're using (e.g. Apache 1.3) |
Some servers set other environment variables as well; check your server
documentation for more information. Notice that some environment variables give
information about your server, and will never change (such as SERVER_NAME and
SERVER_ADMIN), while others give information about the visitor, and will be
different every time someone accesses the program.
Not all environment variables get set. REMOTE_USER is only set for pages in a
directory or subdirectory that's password-protected via a .htaccess file. (See
Chapter 20 to learn how to password protect a directory.) And even then,
REMOTE_USER will be the username as it appears in the .htaccess file; it's not
the person's email address. There is no reliable way to get a person's email
address, short of asking them for it with a web form.
You can print the environment variables the same way you would any hash
value:
Let's try printing some environment variables. Start a new file named
env.cgi:
Program 3-1: env.cgi - Print Environment Variables Program
Save the file, chmod 755 env.cgi, then try it in your web browser.
Compare the environment variables displayed with the list on the previous page.
Notice which values show information about your server and CGI program, and
which ones give away information about you (such as your browser type, computer
operating system, and IP address).
Let's look at several ways to use some of this data.
Referring Page
When you click on a hyperlink on a web page, you're being referred to another
page. The web server for the receiving page keeps track of the referring page,
and you can access the URL for that page via the HTTP_REFERER environment
variable. Here's an example:
Program 3-2: refer.cgi - HTTP Referer Program
Remember, HTTP_REFERER only gets set when a visitor actually clicks on a link
to your page. If they type the URL directly (or use a bookmarked URL), then
HTTP_REFERER is blank. To properly test your program, create an HTML page with a
link to refer.cgi, then click on the link:
HTTP_REFERER is not a foolproof method of determining what page is accessing
your program. It can easily be forged.
Remote Host Name, and Hostname Lookups
You've probably seen web pages that greet you with a message like "Hello,
visitor from (yourhost)!", where (yourhost) is the hostname or IP address you're
currently logged in with. This is a pretty easy thing to do because your IP
address is stored in the %ENV hash.
If your web server is configured to do hostname lookups, then you can access
the visitor's actual hostname from the $ENV{REMOTE_HOST} value. Servers often
don't do hostname lookups automatically, though, because it slows down the
server. Since $ENV{REMOTE_ADDR} contains the visitor's IP address, you can
reverse-lookup the hostname from the IP address using the Socket module in Perl.
As with CGI.pm, you have to use the Socket module:
(There is no need to add qw(:standard) for the Socket module.)
The Socket module offers numerous functions for socket programming (most of
which are beyond the scope of this book). We're only interested in the
reverse-IP lookup for now, though. Here's how to do the reverse lookup:
There are actually two functions being called here: gethostbyaddr
and inet_aton . gethostbyaddr is a built-in Perl
function that returns the hostname for a particular IP address. However, it
requires the IP address be passed to it in a packed 4-byte format. The Socket
module's inet_aton function does this for you.
Let's try it in a CGI program. Start a new file called rhost.cgi, and enter
the following code:
Program 3-3: rhost.cgi - Remote Host Program
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
use Socket;
print header;
print start_html("Remote Host");
my $hostname = gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}), AF_INET);
print "Welcome, visitor from $hostname!<p>\n";
print end_html;
Detecting Browser Type
The HTTP_USER_AGENT environment variable contains a string identifying the
browser (or "user agent") accessing the page. Unfortunately there is no standard
(yet) for user agent strings, so you will see a vast assortment of different
strings. Here's a sampling of some:
DoCoMo/1.0/P502i/c10 (Google CHTML Proxy/1.0)
Firefly/1.0 (compatible; Mozilla 4.0; MSIE 5.5)
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Mozilla/3.0 (compatible)
Mozilla/4.0 (compatible; MSIE 4.01; MSIECrawler; Windows 95)
Mozilla/4.0 (compatible; MSIE 5.0; MSN 2.5; AOL 8.0; Windows 98; DigExt)
Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; Hotbar 4.1.7.0)
Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)
Mozilla/4.0 WebTV/2.6 (compatible; MSIE 4.0)
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.2) Gecko/20020924
AOL/7.0
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.2) Gecko/20021120
Netscape/7.01
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/85 (KHTML, like
Gecko) Safari/85
Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01
Mozilla/5.0 (Slurp/cat; [email protected];
http://www.inktomi.com/slurp.html)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718
Mozilla/5.0 (compatible; Konqueror/3.0-rc3; i686 Linux; 20020913)
NetNewsWire/1.0 (Mac OS X; Pro; http://ranchero.com/netnewswire/)
Opera/6.0 (Windows 98; U) [en]
Opera/7.10 (Linux 2.4.19 i686; U) [en]
Scooter/3.3
As you can see, sometimes the user agent string reveals what type of browser
and computer the visitor is using, and sometimes it doesn't. Some of these
aren't even browsers at all, like the search engine robots (Googlebot, Inktomi
and Scooter) and RSS reader (NetNewsWire). You should be careful about writing
programs (and websites) that do browser detection. It's one thing to collect
browser info for logging purposes; it's quite another to design your entire site
exclusively for a certain browser. Visitors will be annoyed if they can't access
your site because you think they have the "wrong" browser.
That said, here's an example of how to detect the browser type. This program
uses Perl's index function to see if a particular substring (such
as "MSIE") exists in the HTTP_USER_AGENT string. index is used like
so:
It returns a numeric value indicating where in the string the substring
appears, or -1 if the substring does not appear in the string. We use an if/else
block in this program to see if the index is greater than -1.
Program 3-4: browser.cgi - Browser Detection Program
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
print header;
print start_html("Browser Detect");
my($ua) = $ENV{HTTP_USER_AGENT};
print "User-agent: $ua<p>\n";
if (index($ua, "MSIE") > -1) {
print "Your browser is Internet Explorer.<p>\n";
} elsif (index($ua, "Netscape") > -1) {
print "Your browser is Netscape.<p>\n";
} elsif (index($ua, "Safari") > -1) {
print "Your browser is Safari.<p>\n";
} elsif (index($ua, "Opera") > -1) {
print "Your browser is Opera.<p>\n";
} elsif (index($ua, "Mozilla") > -1) {
print "Your browser is probably Mozilla.<p>\n";
} else {
print "I give up, I can't tell what browser you're using!<p>\n";
}
print end_html;
If you have several different browsers installed on your computer, try
testing the program with each of them.
We'll look more at if/else blocks in Chapter 5.
A Simple Form Using GET
There are two ways to send data from a web form to a CGI program: GET and
POST. These methods determine how the form data is sent to the server.
With the GET method, the input values from the form are sent as part of the
URL and saved in the QUERY_STRING environment variable. With the POST method,
data is sent as an input stream to the program. We'll cover POST in the next
chapter, but for now, let's look at GET.
You can set the QUERY_STRING value in a number of ways. For example, here are
a number of direct links to the env.cgi program:
Try opening each of these in your web browser. Notice that the value for
QUERY_STRING is set to whatever appears after the question mark in the URL
itself. In the above examples, it's set to "test1", "test2", and "test3"
respectively.
You can also process simple forms using the GET method. Start a new HTML
document called envform.html, and enter this form:
Program 3-5: envform.html - Simple HTML Form Using GET
Save the form and upload it to your website. Remember you may need to change
the path to env.cgi depending on your server; if your CGI programs live in a
"cgi-bin" directory then you should use action="cgi-bin/env.cgi".
Bring up the form in your browser, then type something into the input field
and hit return. You'll notice that the value for QUERY_STRING now looks like
this:
The string to the left of the equals sign is the name of the form field. The
string to the right is whatever you typed into the input box. Notice that any
spaces in the string you typed have been replaced with a +. Similarly, various
punctuation and other special non-alphanumeric characters have been replaced
with a %-code. This is called URL-encoding, and it happens with data
submitted through either GET or POST methods.
You can send multiple input data values with GET:
This will be passed to the env.cgi program as follows:
The two form values are separated by an ampersand (&). You can divide the
query string with Perl's split function:
split lets you break up a string into a list of strings,
splitting on a specific character. In this case, we've split on the "&"
character. This gives us an array named @values containing two elements:
("fname=joe", "lname=smith"). We can further split each string on the "="
character using a foreach loop:
This prints out the field names and the data entered into each field in the
form. It does not do URL-decoding, however. A better way to parse QUERY_STRING
variables is with CGI.pm.
Using CGI.pm to Parse the Query String
If you're sending more than one value in the query string, it's best to use
CGI.pm to parse it. This requires that your query string be of the form:
For multiple values, it should look like this:
This will be the case if you are using a form, but if you're typing the URL
directly then you need to be sure to use a fieldname, an equals sign, then the
field value.
CGI.pm provides these values to you automatically with the param
function:
This returns the value entered in the fieldname field. It also does the
URL-decoding for you, so you get the exact string that was typed in the form
field.
You can get a list of all the fieldnames used in the form by calling
param with no arguments:
param is NOT a Variable
param is a function call. You can't do this:
If you want to print the value of param($p) , you can print it by
itself:
Or call param outside of the double-quoted strings:
You won't be able to use param('fieldname') inside a
here-document. You may find it easier to assign the form values to individual
variables:
Another way would be to assign every form value to a hash:
You can achieve the same result by using CGI.pm's Vars function:
The Vars function is not part of the "standard" set of CGI.pm
functions, so it must be included specifically in the use
statement.
Either way, after storing the field values in the %form hash,
you can refer to the individual field names by using $form{'fieldname'} .
(This will not work if you have a form with multiple fields having the same
field name.)
Let's try it now. Create a new form called getform.html:
Program 3-6: getform.html - Another HTML Form Using GET
Save and upload it to your webserver, then bring up the form in your web
browser.
Now create the CGI program called get.cgi:
Program 3-7: get.cgi Form Processing Program Using GET
Save and chmod 755 get.cgi. Now fill out the form in your browser and
press submit. If you encounter errors, refer back to
Chapter 1 for debugging.
Take a look at the full URL of get.cgi after you press submit. You should see
all of your form field names and the data you typed in as part of the URL. This
is one reason why GET is not the best method for handling forms; it isn't
secure.
GET is NOT Secure
GET is not a secure method of sending data. Don't use it for forms that send
password info, credit card data or other sensitive information. Since the data
is passed through as part of the URL, it'll show up in the web server's logfile
(complete with all the data). Server logfiles are often readable by other users
on the system. URL history is also saved in the browser and can be viewed by
anyone with access to the computer. Private information should always be sent
with the POST method, which we'll cover in the next chapter. (And if you're
asking visitors to send sensitive information like credit card numbers, you
should also be using a secure server in addition to the POST method.)
There may also be limits to how much data can be sent with GET. While the
HTTP protocol doesn't specify a limit to the length of a URL, certain web
browsers and/or servers may.
Despite this, the GET method is often the best choice for certain types of
applications. For example, if you have a database of articles, each with a
unique article ID, you would probably want a single article.cgi program to serve
up the articles. With the article ID passed in by the GET method, the program
would simply look at the query string to figure out which article to display:
We'll be revisiting that idea later in the book. For now, let's move on to
Chapter 4 where we'll see how to process forms using the POST method.
|