Chapter 1: Getting Started
Our programming language of choice for this book is Perl. Perl is a simple,
easy to learn language, yet powerful enough to accomplish very difficult and
complex tasks. It is widely available, and is probably already installed on your
Unix server.
A D V E R T I S E M E N T
You don't need to compile your Perl programs; you simply write your
code, save the file, and run it (or have the web server run it). The program
itself is a simple text file; the Perl interpreter does all the work. The
advantage to this is you can move your program with little or no changes to any
machine with a Perl interpreter. The disadvantage is you won't discover any bugs
in your program until you run it.
You can write and edit your CGI programs (which are often called scripts)
either on your local machine or in the Unix shell. If you're using Unix, try
pico � it's a very simple, easy to use text editor. Just type pico
filename to create or edit a file. Type man pico for more information
and help using pico. If you're not familiar with the Unix shell, see Appendix A
for a Unix tutorial and command reference.
You can also use a text editor on your local machine and upload the finished
programs to the web server. You should either use a plain text editor, such as
Notepad (PC) or BBEdit (Mac), or a programming-specific editor that provides
some error- and syntax-checking for you.
If you use a text editor, be sure to turn off special characters such as
"smartquotes." CGI files must be ordinary text.
Once you've written your program, you'll need to upload it to the web server
(unless you're using pico and writing it on the server already). You can use any
FTP or SCP (secure copy) program to upload your files; a list of some popular
FTP and SCP programs.
It is imperative that you upload your CGI programs as plain text (ASCII)
files, and not binary. If you upload your program as a binary file, it may come
across with a lot of control characters at the end of the lines, and these will
cause errors in your program. You can save yourself a lot of time and grief by
just uploading everything as text (unless you're uploading pictures � for
example, GIFs or JPEGs � or other true binary data). HTML and Perl CGI programs
are not binary, they are plain text.
Once your program is uploaded to the web server, you'll want to be sure to
move it to your cgi-bin (or public_html directory � wherever your ISP has told
you to put your CGI programs). Then you'll also need to change the permissions
on the file so that it is "executable" (or runnable) by the system. The Unix
shell command for this is:
This sets the file permissions so that you can read, write, and execute the
file, and all other users (including the webserver) can read and execute it. See
Appendix A for a full description of chmod and its options.
Most FTP and SCP programs allow you to change file permissions; if you use
your FTP client to do this, you'll want to be sure that the file is readable and
executable by everyone, and writable only by the owner (you).
One final note: Perl code is case-sensitive, as are Unix commands and
filenames. Please keep this in mind as you write your first programs, because in
Unix "perl" is not the same as "PERL".
What Is This Unix Shell?
It's a command-line interface to the Unix machine � somewhat like DOS. You
have to use a Telnet or SSH (secure shell) program to connect to the shell; for a list of some Telnet and SSH
programs you can download. Once you're logged in, you can use shell commands to
move around, change file permissions, edit files, create directories, move
files, and much more.
If you're using a Unix system to learn CGI, you may want to stop here and
look at Appendix A to familiarize yourself with the various shell commands.
Download a Telnet or SSH program and login to your shell account, then try out
some of the commands so you feel comfortable navigating in the shell.
Throughout the rest of this book you'll see Unix shell commands listed in
bold to set them apart from HTML and CGI code. If you're using a Windows server,
you can ignore most of the shell commands, as they don't apply.
Basics of a Perl Program
You should already be familiar with HTML, and so you know that certain things
are necessary in the structure of an HTML document, such as the <head> and
<body> tags, and that other tags like links and images have a certain allowed
syntax. Perl is very similar; it has a clearly defined syntax, and if you follow
those syntax rules, you can write Perl as easily as you do HTML.
The first line of your program should look like this:
The first part of this line, #! , indicates that this is a
script. The next part, /usr/bin/perl , is the location (or path)
of the Perl interpreter. If you aren't sure where Perl lives on your system, try
typing which perl or whereis perl in the shell. If the system can
find it, it will tell you the full path name to the Perl interpreter. That path
is what you should put in the above statement. (If you're using ActivePerl on
Windows, the path should be /perl/bin/perl instead.)
The final part contains optional flags for the Perl interpreter. Warnings are
enabled by the -w flag. Special user input taint checking is
enabled by the -T flag. We'll go into taint checks and program
security later, but for now it's good to get in the habit of using both of these
flags in all of your programs.
You'll put the text of your program after the above line.
Basics of a CGI Program
A CGI is simply a program that is called by the webserver, in response to
some action by a web visitor. This might be something simple like a page
counter, or a complex form-handler. Shopping carts and e-commerce sites are
driven by CGI programs. So are ad banners; they keep track of who has seen and
clicked on an ad.
CGI programs may be written in any programming language; we're just
using Perl because it's fairly easy to learn. If you're already an expert in
some other language and are just reading to get the basics, here it is: if
you're writing a CGI that's going to generate an HTML page, you must include
this statement somewhere in the program before you print out anything else:
This is a content-type header that tells the receiving web browser what sort
of data it is about to receive � in this case, an HTML document. If you forget
to include it, or if you print something else before printing this header,
you'll get an "Internal Server Error" when you try to access the CGI program.
Your First CGI Program
Now let's try writing a simple CGI program. Enter the following lines into a
new file, and name it "first.cgi". Note that even though the lines appear
indented on this page, you do not have to indent them in your file. The first
line (#!/usr/bin/perl) should start in column 1. The subsequent lines can start
in any column.
Program 1-1: first.cgi - Hello World Program
Save (or upload) the file into your web directory, then chmod 755 first.cgi
to change the file permissions (or use your FTP program to change them). You
will have to do this every time you create a new program; however, if you're
editing an existing program, the permissions will remain the same and shouldn't
need to be changed again.
Now go to your web browser and type the direct URL for your new CGI. For
example:
Your actual URL will depend on your ISP. If you have an account on cgi101,
your URL is:
You should see a web page with "Hello, world!" on it. (If it you get a "Page
Not Found" error, you have the URL wrong. If you got an "Internal Server Error",
see the "Debugging Your Programs," section at the end of this chapter.)
Let's try another example. Start a new file (or if you prefer, edit your
existing first.cgi) and add some additional print statements. It's up to your
program to print out all of the HTML you want to display in the visitor's
browser, so you'll have to include print statements for every HTML tag:
Program 1-2: second.cgi - Hello World Program 2
Save this file, adjust the file permissions if necessary, and view it in your
web browser. This time you should see "Hello, world!" displayed in a H2-size
HTML header.
Now not only have you learned to write your first CGI program, you've also
learned your first Perl statement, the print function:
This function will write out any string, variable, or combinations thereof to
the current output channel. In the case of your CGI program, the current output
is being printed to the visitor's browser.
The \n you printed at the end of each string is the newline
character. Newlines are not required, but they will make your program's output
easier to read.
You can write multiple lines of text without using multiple print statements
by using the here-document syntax:
You can use any word or phrase for the end marker (you'll see an example next
where we use "EndOfHTML" as the marker); just be sure that the closing marker
matches the opening marker exactly (it is case-sensitive), and also that the
closing marker is on a line by itself, with no spaces before or after the
marker.
Let's try it in a CGI program:
Program 1-3: third.cgi - Hello World Program, with here-doc
When a closing here-document marker is on the last line of the file, be sure
you have a line break after the marker. If the end-of-file mark is on the same
line as the here-doc marker, you'll get an error when you run your program.
The CGI.pm Module
Perl offers a powerful feature to programmers: add-on modules. These are
collections of pre-written code that you can use to do all kinds of tasks. You
can save yourself the time and trouble of reinventing the wheel by using these
modules.
Some modules are included as part of the Perl distribution; these are called
standard library modules and don't have to be installed. If you have
Perl, you already have the standard library modules.
There are also many other modules available that are not part of the standard
library. These are typically listed on the Comprehensive Perl Archive Network
(CPAN), which you can search on the web at www.academictutorials.com
The CGI.pm module is part of the standard library, and has been since Perl
version 5.004. (It should already be installed; if it's not, you either have a
very old or very broken version of Perl.) CGI.pm has a number of useful
functions and features for writing CGI programs, and its use is preferred by the
Perl community. We'll be using it frequently throughout the book.
Let's see how to use a module in your CGI program. First you have to actually
include the module via the use command. This goes after the #!/usr/bin/perl line
and before any other code:
Note we're not doing use CGI.pm but rather use CGI .
The .pm is implied in the use statement. The qw(:standard) part of
this line indicates that we're importing the "standard" set of functions from
CGI.pm.
Now you can call the various module functions by typing the function name
followed by any arguments:
If you aren't passing any arguments to the function, you can omit the
parentheses.
A function is a piece of code that performs a specific task; it may
also be called a subroutine or a method. Functions may accept
optional arguments (also called parameters), which are values
(strings, numbers, and other variables) passed into the function for it to use.
The CGI.pm module has many functions; for now we'll start by using these three:
The header function prints out the "Content-type" header. With
no arguments, the type is assumed to be "text/html". start_html
prints out the <html>, <head>, <title> and <body> tags. It also accepts optional
arguments. If you call start_html with only a single string argument, it's
assumed to be the page title. For example:
will print out the following*:
You can also set the page colors and background image with start_html :
print start_html(-title=>"Hello World",
-bgcolor=>"#cccccc", -text=>"#999999",
-background=>"bgimage.jpg");
Notice that with multiple arguments, you have to specify the name of each
argument with -title=>, -bgcolor=> , etc. This example generates the
same HTML as above, only the body tag indicates the page colors and background
image:
The end_html function prints out the closing HTML tags:
So, as you can see, using CGI.pm in your CGI programs will save you some
typing. (It also has more important uses, which we'll get into later on.)
The Other Way To Use CGI.pm
or "There's More Than One Way To Do Things In Perl"
As you learn Perl you'll discover there are often many different ways
to accomplish the same task. CGI.pm exemplifies this; it can be used in
two different ways. The first way you've learned already:
function-oriented style. Here you must specify qw(:standard)
in the use line, but thereafter you can just call the
functions directly:
The other way is object-oriented style, where you create an object
(or instance of the module) and use that to call the various functions
of CGI.pm:
Which style you use is up to you. The examples in this book use the
function-oriented style, but feel free to use whichever style you're
comfortable with.� |
Let's try using CGI.pm in an actual program now. Start a new file and enter
these lines:
Program 1-4: fourth.cgi - Hello World Program, using CGI.pm
Be sure to change the file permissions (chmod 755 fourth.cgi), then
test it out in your browser.
CGI.pm also has a number of functions that serve as HTML shortcuts. For
instance:
Will print an H2-sized header tag.
Documenting Your Programs
Documentation can be embedded in a program using comments. A comment in Perl
is preceded by the # sign; anything appearing after the #
is a comment:
Program 1-5: fifth.cgi - Hello World Program, with Comments
#!/usr/bin/perl -wT
use CGI qw(:standard);
# This is a comment
# So is this
#
# Comments are useful for telling the reader
# what's happening. This is important if you
# write code that someone else will have to
# maintain later.
print header; # here's a comment. print the header
print start_html("Hello World");
print "<h2>Hello, world!</h2>\n";
print end_html; # print the footer
# the end.
You'll notice the first line (#!/usr/bin/perl ) is a comment, but
it's a special kind of comment. On Unix, it indicates what program to use to run
the rest of the script.
There are several situations in Perl where an #-sign is not treated as a
comment. These depend on specific syntax, and we'll look at them later in the
book.
Any line that starts with an #-sign is a comment, and you can also put
comments at the end of a line of Perl code (as we did in the above example on
the header and end_html lines). Even though comments will only be seen by
someone reading the source code of your program, it's a good idea to add
comments to your code explaining what's going on. Well-documented programs are
much easier to understand and maintain than programs with no documentation.
Debugging Your Programs
A number of problems can happen with your CGI programs, and unfortunately the
default response of the webserver when it encounters an error (the "Internal
Server Error") is not very useful for figuring out what happened.
If you see the code for the actual Perl program instead of the desired output
page from your program, this probably means that your web server isn't properly
configured to run CGI programs. You'll need to ask your webmaster how to run CGI
programs on your server. And if you ARE the webmaster, check your server's
documentation to see how to enable CGI programs.
If you get an Internal Server Error, there's either a permissions problem
with the file (did you remember to chmod 755 the file?) or a bug in your
program. A good first step in debugging is to use the CGI::Carp module in your
program:
This causes all warnings and fatal error messages to be echoed in your
browser window. You'll want to remove this line after you're finished developing
and debugging your programs, because Carp errors can give away important
security info to potential hackers.
If you're using the Carp module and are still seeing the "Internal Server
Error", you can further test your program from the command line in the Unix
shell. This will check the syntax of your program without actually running it:
If there are errors, it will report any syntax errors in your program:
% perl -cwT fourth.cgi
syntax error at fourth.cgi line 5, near "print"
fourth.cgi had compilation errors.
This tells you there's a problem on or around line 5; make sure you didn't
forget a closing semicolon on the previous line, and check for any other typos.
Also be sure you saved and uploaded the file as text; hidden control characters
or smartquotes can cause syntax errors, too.
Another way to get more info about the error is to look at the webserver log
files. Usually this will show you the same information that the CGI::Carp module
does, but it's good to know where the server logs are located, and how to look
at them. Some usual locations are /usr/local/etc/httpd/logs/error_log, or /var/log/httpd/error_log.
Ask your ISP if you aren't sure of the location. In the Unix shell, you can use
the tail command to view the end of the log file:
tail /var/log/apache/error_log
The last line of the file should be your error message (although if you're
using a shared webserver like an ISP, there will be other users' errors in the
file as well). Here are some example errors from the error log:
[Fri Jan 16 02:06:10 2004] access to /home/book/ch1/test.cgi failed for
205.188.198.46, reason: malformed header from script.
In string, @yahoo now must be written as \@yahoo at /home/book/ch1/test.cgi line
331, near "@yahoo"
Execution of /home/book/ch1/test.cgi aborted due to compilation errors.
[Fri Jan 16 10:04:31 2004] access to /home/book/ch1/test.cgi failed for
204.87.75.235, reason: Premature end of script headers
A "malformed header" or "premature end of script headers" can either mean
that you printed something before printing the "Content-type: text/html" line,
or your program died. An error usually appears in the log indicating where the
program died, as well.
|