Chapter 6: Reading and Writing Data Files
As you start to program more advanced CGI applications, you'll want to store
data so you can use it later.
Maybe you have a guestbook program and want to
keep a log of the names and email addresses of visitors, or a page counter that
must update a counter file, or a program that scans a flat-file database and
draws info from it to generate a page. You can do this by reading and writing
data files (often called file I/O).
A D V E R T I S E M E N T
File Permissions
Most web servers run with very limited permissions; this protects the server
(and the system it's running on) from malicious attacks by users or web
visitors. On Unix systems, the web process runs under its own userid, typically
the "web" or "nobody" user. Unfortunately this means the server doesn't have
permission to create files in your directory. In order to write to a data file,
you must usually make the file (or the directory where the file will be created)
world-writable � or at least writable by the web process userid. In Unix a file
can be made world-writable using the chmod command:
To set a directory world-writable, you'd do:
See Appendix A for a chart of the various chmod permissions.
Unfortunately, if the file is world-writable, it can be written to (or even
deleted) by other users on the system. You should be very cautious about
creating world-writable files in your web space, and you should never
create a world-writable directory there. (An attacker could use this to install
their own CGI programs there.) If you must have a world-writable directory,
either use /tmp (on Unix), or a directory outside of your web space. For example
if your web pages are in /home/you/public_html, set up your writable files and
directories in /home/you.
A much better solution is to configure the server to run your programs with
your userid. Some examples of this are CGIwrap (platform independent) and suEXEC
(for Apache/Unix). Both of these force CGI programs on the web server to run
under the program owner's userid and permissions. Obviously if your CGI program
is running with your userid, it will be able to create, read and write files in
your directory without needing the files to be world-writable.
The Apache web server also allows the webmaster to define what user and group
the server runs under. If you have your own domain, ask your webmaster to set up
your domain to run under your own userid and group permissions.
Permissions are less of a problem if you only want to read a file. If you set
the file permissions so that it is group- and world-readable, your CGI programs
can then safely read from that file. Use caution, though; if your program can
read the file, so can the webserver, and if the file is in your webspace,
someone can type the direct URL and view the contents of the file. Be sure not
to put sensitive data in a publicly readable file.
Opening Files
Reading and writing files is done by opening a file and associating it with a
filehandle. This is done with the statement:
The filename may be prefixed with a >, which means to overwrite anything
that's in the file now, or with a >>, which means to append to the bottom of the
existing file. If both > and >> are omitted, the file is opened for reading
only. Here are some examples:
open(INF,"out.txt"); # opens mydata.txt for reading
open(OUTF,">out.txt"); # opens out.txt for overwriting
open(OUTF,">>out.txt"); # opens out.txt for appending
open(FH, "+<out.txt"); # opens existing file out.txt for reading AND writing
The filehandles in these cases are INF, OUTF and FH. You can use just about
any name for the filehandle.
Also, a warning: your web server might do strange things with the path your
programs run under, so it's possible you'll have to use the full path to the
file (such as /home/you/public_html/somedata.txt), rather than just the
filename. This is generally not the case with the Apache web server, but some
other servers behave differently. Try opening files with just the filename first
(provided the file is in the same directory as your CGI program), and if it
doesn't work, then use the full path.
One problem with the above code is that it doesn't check the return value of
open to ensure the file was really opened. open returns nonzero upon success, or
undef (which is a false value) otherwise. The safe way to open a file is as
follows:
This uses the "dienice" subroutine we wrote in Chapter 4 to display an error
message and exit if the file can't be opened. You should do this for all file
opens, because if you don't, your CGI program will continue running even if the
file isn't open, and you could end up losing data. It can be quite frustrating
to realize you've had a survey running for several weeks while no data was being
saved to the output file.
The $! in the above example is a special Perl variable that stores the error
code returned by the failed open statement. Printing it may help you figure out
why the open failed.
Guestbook Form with File Write
Let's try this by modifying the guestbook program you wrote in Chapter 4. The
program already sends you e-mail with the information; we're going to have it
write its data to a file as well.
First you'll need to create the output file and make it writable, because
your CGI program probably can't create new files in your directory. If you're
using Unix, log into the Unix shell, cd to the directory where your
guestbook program is located, and type the following:
touch guestbook.txt
chmod 622 guestbook.txt
The Unix touch command, in this case, creates a new, empty file called
"guestbook.txt". (If the file already exists, touch simply updates the
last-modified timestamp of the file.) The chmod 622 command makes the file
read/write for you (the owner), and write-only for everyone else.
If you don't have Unix shell access (or you aren't using a Unix system), you
should create or upload an empty file called guestbook.txt in the directory
where your guestbook.cgi program is located, then adjust the file permissions on
it using your FTP program.
Now you'll need to modify guestbook.cgi to write to the file:
Program 6-1: guestbook.cgi - Guestbook Program With File Write
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
print header;
print start_html("Results");
# first print the mail message...
$ENV{PATH} = "/usr/sbin";
open (MAIL, "|/usr/sbin/sendmail -oi -t -odq") or
&dienice("Can't fork for sendmail: $!\n");
print MAIL "To: recipient\@cgi101.com\n";
print MAIL "From: nobody\@cgi101.com\n";
print MAIL "Subject: Form Data\n\n";
foreach my $p (param()) {
print MAIL "$p = ", param($p), "\n";
}
close(MAIL);
# now write (append) to the file
open(OUT, ">>guestbook.txt") or &dienice("Couldn't open output file: $!");
foreach my $p (param()) {
print OUT param($p), "|";
}
print OUT "\n";
close(OUT);
print <<EndHTML;
<h2>Thank You</h2>
<p>Thank you for writing!</p>
<p>Return to our <a href="index.html">home page</a>.</p>
EndHTML
print end_html;
sub dienice {
my($errmsg) = @_;
print "<h2>Error</h2>\n";
print "<p>$errmsg</p>\n";
print end_html;
exit;
}
Now go back to your browser and fill out the guestbook form again. If your
CGI program runs without any errors, you should see data added to the
guestbook.txt file. The resulting file will show the submitted form data in
pipe-separated form:
Ideally you'll have one line of data (or record) for each form that is filled
out. This is what's called a flat-file database.
Unfortunately if the visitor enters multiple lines in the comments field,
you'll end up with multiple lines in the data file. To remove the newlines, you
should substitute newline characters (\n) as well as hard returns (\r). Perl has
powerful pattern matching and replacement capabilities; it can match the most
complex patterns in a string using regular expressions (see Chapter 13). The
basic syntax for substitution is:
This command substitutes "pattern" for "replacement" in the scalar variable
$mystring. Notice the operator is a =~ (an equals sign followed by
a tilde); this is Perl's binding operator and indicates a regular expression
pattern match/substitution/replacement is about to follow.
Here is how to replace the end-of-line characters in your guestbook program:
Go ahead and change your program, then test it again in your browser. View
the guestbook.txt file in your browser or in a text editor and observe the
results.
File Locking
CGI processes on a Unix web server can run simultaneously, and if two
programs try to open and write the same file at the same time, the file may be
erased, and you'll lose all of your data. To prevent this, you need to lock the
files you are writing to. There are two types of file locks:
- A shared lock allows more than one program (or other process) to access
the file at the same time. A program should use a shared lock when reading
from a file.
- An exclusive lock allows only one program or process to access the file
while the lock is held. A program should use an exclusive lock when writing
to a file.
File locking is accomplished in Perl using the Fcntl module (which is part of
the standard library), and the flock function. The use statement is
like CGI.pm's:
The Fcntl module provides symbolic values (like abbreviations) representing
the correct lock numbers for the flock function, but you must specify
:flock in the use statement in order for Fctnl to export
those values. The values are as follows:
LOCK_SH |
shared lock |
LOCK_EX |
exclusive lock |
LOCK_NB |
non-blocking lock |
LOCK_UN |
unlock |
These abbreviations can then be passed to flock. The flock
function takes two arguments: the filehandle and the lock type, which is
typically a number. The number may vary depending on what operating system you
are using, so it's best to use the symbolic values provided by Fcntl. A file is
locked after you open it (because the filehandle doesn't exist before you open
the file):
The lock will be released automatically when you close the file or when the
program finishes.
Keep in mind that file locking is only effective if all of the programs that
read and write to that file also use flock. Programs that don't will ignore the
locks held by other processes.
Since flock may force your CGI program to wait for another process to finish
writing to a file, you should also reset the file pointer, using the seek
function:
offset is the number of bytes to move the pointer, relative to
whence, which is one of the following:
0 |
beginning of file |
1 |
current file position |
2 |
end of file |
So seek(OUTF,0,2) repositions the pointer to the end of the
file. If you were reading the file instead of writing to it, you'd want to do
seek(OUTF,0,0) to reset the pointer to the beginning of the file.
The Fcntl module also provides symbolic values for the seek pointers:
SEEK_SET |
beginning of file |
SEEK_CUR |
current file position |
SEEK_END |
end of file |
To use these, add :seek to the use Fcntl statement:
Now you can use seek(OUTF,0,SEEK_END) to reset the file pointer
to the end of the file, or seek(OUTF,0,SEEK_SET) to reset it to the
beginning of the file.
Closing Files
When you're finished writing to a file, it's best to close the file, like so:
Files are automatically closed when your program ends. File locks are
released when the file is closed, so it is not necessary to actually unlock the
file before closing it. (In fact, releasing the lock before the file is closed
can be dangerous and cause you to lose data.)
Reading Files
There are two ways you can handle reading data from a file: you can either
read one line at a time, or read the entire file into an array. Here's an
example:
If you were to use this code in your program, you'd end up with the first
line of guestbook.txt being stored in $a, and the remainder of the file in array
@b (with each element of @b containing one line of data from the file). The
actual read occurs with <filehandle>; the amount of data read
depends on the type of variable you save it into.
The following section of code shows how to read the entire file into an
array, then loop through each element of the array to print out each line:
This code minimizes the amount of time the file is actually open. The
drawback is it causes your CGI program to consume as much memory as the size of
the file. Obviously for very large files that's not a good idea; if your program
consumes more memory than the machine has available, it could crash the whole
machine (or at the very least make things extremely slow). To process data from
a very large file, it's better to use a while loop to read one line
at a time:
Poll Program
Let's try another example: a web poll. You've probably seen them on various
news sites. A basic poll consists of one question and several potential answers
(as radio buttons); you pick one of the answers, vote, then see the poll results
on the next page.
Start by creating the poll HTML form. Use whatever question and answer set
you wish.
Program 6-2: poll.html - Poll HTML Form
In this example we're using abbreviations for the radio button values. Our
CGI program will translate the abbreviations appropriately.
Now the voting CGI program will write the result to a file. Rather than
having this program analyze the results, we'll simply use a redirect to bounce
the viewer to a third program (results.cgi). That way you won't need to write
the results code twice.
Here is how the voting program (poll.cgi) should look:
Program 6-3: poll.cgi - Poll Program
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
use Fcntl qw(:flock :seek);
my $outfile = "poll.out";
# only record the vote if they actually picked something
if (param('pick')) {
open(OUT, ">>$outfile") or &dienice("Couldn't open $outfile: $!");
flock(OUT, LOCK_EX); # set an exclusive lock
seek(OUT, 0, SEEK_END); # then seek the end of file
print OUT param('pick'),"\n";
close(OUT);
} else {
# this is optional, but if they didn't vote, you might
# want to tell them about it...
&dienice("You didn't pick anything!");
}
# redirect to the results.cgi.
# (Change to your own URL...)
print redirect("http://cgi101.com/book/ch6/results.cgi");
sub dienice {
my($msg) = @_;
print header;
print start_html("Error");
print h2("Error");
print $msg;
print end_html;
exit;
}
Finally results.cgi reads the file where the votes are stored, totals the
overall votes as well as the votes for each choice, and displays them in table
format.
Program 6-4: results.cgi - Poll Results Program
#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
use Fcntl qw(:flock :seek);
my $outfile = "poll.out";
print header;
print start_html("Results");
# open the file for reading
open(IN, "$outfile") or &dienice("Couldn't open $outfile: $!");
# set a shared lock
flock(IN, LOCK_SH);
# then seek the beginning of the file
seek(IN, 0, SEEK_SET);
# declare the totals variables
my($total_votes, %results);
# initialize all of the counts to zero:
foreach my $i ("fotr", "ttt", "rotk", "none") {
$results{$i} = 0;
}
# now read the file one line at a time:
while (my $rec = <IN>) {
chomp($rec);
$total_votes = $total_votes + 1;
$results{$rec} = $results{$rec} + 1;
}
close(IN);
# now display a summary:
print <<End;
<b>Which was your favorite <i>Lord of the Rings</i> film?
</b><br>
<table border=0 width=50%>
<tr>
<td>The Fellowship of the Ring</td>
<td>$results{fotr} votes</td>
</tr>
<tr>
<td>The Two Towers</td>
<td>$results{ttt} votes</td>
</tr>
<tr>
<td>Return of the King</td>
<td>$results{rotk} votes</td>
</tr>
<tr>
<td>didn't watch them</td>
<td>$results{none} votes</td>
</tr>
</table>
<p>
$total_votes votes total
</p>
End
print end_html;
sub dienice {
my($msg) = @_;
print h2("Error");
print $msg;
print end_html;
exit;
}
|