Chapter 1 Client-Side Scripting

Chapter 1 Browser Identification

CONTENTS

Server-Side Screening
- High Bandwidth/Low Bandwidth
- CGI Wrapping
Client-Side Screening
The Completed GateKeeper
From Here…

It's no surprise that with new technology opening up the Web, the multitude of browsers available are starting to show the strain because everyone supports different levels of Web interaction. A vast majority of Web surfers use Netscape Navigator and Internet Explorer but not everyone uses (or chooses) a browser that supports everything. The end result of this explosive growth is the increased load placed on Web designers. The task of creating a site that both takes advantage of new technology and works well for older browsers is becoming increasingly difficult.

Fortunately, there is a collection of scripting tricks that can make your life a bit easier.

Server-Side Screening

Before the days of languages like VBScript and JavaScript, the only way you could steer a surfer to the pages designed for them was:

Present a different link on your home page for each browser type, and let the user select the correct set of pages.

Use a CGI script to identify the user's browser and force-feed the proper pages.

High Bandwidth/Low Bandwidth

The first technique is still in use today on many sites (see fig. 1.1). Three valuable uses of it are:

Figure 1.1: Using the "High bandwidth/Low bandwidth" technique lets the user decide whether or not to view the graphically intense version of your site.

Provide the option to switch to a text-only version of the site for speed surfing.
Provide "scripted" versus "non-scripted" access to your site, to make the best use of advanced browsers while at the same time allowing older browsers to surf.
Offer heavily graphic (for users with hardwired or T1 connections) and lightly-graphic (for slower modems) paths through your site.

This "high-speed/low-speed" option is necessary because there is no way to identify a user's access speed. Even once the world is wired with optical connections, you should still consider offering alternative routes through your site to accommodate "speed surfers" that use text-only browsers or "script surfers" that may be using any one of several different scripting languages.

The downside of using the first technique is that most people are overly curious. Invariably, someone clicks the wrong link and wanders through your site with pages not optimized for his or her browser. He or she perhaps encounters scripting errors or poorly presented pages and draws a negative conclusion about your ability as a Web master. He or she might even e-mail you to complain-yes, such people do exist!

CGI Wrapping

The second technique requires access to the CGI level of your Web server and for you to maintain one page tree (directory) for each supported browser configuration. In essence, you set up your home page as a browser-generic gatekeeper that is accessible even to old browsers. Within the page, any defined links are written to point to a CGI script instead of the HTML file directly:

<A HREF="/cgi-bin/start.cgi">Enter the site</A>

Clicking a link forces the server to launch the CGI script, which then checks the HTTP_USER_AGENT environment variable to determine the browser type. Listing 1.1 shows a Perl fragment that does this.

Listing 1.1 Using CGI to Identify the Browser Type

#!/usr/local/bin/perl

$userAgent = $ENV{'HTTP_USER_AGENT'};

if ($userAgent =~ 'Mosaic') {
}
   $htmlFile = 'mosaicVersion.html';
} elseif ($userAgent =~ 'Netscape') {
   $htmlFile = 'netscapeVersion.html';
} else {
   $htmlFile = 'otherVersion.html';
}

$htmlFile = join ("/", $ENV{'DOCUMENT_ROOT'}, $htmlFile);

print "Content-type: text/html\n\n";

open (HTML, "<", $htmlFile);

while (<HTML>) {
   print;
}

close (HTML);
exit (0);

While the script in listing 1.1 is very simple, this method also creates some problems, such as:

You need to edit this script every time a new browser type appears. This may often require tracking down a copy of the browser itself or perusing the site's access logs to determine what the new browser transmits for a user agent ID.
Because this script, and the HTML page that calls it, rely on the user clicking the links, your carefully planned front door script can be completely bypassed by the user entering a specific URL.

Finally, you still have to maintain a separate site for each browser type you choose to
support. It is a lot of effort for the end result. Fortunately, the introduction of client-side scripting tools has made this job easier.

Client-Side Screening

Until the advent of JavaScript and VBScript, the ability to screen users has traditionally been somewhat limited. This generally affected those not running their own Web servers and those who didn't have access to the CGI services of their Web servers. Before discussing the new tricks, let's have a quick review of an old method: client-pull.

Client-Pull

Within the browser, the process of loading a new page (or reloading the current page) is initiated by the user through clicking a link or the Reload button. The concept of client-pull was developed by Netscape as an early method of automating this process. In essence, an HTML document contains additional information that instructs the browser to do either one of these:

Reload the current page after a period of time

Load a new page after a period of time

The client (browser) remains in control of the process and, when the specified time has elapsed, "pulls" a new page from the server, hence client-pull. To initiate client-pull, you need to utilize a special HTML tag: <META>.

The <META> (HTML) tag, one of the more esoteric HTML tags, is used within the header block (as defined by the HTML <HEAD>...</HEAD> tag pair) and allows you to embed document "meta-information" that's not defined by other HTML tags. Once in the document, the server can extract or browse for various uses:

Identify a document's content or author

Index and catalog documents on a site

Control the loading or reloading of a page

The third case, client-pull, is of interest in the current discussion because it's the job of the client (browser) to request (pull) the next page based on the information in the tag.

An example <META> tag would be:

<META HTTP-EQUIV="Refresh" CONTENT="secs; URL=newURL">

The two attributes of the <META> tag are:

HTTP-EQUIV must be set to "Refresh" to activate client-pull.
CONTENT consists of a number that specifies the number of seconds to wait before pulling the next page, and a URL= element that identifies the page to pull.

Listing 1.2 shows a page that directs the user somewhere else; or, if his or her browser supports it, takes him or her there automatically after a one-second delay.

Listing 1.2 Client-Pull

<HTML>
<HEAD>
<TITLE>Welcome!</TITLE>
<META HTTP-EQUIV="Refresh" CONTENT="1; URL=http://www.visi.com/next.html">
</HEAD>
<BODY>
You really need to
<A HREF="http://www.visi.com/next.html">go here</A>
</BODY>
</HTML>

TIP

If you wish to experiment with client-pull and you're using Internet Explorer, you'll need to do your testing online; that is, you'll have to load your pages into your Web site and access them from the server rather than reading them from your local hard disk. While Navigator acknowledges client-pull requests from locally loaded (from disk) documents, Explorer does not.

For the purposes of browser identification, client-pull does have limitations. Only the oldest browsers don't support it, which are the same ones that don't support much else, leaving you with a collection of browsers that do, which still have varying levels of support for other features. To more accurately identify different browsers and their capabilities, it's necessary to turn to client-side scripting.

Client-Side Scripting

Client-side scripting involves one of two languages: VBScript (from Microsoft) or JavaScript (from Netscape). In both, the scripting code is embedded directly into the HTML document, and contained within a <SCRIPT> tag block:

<SCRIPT LANGUAGE="JavaScript">
<!-- hide
   ...
// end hide -->
</SCRIPT>

When using VBScript, the LANGUAGE attribute is set to VBScript, but the general configuration of the block is identical.

Any browser that doesn't support scripting ignores the tag but still displays the script statements as text for display. To hide the script body from these browsers, wrap the entire body of the script in an HTML comment statement ().

CAUTION

JavaScript and VBScript handle the comment wrapping in different ways. After JavaScript encounters the first half of the comment tag () needs to be prefaced with a JavaScript comment identifier (//) to prevent JavaScript from trying to interpret the tag as JavaScript.

VBScript, on the other hand, doesn't make this assumption. In fact, prefacing the closing comment tag with a VBScript comment identifier (') will cause VBScript to generate an error. Therefore, the proper structure of a VBScript block is:

<SCRIPT LANGUAGE="VBScript">

<!-- hide

...

-->

</SCRIPT

Even though both Navigator and Explorer support JavaScript, Explorer doesn't fully implement the language. Therefore, if you're going to do heavy Java scripting later in your site, you'll want to redirect Explorer users to another set of pages. Trying to get one page set to handle both browsers would require a good deal of conditional scripting. Fortunately, this is done quite easily because both browsers support the JavaScript location object, which permits you to force the loading of a new page. Determine which page to load by examining the navigator object, which stores information about the browser. The conditional loading of different pages for Navigator and Explorer is demonstrated in listing 1.3.

Listing 1.3 Java Scripted Page Selection

<SCRIPT LANGUAGE="JavaScript">
<!--
   if(navigator.appName.indexOf("Netscape") != -1) {
      location.href = "netscape/index.html";
   } else {
      location.href = "microsoft/index.html";
   }
// -->
</SCRIPT>

At this point, there's only one more case to cover before the client-side gatekeeper is complete: browsers that don't support scripting (such as Mosaic). Because a page containing JavaScript has to be loaded and processed, it's desirable to prevent the no-script version from displaying in Navigator or Explorer. This is accomplished by using a <NOSCRIPT> tag.

Faking <NOSCRIPT>

With the release of Navigator 3.0, Netscape coined several new HTML tags, including <NOSCRIPT>. The purpose of a <NOSCRIPT> block is similar to that of <NOFRAMES>-it identifies an HTML block for processing by a browser that doesn't support scripting. However, <NOSCRIPT> is not part of the HTML standard and, because of this, many other browsers will probably not adopt it. Explorer doesn't support it, so it's not the best option for specifying scriptless HTML. Fortunately, there is another way that works across all browsers and it relies on a unique attribute of JavaScript.

While the JavaScript interpreter looks inside the comment tag block for its script code, it has the unique function that once it encounters a comment-end (-->) tag, it ignores everything else on that line. Non-JavaScript browsers, on the other hand, pick up the processing after the comment closes and interpret the rest of the line as valid HTML.

This means that you can place an empty comment () at the beginning of a line inside the <SCRIPT> tag and get an HTML statement that JavaScript (and browser) ignores, but displays to non-scripting browsers. This effectively creates the equivalent of a <NOSCRIPT> … </NOSCRIPT> block, as demonstrated in listing 1.4.

Listing 1.4 Creating a <NOSCRIPT> Block

<SCRIPT LANGUAGE="JavaScript">
<!-- begin hide
   // JavaScript browsers process this
   ...
// end hide -->
</SCRIPT>

<SCRIPT LANGUAGE="JavaScript">
<!-- -->Non-Script browsers will process this
</SCRIPT>

NOTE

Even though the text in the second <SCRIPT> block is intended for browsers that don't support scripting, it is necessary to specify JavaScript as the scripting language because of how Internet Explorer processes the script tag.

Additionally, while Explorer also supports VBScript it doesn't permit "no-script" lines within a VBScript code block

TIP

Because of a quirk in Microsoft's implementation of JavaScript, you can't mix the "no-script" technique and actual JavaScript code within the same script tag (a trick that Navigator allows). However, by simply moving the "no-script" code into a separate <SCRIPT> tag block, you avoid this problem

This technique gives you the ability to code specific page objects, such as graphics, anchors, even plain text, that are displayed for different browsers, depending on what scripting language the browser supports, hence making it possible to handle both advanced and simple browsers within the same documents.

However, all browsers are not alike. For example, Internet Explorer 2.1 for the Macintosh does support frames and other advanced HTML features, but does not support JavaScript. In fact, Explorer 2.1/Mac treats the <SCRIPT> tag differently from any other browser-it ignores everything within the tag, whether the tag contains JavaScript code or "no-script" statements.

As such, trying to support all possible current browsers within the same document set is simply not possible, unless you're willing to sacrifice many of the advanced browser features.

The Completed GateKeeper

If you want to use the latest and greatest Web technology and support the greatest number of browsers, one method is to split your site into two collections of documents, each optimized for a different type of browser. It's not necessary to create multiple versions for all documents-only those that make extensive use of advanced features or multimedia files. To automatically direct surfers to the pages best suited for their browsers, you need a "gatekeeper" or "redirection script." Using the techniques demonstrated thus far, you can create a gatekeeper front page that will perform the following:

Send Navigator users to the Netscape-optimized part of your site.
Send Explorer users to the Microsoft-optimized part of your site.
Take users of other browsers through a third part of your site that works best for them.

Listing 1.5 demonstrates the overall structure of the gatekeeper.

Listing 1.5 A Gatekeeper Front Page

<HTML>
<HEAD>
   <TITLE>GateKeeper</TITLE>
</HEAD>
<BODY>

<SCRIPT LANGUAGE="JavaScript">
<!--
   if(navigator.appName.indexOf("Netscape") != -1) {
      location.href = "netscape/index.html";
   } else {
      location.href = "microsoft/index.html";
   }
// -->
</SCRIPT>

<SCRIPT LANGUAGE="JavaScript">
<!-- --><H1>Welcome!</H!>
<!-- --><HR>
<!-- -->This site utilizes the advanced features of
<!-- --><A HREF="http://www.microsoft.com/ie/">Internet Explorer</A>
<!-- -->and
<!-- --><A HREF="http://home.netscape.com/">Netscape Navigator</A>.
<!-- -->Download your copy now!
</SCRIPT>

</BODY>
</HTML>

NOTE

While this seems like extra work, depending on the site you're constructing, it might very well be worth it. If you're designing commercial sites, you'll probably encounter a client or two who wants everyone to have access to the site regardless of the browser they use-and still support all the advanced features that make the Web so multimedia-rich.

You can, however, save yourself some work (and maintenance) with some judicious JavaScript coding. Because Navigator and Explorer both support the navigator object and it's appName and appVersion properties, you can "wrap" browser-specific pieces of code with conditional checks that make certain code work for one browser or the other. This effectively reduces your work from three distinct sets to two: JavaScript and non-JavaScript.

Examples of this technique are found throughout the book.

If you want to take advantage of both the latest Web technologies and still make your site accessible to those not using the latest browsers, you'll need to use some form of gatekeeper.

From Here…

This chapter starts the adventure through the world of Web scripting by examining several techniques for screening surfers through a site based on the type of browser used. This provides an excellent basis for several other techniques. If you're interested in seeing browser screening in action, check out:

Chapter 2 "Plug-In Identification," examines how to tell if a plug-in is installed on a user's browser.
Chapter 7 "Animating Images," tells you how to use HTML, GIFs, and client-pull.
Chapter 24, "Designing an eZine," shows you how to create and publish your own electronic online magazine.

Chapter 2 Plug-In Identification

CONTENTS

Extending the Browser
Introduction to MIME Types
Placing Multimedia Objects- The <EMCDD> Tag
Enhancing Control via JavaScript
JavaScript Under Navigator 3.0
From Here…

The Web has always been a place of graphics and sound, but until Netscape introduced the concept of a plug-in, you were limited to static pictures, simple sounds, and the use of helper applications-even if the online content got too fancy for the browser alone to handle.

Plug-ins require the installation of additional software that takes advantage of the new technology. Without it, a page "breaks," displaying a "cracked" icon in place of your graphic work.

Extending the Browser

Plug-ins and helper applications both view content that the browser wasn't designed to handle. They both run the same way-the browser checks each content object loaded against a master list of what program can handle the particular object. If a match is found, the corresponding plug-in or helper launches to handle the object. At this point, however, the similarity ends.

An Object-Oriented Web?

You'll encounter the term object often when dealing with the Web, and it bears a little explanation. As far as the Web is concerned, anything and everything that's included within an HTML page (graphics, hyperlinks, text, forms, multimedia files, anything) is considered an object.

For many of the common object types such as GIF, .JPG, and plain old HTML, the browser itself is considered a "helper," with the program to handle that particular object built in. This makes it possible to reconfigure the browser and have even these common types handled by other programs.

After helper applications launch, they run within their own windows like regular programs. The user must then switch between the browser window and the helper window as well as manually close the helper down.

Plug-ins, however, run within the browser's client area and appear to be part of the Web page. If there are any additional controls (buttons, gauges, sliders, and so on) needed by the plug-in, they become embedded in the Web page (see fig. 2.1).

Figure 2.1 : A plug-in (such as ichat) takes total control of the browser and reconfigures it as an entirely new program.

Introduction to MIME Types

In order for the Web server to properly transmit an embedded object, it needs to know what mime type identifier to send to the browser. Whenever a server sends an object, like an HTML file, an image file, or a QuickTime movie, it's prefaced with a header that contains information that informs the browser about the kind of data that's going to arrive. MIME (Multi-Purpose Internet Mail Extensions) is a freely available specification that provides a way for computers to exchange:

Text in different character sets
Graphics
Sound
Multimedia
Just about anything else

A defined list of mime types accomplishes these tasks. Additionally, the standard is open-ended, and additional types can be defined by anyone. When two computers exchange information (as in a Web session), MIME helps both sender (server) and receiver (browser) figure out what to do with the data.

In order for this system to work properly, the server must know the proper MIME type identifier to transmit to the browser. For most commercial systems, the Web administrator handles this through the server's configuration files. If this is not your case, specify the MIME-type declaration locally within your own directories by adding a type-identifier line to a configuration file. The required name of this file is dependent on the configuration of your server. It is often called .htaccess and is placed in the same directory as your home page. You may need to contact a Web administrator for further information. An example .htaccess file is:

Options All
AddType application/x-director .dcr

This instructs the server to transmit any file ending with .dcr to the browser with an application/x-director MIME type, and instructs the browser to feed the file to Shockwave, assuming it is installed.

NOTE

The .htaccess file controls how the Web works for the directory and is located in all subdirectories. Because of this, you may have multiple .htaccess files in different directories. This can be beneficial when trying to control access to parts of your site. Chapter 6 "Controlling User Access," covers this in more detail.

Each type of multimedia file has a different MIME type. To determine what the correct type identifier is for a particular file, contact your Web administrator or check out the home page of the company that developed the plug-in or helper application.

Placing Multimedia Objects- The <EMCDD> Tag

After configuring the server to deliver the files correctly, all that's left is to inform the browser where to put the object (the multimedia file) on the page. You can accomplish this with the HTML <EMBED> tag:

<EMBED SRC="srcfile" WIDTH=x HEIGHT=y>

where:

SRC-identifies the source file.
WIDTH-specifies the width of the display area (in pixels).
HEIGHT-specifies the height of the display area (in pixels).

CAUTION

The values of the WIDTH and HEIGHT attributes include the size of the multimedia image as well as the space taken up by any controls the plug-in creates. The QuickTime plug-in, for example, adds a control bar at the bottom of the frame. Add the size of the control bar to the height of the movie. Depending on the plug-in, if the attributes are set too small, part of the displayed file may be clipped.

When designing pages that utilize plug-ins, remember to test the pages to make certain that any controls are also visible.

Browsers that can't handle the <EMBED> tag simply ignore it and display nothing. To provide a viewing alternative for these browsers, you can include a <NOEMBED> tag block immediately after the <EMBED> tag:

<NOEMBED>
   You really should get the plug-in ...
</NOEMBED>

Browsers that support embedding will ignore any <NOEMBED> tag and its contents.

Enhancing Control via JavaScript

Netscape Navigator 1.1N recognizes the <EMBED> tag and treats it as an OLE link. This means that embedded objects show up as a "broken icon" even if there is a defined <NOEMBED> block. To make matters worse, Navigator 2.x and subsequent versions alert the user to the missing plug-in (fig. 2.1) and ask for help.

Figure 2.2 : If you attempt to download a QuickTime movie without an installed plug-in, an alert dialog displays and requires the user to select an external (helper application).

NOTE

OLE stands for Object Linking and Embedding. It's the mechanism that permits users to take one kind of file, like a graphic or spreadsheet, and "embed" it within another type of file, like a word processing document.

Fortunately, Netscape 1.1N doesn't recognize JavaScript and makes it possible to wrap the EMBED tag with script code. This ensures that it's only interpreted by Navigator 2.0 or later because Navigator 1.1N ignores everything within the body of the <SCRIPT> tag. The resulting code block shows in listing 2.1:

Listing 2.1 Using JavaScript to Combat Netscape 1.1N

<!-- Embed -->
<SCRIPT LANGUAGE="JavaScript">
<!-- begin hide 
   document.write('<EMBED SRC="sourcefile" WIDTH=x HEIGHT=y>');
// end hide -->
</SCRIPT>

<NOEMBED>
   <!-- provide a visual placeholder for EMBED-less browsers -->
   <IMG SRC="imagefile" WIDTH=x HEIGHT=y>
</NOEMBED>

If the WIDTH and HEIGHT attributes of imagefile match those of srcfile, the placeholder graphic occupies the same space and position of the missing embedded object while keeping your screen layout constant. There are a couple of caveats to this technique:

If someone is using Navigator 2.0 but does not have the plug-in that handles the specified file installed, a "broken" icon displays.
For many plug-ins, the WIDTH and HEIGHT attributes must be specified for the <EMBED> tag. Navigator crashes if you fail to do this. Indicating these attributes decreases the load time for your page by reducing the amount of Navigator computation.

JavaScript Under Navigator 3.0

With the release of Navigator 3.0, JavaScript has been extended in several ways. One extension is the addition of a plugins[] array to the navigator object. The plugins[] array is a complete list of plug-ins currently installed in the browser that were identified by the length property. Within the array, each element has a name property, which is a string holding the name of the plug-in. This makes it possible to determine whether a particular plug-in is installed simply by scanning the array. An example of a JavaScript code fragment that searches for the Shockwave plug-in is indicated in listing 2.2.

Listing 2.2 Identifying the Presence of a Plug-In

var nplug = navigator.plugins.length;
var i = 0;

while (i < nplug) {
   if (navigator.plugins[i].name.indexOf('Shockwave') !=   1) {
      shock = 1;
   }

   i++;
}

NOTE

A sample document that displays all of a browser's installed plug-ins is included on the CD.

The plugins[] property is only supported by Navigator 3, and as Internet Explorer or earlier versions of Navigator support it, it's necessary to check if Navigator is being run. If so, what version is it? This is done by scanning the appVersion property of the navigator object for the string 2. (that's a 2 followed by a period). By specifying the decimal point, you prevent possibly matching 1.2, 3.2, or some other browser version you're not interested in. By checking the navigator object's appName property for the presence of Netscape, you determine whether Navigator or Explorer is being used.

NOTE

The JavaScript navigator object is a very handy component because it permits you to do very specific page configurations. These configurations were once restricted to CGI scripting. Because of this, you'll see navigator appear throughout the book, whenever client-side scripting involves browser-specific functions.

Pulling this all together, listing 2.3 gives you a generic function that identifies whether a given plug-in is installed. If running Navigator 3.0, it searches for the plug-in. If running Navigator 2.0 or Internet Explorer 3.0, it assumes the plug-in is installed. If running anything else, it assumes the plug-in is not installed.

Listing 2.3 A Generic Plug-In Checker

<SCRIPT LANGUAGE="JavaScript">
<!-- begin hide
function isPluginInstalled(strPlugin) {
   var fInstalled = false;

   if((navigator.appVersion.lastIndexOf('3.') != -1) && 
      (navigator.appName.indexOf('Netscape') != -1) {
      var nplug = navigator.plugins.length;
      var i = 0;

      while (i < nplug) {
         if (navigator.plugins[i].name.indexOf(strPlugin) != -1) {
            fInstalled = true;
         }

         i++;
      }
   } 
   else
   if((navigator.appVersion.lastIndexOf('2.') != -1) ||
      (navigator.appName.indexOf('Microsoft') != -1) {
      fInstalled = true;
   }

   return fInstalled;
}
// end hide -->
</SCRIPT>

NOTE

The decision to assume the installation of a plug-in under Internet Explorer is purely arbitrary. You can always choose to assume that an Explorer user has no plug-ins and was made in anticipation of both the future support of the plugin[] property by Explorer and the growing number of plug-ins that are becoming available for Microsoft's browser.

With this little tool, you can customize your pages to display as cleanly (no "broken icons" or irritating "you need a plug-in" messages) as possible, regardless of whether a plug-in is installed or not. A code fragment that uses the isPluginInstalled() function to embed a Shockwave file is demonstrated in listing 2.4:

Listing 2.4 Testing for the Presence of Shockwave

<!-- Embed a Shockwave for Director file -->
<SCRIPT LANGUAGE="JavaScript">
<!-- begin hide
   if(isPluginInstalled('Shockwave')) {
      document.write('<EMBED SRC="myshock.dcr" WIDTH=100 HEIGHT=50>');
   } else {
      document.write('<IMG SRC="noshock.gif" WIDTH=100 HEIGHT=50>');
   }
// end hide -->
</SCRIPT>

<NOEMBED>
   <!-- provide a visual placeholder for EMBED-less browsers -->
   <IMG SRC="noshock.gif" WIDTH=100 HEIGHT=50>
</NOEMBED>

Remember that the IMG placeholder needs to be included twice-once for browsers that are JavaScript-enabled but don't have the plug-in (or that can't determine whether it's installed), and once for older browsers that support neither EMBED nor JavaScript.

From Here…

This chapter examines the different methods used to identify plug-ins installed on a user's browser and gives suggestions for accommodating different browsers.

Chapter 1 "Browser Identification," explains the auto-load method. You can use this method to jump the users to a configuration page that allows them to easily download and install any plug-ins they may need.

Chapter 3 Tracking Hit Counts

Chapter 3 -- Tracking Hit Counts

Chapter 3 Tracking Hit Counts

CONTENTS

Server Access Logs
More Efficient Counting
Graphic Counters
Generating Server Statistics with wusage
User-Specific Access Tracking
- Baking Up a Batch of Cookies
From Here…

Now that you've started building "the ultimate site," you'll probably want to sate your ego by knowing exactly how many or how few people are bothering to stop by your little corner of the Web. If you're running a commercial site and get funding from advertisers, you'll need demographic information to prove to the people paying you that their money is well spent. In short, you'll need to track hits, or accesses, to your Web pages.

Server Access Logs

When someone's browser requests your Web page, that page is said to be hit, or accessed. Web servers track, in varying degrees, these hits and the information stores in the access log somewhere within the server's directory structure. From the information in the access log, you can identify what pages on your site have been requested, how many times, and by whom.

Hit or Miss?

In current Web parlance, the term hit has a somewhat broader definition than access. While hit corresponds to the loading of a page and all the embedded objects it may contain, access corresponds to the loading of one object within a page.

For example, if you have a page with 10 graphics, a hit on that particular page generates 11 access entries in the log-one for the page itself and one for each graphic. For highly complex pages with multiple graphics, frames, server-side includes, and so on, the total access count inflates by the number of individual objects within the page. If 100 people visit your page and you have 10 graphics on that page, then you have 100 hits, but 1,000 accesses. Naturally, from an advertising standpoint, talking about accesses makes a site sound much more popular than it actually is.

In previous years, this was a closely guarded secret, known only to the Web administrators. It allowed unscrupulous administrators and marketers to claim incredible activity on their sites just by listing the accesses instead of individual user visits. However, in recent years the word's gotten out, advertisers are more savvy, CGI scripters have gotten smarter, and access statistics are more in line with actual user visits.

The easiest way to count hits is to utilize the log files kept by your Web server. As with any other piece of software, what the file is named and where it's located varies. For example, the NCSA servers create a log file called access_log, which is, by default, stored in a logs/ subdirectory off the server's root. A sample of the information written to access_log is shown in listing 3.1, although the exact amount of information maintained can be configured. Consult the documentation for your server for more information on how to do this.

Listing 3.1 Sample from access_log

px1.mel.aone.net.au - - [24/Jun/1996:00:02:46 -0500]
"GET /~sjwalter/javascript/ HTTP/1.0" 200 5245
px1.mel.aone.net.au - - [24/Jun/1996:00:02:50 -0500]
"GET /~sjwalter/javascript/nn/index.html HTTP/1.0" 200 2793
px1.mel.aone.net.au - - [24/Jun/1996:00:02:55 -0500]
"GET /~sjwalter/javascript/nn/index2.html HTTP/1.0" 200 666
px1.mel.aone.net.au - - [24/Jun/1996:00:02:57 -0500]
"GET /~sjwalter/javascript/nn/que.html HTTP/1.0" 200 523
px1.mel.aone.net.au - - [24/Jun/1996:00:02:58 -0500]
"GET /~sjwalter/javascript/nn/index.html HTTP/1.0" 200 2293

As you can see, the amount of information available in the access log is rather extensive. The times specified for each page's request are close because this is a framed site. Each individual HTML document generates another access entry. If more of the log were printed here, you'd also see an access entry for each graphic displayed on each page.

One thing worth noting is the GET request in the first line:

GET /~sjwalter/javascript/ HTTP/1.0

No file is specified because the user accessed the main page using aliasing. If the server is configured for it, specifying a URL with only a path and no file name causes a default file, (often default.htm, index.htm, or index.html), to be handed back to the browser, like this:

http://www.visi.com/~sjwalter/

Because of this, if you want to scan the access log for hits, you need to look both for a specific page (your home page, for instance) and for an alias reference. To actually scan the log, use the UNIX grep command, which searches one or more files for a particular string. The general syntax for grep is:

grep pattern fileName

The string to search for is pattern and the file to search for is "fileName." By default, grep prints out every matching line it finds in the specified file with the addition of the -c parameter. You can instruct grep to suppress the normal output and just print a count of the matching lines. The simple CGI program that follows takes advantage of this and counts the number of home-page accesses in the specified directory (see listing 3.2). This listing assumes that the home page is named index.html.

Listing 3.2 A Simple Access Counter

#!/usr/local/bin/perl

$homePage = "/%7Esjwalter";
$logFile  = "/var/httpd/logs/access.log";

print "Content-type: text/html\n\n";
$num  = 'grep -c 'GET $homePageURL/ HTTP'      $logFile';
$num += 'grep -c 'GET $homePageURL/index.html' $logFile';
print "$num\n";

NOTE

In listing 3.2, the $homePage variable contains the sequence %7E. This is the ASCII equivalent of the tilde (~) and is used because special characters, like the tilde, are often encoded. This means that they convert to their ASCII numeric representation when written out.

This is different from escaping text, where the character is preceded by a backslash (\) in order to render it as normal text, instead of a Perl metacharacter.

To use this program, include it in your home page as a Server-Side include file, which listing 3.3 demonstrates. The result is a count similar to that shown at the bottom of figure 3.1.

Figure 3.1 : A simple CGI script can be implemented to create a text-based access counter.

Listing 3.3 Implementing a Simple Counter in HTML

<html>
<head><title>Welcome to My Home Page</title></head>
<body>
This page has been accessed
<!--#exec cgi="access1.cgi" -->
times.
</body>
</html>

Lack of efficiency is a problem with grepping the server's access log. It takes several seconds to read through the log file of an active server and most servers do not want to wait the additional seconds just to learn the access count. A more efficient technique is to maintain a separate file on the server that contains the access count.

More Efficient Counting

To circumvent the additional overhead of having to scan an entire server log file to compute each access, store the access count in a temporary file on the server. Then, using a slightly different script, follow these steps:

Open the file.
Read the current counter value from the file.
Increment the counter value.
Write the new value back to the file, overwriting the old value.
Close the file.
Write the new value back through the server so it appears on the page in the user's browser.

The process of opening, reading, writing, and closing a file is easily done with Perl. One additional factor, however, now comes into play. Because it's possible for more than one user to be accessing your site at any given time, it's possible for the counter file to be accessed from different connections simultaneously. If 10 users hit your page at the same time, each one would see the same access count and whoever is the last one to write the file out is the one who sets the value for the next user. This can skew the access count unless some method can signal each simultaneous access as a new hit, so some form of locking mechanism is necessary.

Sometimes referred to as a semaphore, a lock is a file that signals that something is happening. In the case of counter access, whoever opens the counter file first writes out a lock file. Anyone else attempting to access the counter has to wait until the lock file is deleted (a matter of seconds). Once the lock file disappears, anyone can quickly establish his or her own lock and then the process continually repeats up to the last access. This maintains an accurate access count. An example of simple locking is shown in listing 3.4.

Listing 3.4 Implementing File Locking

while (-e lockFile) {
   select(undef, undef, undef, 0, 1);
}

open(LOCKFILE, ">lockFile");
... # retrieve and increment the counter
close(LOCKFILE);
unlink(lockFile);

TIP

The while() loop in Listing 3.4 keeps checking for the existence of lockFile and, if found, meaning that someone else is accessing the counter log, performs a dummy buffer select, which is a relatively fast process. This means that the loop will cycle very quickly, minimizing the wait a user would encounter on a busy site.

However, because this loop executes so fast (and continuously), it also takes its toll on system response, especially if your site is extremely busy. An alternative loop that isn't as hard on the system would be:

while (-e lockFile) { sleep(1); }

This puts the process (the script, in this case) to sleep for 1 second, thus not using any system resources. While this loop executes more slowly (once each second), a one-second wait is unnoticed for the normal, modem-based user.

Graphic Counters

In the multimedia world of the Web, text-based access counters are somewhat bland. More often than not, the counters you find on pages are graphic, providing a more visually appealing display.

Converting your counter from a text-only to a graphic counter is simple. All you need is a collection of 10 image files-one for each digit from 0 through 9. Then, instead of printing out the number you read from the access log, you'd step through the number digit-by-digit and "print" out the corresponding image file. Listing 3.5 demonstrates a Perl fragment that handles this type of counter in a rather unusual way.

Listing 3.5 A Graphic Access Counter

...
# $count is assumed to have the current access count
print "<TABLE CELLPADDING=0 CELLSPACING=0 BORDER=0>";
print "<TR>";

for ($i=0; $i<length($count); $i++) {
   $digit = substr($count, i, 1);
   print "<TD><IMG SRC=\"$imagedir/$digit\.gif\"></TD>";
}

print "</TR></TABLE>";

What's different about listing 3.5 from many graphic counters is that it doesn't construct a bit map dynamically. Rather, it generates HTML code that formats the individual digits into a table, and lets the browser do the work of requesting the appropriate images from the server.

TIP

While this technique is not as efficient as having your Perl code generate the entire count as a single bit map, it permits you to do special visual tricks with your counter. For example, each individual graphic could be a small animated GIF.

If, however, you'd rather have Perl do all the work in generating your counter, you'll find examples of bit-map construction on the companion CD-ROM.

Generating Server Statistics with wusage

While access counters track the number of hits a page takes, often it's more valuable to be able to analyze the hit counts to look for patterns. What times of day are the most hits recorded, from what domains, what pages are hit the most in a site, and so on, are some examples. For those not interested in writing their own Web activity analysis program from the ground up, there is a wonderfully robust tool for generating server statistics-wusage. Available from http://www.boutell.com/wusage/, wusage generates weekly usage statistics of the following information:

Total server usage
Response to Isindex pages, or Index" usage
The top 10 sites by frequency of access
The top 10 documents accessed
A graph of server usage over many weeks
An icon version of the graph for your home page
Pie charts showing the usage of your server by domain

The only major requirement is that the program needs to be run on a periodic basis, usually once per week through a server maintenance script. An example of the output wusage can generate is shown in figure 3.2.

Figure 3.2 : The wusage statistics program generates a visual display of the activity of your Web server.

User-Specific Access Tracking

The techniques covered so far in this chapter deal with how many times your site has been accessed. You can also track how many times a particular user visited through the use of cookies. While server tracking relies on logs stored on the server, cookies are stored with the user's browser.

Baking Up a Batch of Cookies

Cookies (or Persistent Client State HTTP Objects) are a mechanism which both the server (and client, through JavaScript) stores and retrieves information from the client side of the connection. Every cookie has the following components:

NAME=VALUE-the only required component of a cookie, this is a sequence of characters that assign a name and value to the cookie. Neither the NAME nor VALUE part contains semicolons. Semicolons, commas, or white space are used to separate cookie components. If there's a need to use these characters, use escape coding as an ASCII value.
expires=DATE-option which defines the lifespan of the cookie. An example DATE string is:
Mon, 10-Jun-1996 23:14:25 GMTIf not specified, the cookie only exists until the browser is shut down.
domain=DOMAIN_NAME-identifies the valid cookie domain. By default, this is the domain of the server which generates the cookie response.
path=PATH-identifies the path and subdirectories within the valid cookie domain. By default, this is the path of the document associated with the cookie.
secure-an optional flag that indicates that the cookie only transmits if the connection between server and browser is a secure one. This can be done by utilizing SSL, the Secure Sockets Layer. By default, cookies are not secure.

CAUTION

There are several limits imposed on cookies:

A maximum of 20 cookies can be created for any given domain. Any attempt to set additional cookies will cause the oldest cookies in the file to be overwritten.
A given client (browser) can only store a maximum of 300 cookies. Like the 20-cookie domain limit, exceeding the 300-cookie limit will result in old cookies being overwritten.
Each cookie cannot exceed 4K (4096 bytes) in size.

Originally limited to server-side manipulation, JavaScript makes accessing cookies from within the browser a snap. The process for updating a cookie counter is similar to that of updating a server-side counter:

Read the cookie value or assume a value of 0 if the particular cookie doesn't exist.
Increment the counter value.
Write the cookie back out.

The full source code for a cookie-based counter is available on the CD-ROM.

NOTE

Currently, client-side cookie manipulation (via JavaScript) is supported within Netscape Navigator, but not Internet Explorer.

For a trick that creates "pseudo-cookies" that work in both Navigator and Explorer, check out Chapter 27, "Power Scripting Toolkit."

From Here…

This chapter introduces the principles behind hits and access counters and how to implement them within your site. The techniques discussed can be extended in a variety of different ways, such as customizing your access monitoring to create a special page for the one millionth user access and using user-specific tracking, customize your site to display a special message (or link) with the user who visited for the 50th time.

Chapter 8 "Advertising with Billboards," demonstrates attaching a counter to an advertising banner to see how many times a particular advertisement is viewed.

Chapter 4 Saving Configurations with Cookies

Chapter 4 -- Saving Configurations with Cookies

Chapter 4 Saving Configurations with Cookies

CONTENTS

What Are Cookies?
- Cookies and Security
- Who Can Cook?
Cookie Specifics
Generating Cookies
Using JavaScript Cookies
From Here…

Even if you're running a framed site, you may find that you need to provide frame users the ability to turn off the frames for faster surfing. If you offer such an option, a nice addition for your site would be to have it "remember" that a particular user surfed without frames before, and then have it automatically return them to that mode the next time they visit. To pull this off, you need to be able to store information about the user. The Web mechanism that makes this possible is the persistent client-state HTTP object, more commonly referred to as a cookie.

What Are Cookies?

When you run a program on your computer, it may store information (window placement, the name of the last file loaded, and so on) for use the next time you fire up the same application. The Web can do a similar trick, storing information sent from the server for use during a future browser session. These little "tidbits" of data are called cookies, and they can literally consist of anything: a user ID and password, the number of times a person has visited a site, the date and time of the user's last visit, and so on.

With cookies, a Web master can do these three things:

Enhance the attractiveness of a site by using them to tailor the site to its visitors, therefore making the site more useful and enjoyable.
Track information internally to get a better idea of what people like and don't like on a site.
Add functionality and simplicity for the Web visitor.

Cookies are initially sent from the server to the browser, and are stored in a file by the browser until the next time you surf by the same page. The next time you drop by and your browser requests the page from the server, it also sends the server any cookies associated with that page-or page tree, as you'll see later.

Cookies and Security

With the growing concern about security and information privacy on the Web, there is a good deal of misinformation about exactly what cookies can and cannot do.

Because cookies are designed to store browser-specific (or user-specific) data, they can help you with the following:

Track your travel through a given site. Granted, you don't need cookies to do this, but it makes things a bit easier when you do.
Help for developing marketing or statistical information, but only if they store relevant information, such as pages visited, times visited, and so on.
Work through proxies and can be used behind firewalls.
Remember configurations and other information that would help a commercial Web site better serve its visitors.

By themselves, cookies are not a security risk and cannot

Get data from your hard drive.
Retrieve your e-mail address.
Steal credit card numbers, password files, or other sensitive information.

Of course, if you were to provide any of the above to an HTML form, it's not outside the bounds of the script that processes the form to turn around and write much of that data as a collection of cookies back to your computer.

Even if you do provide such information to a server and the server writes a cookie, the cookie is restricted (by design) to be related only to the server that wrote it. In other words, you can't write a server program that reads another server's cookies.

CAUTION

Some people recommend that you periodically delete the cookie file that your browser creates (for example, Navigator on the PC stores cookies in a file called cookies.txt in the same directory as the navigator.exe file) to ensure that sensitive information isn't stolen by unscrupulous servers.

While this does no damage to your system (if the browser can't find the cookie file, it simply starts up a new one), it constantly puts you in the position of being a "new user" for many sites that rely on cookies to help configure their site to your tastes.

Another tip is to "lock" your cookie file by making it read-only so no cookies can be written. This doesn't prevent cookies from being created, but it will prevent them from being saved.

TIP

If the thought of your server and browser exchanging information "behind your back" still bothers you, you can control cookies through a couple of different means-either through the browser directly or through a plug-in.

Internet Fast Forward is a plug-in that installs into Navigator and can be used to prevent or monitor cookie transmissions.

For control from within the browser itself, both Navigator 3.0 and Explorer 3.0 offer configuration options that allow the browser to warn you if a cookie is about to be exchanged and (optionally) not permit it.

Be cautioned, though, that some sites will simply not permit you to go any further should you refuse to store their cookie, as some of them use the cookie as a security access key or a tracking flag.

Who Can Cook?

While most of the people who surf the Web use Navigator, Explorer, or Mosaic, there are still many other browsers out there, and not all of them support cookies. Also, because of licensing, there are customized versions of even the popular browsers, and some don't support cookies.

Digital Equipment Corp. has put together a script that tests your browser for cookie support, as well as displays the results of its tests on a rather broad selection of browsers. Here's where you can find its script:

http://www.research.digital.com/nsl/formtest/stats-by-test/NetscapeCookie.html.

Browsers aren't the only restrictions to cookie use; several servers in use don't support cookies, either. For a list of servers that do, or servers that require specific configuration, check out this Web site:

http://www.illuminatus.com/cookie_pages/servers.html.

Cookie Specifics

Cookies are transmitted from the server to the browser within a document's header. If you were to look at the header block in transit, you'd see that a cookie has the following format:

Set-Cookie: name=value; expires=date; path=pathName; 
domain=domainName; secure

The five fields that make up a cookie are:

name=value- The name and value for the cookie; it can consist of anything. For example, if you were using a cookie to store the number of times a user has visited your site, you could use a name of Visits and a value that stores the number of hits, which could be incremented each time the user stops by.

TIP

While the name and value fields can contain any kind of data, it's recommended that you avoid spaces and special characters. If you need to embed spaces or special characters within a cookie, you should encode these characters using URL-style %XX coding, where characters are replaced with a percent sign (%) followed by their hexadecimal ASCII equivalent, such as %20 for a space.

expires=date-Specifies the lifetime of the cookie. The date is specified in the following format:
Wdy, DD-Mon-YYYY HH:MM:SS GMTand must be in relation to GMT, so you must convert from local time to GMT before you set this field. If not specified, the cookie lasts only until the user closes the browser.

NOTE

While Internet Explorer requires the entire date string (including the time) before it recognizes a cookie as valid, Navigator, on the other hand, can deal with cookies whose expire strings are as short as 01-Jan-99 GMT.

path=pathName-Identifies the path on the server for which the cookie applies. Paths are defined "from the top down," meaning that the cookie will be good for all subdirectories below the specified directory. Most commonly, this is set to "/" (the root of the server), but if you are using another provider and running your site out of your own directories, you may wish to restrict the path to account for only your files. If not specified, path defaults to the path of the document that contains the cookie.

NOTE

Because of a bug in Netscape 1.1N, if you don't specify a path of at least "/" (the server root), the cookie won't get set.

domain=domainName-specifies the domain for which the cookie will be returned, and needs to have at least two or three periods (.) in it, depending on the top-level domain. Domains that end in ".com," ".edu," ".net," ".org," ".gov," ".mil," or ".int" require only two periods, while all other domains require three. For example, a domain of visi.com wouldn't work, but www.visi.com would. If not specified, domain defaults to the host name of the server that generated the cookie response.

NOTE

Requiring at least two periods keeps someone from creating a cookie that's good for all .com domains, for example.

secure-which (if present) indicates that the cookie should be transmitted only if you are running a secure server. If absent, the cookie will be sent regardless of the security of the connection.

CAUTION

Adding secure to the end of a cookie definition does not make the connection secure, it only keeps the cookie from being transmitted on a non-secure port. If you're not running a secure server (one that supports SSL) and you mark all your cookies as secure, none of them will be sent.

Of the various fields in a cookie, only name=value must be defined. Additionally, some other things to remember about cookies include:

Multiple cookies associated with a single document will be separated by ";" (semicolon-space).
The order you set a cookie's data fields in is important. Follow the order as listed within this chapter.
New cookies are written to the hard disk only when the user quits the browser. Modified cookies, however, are written out immediately.
To modify a cookie, the domain, path, and name portion of the data must match. Otherwise, it will make a new one.
According to the Netscape specifications, the browser is required to hold a maximum of only 300 cookies and no more than 20 cookies from the same path and domain. Browsers may choose to hold more cookies, but they aren't required to. If more cookies are added that exceed these limits, the oldest cookies in the file will be deleted.

Generating Cookies

Now that you know what a cookie is and what it does, it's time to look at how to "bake" your own. Cookies can be created in several different ways:

By sending a Set-Cookie header line in the HTML document header.
By embedding an HTML <META> tag within the document.
By manipulating the cookie string property of a document object.

The Set-Cookie Header

From the server-side, probably the easiest way to create a cookie is to include a Set-Cookie header within the header block of an HTML object. The Set-Cookie header line you've already seen in the previous section:

Set-Cookie: name=value; expires=date; path=pathName; 
domain=domainName; secure

Listing 4.1 is an example of setting a cookie using the response header.

Listing 4.1 Set-Cookie

#!/usr/local/bin/perl

...
print "Content-type: text/html\n";
print "Set-Cookie: myCookie=NewCookie; expires=07-Sep-99 GMT\n\n";
print "<HTML><BODY>Cookie Set</BODY></HTML>";
...

NOTE

Within an HTML object's header block, the order of the headers (Content-type, Set-Cookie, and so on.) isn't important. What is important is that the last header line has a blank line (an extra newline character) after it to inform the server that the header is finished and the object's body is coming next.

Deleting a cookie from an object is just as easy-you simply "set" the cookie, but make the expiration date sometime in the past:

print "Set-Cookie: myCookie=NewCookie; expires=01-Jan-70 GMT\n\n";

NOTE

Another way to delete a cookie is to "set" it, but leave the value attribute blank:

print "Set-Cookie: myCookie=; expires=07-Sep-99 GMT\n\n";

Unfortunately, Internet Explorer doesn't like this technique. If you attempt to delete a cookie in this manner, Explorer leaves the cookie untouched. N

The HTML <META> tag

The HTML <META> tag provides one mechanism for setting cookies. To set a cookie using a <META> tag, you'd employ the following syntax:

<META HTTP-EQUIV="Set-Cookie" Content="...">

where the Content attribute would contain the name, value, expires, domain, path, and secure fields.

The downsides of using the <META> tag, however, include:

Currently, only Netscape Navigator supports cookie setting via the <META> tag.
Unless you use a server-side script to generate the document (and, therefore, the tag), the cookie value is fixed. While this may work for some applications, for counters it's not practical.

Because of the restrictions with the <META> tag, using the Set-Cookie header is the preferred method.

Retrieving Cookie Data

Once you've created a cookie or two, reading the data back from within Perl is no different from reading in HTML form data-you work through an environment variable. In the case of cookies, the variable is HTTP_COOKIE, and pulling it from the environment retrieves every cookie that applies to the document, including any cookies that were created for documents in directories above the particular document.

Because individual cookie fields (and the cookies themselves) are separated by a semicolon and a space, the Perl fragment in listing 4.2 easily creates an array of cookie data.

Listing 4.2 Retrieving Cookie Data

#!/usr/local/bin/perl

if(defined $ENV{HTTP_COOKIE}) {
   @cookieArray = split(/; /,$ENV{HTTP_COOKIE})
}

Once you've created your cookie array, scanning for a particular cookie is simple, as demonstrated in listing 4.3.

Listing 4.3 Getting a Cookie Value

# @cookieArray has been loaded previously
#
function GetCookie {
   $cookieName  = ARGV[0];
   $cookieValue = null;

   foreach(@cookieArray) {
      if($_ =~ /$cookieName/) {
         ($cookieName, $cookieValue) = split (/=/,$_)
      }

   $cookieValue;
}

TIP

Listing 4.2 demonstrates a format of the Perl foreach statement that might be unfamiliar to some, because it has no "item variable" like the following version:

foreach $cookie (@cookieArray) { if($cookie =~ /$cookieName/) { ...

and instead uses the Perl special variable $_. $_ is the default pattern matching variable, and when no other variable is specified, is given the result of the pattern match.

Browser Testing

Because some browsers don't support cookies, having a little script that can identify whether a user's browser does or doesn't is a nice little treat. You can then quietly direct them to the appropriate part of your site (cookie or cookie-less), and it demonstrates another Perl trick in the process.

Listing 4.4 is an example of such a "cookie taster." It works by:

Trying to set a test cookie.
Redirecting the browser to load the page again, with a query string switching the script into "taste test" mode.
Looking to see if the cookie previously set actually exists.
Redirecting the user to a different document, depending on whether the cookie exists.

The reason I include this is because it is transparent to the browser; he or she just thinks that it takes a little bit too long for your first page to load.

Instead of printing a response, the taster.cgi could redirect browsers again to appropriate pages.

Listing 4.4 Cookie Taste Test

#!/usr/local/bin/perl

$me = 'taste.cgi';

if($ENV{'QUERY_STRING'} eq 'TEST') {
   if($ENV{'HTTP_COOKIE'} =~ /Cookie=Test/) {
      $newDoc = "cookieDoc.html";
   } else {
      $newDoc = "noCookies.html";
   }

   print "Location: $newDoc\n\n";
} else {
   #
   print "Location: $me?TEST\n";
   print "Set-Cookie: Cookie=Test\n\n";
   print "<HTML><BODY></BODY></HTML>";
}

NOTE

This will not work on all servers, because some servers optimize the header information by putting all header lines on one physical line by removing the newline characters between individual header fields. According to the HTTP specification, this is valid, but Netscape Navigator won't recognize a Set-Cookie directive unless it's on a line of its own.

For a list of servers that handle cookies properly (and those that don't, and why), check out:

http://www.illuminatus.com/cookie_pages/servers.html.

Using JavaScript Cookies

The cookie property of the document object is the JavaScript wrapper for the cookie interface. Just as cookies are a very long string in Perl, in JavaScript the cookie object is of the string type. Therefore, the manipulations to create, delete, and read cookies are very similar to their Perl counterparts.

Creating JavaScript Cookies

Listing 4.5 is a JavaScript function that creates a cookie. It takes advantage of a JavaScript function's ability to handle more parameters than are defined by testing the arguments property of the function.

Listing 4.5 Setting a Cookie with JavaScript

function SetCookie(name, value) {
   var argv    = SetCookie.arguments;
   var argc    = SetCookie.arguments.length;
   var expires = (argc > 3) ? new Date(argv[3]) : null;
   var path    = (argc > 4) ? argv[4] : null;
   var domain  = (argc > 5) ? argv[5] : null;
   var secure  = (argc > 6) ? argv[6] : false;

   document.cookie = name + "=" + escape(value)
      + ((expires == null) ? "" : ("; expires=" + expires.toGMTString()))
      + ((path == null) ? "" : ("; path=" + path))
      + ((domain == null) ? "" : ("; domain=" + domain))
      + ((secure == true) ? "; secure" : "");
}

You use this function as follows:

SetCookie(name, value [, expires, path, domain, secure]);

where "name," "value," "expires," "path," "domain," and "secure" correspond to the previously introduced cookie components. Note that the last four parameters are optional (as indicated by the square brackets). For example, to set a cookie named count to the number of times a user has visited your site, you could call SetCookie() as follows:

SetCookie("count", "5");

which would make the count cookie available to all the pages on your site because "domain" and "path" revert to their default values. The cookie itself, because the expires property wasn't specified, would exist only until the user closes his or her browser.

Deleting a cookie through JavaScript is no different from deleting a cookie in Perl-simply set the cookie's expires parameter to a time in the past.

Retrieving JavaScript Cookies

Listing 4.6 demonstrates retrieving cookies through JavaScript. As with Perl, you scan through the cookie string looking for the substring name=, where name is the desired cookie. If the substring is found, everything after the equal sign and before the next semicolon will be the cookie's value.

Listing 4.6 Retrieving Cookies with JavaScript

function GetCookie(name) {
   var arg  = name + "=";
   var alen = arg.length;
   var clen = document.cookie.length;
   var i    = 0;

   while(i < clen) {
      var offset = i + alen;

      if(document.cookie.substring(i, offset) == arg) {
         var iEnd  = document.cookie.indexOf(";", offset);

         if(iEnd == -1) {
            iEnd = document.cookie.length;
         }

         return unescape(document.cookie.substring(offset, iEnd));
      }

      i = document.cookie.indexOf(" ", i) + 1;

      if(i == 0) {
         break;
      }
   }

   return null;
}

Accessing JavaScript Cookies in Internet Explorer

Internet Explorer supports cookies, but not from within JScript, Microsoft's name for its implementation of JavaScript. Fortunately, getting around this is relatively easy, if all you intend to use cookies for is keeping track of things during the current visit to your site.

Basically, you "wrap" the cookie functions with a browser test:

if(navigator.appName.indexOf("Netscape") != -1) {
   // Safe to use cookie object
} else {
   // no cookie object, go to Plan B
}

Even though Explorer doesn't support cookies, you can still make use of them by doing the following:

Creating global variables of the same name as your cookies.
Using the conditional wrapping test shown. "Plan B" is a code block that either sets the local variable or retrieves its value.

This is actually easier than it sounds, thanks to the JavaScript eval() function, which takes its parameter and evaluates it as though it were a JavaScript statement. This means that

eval("myGlobal=10");

would set the global variable myGlobal to 10. This makes it possible to keep the cookie functions generic. Listing 4.7 is a code fragment that takes the GetCookie() and SetCookie() functions and sets them up to work within Explorer.

Listing 4.7 Explorer Cookies

function GetCookie(name) {
   if(navigator.appName.indexOf("Netscape") != -1) {
      // GetCookie manipulation code
   } else {
      return eval(name);
   }
}

function SetCookie(name, value) {
   if(navigator.appName.indexOf("Netscape") != -1) {
      // SetCookie manipulation code
   } else {
      eval(name + " = '" + value + "'");
   }
}

NOTE

It's important to point out that this technique works only if you've centralized your source code into a top-level frame. Once a page is unloaded, all the "cookie" data is lost. However, if you've located your JavaScript code within the parent document of your site, this trick has the same effect as creating cookies that last only for the duration of the user's browsing session-except it's limited to your site instead of the entire browser session.

From Here…

This chapter presents a brief introduction to cookies, the mechanism by which you can store client-specific data on a user's computer. They can be helpful for many things, such as online ordering systems. An online ordering system could be developed using cookies that would remember what a person wants to buy-this way, if a person spends three hours ordering CDs at your site and suddenly has to get off the Internet, he or she could quit the browser and return weeks or even years later and still have those items in his or her shopping basket.

Site personalization is another use for cookies. This is one of the coolest uses. Suppose a person comes to your site but doesn't want to see any banner advertisements. You could allow him or her to select this as an option and from then on until the cookie expires, he or she wouldn't see them.

Also, using a cookie in conjunction with a server-side script to store the information for tallying to track the number of visits (or hits) to your site, or number of times a single person visits.

For more information on related topics, check out:

Chapter 5 "Creating Personalized Home Pages," where you learn to design a home page using cookies.
Chapter 19, "Shopping Cart," examines how to retain product information as visitors "shop."

Chapter 5 Creating Personalized Home Pages

Chapter 5 -- Creating Personalized Home Pages

Chapter 5 Creating Personalized Home Pages

CONTENTS

There's No Place Like Home
Constructing Cookies
Homemade Perl
Enhancing Virtual Home Pages
From Here…

Everyone wants to carve out their own little niche on the World Wide Web, and accomplishing this is becoming easier every day. It started with service providers that offered space on their systems and an appropriately configured Web server for users to store Web pages, but that still required the user to do most of the scripting and construction.

One new twist on the personal home page is becoming more popular, especially with major sites like Microsoft, Netscape, and even Yahoo-a page that resides on their site that you can customize. Each person who visits that site will see a different page, depending on the options they chose.

There's No Place Like Home

If you look around the Web these days, you'll find many major services, such as Microsoft and Yahoo, that are offering you the option to create your own "home page" that's "hosted" on their systems. Unlike having an account on a provider, these home pages have a set number of configuration options, and as far as the server is concerned, they don't take up much physical disk space. In order to understand how this works, it's first necessary to look at what makes up a typical "home page."

When you stop and think about it, in the simplest sense, home pages are really nothing more than collections of information, with what information being decided by the home page owner. Surf the Web a bit and look at some of the various home pages out there, then sit down and try to figure out what they all have in common. You'll probably come up with a list similar to this:

Page characteristics-like background color, text color, and so on.
Some sort of "welcome" message at the top.
Various little interesting bits of information (at least, information the owner deems interesting).
A collection of links the owner likes.

When you create your own home page, you have to write the HTML, find the graphics, and upload everything to your provider. But, if you make some arbitrary decisions about the content of a home page, you can reduce all this work to selecting various options from one or more lists. Effectively, you turn home page creation into an act of configuration instead of construction. Apart from storing the various parameters mentioned, all you need to add is a means for the user to customize or change his or her own configuration.

Constructing Cookies

One of the nice things about basing your home page generator entirely on cookies is that you can create the entire interface through JavaScript and you don't have to require Server-side (Perl) access. In essence, all the parameters defining the page are stored as cookies, and the browser reads this data through JavaScript and dynamically builds the page each time it loads.

Before exploring the process involved, there are some limitations imposed by Netscape's cookie specification that need to be reviewed.

Limitations

Netscape's proposal for cookies places several limitations on the implementation. The specification states that at most there only needs to be:

300 cookies total stored in the cookie file (that's across all pages and domains).
20 cookies from any given domain (no matter how many pages may come from that).

With a little creativity, however, you can get around (or, at least, minimize) these limitations.

Custom Domain One way to circumvent the maximum cookie count limitation is to create a custom subdomain by dedicating a server and unique IP address within your primary domain. For example, if your domain is:

myplace.com

dedicating a subdomain:

home.myplace.com

just for your cookie-controlled home pages gives you the maximum number of cookies possible (20) for your domain.

NOTE

If you create a custom subdomain, you must specify the complete domain when you create your cookies. Otherwise, the default behavior is to use the primary domain, which lumps your cookies together with all the others for that domain.

If you use a custom URL (or, even if you don't and simply use a special subdirectory instead), you can use one additional trick to make remembering the URL even simpler. Most servers are configured to retrieve a file with a particular name with only the domain and path given in the URL. This is called indexing, and it means that if the URL entered into the browser was:

http://home.myplace.com/

The document that is returned is (more than likely):

http://home.myplace.com/index.html

If "index.html" is the dynamic home page construction document, each user that logs in is presented with his or her own page; hence, no two surfers see the same page.

NOTE

Not all servers specify index.html as the default index file. Other popular names are: default.htm, index.htm, and index.shtml. Consult your server documentation or check with your provider to find out what the file name is for your system.

What if There's No Index File?

If the specified index file isn't found, or if the server isn't configured with default index names, the server will instead return a list of all the files within the specified directory, with each file configured as a hyperlink for opening or downloading. This can be a problem if you don't want to give users a look at the underlying structure of your site.

This is an important point to remember: If your server is configured to index, it will do it for any directory. This means that if you keep your graphics in a subdirectory (like images or gfx) and a user figures this out, which is possible by examining the code of your document, he or she can easily try to load:

http://home.myplace.com/images/

and will be presented with a list of all your graphics and any subdirectories below that as well. For the Web purist, this is an ugly option, and one that can easily be prevented.

One way to keep people from poking where you don't want them to is to place a small HTML file with the appropriate index name in each directory (except those directories where you're already providing such files) that consists of the following HTML code:

<HTML><HEAD></HEAD><BODY></BODY></HTML>

This effectively blanks the page, preventing the user from seeing what's there. However, this still indicates to a savvy user that the path he or she entered does exist; some people are just too curious, so a more drastic measure would be to have the file redirect the user back to your home page. You can do this by using the redirection techniques from chapter 1, "Browser Identification." This way, if they bounce to a subdirectory you don't want them in, you kick them back to where you do want them.

Understanding "Cookie Stacking" Each individual cookie has a limit of its own. The cookie can't exceed 4K (4096 bytes) in size, including the name and the other information combined. This can be used to your advantage by squeezing more than one piece of data, or more than one record field, into the cookie. Just be careful not to exceed the 4096-byte barrier, as any data beyond that point will be truncated.

Overcoming Cookie Limitations Whenever one of the cookie limitations is hit, whether it's the number of cookies/domain or number of cookies total, the oldest cookies in the cookie file are deleted as needed to make space for new cookies. Therefore, the "fresher" your cookies are, the less chance they have of being thrown away.

One easy way to ensure freshness is to reset your cookies each time the user visits his or her home page.

Implementing the Design

With the limitations and workarounds covered thus far in mind, it's time to start designing the "virtual home page." The first step is to get a general idea of what the page will look like, and the easiest way to do that is to write out the page in straight HTML first, then convert it to JavaScript.

Listing 5.1 shows the HTML that creates the simple home page shown in figure 5.1.

Figure 5.1 : Building a dynamic home page is easiest if you first design the page with straight HTML then rewrite the dynamic parts into JavaScript. This method of "lay it out first, then code it," also works for developing Perl-based dynamic pages.

Listing 5.1 A Simple Home Page

<HTML>
<HEAD>
   <TITLE>Welcome to Scotty's Place</TITLE>
</HEAD>
<BODY BGCOLOR=#ffffff>

<CENTER>
   <H1>Welcome to Scotty's Place</H1>
</CENTER>

<HR>

<H2>Favorite Hang-outs:</H2>

<UL>
   <LI><A HREF="http://www.microsoft.com/">Microsoft</A></LI>
   <LI><A HREF="http://home.netscape.com/">Netscape</A></LI>
   <LI><A HREF="http://www.cnet.com/">C|Net Central</A></LI>
   <LI><A HREF="http://www.shareware.com/">Shareware.com</A></LI>
</UL>

<HR>

<A HREF="mailto:sjwalter@visi.com">Send Scotty email</A>

</BODY>
</HTML>

Looking at the HTML, you'll see that there are several different places for customization:

The user's name in the <TITLE> tag
One or more favorite links
The user's e-mail address at the bottom of the page

From this list, you can generate an HTML table/form structure that retrieves the data from the user and saves it as cookie information. The trick is that this table needs to be in the same physical file as the user's actual home page. In other words, the virtual home page document does double duty:

If the page hasn't been "configured," meaning no cookies are present, a default configuration page is displayed.
If the page has been configured (and cookies exist), the actual home page is shown.

This requires a little dynamic program control, and is best handled by structuring your page so that virtually all of the HTML is written by JavaScript. Listing 5.2 demonstrates the basic structure of the page. Note that listing 5.2 uses the GetCookie() function that was introduced in chapter 4, "Saving Configurations with Cookies."

Listing 5.2 Using JavaScript to Conditionally Display a Page

<HTML>
<HEAD>
<SCRIPT LANGUAGE="JavaScript">
<!--
...
if (GetCookie("Title") != null) {
   HomePage();
} else {
   ConfigurationPage();
}
// -->
</SCRIPT>
</HEAD>
</HTML>

The HomePage() and ConfigurationPage() functions do the dirty work of formatting the HTML code for the two possible HTML "documents" this page will create. Because the basic structure of the home page has already been presented (back in listing 5.1), the HomePage() function will be looked at first. This also helps to define what the various cookies are going to be called, which is necessary information when coding the ConfigurationPage() function.

The HomePage() Function The HomePage() function, as shown in listing 5.3, pulls the various information from the cookies stored with the document and translates the data into a personal home page.

Listing 5.3 The HomePage() Function

function HomePage() {
   var tStr = "<HEAD>\n" + 
              GetCookie("Title") + "</TITLE>" +
              "</HEAD>" +
              "<BODY>" +
              "<H1>" + GetCookie("Title") + "</H1>" +
              "<HR><UL>";

   for(var i=1; i<=3; i++) {
      tStr += "<LI>" + '<A HREF="' +
              GetCookie("URL" + i) + '">' +
              GetCookie("Link" + i) + '</A></LI>';
   }

   tStr += "</UL>" +
           "<HR>" +
           '<A HREF="mailto:' + GetCookie("Email") +
           '">Send me email</A>';

   document.write(tStr);
   document.write("</BODY></HTML>");
}

NOTE

One thing to keep in mind when building dynamic HTML like this: If you have any attributes within your tags that have "quoted" data (the value of the attribute being enclosed in quotation marks), you should probably use single quotes (') around the JavaScript string and double quotes (") around the value data.

The ConfigurationPage() Function For cookie data to be available to the HomePage() function, you need to have the user enter the information and "configure" the page. This is handled by the ConfigurationPage() function (listing 5.4).

Listing 5.4 The ConfigurationPage() Function

function ConfigurationPage() {
   var tStr = '<FORM>'
            + '   Title <INPUT TYPE=TEXT NAME="Title" SIZE=40>'
            + '   <P>'
            + '   Email <INPUT TYPE=TEXT NAME="Email" SIZE=40>'
            + '   <P>';

   for(var i=1; i<=3; i++) {
      tStr += 'Link #' + i +
              '<INPUT TYPE=TEXT NAME="Link' + i + 
              '" SIZE=30><BR>' +
              'URL  <INPUT TYPE=TEXT NAME="URL' + i +
              '" SIZE=40><P>';
   }

   tStr += '<INPUT TYPE=BUTTON VALUE="Configure"' +
           'ONCLICK="Configure(this.form)">' +
           '<INPUT TYPE=RESET  VALUE="Reset">' +
           '</FORM>';

   document.write("<HTML><HEAD>" + 
                  "<TITLE>Configuration</TITLE>" +
                  "<HEAD><BODY>");
   document.write("<H2>Configure your page</H2>");
   document.write(tStr);
   document.write("</BODY></HTML>");
}

All this function really does is build (through JavaScript) a document that contains an HTML form with all the fields necessary to create the appropriate cookies. Because this information isn't being sent to the server, you don't need to specify an ACTION attribute or a SUBMIT button; however, you do need to utilize the onClick event of a button on the form to fire another JavaScript function that actually sets the cookie data. This function, Configure(), is shown in listing 5.5.

Listing 5.5 The Configure() function

function Configure(form) {
   SetCookie("Title", form.Title.value);
   SetCookie("Email", form.Email.value);

   for(var i=1; i<=3; i++) {
      var tTmp = eval("form.Link" + i + ".value")
               + "*"
               + eval("form.URL" + i + ".value");

      SetCookie("Link" + i, tTmp);
   }

   window.location.href = "index.htm";
}

Adding a Little Personality

You can give your virtual home page builder a little personality (or attitude, if you will) by using JavaScript to dynamically change the page each time the user drops by. One way to accomplish this is to take advantage of the JavaScript Date object to dynamically display a message to the user depending on the time of day he or she stops by. Listing 5.6 demonstrates this simple enhancement.

Listing 5.6 A Time-Varying Display

d = new Date();
hour = d.getHours();

if(hour < 5) {
   document.write("Doing a little late-night surfing, eh?");
} else if(hour < 6) {
   document.write("Up early, I see!  Do you have your coffee?");
} else if(hour < 12) {
   document.write("Good morning!");
} else if(hour < 18) {
   document.write("Good afternoon!");
} else {
   document.write("Good evening!");
}

Homemade Perl

The cookie-based home page has the beauty of not requiring any fancy server-side manipulation because the entire process can be done within the browser through JavaScript. However, as you've seen, the limitations of the cookie specification, coupled with the paranoia about cookies that has spread through the Web community recently, make such an implementation only practical for a couple of reasons:

Casual surfers-those who don't visit too many sites.
A company Intranet where surfing is restricted to the company server, as in a site that uses the browser as a server front-end application for employees.

If you want to make your home-page service available to the general public, a means is needed to store more information than can be kept in cookies and store it more permanently. This means Perl and a little server-side database work.

The Perl version of the page, available on the CD-ROM, isn't much different from the JavaScript version, except that instead of storing everything as a collection of cookies, it only stores one cookie: a unique file name that corresponds with a database file on the server containing the user's configuration information.

Enhancing Virtual Home Pages

The virtual home page designs presented here are simple, yet they should give you some ideas as to how to easily expand on their structure.

Here are some hints:

Extend the database to allow the user to specify a background image, perhaps from a collection of textures you keep on your site.

Let the user embed graphics within the page from a collection of images you provide.

Add a "quote of the day" or similar option, where you pull data at the time the user loads the page from another database and add it to the page.

Add the user's birthdate to the database, and have his or her home page wish him or her "Happy Birthday."

Extend the database to hold important dates and reminders.

From Here…

This chapter took the tools from chapter 4, "Saving Configurations with Cookies," and extended them to create a "no-memory required" home page for your visitors.

For additional tips and tricks using these techniques, check out:

Chapter 15, "Managing a Database," demonstrates how to create and maintain a database.

WEB SCRIPTING SECRET WEAPONS...

Chapter 1 Client-Side Scripting

Chapter 1

Browser Identification

thebestwebmaster.blogspot.com

Chapter 2 Plug-In Identification

Chapter 2

Plug-In Identification

Chapter 3 Tracking Hit Counts

Chapter 3

Tracking Hit Counts

The Best Web Master was here...

Chapter 4 Saving Configurations with Cookies

Chapter 4

Saving Configurations with Cookies

Web Master Was here..

Chapter 5 Creating Personalized Home Pages

Chapter 5

Creating Personalized Home Pages

Web Master...

rate of exchange...

The Best WebMaster was Here...

pr code

<--Web Programming Desktop Reference-->

Hakkımda