Next Previous Contents

6. Webclient Command Line Flags

This section reviews some of the features controlled by command line flags.

6.1 Cookie Handling

There are two types of "cookies" in common usage on the web. The first kind follow the HTTP Cookie specification, and are sent as part of the HTTP header. This section will discuss this type of cookie. Another type of "cookie" is a string that is embedded in the URL itself, and is passed from server to client by embedding it directly into the body of a web page (usually in some url-encoded form). This second type of cookie is also supported by webclient, and the mechanisms for dealing with it are discussed in further detail in the section URL-embedded State below.

webclient will automatically accept and cache any and all cookies returned by the server. The cookies will then be handled following the usual cookie semantics for a browser: if path names match, then the cookie will be returned to the server. webclient does *not* age cookies, and thus, they will not expire from the cache in that fashion. Also, webclient does not maintain a persistent store of cookies: once webclient exits, any cookies it had are lost.

In order to make webclient a more realistic multi-user stress tool, it will flush the cookie cache at the end of a session. That is, each new session is started with an empty cookie cache, simulating new user with a recently restarted browser.

In order to help verify correct operation, webclient can be made to check for the presence of cookies on certain paths, and to print an error and exit if the server did not return a cookie for that path. The --cookie-path flag can be specified any number of times to add a path to the error checking code.

Some web and application servers refer to a state maintenance technique called url-encoding in connection to a discussion of cookies. Note that url-encoding does not use cookies in the sense in which the HTTP spec implies; rather, the server embeds unique, long strings directly into the urls in the body of the web page. These long strings are used by the server to provide a cookie-like function. webclient provides support for these types of "url-cookies", and is able to track them with the --handle flag described in the section URL-embedded State below.

By default, in order to maintain backwards compatibility, webclient will check for the presence of a cookie on the path /proclogin.ns. This is the same as specifying the flag --cookie-path=/proclogin.ns If any cookie path is explicitly specified, then the default /proclogin.ns is not set.

6.2 GIF Fetching

The -g and the -c flags enable the fetching and caching of images. webclient is able to scan a web page and fetch any images that it finds embedded in the page. It does so by scanning the returned page for references of the form IMG SRC= and extracting and fetching the specified URL. It uses a fairly sophisticated pattern-matching algorithm to find the URL, and is able to pick its way through some more obtuse quotation mark and white-space combinations, such as those that might occur in JavaScript. Note, however, that webclient does not provide a JavaScript interpreter, and that therefore it can get confused by more complex image-fetching JavaScript applets. It does not support images fetched with client-side Java applets. Images are fetched only if the -g option is set; by default, image-fetching is disabled.

Emulation of a browser's gif-cache is supported with the -c flag. That is, if webclient notices that it has previously fetched a given gif url this session, it will not fetch that url again. The result is that the number of gif files fetched by webclient should match the number of gif files fetched by the browser during an entire session, assuming that the gif cache was empty when the user requested the server's logon page. If the -c option is not specified, every gif is fetched every time the page is requested.

By default, webclient uses four threads and the HTTP/1.1 Persistant Connection protocol for fetching gif files in parallel over four sockets. The number of threads and the protocol used can be changed as explained below.

Note:

  1. The gif cache is cleared every time a session replay starts. (e. g. when the logon request is issued).
  2. gifs are fetched on subsequent requests if they have not yet been fetched this session replay.
  3. All open sockets are closed when the session ends. This helps maintain the appearance that each session comes from a different web browser.

6.3 HTTP/1.0, HTTP/1.1, KeepAlive and Multi-Threading

By default, webclient uses four threads and the HTTP/1.1 Persistant Connection protocol for fetching gif files in parallel over four sockets. This behaviour can be modified with three flags: --no-keep-alive, --num-threads=nnn and --http-version=1.x

By default, the HTTP/1.1 protocol specifies that Persistant Connections are to be used when a browser talks to the web server. What this means is that once the browser has opened a socket to the server, it keeps that socket open for further URL requests. This helps eliminate the overhead of negotiating a new socket for each request. By default, webclient does the same, in order to better emulate a real web user. However, this behaviour can be disabled by specifying the --no-keep-alive flag. This flag causes the Connection: Close header field to be added to the HTTP header, and the socket to be closed after all of the data has been received.

The defacto industry-standard Netscape extensions to the HTTP/1.0 protocol had a similar concept, called Keep-Alive. webclient can be made to use this protocol by using the --http-protocol=1.0 flag. Currently, there are only two valid values that this flag can take: HTTP/1.0 and HTTP/1.1. By specifying HTTP/1.0, webclient will try to use Keep-Alive by including the header field Connection: KeepAlive with each request (and keeping the socket open). This can again be disabled by using the --no-keep-alive flag.

To further improve performance, browsers open a number of sockets to the web server for fetching gifs in parallel. The default number of sockets is four for both Netscape(TM) Navigator and Microsoft(TM) Internet Explorer, although users can adjust this value from the control panel or preferences dialog. To emulate this behaviour, webclient maintains a pool of four threads for gif fetching. Each thread handles the i/o on one socket. The number of threads (and thus the number of sockets) that are used can be changed with the --num-threads=nnn flag.

Note that once webclient has opened a socket to the server, it will keep it open indefinitely (as long as the --no-keep-alive flag wasn't sepcified). However, webservers have only a limited pool of connections, and busy webservers will routinely close the socket on unsuspecting browsers. webclient does notice when this occurs, and keeps statistics on how often it was able to reuse and open socket, and how often an open socket was unexpectedly closed by the server. These stats are printed as part of the normal stats output.

Note:

  1. All open sockets are closed at the end of the session. Sockets are not kept alive across sessions. This helps maintain the appearance that each session comes from a different web browser.

6.4 Substitution and Re-Writing

webclient supports a number of substitution and re-writing modes. These include:

Each of these are discussed in greater detail below.

6.5 Header Modification and Key-Value Substitution

The HTTP headers generated and sent by webclient can be fully customized and rewritten. By default, webclient sends a simple, basic HTTP header. A fully customized header can be specified with the --header-file flag, or alternately, the header can be placed in the input file, using the <<HEADER>> directive.

Whether or not a custom header has been specified, key-value pairs in the header can be substituted for or added to the header with the --header-subst and --header-add flags. These flags are particularly useful when creating multi-user scripts, where each running copy of webclient needs to send a slightly different header. In particular, this is needed in order to perform HTTP-style authentication.

The default header that webclient currently should resemble:


User-Agent: webclient/WebLoad v4.0beta3 (Linux OpenSSL 0.9)
Host: webby.com:80
Referer: webby.com/page.html
Accept: */* 
Accept-Language: en
Accept-Charset: iso8859-1, *, utf-8

The User-Agent value will reflect the current actual version of webclient. It can be modified with the -U flag described below, or by specifying a custom header. It can be omitted by using a custom header which does not contain it.

The Host value is automatically generated and updated by webclient depending on the server being contacted. If this tag is present in the header, then webclient will always update its value as appropriate. It can be omitted by using a custom header which does not contain it.

The Referer tag will be automatically added and updated based on the most recent URL that webclient had requested. There is currently no way to disable the presence or automatic update of this tag.

The --header-file Flag

A fully customized HTTP header can be specified with the --header-file command-line flag, for example: webclient --header-file=some.file.name This header will be used for all fetches, including the fetching of gifs. A typical header file might look like the following:


Accept: image/gif, image/x-bitmap, image/jpeg, image/png
Accept-Language: en
Pragma: no-cache
Authorization: Basic amFtZXM6amQpMrT=

Note that the header file should not contain the HTTP method (viz. GET, POST), this is handled separately. Note that the header file should not contain the body for a POST request, this is handled separately with the <<POSTDATA>> input file directive. The header file should not contain blank lines or comment lines. It will be parsed into key-value pairs which can be substituted for with the --header-subst and --header-add flags.

The --header-subst and --header-add Flags

Values in the HTTP header can be substituted for with the --header-subst flag. For example, webclient --header-subst="Accept-Language: fr" will change the value of the Accept-Language tag in the header to be fr. The substitution will only be made if the tag already appears in the header. If the tag does not appear, then the substitution will not be made.

The --header-add flag can be used to make a substitution for an existing value, or to add the tag-value pair if it is not already present.

6.6 Example: HTTP Authorization

Some web sites require authentication using the HTTP 401 response code in conjunction with the Authorization header field. That is, the web server will deny access to a web page unless the browser (webclient) supplied a field of the form


Authorization: Basic amFtZXM6amQpMrT=

in the header sent with the URL request. The string of seemingly random letters is an encoded username-password pair. Appropriate values for the encoded string can be gotten by using the webmon tool with tracing enabled. These values can be placed in the webclient request header file. Alternately, it might be more convenient to specify these on the command line, using the --header-add flag. This is particularly the case when multiple copies of webclient must run, each with it's own login. The following can be used to add the above line to the header:


webclient --header-add="Authorization: Basic amFtZXM6amQpMrT="

The difference between the --header-subst and the --header-add flags is that the former will make the substitution only if the key is already present in the header, whereas the latter will either substitute or will add the key-value pair if it is not present.

6.7 Example: Setting the User-Agent Tag

By default, webclient sets the User-Agent tag in HTTP headers sent to the server to webclient 4.0pre0 (Linux) or similar. However, some web servers (in particular, the Netscape Enterprise Server) check for the User-Agent type, and respond differently to different server types. Sometimes the differences are subtle, and yet they can change overall behavior dramatically: things like redirects, socket close semantics and returned headers can change, and sometimes even bugs will be exhibited for some cases but not others.

To get webclient to trick the webserver into behaving more appropriately, the -U flag can be used to change the value of the User-Agent field.

Before you start, you must figure out what the browser you are trying to impersonate is sending. To do this, use webmon with the -t (trace) option. Run a few requests and then stop webmon. Look in the trace file for a line that begins with User-Agent:. The string that follows this is the string that must be specified on the -U option. For example, with Netscape 4.04, under AIX, the string is:


User-Agent: Mozilla/4.04 [en] (X11; AIX 4.1; Nav)

You would then pass this flag to webclient as shown below. Note the use of the single quote marks to delimit the string. The quotes are needed whenever there is embedded whitespace in the string, and also to delimit shell special characters, such as "(".

(Note: The DOS shell under Windows95/98/NT cannot use quote marks to delimit a string. In order to prevent the embedded blanks from causing a problem, convert them to hash marks (# signs). webclient will automatically convert them back into spaces).


webclient -U 'Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)'

Some other User-Agent strings:


-U 'Mozilla/3.0 (Win95; I)'     Netscape Version 3 for Windows 95
-U 'Mozilla/3.04 (Win95; U)'    Netscape Version 3 for Windows 95
-U 'Mozilla/2.02 (OS/2; U)'     Netscape Version 2 for OS/2
-U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)'           NS for AIX
-U 'Mozilla/4.05 [en] (X11; U; Linux 2.0.32 i586)'      NS for Linux

Note that the -U flag is entirely equivalent to the longer, more verbose flag --header-add. The previous example is completely equivalent to the following:


webclient --header-add="User-Agent: Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)"

6.8 URL-embedded State (URL Cookies)

Some web-site designs embed customer-specific information into URL's as an alternative mechanism to "cookies" for maintaining state information. webclient can track this state information in an automated fashion, generating the appropriate URL's dynamically as it traverses a web site. There is a restriction: webclient assumes that the state information is url-encoded as a key-value pair in the URL.

This is best illustrated with an example. Suppose that when a user visits a website, the request the URL /cgi-bin/firstpage, and that the page that is issued in response to this contains the URL /cgi-bin/secondpage?this=that&token=qwertyuiop&up=down where the string qwertyuiop is generated dynamically and differs for every visitor to the site. Then webclient can be configured to track navigate this site by using an input file similar to the following:


GET /cgi-bin/firstpage
GET /cgi-bin/secondpage?this=that&token=xxx&up=down

and using the command line


webclient --handle=token

This will cause webclient to scan each web page it receives for new values of the key "token", and substitute for its value in any subsequent GET or POST requests, including POST data bodies. The particular value "xxx" used in the input file does not matter. Substitutions for multiple handles can be done by specifying as many --handle= flags as needed.

Note that if a token appears multiple times on the same page with different values, webclient will record only the last value that it finds on the page. This may not be the desired behavior in some cases. Note that the an ampersand (&), white space, a (single or double) quote-mark, or a right angle bracket (>) are assumed to delimit the end of the token.

6.9 Substituting in GET Requests and POST Bodies

The flag --substitute can be used to make generic substitutions in the request URI and in the POST body. Thus, for example, if the input file to webclient contains a URL of the form GET /some/where/blort.html, and the client is started as webclient --substitute=blort:page001 then the actual URL that will be requested will be /some/where/page001.html.

This substitution mechanism allows webclient to be used in perl and shell scripts, where different urls need to be fetched by different clients, but maintaining dozens or hundreds of client specific URL files is not desired. Typically, this flag is used to substitute for user-names and passwords (see below). Substitutions are carried out in both the URL's and the POST bodies. As many --substitute flags can be specified as needed.

6.10 Example: Substitution for <<USER>>, <<PIN>>, and <<PASSWD>>

When benchmarking password-protected web sites, each copy of webclient will typically need to use its own username/password pair. Authentication by web sites is usually handled in one of two different ways: either by using the HTTP Authorization mechanism or by embedding the username and password into the request or post data. The former approach was discussed above; the latter approach can be handled with the -u flag. Rather than creating a unique input file for each client, with a username/password hard-coded into the input file, a substitution can be performed. Thus, for example, if the input file contains the request


GET /path/to/cgi?login=<<USER>>&idcode=<<PIN>>&pwd=<<PASSWD>>

and you wanted to substitute the values linas, 1234 and r00tp4ssw0rd for the login, idcode and pwd, you could specify


webclient --substitute=<<USER>>:linas  \
          --substitute=<<PIN>>:1234
          --substitute=<<PASSWD>>:r00tp4ssw0rd

on the command line. Alternately, you can use the abbreviated form with the -u flag, by merely specifying


webclient -u linas:1234:r00tp4ssw0rd

6.11 Error Detection and Reporting

webclient contains a number of facilities to simplify error detection and reporting. Some of these are described below.

6.12 Clean Exit On Error

Some web site designs prevent a user from logging on more than once at the same time. There are a variety of reasons to design a web site in this way, and many websites enforce this. When using webclient to access such a site, it becomes desirable to log the user off in the case of an error, so that the user is not blocked from making future logins.

Note that simply logging off by running webclient a second time may not be an option because websites that enforce logins usually use cookies to keep track of the user. That is, a user cannot log-off unless they also present the right cookie. When webclient exits for any reason, the current cookie(s) are lost, and thus it can become impossible to log-off after webclient has exited. In order to work in this environment, a log-off script can be specified with the --clean-exit flag.

In the case of an error, or if it is interrupted, webclient can be made to send a series of HTTP requests by using the --clean-exit flag to specify a file containing the HTTP requests to run. The format for the clean-exit file is the same as the input file. Errors that can trigger a clean exit include any unexpected HTTP errors (such as 304 Not Found, 500 Server Error, etc), timeouts (due to the use of the -A flag), or an interrupt (ctrl-C from the terminal or SIGUSR1, or SIGINT from a controlling shell script). Note that this last usage simplifies the management of multiple copies of webclient via controlling scripts.

6.13 Page Validation with Check Sums

Webclient is designed to check the validity of the data that is returned for a particular request by calculating a check sum for that page. It then compares the check sum to the one that is stored in the session request file. If the check sum does not match, then webclient assumes that an error has occurred. Checksum mismatches normally cause webclient to print a detailed error message and trace information, and then stop. If instead, you want it to continue, and just print a warning message, then specify the --warn-checksums option.

However, checksums can be troublesome when a web page includes variable, changing data, such as the current date or time, or a rotating banner advertisement, or other data that changes daily and/or every time the web page is fetched.

To work around pesky checksum pages, validation can be disabled in one of two ways: one a per-URL basis, and for the entire run. Validation can be disabled on a per-url basis simply by editing the input file, and setting the checksum value equal to "-1". This will cause validation for that page to be skipped. Validation of checksums for the entire run can be disabled by specifying the -i option to webclient. In general, it is important not to disable checksums globally, since if you do, the server could return completely bogus data and you will never find out that you are timing a bogus page.

The HTTP header is not included in the checksum calculation; therefore variations in the header due to cookies, expiration date pragmas, or server versions will not affect the checksum.

If the web pages are changing only infrequently, the -v flag can be used to recompute the check sums, and output a new session file with the new checksums in it. Alternately, the -v flag can be used to create checksums for a request file that does not already have them. (In normal operation, the session file will have been created by webmon, and webmon will have computed and written out the appropriate checksum. This is the preferred mode of operation, as the correctness of the web page can be visually inspected with webmon.)

6.14 Think Time Distributions

Webclient supports the concept of 'think time' in order to better simulate multi-user loads on a server. The think time is the amount of time that webclient pauses between URL requests, simulating a user who has stopped to read a web page before clicking on the next hyperlink. The think time may be specified either in the input file, or with the --think-time=<float> flag. The <float> parameter specifes the time, in seconds, as a floating point number.

Think-times that are fixed or are randomly distributed may be specified. By default, a exponentially random distribution is used, although a gaussian or a fixed distribution may also be specified. One of these mutually-exclusive distributions may be specified with the --think-fixed, --think-exponential or the --think-gaussian flags. The image below shows both distributions, for a mean think time of ten seconds.

The exponential distribution is given by

P(t;m) = (1/m) exp (-t/m)
where m is the mean. The standard deviation of the exponential distribution is m.

The gaussian distribution is given by

P(t;m) = 2 L t exp (- L t2)
where L = pi/ (4 m2), where pi = 3.14... and m is the mean. Note that the standard deviation is given by m sqrt (4/pi -1) = 0.5227... m.

The exponential distribution has been long accepted as an appropriate model for typing behaviour at a computer terminal keyboard. The gaussian distribution, with a small probability of a small think time, might more accurately describe web browser users.

6.15 Other Flags

The following flags are not documented above, but are still very important and useful:

Debugging & Tracing

-h, --help

Print a command summary and exit.

--version

Print webclient version info and exit.

-r, --report-file=<file>

Specify the file to which the webclient report will be written. What is written to the report file is nearly identical to what webclient writes to standard out, unless the --quiet flag has been specified.

--access-log=<file>

Write a webserver-style access log. The industry-standard logfile format is used. Note that the ip address written in the logfile is that of the server that was contacted. The result code is the result code that webclient received, and the length is the length (including the header) that was received. The timestamp that is printed is taken after the entire message has been received.

-t, --trace-file=<file>

Write a trace of all of the HTTP traffic to the indicated file.

--skip-stats

Don't collect or print summary performance statistics.

-e, --print-each

Print individual response time observations.

-x, --timestamps

Print request start and end timestamps.

--quiet

Minimize the number of messages written to standard out.

--show-progress

Write out each URL as it is fetched. Useful for visually inspecting the forward progress of webclient through a series of requests. Note that this can generate a lot of output on a fast system.

--no-bug-compat

Disable bug-compatability mode. Currently, this flag disables only one bug: the 'Content-Length-off-by-two' bug. In this bug, the Netscape browser will send a POST body with an appended CRLF, and then will set the Content-Length in the header two bytes shorter then the actual message. Unfortunately, some servers, notably Sun url-decoding Java Servlets depend on this incorrect length being set, generating parse errors or NullPointerExceptions if not. The correct HTTP protocol for the ContentLength is documented in RFC2616 Paragraph 4.4. Note that by default, bug-compatibility is enabled, and a warning message willl be printed whenever the bug occurs.

MultiUser Options

--num-sessions=<int>

Override number of times that the session will be run. Normally, the number of times that a session will be played is specified in the request input file. The value specified with this flag will override that value.

-W, --wait-interval=<seconds>

Pause after each session trial, before starting the next session. Normally, once a session has been completed, a new session is started immediately. You can use this flag to specify a delay between sessions. Alternately, you can specify a think-time after the last URL of the session (or before the first URL of the session), leading to the same effect.

-R, --random-seed=<seed>

Specify a seed value to be used with the random-number generator used to generate random think times. This flag is useful for getting repeatable think times and thus repeatable results.

--fork

Fork this process to run in the background after validating all of the command-line arguments. This is a handy feature for starting webclient from a shell script: if some obvious startup error occurs, the shell can deal the failing client in the foreground. Otherwise, once past the initial startup, the client will move to background, freeing the shell to start another client.

-m, --shmem=<child:shmkey>

Specify the common shared memory location for webclient to use. This flag is required when synchronizing multiple copies of webclient; it allows the ramp-up and statistics gathering phases to be appropriately synchronized, and allows some basic reporting back to the controlling program.


Next Previous Contents