Next Previous Contents

7. Webclient Request File Format

The list of URL's that webclient will fetch are read from an input or request file. These requests must include a checksum to validate the returned data, and may also be interspersed with a variety of directives controlling the think times, the request header and body, and the way in which summary statistics are generated. Several examples of an input file are show in a previous section.

7.1 Request File Basics

The request file is formatted with a quasi-XML-like markup language. It normally starts with a <<START>> tag, and ends with an <<END>> tag. In between, each line specifies an HTTP request, indicates any additional POST data, tells how often the request needs to be run, whether or not the request is part of a group block that needs to run together, and the checksum information. Lines beginning with the hash # sign are treated as comments.

The collection of requests between <<START>> and <<END>> is termed a "session", and is meant to model a typical user session, from logon, to doing some work, to logging off. When a session is played back with webclient, the requests are processed in the sequence specified in the file.

A session may be run multiple times. In order to simulate a real user's pauses between between web pages, the playback can be adjusted so that there is a pause between requests. The pause can be specified to be (exponentially) random or a fixed amount of time.

The basic elements of the syntax are described below:

<<START>>

Indicates the start of a session description. All URL's up to the <<END>> input line are read and saved and submitted as a session. Currently, only one such tag may appear in a request file.

<<END>> count think_time

Denotes the end of a session description. It is followed by two parameters:

count: integer, the number of times that the session will be replayed. Note that this value can be overridden with the --repeat-count command-line flag.

think_time: floating point, number of seconds. If zero, then webclient will play back the URI's in as rapid succession as possible. If negative, then webclient will wait exactly -think_time seconds between each URI. If positive, then webclient will wait a random, exponentially-distributed amount of time between each URI. The average distribution of think times will equal think_time. Note that this value can be overridden with the command-line --think-time option (See the Think Time Distributions section).

Currently, only one such tag may appear in a request file.

GET url fraction checksuminfo

POST url body fraction checksuminfo

Requests may be formatted into a single input line each, as shown here. Alternately, the data, fraction and checksum info can be broken up into multiple lines by using the <<REQUEST>> etc. directives below. Depending on the style and nature of the session, the single-line approach may be simpler and easier to read than the multi-line approach; this is left as a matter of taste.

If the single-line approach is used, then each request must really be a single input line. If the url or data gets long -- don't insert a CR or NL to split the line -- the CR/NL will be interpreted as white space and the url will end at that point. If the url includes a query string (data after the ?) you must include it as part of the url without spaces.

The fraction and checksuminfo fields have the same meaning as those described below. Besides just GET and POST, and valid HTTP method may be specified, including OPTIONS, HEAD, PUT, DELETE, TRACE and CONNECT. Note that the POST form must have the post body appearing as shown in the second form.

<<REQUEST>> method url

Request the indicated url using method from the web server. The method should be a valid HTTP method; viz. one of GET, HEAD, POST, etc. The url may be a fully qualified URI (such as http://server.com/some/page.html) or a path fragment (such as /some/page.html). The former form allows a session to stretch across multiple servers; webclient will resolve and address each server in turn. If these two types of requests are intermixed, the most recently specified server will be used for the fragmentary requests. Alternately an initial web server can be specified with the -w or --webserver flag (this initial server will be over-ridden if/when a fully-qualified URL appears in the input file).

Note that POST requests should be followed by a <<BODY>> tag specifying the body of the POST request.

<<BODY>> <</BODY>>

<<POSTDATA>> <</POSTDATA>>

These two tags are synonyms. They are used to specify an HTTP body that is sent to the webserver along with the HTTP header. The end of the body should be marked with a <</BODY>> or <</POSTDATA>> tag appearing on a new line. This tag allows webclient to emulate not only the use of HTML forms, but also to support non-HTML-based HTTP protocols, such as OFX.

<<BODYFILE>> filename

The body to a POST may be specified out-of-line, as a separate file. The indicated file should contain the text of the POST body. This form is especially convenient when working with HTTP protocols that have large bodies (such as OFX), or the bodies are available through an independent test suite (such as OFX).

<<COUNT>> fraction

The fraction controls randomization of the session. If the fraction is 1.0, then this URL request is submitted as part of every session that is run. If the fraction is between 0.0 and 1.0, then that value indicates the fraction of sessions for which this request will be run. For example, to run the request on 50% of all sessions, use a fraction of 0.50. For each time through the session, webclient will generate a random number to determine if the request should be run, using fraction as the probability.

Values greater than one are interpreted as run counts. Thus, a value of 1.5 will be interpreted to mean that the request should be run at least once, and possibly twice. (Note, however, that the summary timing statistics report might not be generated in the form you expect when using values greater than than 1.0. This may change in future versions.)

A block of repeated URL's may be specified by using a negative number for the fraction. This usage is discussed in greater detail below.

<<CKSUM>> int int int int int

Specifies a checksum against which the returned page will be validated. This checksum is normally computed by webmon when a session is being recorded; the checksum can also be recomputed by specifying the -v flag. To disable the use of checksum validation for the current request, specify a -1 for the first integer.

<<HEADER>> <</HEADER>>

Text in between the <<HEADER>> and the <</HEADER>> tags will be used to form the HTTP header. This header will remain in effect for the current and subsequent requests until a new header is specified (with either the <<HEADER>> or the <<HEADERFILE>> directive).

<<HEADERFILE>> filename

Specify a file that contains the request header to use. This tag performs the same function as the --header-file flag, except that it applies on a per-url basis. For greater detail, review the section on Header Substitution. The specified header will remain in effect for the current and subsequent requests until a new header is specified (with either the <<HEADER>> or the <<HEADERFILE>> directive).

<<THINK>>

<<THINK>> seconds

Specifies that a pause should occur between this and the next request. Used to emulate a user pausing to "think" between page fetches. If the number of seconds is specified, then the pause will be for that length; otherwise, the default think-time (specified on the <<END>> tag or with the --think-time flag) will be used. Note that, as elsewhere, times specified as a negative number denote a fixed think time, while those specified with a positive number denote an average for a random exponential distribution.

<<MARK>>

Specifies that performance statistics should be rolled up between this and the previous <<MARK>> and reported as a unit. This is useful when a single page view might appear as multiple URL's in the request file, and means and deviations are wanted for the combination.

7.2 Specifying Blocks of Requests

Oftentimes, a number of URL requests need to be run as a group, with the same random choice being made for all members of the group. For example, suppose you want to run the "purchase product" transaction in 35% of all sessions. Well, in a real session, the user wouldn't be able to jump into the middle of the set of web pages that form the transaction. Instead, they must first pull up a services page, then select a service/product, then fill out a form, then verify their order before submission, etc. This sequence of pages must be treated as a block in order for them to make sense. If the first page of the block is randomly chosen to be run, then the whole block will be run. If the first page is randomly rejected, then none of the block is to be run.

Members of a block can be indicated by specifying the fraction -1.0. If the fraction is negative, then that web page will be treated as part of a block that begins with the most recent non-negative URI. Thus, to group a series of pages together, you do something like the following:


     GET /page1 0.75
     GET /page2 -1.0
     GET /page3 -1.0
     GET /page4 -1.0

This will cause pages 1-4 to be run on 75% of all sessions. Note that the group ends with the next page with a non-negative value for the fraction.

(Extensions are planned for nested hierarchical blocks but have not been implemented).

7.3 Custom Think Times

Think times are used to emulate a user pausing between requests to "think" about what they are doing before issuing the next request. webclient has a number of ways of specifying the think time that should be applied between requests.

By default, the think time specified on the <<END>> line applies to each gap between input URL's. That is, there will be a pause between each URL before it is issued. This value can be overridden by specifying the --think-time flag; but again, the value specified applies to each gap. The location of the gaps, and the length of the gaps can be overridden by using the <<THINK>> directive.

The think-times used can be fixed or randomly generated. When the think-time is randomly generated, it is done so with an exponential distribution whose average is the specified time. The exponential distribution provides a more realistic model of actual human behavior, with some pauses taking longer than others, but all tending towards a mean. In all cases where a think time is specified to webclient, a positive number implies that the exponential distribution should be used, and a negative number implies that a fixed length of time should be used. A gaussian distribution may also be specified. Further details are presented in the Think Time Distributions section.

For example, the input file


<<START>>
GET /pageone.html
GET /pagetwo.html
<<THINK>>
GET /pagethree.html
GET /pagefour.html
GET /more.html
<<THINK>>
GET /another.html
GET /andmore.html
<<THINK>>
GET /last.html
<<THINK>>
<<END>> 50 8.3

will result in no pause between the fetch of /pageone.html and /pagetwo.html, and a think-time of 8.3 seconds between the fetch of /pagetwo.html and /pagethree.html. That is, the think-time will be non-zero only where the <<THINK>> directive appears.

Optionally, independent think times can be specified, like so:


<<START>>
GET /pageone.html
GET /pagetwo.html
<<THINK>>  5.2
GET /pagethree.html
GET /pagefour.html
GET /more.html
<<THINK>>  
GET /another.html
GET /andmore.html
<<THINK>>  49.0 
GET /last.html
<<END>> 50 8.3

The first pause will last 5.2 seconds, the second will last 8.3 seconds (the default), and the last pause will last 49 seconds. In this example, there is no pause between the fetch of /last.html and /pageone.html in the next go-around. A final think is needed if you want to avoid immediately going back to the beginning.

If the keyword <<THINK>> never appears in the file, then the default think-time will be applied between each and every url. Thus, the input file


<<START>>
GET /pageone.html
GET /pagetwo.html
GET /pagethree.html
<<END>> 50 8.3

is identical to


<<START>>
GET /pageone.html
<<THINK>> 8.3
GET /pagetwo.html
<<THINK>> 8.3
GET /pagethree.html
<<THINK>> 8.3
<<END>> 50 8.3

Additional flexibility is provided by the --think-time flag. If this flag is used, it overrides the default value specified with the <<END>> tag.


Next Previous Contents