Runtime configuration management
[EPIK utils]

In the former KOJAK system, the different components accessed their (optional) environment variables with different approaches. Several components used methods from the EPILOG module elg_env, e.g., elg_is_verbose() which returns the boolean value of the ELG_VERBOSE environment variable. Additionally, EARL and EXPERT had several environment variables which were handled inside the corresponding modules themselves, i.e., these modules used special methods which give back the value of the respective environment variables.

These environment variables could be used to override internal defaults, and thereby provide runtime configurability. While their use was optional, and KOJAK could (generally) be used without them, when it was desirable to set/review/update several values this approach became more awkward (and more limiting when additional configurability was desired).

Furthermore, during the transfer of the underlying approach to metacomputing environments two new environment variables EPK_MACHINE_ID and EPK_MACHINE_NAME were introduced into the measurement system, and additional runtime configurability was desired for EPIK and other components.

A generic mechanism for runtime configuration specification and management, including environment variable handling, has therefore been introduced. (Only environment variables specific to Scalasca/KOJAK are currently managed by this scheme, not standard environment variables such as PATH.) The introduced approach and the influence on other modules are described below.

Revised runtime configuration management

The introduced approach aims to provide generic management of the runtime configuration (including internal defaults and optional re-configuration via files and environment variables) used during measurement and analysis. A new EPIK module epk_conf and an optional configuration file ("EPIK.CONF") are introduced.

The configuration file contains key-value pairs and comments in the following format

[#] key = value [# comment]

The keys correspond to environment variables (both former KOJAK environment variables and new ones) and accepted values similarly correspond to those of the environment variables. As with environment variables, values can be defined in terms of previously defined values. Optional comments can be provided as documentation.

The module epk_conf internally maintains an array of data structures for different configuration variables and provides external methods to read their values. First, the epk_conf module contains a global array epk_vars. The EPK_CONF_VARS elements of this array are data structures of the type epk_data. Such a data structure epk_data features a key, a default value, a comment and a current value.

typedef struct {
    char* key;
    char* def;
    char* comment;
    char* val;
} epk_data;

Therefore, the array epk_vars has a separate entry of the type epk_data for each configuration variable. This entry can be accessed by a corresponding index which is defined in an enumeration similar to the following:

typedef enum {
    EPK_CONF,
    EPK_MACHINE_ID,
    EPK_MACHINE_NAME,
    EPK_CONF_VARS
} epk_type_i;
One possible epk_vars specification is shown in the following section:
static epk_data epk_vars [EPK_CONF_VARS]= {
    {"EPK_CONF", NULL,
        "E P I K configuration", (char*)EPK_CONF)},
    {"EPK_MACHINE_ID", "0",
        "Unique identifier of the Machine", (char*)EPK_MACHINE_ID},
    {"EPK_MACHINE_NAME", "Machine",
        "Define the name of the Machine", (char*)EPK_MACHINE_NAME},
}

Note that each epk_data entry value is initialized to its index (enumeration): this is checked the first time that the epk_vars are used to verify that the table entries and enumeration indices are consistent, before the values are set to their defaults. (On encountering an internal inconsistency, parsing and further execution is aborted.) Additionally, epk_data entries with NULL def values (e.g., EPK_CONF) are internal section placeholders, used to output the associated comment as a header between sections of variables when the configuration is output.

The module epk_conf also provides a method epk_get. This method returns the value of the configuration variable which corresponds to the index of the respective data structure epk_data within the array epk_vars.

extern const char *epk_get(int index);
epk_get(EPK_MACHINE_NAME) -> "Machine"

Thus, this method encapsulates access to the different configuration variables. In the following the underlying approach is described.

The first call to the method epk_get updates all entries contained in the data structure epk_vars in a generic manner. More precisely, for each data structure epk_data within the array epk_vars the corresponding value entry will be updated. Therefore several values have to be considered.

First, the value entry in each data structure epk_data is set to its corresponding default value. If a default value in the respective Makefile.defs is defined, this value will be used to update the value in the corresponding data structure epk_data within the array epk_vars: only EPK_MACHINE_NAME and EPK_LDIR are currently updated in this way. Then the configuration files are parsed and used to update the values in the data structure epk_data within the array epk_vars respectively. Two configuration files might be read. If a configuration file exists in the Scalasca/KOJAK installation DOCDIR, it will be read and processed first. An alternative default configuration file location can be specified at runtime with the special EPK_CONF environment variable: i.e., if EPK_CONF is set its value is used instead of the installation DOCDIR to locate the default EPIK.CONF file. (It makes no sense, and is an error, to specify EPK_CONF within a configuration file.)

The parsed key-value pairs are used to update the suitable values in the data structure epk_data within the array epk_vars. If a configuration file exists in the working directory as well, the same procedure will be repeated. Finally, for each corresponding environment variable that has also been set, its value is used to update the value in the respective data structure epk_data within the array epk_vars again.

This processing is performed automatically with the first call made to the method epk_get. Subsequent calls return the values stored in the data structures epk_data within the static array epk_vars. Processing of configuration files and environment variables is therefore performed only once by each process, irrespective of how many variables are queried or how often. (The values can differ per process, according to their environment, though most will be identical.)

Typically only a few variables will be specified in local configuration files, and some variables may be more convenient to specify via environment variables. The configuration file permits specification of as few or as many variables as desired. Although entries are optional, if present their syntax is checked and problems reported. Whenever a variable is multiply defined, the last value takes precedence.

The epk_conf_print method can be used to print the effective configuration, after processing defaults, files and environment variables, to the provided stream.

Additionally, the module epk_conf features two converters. They can be used to convert the returned string into a Boolean value or into an integer value, including appropriate handling of special strings such as "Yes," "TRUE," "Unlimited," etc.

epk_str2bool(epk_get(EPK_VERBOSE)) -> 0 (false)
epk_str2int(epk_get(ELG_COMPRESSION)) -> 6
epk_str2bool(epk_get(ELG_COMPRESSION)) -> 1 (true)

Incorporating new configuration variables is a straightforward matter of adding new variable definitions and corresponding entries to the epk_vars table, which includes the default values and short descriptions. (If it is desired to support old names for variables, e.g., ELG_FILE_PREFIX=EPK_TITLE, these might ultimately be specified as an additional mapping table which is scanned to get the updated name prior to retrieving the value.)

The default configuration is reported by running the new epk_conf utility in an environment where no existing files are installed and no environment variables are set. The settings which correspond to the currently active configuration can be similarly output, based on the internal defaults updated with located configuration files and any environment variables which are set. Such output is stored within corresponding experiments for reference (epik_a/epik.conf), installed as is or manually edited prior to installation in the current directory (or the installation DOCDIR, if preferred), for use with subsequent experiments.

Summary

The presented approach was introduced to support generic extensible management of different runtime configuration variables within the various components of the Scalasca/KOJAK toolset.

The module elg_env previously provided access to some environment variables for runtime configuration as already mentioned. Since the introduced module epk_conf provides more generic configuration management, while incorporating the prior functionality of elg_env, revised elg_env methods would simply call the corresponding epk_conf methods. Notably, in the absence of (installed or local) configuration files, the behaviour matches that of previously, while the optional configuration files can be used to specify configuration variables in a syntax matching that of the environment variables themselves. Therefore, the module elg_env has been removed from the sources and calls to the respective methods of elg_env have been replaced with calls to epk_conf.


SCALASCA    Copyright © 1998–2009 Forschungszentrum Jülich, Jülich Supercomputing Centre