CVoiceControl consists of three executables:
microphone_config
model_editor
cvoicecontrol
First of all, you have to calibrate your microphone to use it
for speech recognition. Use microphone_config
to
do this. When your sound hardware is prepared
you can use model_editor
to create speaker models.
These are the main objects needed for the speech recognition
process.
cvoicecontrol
is the actual speech recognition program.
These three components are described in the next three sections. Also the structure of the speaker models will be described more closely.
Microphone_config
must be started by entering the
command
% microphone_config
at a command prompt (Linux console,
xterm, kvt etc.).
First, the tool generates a list of available mixer and audio devices.
If the tool fails at this time, it is because it
could not find any appropriate mixer and/or audio devices in your system.
In this case, make sure that your sound device is installed
correctly and that your sound driver is working properly.
The microphone calibration process is divided into five steps. These steps can be run from the main menu. A step that has been completed successfully is displayed in bold face, a step that has not been completed is displayed in normal face and a step that can not be run at this time is not displayed at all.
The five steps are:
Select Mixer Device
Select Audio Device
Adjust Mixer Levels
Calculate Recording Thresholds
Estimate Characteristics of Recording Channel
If microphone_config
managed to detect your audio hardware
automatically the first two steps (Select Mixer Device
and
Select Audio Device
) are displayed in bold, i.e. marked as
``completed successfully''. (The selected device files are displayed
in parantheses behind the menu entries.)
In this case you may continue with step three.
Nevertheless, if you have more than one sound card installed or if
you have to select a non-default mixer or audio device, hit
enter on the respective menu item and select a device from the list.
In case of doubt, stick with the suggested settings!
Next, run step three: Adjust Mixer Levels
.
Here, we try to estimate good values for the mixer channels
MICROPHONE IN (MIC) and (if available) INPUT GAIN (IGAIN). You will be
guided through the process by detailed information dialogs.
To succeed, this step strongly relies on your cooperation!
Initially, the MIC level is set to the maximum and the IGAIN level (if available) is set to the minimum value.
If an IGAIN channel is available then its level is increased while you speak at a conversational volume until the input signal is strong enough. Hint: Reasonable values for the IGAIN level on my system range between 1 and 8.
Next, the microphone level is reduced repeatedly while you speak at a ``maximum volume level'' until the incoming signal does not exceed an upper limit anymore. Hint: Reasonable values for the MIC level on my system range between 60 and 95.
Upon successful completion of this step, the next two steps are available for selection from the main menu.
Next, select Calculate Recording Thresholds
from the menu.
During this step, we try to find reasonable energy levels at which to start the automatic voice recording and at which to stop the recording. Again, you will be guided through the process by detailed information dialogs.
In the next step Estimate Characteristics of Recording Channel
the characteristics (like background noise etc.) of the recording
channel are estimated. Again, there is online information to
guide you through the process.
If all five steps have been completed successfully, the item Write Configuration
becomes available in the main menu. Please select it to
store all the gathered information to the file config
which
is put in the directory .cvoicecontrol
in your home directory.
The directory .cvoicecontrol
is created if necessary.
If the configuration has been saved successfully you can leave the
configuration tool by selecting Exit
from the main menu.
Congratulations, your microphone is set up for speech
recognition!
CVoiceControl is a template-matching based speech recognition system, i.e. for each command that can be recognized there have to be some sample utterances which an incoming utterance can be compared to. All this stuff is collected in a so-called speaker model.
A speaker model consists of a variable number of reference items where each reference item corresponds to a command that can be recognized. A reference item consists of a label (a transcription of what is said), a command (a unix command that is executed upon recognition of this reference item) and a variable number of sample utterances.
Roughly speaking, to recognize an incoming utterance, it is compared to all sample utterances of all reference items in the active speaker model. If the sample utterances of one reference item are most similar to the incoming utterance (i.e. have the smallest distance score), this reference item will be chosen as recognition result.
To launch the speaker model editor open a console and type:
% model_editor
From the main menu of the editor you can reset the current speaker
model (New Speaker Model
), load one from file
(Load Speaker Model
), edit the model (Edit Speaker Model
),
save it (Save Speaker Model
) and leave the editor
(Exit
).
Model Editor:
The model editor shows the reference items of the current speaker
model in a table view, one reference per line. A reference item
in the table can be highlighted (selected) using the up and down
cursor keys.
At the bottom
of the dialog a brief summary of keyboard commands is displayed
for your convenience. Press a
to add a new reference item to
the model, press d
to delete the currently highlighted item,
Press Enter
to edit the currently highlighted item and press
b
to return to the main menu.
So for example, to add and edit a new reference item,
please press a
followed by Enter
.
Edit Speaker Model Item:
Selecting a reference item by pressing Enter
opens the
item editor dialog. This dialog displays the label and
command of the selected item as well as a list of donated
sample utterances. A brief summary of keyboard
commands is displayed at the bottom.
Sample utterances in the list view can be highlighted using the up
and down cursor keys.
To record a new sample utterance press r
. The recording is
then done automatically, i.e. no further keyboard interaction is
required to record the utterance. Note: After pressing r
you
should wait a second or so before starting to talk! This is because
an audio buffer needs to be filled before the actual automatic recording
can be started!
To delete a highlighted sample utterance press d
, to play it
press Enter
.
To edit the label string of the current item press l
.
To edit the command string press c
.
To leave the current dialog press b
.
Important: Listen to every utterance you record to make
sure that nothing has been cut off at the boundaries! If many
utterances are cut off, please rerun the microphone configuration
tool!
Note: To ensure a good recognition quality, a minimum number of sample utterances per reference item is required. By default, the minimum number is set to ``4''.
Note: Recognized commands are executed in the foreground by default. This means that the speech recognizer blocks until the executed command has finished! This behaviour is required because many sound cards do not allow for recording and playing at the same time. So, if one wants to output any acoustic reaction to the sound card, the speech recognizer will need to wait until the command was executed before continuing in auto recording mode. If you want to have the speech recognizer run a command in the background and continue with recognition you have to append a ``&'' to the command!
By the way, the command may consist of a sequence of commands separated by ``;''.
Important: If a reference item has been recognized
by the speech recognizer the associated command will be executed!
There is no guarantee that the recognition result is correct.
Also, the speech recognizer does not check whether the execution of
a command would harm your system (we talk about commands like rm
). Thus, it is
the users responsibility to define harmless commands in the
speaker model and to make sure that the reference items in a
speaker model are not too confusable!
Once you have finished editing the speaker model, save it to disk
via Save Speaker Model
from the main menu. Note that speaker
model files must have the extension .cvc. If you do not
specify this extension it will be appended to the file name
automatically!
To start the speech recognizer open a console and type:
% cvoicecontrol <model_file>
where <model_file>
is the name
of the speaker model you want to use.
The speech recognizer enters auto recording mode automatically.
Note: Make sure that no application needs access to the sound device at this time, as most sound devices only allow for exclusive access!
After a command was recognized successfully the speech recognizer reenters automatic recording mode, being ready for the next speech command.
To finish the program, you have to kill the speech recognizer explicitely
by pressing Ctrl-C
in the console where you started the recognizer
or by issuing the command killall cvoicecontrol
from any command prompt.
Hint: There is also a special command name that can be used in a speaker model's
reference item to finish cvoicecontrol. It is called cvoicecontrol_off
.
Note: The speech recognizer can be started in a special mode by
specifying the command line option --once
, i.e. by starting it
the follow way:
% cvoicecontrol --once <model_file>
In this case, the speech recognizer will exit automatically after the
first successful recognition run. The exit code of the program is set
to the id number of the reference item that has been recognized.
As an example let us consider a speaker model yes-no.cvc
that
contains two reference items. The first one being ``Yes'', the
second one being ``No''. Invoked like
% cvoicecontrol --once yes-no.cvc
the speech recognizer returns 0 if ``Yes'' was recognized and 1 if
``No'' was recognized. Using speech prompts in shell scripts is
then straightforward. Example:
#!/usr/bin/tcsh
cvoicecontrol --once yes-no.cvc
set result = $status
if ($result == "-1") then
echo "Error!"
else if ($result == "0") then
echo "You said yes"
else if ($result == "1")
echo "You said no"
endif
exit
Note: In a tcsh
script the shell variable status
always contains the exit code of the most recently executed
command! To obtain the exit code in a bash
script you have
to use the special parameter $?.
Have fun with CVoiceControl!