This section details how to practically encode dialogue domains for OpenDial using XML.
A dialogue domain in OpenDial follows the skeleton below:
<domain>
<initialstate>
<!--(optional) initial state variables -->
</initialstate>
<parameters>
<!--(optional) prior distributions for rule parameters -->
</parameters>
<model
trigger="trigger variables for model 1">
<!--probabilistic rules for model 1 -->
</model>
<model
trigger="trigger variables for model 2">
<!-- probabilistic rules for model 2 -->
</model>
...
<model
trigger="trigger variables for model n">
<!-- probabilistic rules for model n -->
</model>
<settings>
<!--(optional) domain-specific settings -->
</settings>
</domain>
The settings, initial state and parameters can be left out of the domain specification if empty. The number of rule-structured models is arbitrary.
For more complex domains, the domain specification can be split in several files through the import marker:
<import
href="path to another file"
/>
Numerous examples of dialogue domains can be found in the directory domains and test/domains of the base directory.
XML format for <domain>:
The initial state for the domain defines the variables included in the dialogue state upon starting the dialogue system. Each variable has a particular identifier and a probability distribution.
Variables with a discrete range of values are defined as categorical tables:
<variable
id="variable_id">
<value
prob="probability for first value">first value</value>
<value
prob="probability for second value">second value</value>
...
<value
prob="probability for the nth value">nth value</value>
</variable>
Probability values must be comprised between 0 and 1. If the total probability amounts to less than 1, OpenDial automatically adds an empty value (None) for the remaining probability mass. If the prob attribute is omitted, the value is assumed to have a probability 1.
Here is a simple example of state variable:
<variable
id="userIntention">
<value
prob="0.5">Want(Object_A)</value>
<value
prob="0.3">Want(Object_B)</value>
</variable>
Probability distributions can also be defined for a continuous range, using the XML element <distrib type="..."> (see below).
XML format for <initialstate>:
XML format for <variable> in <initialstate>:
IMPORTANT NOTE:
Generally speaking, variable can have arbitrary identifiers, but a couple of special characters should be avoided. Variables should not include primes ('), curly brackets ({,}) or square brackets ([,]), as these are used internally in OpenDial. Furthermore, variables ending with ^p, ^t and ^o have a special function: ^p denotes predictive variables, ^t denotes temporary variables that are deleted immediately after each update loop, and ^o denotes observation variables for user simulators.
Some variable values also have a special meaning in OpenDial: "None" denotes an "empty" value, and values between square brackets [ ] denote sets of elements.
Probabilistic rules can include parameters whose values is initially unknown and must be estimated from data. As OpenDial adopts a Bayesian learning approach, each parameter must be associated with an prior distribution over its (usually continuous) range of possible values.
XML format for <parameters>:
Parameters are defined in exactly the same way as state variables. Their distributions are defined in a parametric manner:
<variable
id="uniform_example">
<distrib
type="uniform">
<min>-1</min>
<max>3</max>
</distrib>
</variable>
<variable
id="gaussian_example">
<distrib
type="gaussian">
<mean>2</mean>
<variance>4</variance>
</distrib>
</variable>
<variable
id="dirichlet_example">
<distrib type="dirichlet">
<alpha>1</alpha>
<alpha>1</alpha>
<alpha>2</alpha>
</distrib>
</variable>
A dialogue model is essentially defined as a set of probabilistic rules combined with one or more "trigger variables" that defines when the rules are to be applied:
<model
trigger="trigger variable(s)">
<rule
id="rule 1">
...
</rule>
<rule
id="rule 2">
...
</rule>
...
<rule
id="rule n">
...
</rule>
</model>
The trigger variables must be separated by a comma. The rules can either encode probability or utility rules, as we explain below.
XML format for <model>:
Probability rules express how a subset of state variables (the "input variables" of the rule) affect the probability distribution over some other state variables (the "output variables"). The output variables may either already exist in the dialogue state (in which case their content is erased) or represent new variables to include in the dialogue state.
Probability rules are structured as an if...then...else construction:
if (condition c1) then
P(effect e1) = ...
P(effect e2) = ...
...
else if (condition c2) then
...
else
...
In XML, these probability rules are expressed as (ordered) list of cases. Each case has a (possibly empty) condition and a list of alternative effects (each with a particular probability).
Here is one concrete example of probability rule (corresponding to the rule r1 in Lison (2014), p. 65):
<rule
id="r1">
<case>
<condition>
<if
var="Rain"
value="false"/>
<if
var="Weather"
value="hot"/>
</condition>
<effect
prob="0.03">
<set
var="Fire"
value="true"/>
</effect>
<effect
prob="0.97">
<set
var="Fire"
value="false"/>
</effect>
</case>
<case>
<effect
prob="0.01">
<set
var="Fire"
value="true"/>
</effect>
<effect
prob="0.99">
<set
var="Fire"
value="false"/>
</effect>
</case>
</rule>
Rule r1 simply indicates that the probability of a fire if there is no rain and a hot weather is 0.03, while this probability is 0.01 in other cases.
In some circumstances, one may want to enforce a particular dominance hierarchy among the rules (in order to ensure that some rules have priority over others if they are triggered simultaneously). This can be specified using the priority attribute, taking an integer value (where 1 indicates the highest priority).
XML format for <rule>:
XML format for <case>:
We now detail how the conditions and effects are practically specified.
Conditions
As exemplified in the rule above, the condition XML node is composed of a list of basic conditions.
XML format for <condition>:[2]
Each basic condition is written as an <if .../> markup with three basic attributes:
XML format for <if .../>:
Effects
Each case contains one or more (alternative) effects. Each effect has a particular probability of occurrence. This probability can be specified by hand, as in the example above:
<effect
prob="0.03">
<set
var="Fire"
value="true"/>
</effect>
When the effect does not specify any prob attribute, the effect is assumed to have a probability 1. When the total probability for all effects is lower than 1, an empty effect is implicitly assumed to cover the remaining probability mass.
The probability of a particular effect can also be a parameter. In this case, each case with n alternative effects is associated with a nth dimensional Dirichlet distribution that express the possible values for the effect probabilities. For instance, the effect probabilities in rule r1 can be rewritten as:
<rule
id="r1">
<case>
<condition>
<if
var="Rain"
value="false"/>
<if
var="Weather"
value="hot"/>
</condition>
<effect
prob="firstdirichlet[0]">
<set
var="Fire"
value="true"/>
</effect>
<effect
prob="firstdirichlet[1]">
<set
var="Fire"
value="false"/>
</effect>
</case>
<case>
<effect
prob="seconddirichlet[0]">
<set
var="Fire"
value="true"/>
</effect>
<effect
prob="seconddirichlet[1]">
<set
var="Fire"
value="false"/>
</effect>
</case>
</rule>
Note the brackets after the parameter name to refer to a specific dimension of the multivariate Dirichlet.
XML format for <effect> (for probability rules):
Inside each effect is a list of basic assignment of values to variables. Each assignment is defined by a <set.../> markup with two attributes: var and value.
XML format for <set .../> (for probability rules):
Rule can also be employed to express utility models. A utility rule defines the utility of particular actions (from the system perspective) depending on particular state variables. The general skeleton remains similar to probability rules, with the difference that effects are this time associated to particular utilities instead of probabilities. Here is an example of utility rule (rule r2 of Lison (2014), p. 69):
<rule
id="r2">
<case>
<condition>
<if
var="Fire"
value="true"/>
</condition>
<effect
util="5">
<set
var="Tanker"
value="drop-water"/>
</effect>
<effect
util="-5">
<set
var="Tanker"
value="wait"/>
</effect>
</case>
<case>
<effect
util="-1">
<set
var="Tanker"
value="drop-water"/>
</effect>
<effect
util="0">
<set
var="Tanker"
value="wait"/>
</effect>
</case>
</rule>
Rule r2 indicates that the utility of the drop-water action is +5 is there is a fire (and -1 otherwise), and that the utility of wait is -5 is there is a fire and 0 otherwise.
Conditions are defined similarly to probability rules. Effects also have a similar structure, with one exception: the prob attribute is replaced by util. The variables specified in the effect (Tanker in the above example) are action variables.
As for probability rules, utilities can be fixed or correspond to parameters to estimate. For instance, rule r2 can include four parameters that denote the respective utility of the system actions depending on the situation:
<rule
id="r2">
<case>
<condition>
<if
var="Fire"
value="true"/>
</condition>
<effect
util="firstgaussian">
<set
var="Tanker"
value="drop-water"/>
</effect>
<effect
util="secondgaussian">
<set
var="Tanker"
value="wait"/>
</effect>
</case>
<case>
<effect
util="thirdgaussian">
<set
var="Tanker"
value="drop-water"/>
</effect>
<effect
util="fourthgaussian">
<set
var="Tanker"
value="wait"/>
</effect>
</case>
</rule>
XML format for <effect> (for utility rules):
XML format for <set ... /> (for utility rules):
In addition to an initial state, parameters and rule-structured models, a dialogue domain can also include particular system settings to override the default values.[3]
The settings are defined as simple list of elements:
<settings>
<property1>value for property1</property1>
<property2>value for property2</property2>
....
</settings>
These properties can also be modified through the GUI or by adding a -Dproperty=value flag to the command line.
XML format for <settings>:
(partial list, see Settings.java for all details)
[1] Multivariate Gaussian distributions can also be defined. In this case, the scalar values for the mean and variance are replaced by vector values in the form <mean>[v1,v2,..,vn]</mean>. Multivariate Gaussian distributions support for the moment only distributions with a diagonal covariance (i.e. independent Gaussians).
[2] Conditions can also include the nested operators <and>, <not> and <or> (cf. Advanced modelling: nested conditions).
[3] The default settings can be found in the file resources/settings.xml.