Frequently asked questions

Generalities
Dialogue domains
Miscellaneous
Technical issues
- I get an UnsupportedClassVersionError when I try to run OpenDial.
- Speech recognition with the Nuance API does not seem to work.

Generalities

What is the relationship between OpenDial and MDP/POMDP approaches?

As in existing statistical frameworks, OpenDial represents the dialogue state as a Bayesian network that is regularly updated with new observations and employed to calculate the utility of possible system actions. The key difference between OpenDial and traditional MDP/POMDP approaches is that the domain models (i.e. the transition, reward and observation models) are expressed via probabilistic rules instead of the usual representations for probability distributions and utility functions.

The rules are essentially high-level templates for probabilistic models. They provide an abstraction layer that allows the system designer to capture the domain in a concise and human-readable form, with a limited number of parameters. In other words, OpenDial can be seen as following a "structured POMDP" approach to dialogue. See Lison (2014) for a more detailed description of the formalism.

What exactly are the utility rules? Do they represent rewards, action-values, or something else?

It depends on the planning horizon that is employed for your domain. In the most common case, the planning horizon is set to 1 (i.e. limited to the present time step). In this setting, the action selected by OpenDial will simply correspond to the one that maximises the total utility in the current dialogue state. It is, however, also possible to use planning horizons larger than 1, in which case online planning is performed to find the action that maximises the accumulated utilities from the present time up to the horizon limit (given a particular discount factor). Online planning is, however, a computational expensive operation due to the combinatorial explosion of the number of possible interaction paths to consider. It also necessitates the specification of a full transition model.

For most dialogue domains, we recommend therefore to keep the planning horizon to its default value, and ensure that the (handcrafted or learned) utilities reflect the long-term utilities of the action, and not just its immediate reward.

Why is there a distinction between "predictive" variables X^p and normal state variables X in OpenDial?

We sometimes want to use probability rules to provide a prior prediction on a future, currently unobserved variable (for instance, to predict the next user dialogue act following a particular system response). The suffix ^p allows OpenDial to distinguish such predictions from actually observed values. Once matched with actual observed values, the prediction acts as a prior for the observation. Technically, this is realised via an "equivalence node" inserted between the prediction and the observation, see Lison (2014), p. 78-79 for details.

This explicit distinction between prediction and observations is necessary in OpenDial since both the predictions and the observations may be uncertain (and hence expressed as distinct probability distributions). Spoken dialogue systems must indeed frequently integrate observations that represent "soft" evidence, such as the ASR/NLU hypotheses of the user dialogue act.

What are the differences between probability and utility rules?

Probability rules represent conditional probability models P(Y|X), where Y and X are arbitrary subsets of the state variables. In other words, probability rules express conditional, probabilistic relations between state variables. Utility rules, on the other hand, express the utility U(A,X) of particular system actions A depending on the state variables X. The utility rules encode the relative "desirability" (from the system's point of view) of executing particular actions depending on the current state.

In practice, probability rules are typically used to define the models used in language understanding (to capture e.g. the relation between the user utterances and their corresponding dialogue acts), dialogue state update (to capture the relation between the dialogue acts and other state variables such as the underlying user intentions) and in the prediction of future observations. The utility rules, for their part, are most often used in action selection (to find the next system action to execute) and generation (to find the best realisation of a particular communicative action).

Dialogue domains

How do I get the dialogue system to start first, before the user has uttered anything?

This can be done very easily: just add the starting system action in the initial dialogue state for the domain.

The calculation of marginal distributions sometimes gives imprecise results. Why?

If your domain includes continuous distributions (such as unknown parameter values), OpenDial will rely on sampling algorithms to perform probabilistic inference. Sampling algorithms are approximate algorithms, so the probability value will always be somewhat imprecise -- especially when dealing with multivariate continuous distributions, which are more difficult to sample. You can easily modify the number of samples in the Options -> Settings GUI window, or in the domain settings.

What is this special None value present in some distributions?

The None value is employed as a filler value to ensure that all probability distributions sum up to 1.0. For instance, if a user utterance is expressed as an N-best list with 2 elements, one "move left" with probability 0.6 and one "mow the left" with probability 0.2, the distribution over possible user utterance will have a None value with probability 0.2, corresponding to an empty utterance.

For system actions, the None value represents the void action (i.e. do nothing).

How can I keep track of the dialogue history?

By default, the dialogue state will only contain the most recent value of the user- or system-related variables such as u_u (user utterance), a_u (user dialogue act), a_m (system action) or u_m (system utterance). However, one can easily record longer dialogue histories by creating new variables that capture previous dialogue acts. For instance, we can create a new variable a_u-prev that contains the next-to-last dialogue act from the user, and specify the following rule in the domain model updating the user dialogue act:

<rule>

  <case>

    <effect>

<set var="a_u-prev" value="{a_u}" />

    </effect>

  </case>

</rule>

The same operation can be of course done for other state variables.

Alternatively, if you want to keep track of the complete dialogue history (without limit on the number of elements), you can also define the history as a list, and insert a new element after each update:

<rule>

  <case>

    <effect>

      <set var="a_u-history" value="{a_u-history}+{a_u}" />

    </effect>

  </case>

</rule>

For this last approach, you might need to reduce the number of hypotheses in the N-best lists in order to avoid a combinatorial explosion in the number of values in this history variable.

Miscellaneous

How do I integrate OpenDial as part of another application?

It is really easy to integrate OpenDial as part of another Java application. You simply need to add OpenDial to your classpath (if you don't need to directly access OpenDial's code, you can simply add the OpenDial JAR file and its dependencies to the classpath), instantiate a new DialogueSystem object, provide it with a dialogue domain (and possibly some additional modules), and start the system:

// creating the dialogue system

Domain domain = XMLDomainReader.extractDomain("path/to/XML/domain/file");

DialogueSystem system =new DialogueSystem(domain);

// Adding new domain modules (optional)

system.attachModule(OneExampleOfNewModule.class);

// When used as part of another application, we often want to switch off the OpenDial GUI

system.getSettings().showGUI = false;

// Finally, start the system

system.startSystem();

Once started, you can simply update the dialogue state using the methods addUserInput(...) and addContent(...) in DialogueSystem. You can also query the current dialogue state at any time using the getContent(...) methods. Check the Javadoc API for details on how to control and monitor the dialogue system.

As long as your programming language allows you to import Java classes (this is the cases for e.g. Jython or Scala), you can even integrate OpenDial in applications using other languages than Java. Here is for instance how to start OpenDial using Jython:

>>> from opendial.readers import XMLDomainReader

>>> from opendial import DialogueSystem

>>> domain = XMLDomainReader.extractDomain("/path/to/domain/file")

>>> system = DialogueSystem(domain)

>>> system.startSystem()

How do I connect two OpenDial clients running on remote machines?

OpenDial includes a functionality to connect two remote machines (on the same network) with one another. This can be especially useful to perform Wizard-of-Oz experiments. To allow two OpenDial systems to be connected with one another, follow the following procedure:

Start OpenDial on the two machines A and B (where A and B have mutually accessible IP addresses).
Click on Help -> About on machine A. Copy the local address (IP and port) for the machine.
Click on Interaction -> Connect to Remote Client on machine B. Copy the address and port for machine A into the field and click OK.
The connection between the two clients is now established. To use this remote connection for a Wizard-of-Oz dialogue, the machine playing the role of the Wizard should change its role by clicking on Interaction -> Interaction Role -> System.
At the end of the interaction, you can simply save the transcript into XML by clicking on Interaction -> Save Dialogue As ....

How can I perform incremental processing of user inputs in OpenDial?

OpenDial allows you to insert user inputs in an incremental manner. This can be useful if you have a speech recogniser that is able to output partial recognition hypotheses while the user is speaking. This way, you can get the dialogue system to react quickly and start processing the user inputs as soon as some partial hypotheses are available.

In practice, the insertion of incremental content is achieved via the method addIncrementalContent(...) in the class DialogueSystem. The method takes two arguments: a partial N-Best list of user input, and a boolean flag followPrevious that indicates whether the new content is a continuation of some previous hypotheses, or whether it constitutes a new utterance. Here is an example of how you can use the method:

DialogueSystem system = new DialogueSystem();

system.startSystem();

Map<String,Double> nbestList1 = new HashMap<String,Double>();

nbestList1.put("this is", 0.7);

nbestList1.put("these", 0.3);

system.addIncrementalUserInput(nbestList1, false);

Map<String,Double> nbestList2 = new HashMap<String,Double>();

nbestList2.put("a screw", 0.6);

system.addIncrementalUserInput(nbestList2, true);

The method starts by inserting a partial N-best list ["this is" (0.7), "these" (0.3)], and then expands these initial hypotheses with the N-best list ["a screw" (0.6), "" (0.4)]. At the end, the full N-Best list for the utterance will be ["this is a screw (0.42), "this is" (0.28), "these a screw" (0.18), "these" (0.12)].

When operating in this incremental mode, the probabilistic rules and the external modules are triggered as usual, but the state variables can be modified at anytime to reflect the insertion of new incremental content. If you want to perform incremental updates of other variables than the user inputs, you can do so with the method addIncrementalContent(...).

Technical issues

I get an UnsupportedClassVersionError when I try to run OpenDial.

All version of OpenDial > 0.95 require Java 8 in order to compile. Simply download and install the Java 8 JDK in order to resolve the issue.

Speech recognition with the Nuance API does not seem to work.

You should first check that the correct input mixer is selected in the Options, that the volume bar is moving when you talk in the microphone, and that you are connected to the internet. You should also make sure that the speech data lasts longer than 2 seconds, as the Nuance API seems to have problems with speech data of shorter duration.

If the problem persists, look at the logs to see what may have gone wrong. If you get an unusual response status from the Nuance server, check the online documentation on the Nuance Mobile Developer website to get the meaning of that response status. Finally, you can directly listen to the last recorded speech input via the OpenDial GUI (go to the state monitor tab, right-click on the "s_u" node, and select "play sound"). If the sound is absent or distorted, this may indicate a problem with the sound capture on your machine.

Page updated

Google Sites

Report abuse