most meaningful thing was internship. Working in industry and seeing that dreams of doing things which were industry experience. Everything doesn’t materialize.
Everything doesn’t materialize. Can you succeed as a student, researcher, find funding source etc. Any answers to those are meaningful.
Do something and knowing a lot more in next three months, I will know more. If it doesn’t work, then it can be a baseline. So, working on things and making it better is the goal.
Learning many skills is important. Finding dialogue solutions on existing problems is hard. First success was applying a well know method to some other problem. That helps build credibility.
Cannot become a good researcher sitting at home. People were afraid to try new things. “Standing on shoulders of giants” is meaningful. Environment is meaningful.
Dialogue picked me and I didn’t pick dialogue. There was a point in time around 2000s when I thought that web access and smartphone was the best. I was worried that my research will all go waste. I didn’t do something else. So, I had to be patient.
Neural nets have been here for quite some time. But, got attention later.
Having patience is important. Its like an obsession. Its like doing art. Found analyzing human-human dialogue when SDS didn’t work. So, I kept the motivation going. Started working with social media.
Don’t follow the trend. Focus on the approach. Know that people are more interested in approach and the results. Results can be obtained with the approach. Build good approach.
Gokhan: Let users explore. Do more and more thing.
Truam: Go back to human conversation and see how to converse. You don’t go to McD and ask about integral calculus. So, there is an environment and the reason the system is built for. Give people a chance to explore the system and let them know what the system is built for. Draw on those models. How do you detect that the dialogue is not going anywhere and how to gracefully recover from there?
Walker: Human in the loop can work?
Traum: Depends on the expertise from the people.
Gokhan: Deep learning
Traum: (Convert dialogue into a machine learning problem. Not good)
Walker: Broad coverage people are important. Inquisitive mind and someone who you can talk to about anything. Problem is that broad coverage has to be learnt from the right place. Finding those hard places is hard. Read a lot! Interest in sensitivity to language. Language is a way of communicating and not just a signal.
Trung: 30 lot of different expertise.
Traum: we’re focused on language and a few people know a lot about language and more than us, but their contribution as a body of material to us is not quite useful. People from different discipline don’t look at the problem the same way.
Ethan: give an example. Motivate well.
Marilyn: think hard about the baselines. Ask people about a good baseline. Ask people what side the issues are on. Have a good baseline for your tasks. Important to keep thinking about the problem and how you set it up.
Gokhan: happens in ML, my number is so much better is less influential. What was your contribution to make the model better ? How did your contributions affect the numbers ?
Trung: Deep learning cannot explain why it is better, and this is the reason why it works better and this is a good solution.
Two major categories in SDS:
IOT provides lot number of use cases. Ex: ATT system, connected dialogue systems.
Ondrej: May be it is not dialogue, may be integration system. Navigation system should ask questions.
Commercial dialogue systems are task oriented usually. AI in a game, playstation, talking to the AI and all work in a team. Entertainment system.
Is siri a dialogue system ?
Omilia: lot of customers. Conversational systems do not eliminate jobs. Goal is to not lose jobs but change jobs. Time is what matters. Orient agents to provide better customer experience.
Human in the loop AI can help leverage more complex stuff from humans.
Activation of card 2.5 euros cost if humans do the job. Agents can have a call and costs around 0.2 or 0.3. There is a major cost reduction.
Agents need to be able to grab attention. Lot of people just speak and are distracted. So, it doesn’t quite grab attention.
Context of the dialogue: Ex: Hello .. customer missed it .. Hello again. Make the customers move on.
If agent is confined to just finish conversation in a time bound manner, it cannot achieve the goal of optimality.
Customer service: dealing with a person is indivudial. But, the company cares about volumes serviced. How busy the queue.
Having a persona can help businesses perform better.
Imagine building a SDS for fridge, Task oriented system can be deployed without persona. It is like a siri use case.
People like talking to people. Siri: depends on use case.
Security systems should be smart: Robotic dog for security.
IOT: Fridge, camera and all the devices. Eco-systems creation with SDS.
People attribute personality to a voice. Even Roomba is associated a personality. Automatic systems is associated personality.
Alexa with hands free functionality is good and things don’t quite need a personality yet.
Personality is probably needed not in the case where the functionality performs well but when it doesn’t do well ? “I am sorry I didn’t catch that” repeatedly saying is not a good solution and needs a personality.
How to recover from errors is important in commercial vs research sds ? Siri: default is google search.
In Europe customers have completely different attitude to the system.
As soon as they hear system they start swearing.
- Has bunch of data labeled. Whats the best way to add new labels ? Slots and intent mapping. How addition of intents and slots? Needs recollecting and re-implementing the pipeline. May be generative models can be a solution to this. Needs re-training.
- Including context into the language understanding.
- Small labeled dataset, but large unlabeled dataset. Using self-taught learning.
- Using annotations on the ASR engine.
- DSTC2 data is static. Variability in the new data. Much more different from the dynamic data. So, SLU needs a service? Problem conversion between state tracking vs understanding.
- SLU: task oriented setting and slot fitting. Interested in non-task oriented setting? Understand what the person is speaking in open domain and respond differently. SLU task is much vague and depends on the domains.
- Generic understanding: AMR: Abstract meaning representation is more powerful than framenet.
- Spoken vs written: written is much harder because its harder to parse. But, spoken have shorter length and easier to parse.
- Give me a perfect AMR parser, but how to build the perfect SDS.
- Opentable API has three variable API. Query to the API which can fulfil the user request.
- Classification is over-simplified version of SLU.
- High precision grammar: disfluencies: Disfluency clean up doesn’t clean up. Query rewrite is a big issue.
- Query rewrite is very dominant. Synonyms and disjunctions are added.
- Query rewrite: Speech is taken as a commodity.
- Incremental SLU could handle works by having barge-ins. Annoying but works. Query success tasks increased by performing incremental SLU.
- Retrain on real user utterances. Corpora is important.
- Residual nets are may be equivalent to RNN. Skipping layers and resnets don’t quite have the same appeal due to depth and randomness.
- Wavenet: generates signals. Wavenets vs resnets ? Generating words is very different from generating images. Could still work. Recursive neural networks.
- Fit to the API is important. -Show me movies of “James Cameroon”- Build trees based on parse tree. Recursive neural networks needs a more structured training data.
- Recursive NN was more linguistically motivated.
- Bracketed representation and LSTM provides the structure.
- Memory augmented neural networks: Memory augments the learning by integrating context.
- Attention models vs memory networks.
- Memory networks are may be better suited for QA and not really dialogue.
- Direct linking works well for parsing lattices.
- Input layer is in the form a lattice. [wordlets]
- Memory networks [QA system] FB data set. Could be a good pointer. Memory network to represent the context.