Have you ever talked to Siri and asked yourself how one builds such a system? Some time ago, when I was pursuing my MPhil degree in Cambridge, Prof. Steve Young demonstrated a spoken dialogue system during a talk. I was fascinated by the idea that one could make a computer speak and understand human speech. I thought I must get into this research and so I applied for a PhD at the Department of Engineering’s Dialogue Systems Group. A spoken dialogue system normally has three parts: speech understanding, which decodes the meaning from the user’s speech; dialogue management, which tries to come up with a good response; and speech generation, which turns the answer into natural speech. All of these modules can be data-driven: machine learning methods allow us to build systems that become better at their tasks the more data they have.
This is very exciting because in today’s world we are generating data at the biggest pace ever.
There are two distinct kinds of machine learning methods that we use for this research. One is called supervised learning. This is how we learn ourselves when we have a teacher to provide examples. The system simply tries to imitate the teacher. Another is called reinforcement learning, and one can think of it as learning from interaction. In this approach, the system can explore different possibilities. Whenever it makes a good decision, it gets a reward from the user. Over time, it tries to maximise that reward. Just like a child learns from trial and error.
This kind of learning through interaction in the context of dialogue systems really intrigues me. The problem is that such learning methods normally need a huge number of interactions before the system starts to behave reasonably well. So I’ve been working on ways to speed up this process, so that the system can learn directly from talking to a human. And indeed I was the first researcher to show that this is possible.
Applications for this technology include every area where we currently see human-computer interaction, and it will make such interaction possible in the future in areas where we can’t imagine it today. Currently, I am particularly interested in applications in the health sector. To support such systems, we need to develop algorithms capable of supporting much more complex interactions than what is possible today. But if successfully built, such systems would have a huge benefit for society.
Dr Milica Gašić
Lecturer in Dialogue Systems, Department of Engineering Fellow, Murray Edwards College
See my interview for The Naked Scientists: http://www.thenakedscientists.com/HTML/interviews/interview/1001757/
Or check out my website: http://mi.eng.cam.ac.uk/~mg436/
References Gašić and S. Young "Gaussian Processes for POMDP-based dialogue manager opimisation", IEEE Transactions on Audio, Speech and Language Processing, 2014 Gašić, F. Jurcicek, B. Thomson, K. Yu and S. Young. "On-line policy optimisation of spoken dialogue systems via live interaction with human subjects", ASRU, Hawaii, 2011