The proposed project is about interacting with digital systems using virtual characters that exist in real, physical environments using a projected interface. These projected virtual characters will move around surfaces in a realistic manner, and interact with people through speech using a multimodal dialogue management system that includes both speech recognition and speech synthesis. The aim of this project is to design and build a prototype system that we can deploy in InSpace, and enable us to begin to investigate whether projected virtual characters can facilitate information exchange and provide an improved quality of experience compared with other approaches to interaction. In the medium term (and through follow-on funding we are in the process of applying for) this will enable us to investigate issues such as the benefits of having such a focus to interaction (compared with no focus, or a mobile device), whether two (or more) virtual characters result in an improved interaction, and the social presence of projected virtual characters in relation to their modality and persona and the overall context.
The motivation for this work is to develop new approaches for interaction with digital systems in both public and private spaces – freeing the user from interacting with a specific device (handheld or fixed), but while maintaining a visual focus for the interaction, Projections are significantly easier to handle than robots, which would have to be able to navigate their way around things and people in the environment, and they can’t hurt anyone accidentally. Yet, they can be very expressive and use several means of nonverbal communication (facial expression, gaze, body language, the way they move around,...) to make the experience of talking to an agent more complete. They can be used to create affect (personality, mood and emotion). The fact that these characters are being projected implies that they are ‘living’ on surfaces of the room. Therefor it does not make sense to use human like characters. They would just appear too unnatural. Instead we plan to use cartoon versions of animals that can crawl on walls, ceilings, desks etc, such as geckos or spiders.
During this project we plan to set up a working prototype in InSpace consisting of a controllable projection device to project the character and a hypersonic directional speaker on a pan&tilt device to let the character’s voice appear from it’s projected position as output devices. In order to recognise user’s speech we plan to use a microphone array. The project builds on several mature components including the AMI distant speech recognition system, HTS speech synthesis, the JAST/Indigo multimodal dialogue system, and the controllable projector system developed by Jochen Ehnes under the NIPUI Marie Curie fellowship.
Who am I?:
Jochen Ehnes, Colin Matheson, Jon Oberlander, Steve Renals, Simon Biggs (ECA)
How is it novel? What is exciting about it?:
Previous systems for creating virtual characters presented them on a computer screen. As a result the experience was like watching a TV show and calling in to talk to the people in the studio, or as doing a video chat at best. The new and exciting thing about this project is to finally bring the agents into the same room with us. We won’t even need a computer in the classical sense of screen, keyboard and mouse to interact with them.
While there have been works with projected characters before, they were quite limited. For once, as they used human like characters, it would not have been believable to let the characters walk between different locations across multiple surfaces and around obstacles such as doors. For that reason the characters were either morphed into a golden ball in order to be moved, or ‘beamed’ over onto a PDA to be transported to a new location. Furthermore the system used predefined animations as well as pre recorded speech segments, which would not allow for a very dynamic behaviour. In our project however, we plan to create a very dynamic behaviour and the impression that the characters really live in our environment. Instead of being pre-rendered, our characters will be simulated and rendered life, so they can move across the room in a believable way.
We are also excited about the fact that this system will provide us with a new opportunity to showcase research on speech technology and dialogue systems to a wider audience in InSpace and that it will provide us with means to examine affective computing in this context.
What will I do next? What opportunities will it open up?:
We are currently setting up a proposal for an EPSRC project to follow on; the IDEA Lab work will form a valuable pilot/prototype for this.
Simon Biggs (ECA) is a partner in the current proposal, and we believe that the prototype will be a valuable, and concrete way, to develop further and deeper collaborations with ECA, with the intention of developing joint proposals.
What constitutes success? How risky is it?:
Success ultimately is to create virtual characters that seem to live in a space and can talk to and interact with people coming into that space. They should be believable enough that people interact with them naturally. Of course this is not only hard to quantify, but also very ambitious. For that reason we specify success as having a system that projects a virtual gecko onto surfaces in InSpace, which can move around in a believable way, meaning its feet stay in place while they are in contact with the wall. It will also be able to move its mouth in sync to the speech generated by the speech synthesis (probably festival) and the sound of the speech will be projected at the character with a hypersonic speaker on a pan and tilt head. The system will also be able to understand speech and react to that by using an extended dialogue system. Optionally we will also implement a video tracking system to allow the system to spot users and approach them automatically, but we recognise that it might be too ambitious to integrate this within the intended timeframe.
An obvious risk is that the speech recognition will not work well enough with input from a microphone array. If that should be the case we plan to use close talking microphones. If the realtime speech recognition still should be insufficient, we might back up to keyword spotting.
What resources do I bring to the project?:
The project builds on several mature software components including the AMI distant speech recognition system (including microphone array audio capture), HTS speech synthesis (including hypersonic directional speaker), the JAST/Indigo multimodal dialogue system, and the controllable projector system developed by Jochen Ehnes under the NIPUI Marie Curie fellowship.
What resources and expertise do I need?:
The project would support Jochen Ehnes, who will be co-funded by InSpace, plus a speech researcher from CSTR. Requesting 4 person-months, in total.
What shared resources, if any, will the project create?:
The project will create a prototype of a system to create virtual characters. It will create all components necessary for that and ensure they integrate well. This system could then be used to further develop, test and evaluate individual components such as the dialogue system which might be improved with more affectionate behaviours.
As it consists of several components developed in house, such as speech recognition, speech synthesis, dialogue systems and signal processing for microphone arrays, it can be used to demonstrate advances in all those systems to a wider audience (visitors in InSpace) while at the same time provide a real life testbed for those systems.