Projected Virtual Characters (Or how to get people to talk to the wall)

No public posts in this group. You must register or login and become a member in order to post messages, and view any private posts.

The proposed project is about interacting with digital systems using virtual characters that exist in real, physical environments using a projected interface. These projected virtual characters will move around surfaces in a realistic manner, and interact with people through speech using a multimodal dialogue management system that includes both speech recognition and speech synthesis. The aim of this project is to design and build a prototype system that we can deploy in InSpace, and enable us to begin to investigate whether projected virtual characters can facilitate information exchange and provide an improved quality of experience compared with other approaches to interaction. In the medium term (and through follow-on funding we are in the process of applying for) this will enable us to investigate issues such as the benefits of having such a focus to interaction (compared with no focus, or a mobile device), whether two (or more) virtual characters result in an improved interaction, and the social presence of projected virtual characters in relation to their modality and persona and the overall context.

The motivation for this work is to develop new approaches for interaction with digital systems in both public and private spaces – freeing the user from interacting with a specific device (handheld or fixed), but while maintaining a visual focus for the interaction, Projections are significantly easier to handle than robots, which would have to be able to navigate their way around things and people in the environment, and they can’t hurt anyone accidentally. Yet, they can be very expressive and use several means of nonverbal communication (facial expression, gaze, body language, the way they move around,...) to make the experience of talking to an agent more complete. They can be used to create affect (personality, mood and emotion). The fact that these characters are being projected implies that they are ‘living’ on surfaces of the room. Therefor it does not make sense to use human like characters. They would just appear too unnatural. Instead we plan to use cartoon versions of animals that can crawl on walls, ceilings, desks etc, such as geckos or spiders.

During this project we plan to set up a working prototype in InSpace consisting of a controllable projection device to project the character and a hypersonic directional speaker on a pan&tilt device to let the character’s voice appear from it’s projected position as output devices. In order to recognise user’s speech we plan to use a microphone array. The project builds on several mature components including the AMI distant speech recognition system, HTS speech synthesis, the JAST/Indigo multimodal dialogue system, and the controllable projector system developed by Jochen Ehnes under the NIPUI Marie Curie fellowship.

Who am I?: 
Jochen Ehnes, Colin Matheson, Jon Oberlander, Steve Renals, Simon Biggs (ECA)
How is it novel? What is exciting about it?: 
Previous systems for creating virtual characters presented them on a computer screen. As a result the experience was like watching a TV show and calling in to talk to the people in the studio, or as doing a video chat at best. The new and exciting thing about this project is to finally bring the agents into the same room with us. We won’t even need a computer in the classical sense of screen, keyboard and mouse to interact with them. While there have been works with projected characters before, they were quite limited. For once, as they used human like characters, it would not have been believable to let the characters walk between different locations across multiple surfaces and around obstacles such as doors. For that reason the characters were either morphed into a golden ball in order to be moved, or ‘beamed’ over onto a PDA to be transported to a new location. Furthermore the system used predefined animations as well as pre recorded speech segments, which would not allow for a very dynamic behaviour. In our project however, we plan to create a very dynamic behaviour and the impression that the characters really live in our environment. Instead of being pre-rendered, our characters will be simulated and rendered life, so they can move across the room in a believable way. We are also excited about the fact that this system will provide us with a new opportunity to showcase research on speech technology and dialogue systems to a wider audience in InSpace and that it will provide us with means to examine affective computing in this context.
What will I do next? What opportunities will it open up?: 
We are currently setting up a proposal for an EPSRC project to follow on; the IDEA Lab work will form a valuable pilot/prototype for this. Simon Biggs (ECA) is a partner in the current proposal, and we believe that the prototype will be a valuable, and concrete way, to develop further and deeper collaborations with ECA, with the intention of developing joint proposals.
What constitutes success? How risky is it?: 
Success ultimately is to create virtual characters that seem to live in a space and can talk to and interact with people coming into that space. They should be believable enough that people interact with them naturally. Of course this is not only hard to quantify, but also very ambitious. For that reason we specify success as having a system that projects a virtual gecko onto surfaces in InSpace, which can move around in a believable way, meaning its feet stay in place while they are in contact with the wall. It will also be able to move its mouth in sync to the speech generated by the speech synthesis (probably festival) and the sound of the speech will be projected at the character with a hypersonic speaker on a pan and tilt head. The system will also be able to understand speech and react to that by using an extended dialogue system. Optionally we will also implement a video tracking system to allow the system to spot users and approach them automatically, but we recognise that it might be too ambitious to integrate this within the intended timeframe. An obvious risk is that the speech recognition will not work well enough with input from a microphone array. If that should be the case we plan to use close talking microphones. If the realtime speech recognition still should be insufficient, we might back up to keyword spotting.
What resources do I bring to the project?: 
The project builds on several mature software components including the AMI distant speech recognition system (including microphone array audio capture), HTS speech synthesis (including hypersonic directional speaker), the JAST/Indigo multimodal dialogue system, and the controllable projector system developed by Jochen Ehnes under the NIPUI Marie Curie fellowship.
What resources and expertise do I need?: 
The project would support Jochen Ehnes, who will be co-funded by InSpace, plus a speech researcher from CSTR. Requesting 4 person-months, in total.
What shared resources, if any, will the project create?: 
The project will create a prototype of a system to create virtual characters. It will create all components necessary for that and ensure they integrate well. This system could then be used to further develop, test and evaluate individual components such as the dialogue system which might be improved with more affectionate behaviours. As it consists of several components developed in house, such as speech recognition, speech synthesis, dialogue systems and signal processing for microphone arrays, it can be used to demonstrate advances in all those systems to a wider audience (visitors in InSpace) while at the same time provide a real life testbed for those systems.
What is the timescale?: 
1 March – 31 July 2010

Project update

At the end of April we submitted our proposal to EPSRC, which took us a lot of time, but we believe that now it is a very decent proposal.

In the recent weeks I first worked on the Character representation. For once, I programmed a body as well as upper and lower limbs connected to the inverse kinematics to create a gecko that walks across the surfaces. As the characters shall be able to talk, I also added a text to speech system (for now the Apple text to speech system) and set it up so that the character's mouth moves in sync to the generated speech. As the standard speech synthesiser does not allow very characterful speech, we plan to use other software developed in house for this purpose, so this should be seen as a proof of concept, in particular I wanted to see how the synchronisation of voice and mouth movement would work.
Similarly the motion of the body is very basic now. In fact the torso does not bend at all, but just the legs move to accommodate the simulated feet movements. I will have to work on a more sophisticated simulation, which also moves the torso, head and tail in a more believable manner.

Recently I have been working on the control of the directed speakers that shall create the effect that the sound appears to be originating from the character's mouth. I met with people from the workshop and Hugh Cameron is currently working to construct and build a gimbal that will hold the speaker, which will be moved using RC servos. In the meantime I programmed an Arduino board (Atmega 328 Microcontroller) to control the two RC servos based on data it receives via its usb/serial connection. Then I also added an interface class to my system that would allow it to connect to the arduino board and send the position values for the servos. A second class represents the pan and tilt device and abstracts the physical setup away. An extension to the GUI interface allows to set pan and tilt angles.

To complete the speaker gimbal control I will have to replicate the parts of my system used to calibrate the projectors position as well as the routines used to aim at a certain spot of the room.

As the controllable projector for Inspace arrived recently, we were in contact with the workshop to get it mounted in Inspace. Currently a mounting plate is being built.

(Nearly) Final Report

The speaker gimbal has been built and integrated with the system based on work described in the last project update. In order to calibrate the speaker gimbal's position and orientation one can easily aim it (with the help of a laser pointer mounted on it) at a few known calibration points in the room. With a press of a button the system then calculates the position and orientation.

With that known, the speaker can follow characters walking on surfaces, just as the projector does. While in general the sound appears to be coming from the character's position, there are some situations where the result is not as good as we hoped. When the ultrasound beam from the speaker hits the wall under a quite oblique angle, it seems to be reflected more than it creates a sound source, which seems to appear where the reflected beam hits a wall under a more direct angle. In order to improve that I believe we should look into the interaction of that beam with different surface materials and at different angles of incidence and reflection, it probably is related to diffuse reflection of light.

Another important part I implemented recently was the automatic folding of Bezier paths onto the surface model. That means while one had to be very careful so far that the paths one created were laying on one surface only and did not extend beyond the surface anywhere along the way, one can now easily create paths that cross surface boundaries. When start and end point of a path segment are on different surfaces, the system automatically unfolds the surfaces in a suitable way and draws the path on this unfolded plane. With this being done, it is now easily possible to let characters walk on different surfaces of a room. A behaviour simulation system should be able to do that automatically as well.

Finally we started to integrate the dialogue system based on SWI-Prolog into the simulation and projection system in order to create a demonstrator of the whole system we intended to build. While we were not quite able to finish that by the end of July due to different people being on holiday at different times during the last few weeks, we believe that it will be done quickly once everybody involved is back.

*

*

Syndicate content