The proposed project is about interacting with digital systems using virtual characters that exist in real, physical environments using a projected interface. These projected virtual characters will move around surfaces in a realistic manner, and interact with people through speech using a multimodal dialogue management system that includes both speech recognition and speech synthesis. The aim of this project is to design and build a prototype system that we can deploy in InSpace, and enable us to begin to investigate whether projected virtual characters can facilitate information exchange and provide an improved quality of experience compared with other approaches to interaction. In the medium term (and through follow-on funding we are in the process of applying for) this will enable us to investigate issues such as the benefits of having such a focus to interaction (compared with no focus, or a mobile device), whether two (or more) virtual characters result in an improved interaction, and the social presence of projected virtual characters in relation to their modality and persona and the overall context.
The motivation for this work is to develop new approaches for interaction with digital systems in both public and private spaces – freeing the user from interacting with a specific device (handheld or fixed), but while maintaining a visual focus for the interaction, Projections are significantly easier to handle than robots, which would have to be able to navigate their way around things and people in the environment, and they can’t hurt anyone accidentally. Yet, they can be very expressive and use several means of nonverbal communication (facial expression, gaze, body language, the way they move around,...) to make the experience of talking to an agent more complete. They can be used to create affect (personality, mood and emotion). The fact that these characters are being projected implies that they are ‘living’ on surfaces of the room. Therefor it does not make sense to use human like characters. They would just appear too unnatural. Instead we plan to use cartoon versions of animals that can crawl on walls, ceilings, desks etc, such as geckos or spiders.
During this project we plan to set up a working prototype in InSpace consisting of a controllable projection device to project the character and a hypersonic directional speaker on a pan&tilt device to let the character’s voice appear from it’s projected position as output devices. In order to recognise user’s speech we plan to use a microphone array. The project builds on several mature components including the AMI distant speech recognition system, HTS speech synthesis, the JAST/Indigo multimodal dialogue system, and the controllable projector system developed by Jochen Ehnes under the NIPUI Marie Curie fellowship.
The speaker gimbal has been built and integrated with the system based on work described in the last project update. In order to calibrate the speaker gimbal's position and orientation one can easily aim it (with the help of a laser pointer mounted on it) at a few known calibration points in the room. With a press of a button the system then calculates the position and orientation.
With that known, the speaker can follow characters walking on surfaces, just as the projector does. While in general the sound appears to be coming from the character's position, there are some situations where the result is not as good as we hoped. When the ultrasound beam from the speaker hits the wall under a quite oblique angle, it seems to be reflected more than it creates a sound source, which seems to appear where the reflected beam hits a wall under a more direct angle. In order to improve that I believe we should look into the interaction of that beam with different surface materials and at different angles of incidence and reflection, it probably is related to diffuse reflection of light.
Another important part I implemented recently was the automatic folding of Bezier paths onto the surface model. That means while one had to be very careful so far that the paths one created were laying on one surface only and did not extend beyond the surface anywhere along the way, one can now easily create paths that cross surface boundaries. When start and end point of a path segment are on different surfaces, the system automatically unfolds the surfaces in a suitable way and draws the path on this unfolded plane. With this being done, it is now easily possible to let characters walk on different surfaces of a room. A behaviour simulation system should be able to do that automatically as well.
Finally we started to integrate the dialogue system based on SWI-Prolog into the simulation and projection system in order to create a demonstrator of the whole system we intended to build. While we were not quite able to finish that by the end of July due to different people being on holiday at different times during the last few weeks, we believe that it will be done quickly once everybody involved is back.
Project update
Submitted by jehnes on Wed, 05/19/2010 - 14:02.At the end of April we submitted our proposal to EPSRC, which took us a lot of time, but we believe that now it is a very decent proposal.
In the recent weeks I first worked on the Character representation. For once, I programmed a body as well as upper and lower limbs connected to the inverse kinematics to create a gecko that walks across the surfaces. As the characters shall be able to talk, I also added a text to speech system (for now the Apple text to speech system) and set it up so that the character's mouth moves in sync to the generated speech. As the standard speech synthesiser does not allow very characterful speech, we plan to use other software developed in house for this purpose, so this should be seen as a proof of concept, in particular I wanted to see how the synchronisation of voice and mouth movement would work.
Similarly the motion of the body is very basic now. In fact the torso does not bend at all, but just the legs move to accommodate the simulated feet movements. I will have to work on a more sophisticated simulation, which also moves the torso, head and tail in a more believable manner.
Recently I have been working on the control of the directed speakers that shall create the effect that the sound appears to be originating from the character's mouth. I met with people from the workshop and Hugh Cameron is currently working to construct and build a gimbal that will hold the speaker, which will be moved using RC servos. In the meantime I programmed an Arduino board (Atmega 328 Microcontroller) to control the two RC servos based on data it receives via its usb/serial connection. Then I also added an interface class to my system that would allow it to connect to the arduino board and send the position values for the servos. A second class represents the pan and tilt device and abstracts the physical setup away. An extension to the GUI interface allows to set pan and tilt angles.
To complete the speaker gimbal control I will have to replicate the parts of my system used to calibrate the projectors position as well as the routines used to aim at a certain spot of the room.
As the controllable projector for Inspace arrived recently, we were in contact with the workshop to get it mounted in Inspace. Currently a mounting plate is being built.