Managers spend most of their time in meetings, preparing for meetings, or acting on them. Documents come to them in all kinds of ways - email, intranets, version control repositories, memory sticks - and just getting these in some semblance of order so they can be found reliably can be a major headache. One coping strategy is to make sure that everything is at least somewhere on one machine, typically a laptop, and use, for instance, Spotlight to search it. This helps, especially when there's a clear choice of what text string to use, but what if Spotlight could find the most relevant documents on the machine automatically, just based on what meeting it was? In this project, we want to show that this is possible by combining components that we already have in-house - primarily, fast speech recognition for a whole room and content linking - with the kinds of diaries that managers already use. Our current partnerships show us that companies can be very eager collaborators - but they only really understand and respond to application concepts if they can see at least some simple demonstration of them. We have not yet attempted to put together anything that shows this new functionality. In fact, this is our first foray into personal applications rather than group technologies. The main thing this funding will achieve is proof of principle, enabling us to open up a whole new application area for joint proposals.
There are many ways of specifying an end user application in this space. For the sake of demonstration, our specific vision augments the manager's laptop with a "microcone" - a USB device that provides good recording for offices and small meeting rooms - and that they control their schedule using Google Calendar. As the day goes on, the application will recognize the speech (which can be done either on one core of the laptop or "in the cloud") and segment it against the manager's schedule. When the manager is ready to digest the meeting or do some work resulting from it, he can go back to it in his diary and find his hard drive already indexed with two things: the documents he consulted during the meeting, and an ordered list of the documents he didn't consult, but are the most relevant, judging from what was said. This kind of interface will make it faster to remember what happened, cut and paste into new documents, and complete his work.
Who am I?:
Jean Carletta, Steve Renals
Submitted by jeanc on Mon, 08/16/2010 - 15:40
Project Outcomes:
* Created a prototype application that demonstrates the Ambient Spotlight concept by integrating results of speech recognition and higher level automatic annotations with a simple calendar display
* Demonstrated to industry contacts
* Submitted demo proposals to two conferences (both accepted)
Do the outcomes match the objectives:
Benefits for the future:
Having explored the space and developed a demonstrator, it will now be easier to develop strategies for future funding in the area of personal applications based on meeting speech recognition.
A large part of development involved integration of existing technology, and this project has helped to rationalize meeting processing, from recorded speech to meeting browsing and content-linking.
Opportunities for further research:
Specific research opportunities that arise from the demonstrator include investigating the effect different parameters have on the system performance: ASR quality; time-period over which results are aggregated etc. Evaluation via field testing with a variety of users would be essential to more formally assess the potential of these approaches.
More generally opportunities now appear to investigate the whole area of personal applications using meeting speech recognition.
Has the project created any new shared resources?:
Future research plans:
The Ambient Spotlight falls in one of the two application areas of a programme grant proposal currently being prepared (full proposal due September 2010).
Publications and presentations:
* Demonstration paper accepted to ACM Multimedia Workshop on Searching Spontaneous Conversational Speech:
J. Kilgour, J. Carletta and S. Renals, The Ambient Spotlight: Queryless Desktop Search From Meeting Speech, Proceedings of ACM multimedia workshop on Searching Spontaneous Conversational Speech (SSCS10), 2010, to appear.
* Demonstration proposal accepted to appear at ICMI-MLMI10.
* Presentation to Cisco Systems group, April 27th 2010.
Collaborations:
Collaboration with Dev/Audio founder Iain McCowan, maker of the Microcone. He passed us new versions of the Microcone Recorder software and we reported back issues to him.
We also maintained collaboration with some of the AMIDA partners who contributed to the Content Linking demonstrator, particularly at IDIAP, Switzerland.
Use of funding:
The funding was used as five person-months of support for the researchers who carried out the work.
How is it novel? What is exciting about it?:
Although there are currently personal applications that involve speech recognition, they all involve capture of a single speaker, usually wearing a headset mic. For this reason, none of them operate ambiently as part of the user's work environment. Meanwhile, modern office communication methods mean that the sheer volume of email and documents intended for a manager's attention can be completely unwieldy. Textual search technologies for disorganized storage do exist, but it can be difficult to find usable keywords. Our method takes a completely different cognitive approach to the problem that could really benefit users, especially less technical ones who might have less idea why particular search strategies do and don't work.
What will I do next? What opportunities will it open up?:
Once we have a working pilot and understand any architectural constraints on the range of possible end user applications, we intend to show it to our existing industry contacts so that with them, we can jointly develop a strategy for funding future work that would specify and evaluate our concept more formally. We will consider both direct industry funding, the Technology Strategy Board, and more traditional grant proposals, with our contacts as active partners. Although we intend to use the money to prove the principle for a specific application, we expect what we learn to be useful for a range of possible personal applications, include ones in health and education. For instance, one could imagine assistance for some medical professions that pulls up the relevant notes and protocols, or for students reviewing a lecture or tutorial.
What constitutes success? How risky is it?:
Success ultimately for us is coming up with a convincing application that both works technically and that our contacts find attractive. There are a number of issues that we have to sort to be successful. For instance, there is a trade-off between speech recognition speed and accuracy. One question is whether we can we get good enough speech recognition running on one core of a dual core laptop, or whether we need to resort to "the cloud". If it's the latter, then we need to sort how we'll send the audio for processing without incurring delays that interfere with a manager's workflow. Similarly, indexing materials from heterogeneous sources and integrating interfaces with an existing calendaring framework might be challenging. Our feeling is that somewhere in this space, there's an idea that will work well, but we don't know exactly how to write the application yet.
What resources do I bring to the project?:
Micro-cone, meeting speech recognizer, content linking componentry; experience in the design and implementation of meeting support applications; data on which to demonstrate; half a dozen industry contacts likely to be interested.
What resources and expertise do I need?:
Over 5 calendar months, 4 person-months of Jonathan Kilgour running against maintenance tasks on other projects, and 1 person-month of Jean Carletta, running part-time. We also need around a 1000 pounds of computing resource to cover the departmental costs for speech recognition, which can be very resource intensive. We may need a demonstration laptop if we can't find a spare - most staff would be unwilling to show all their laptop contents in a live demonstration.
What shared resources, if any, will the project create?:
The project will create a end user pilot application, and as part of that, better componentry for doing "content linking" on a laptop that can be reused in other applications.
What is the timescale?:
Five person-months over five calendar months split between two staff, starting mid February or early March 2010.