Larry Heck, Conversational Systems Lab, Microsoft Corporation
The Conversational Web
There is currently significant interest in the speech, language, and human computer interaction (HCI) scientific communities in creating a conversational interface to the web. Through the combination of natural language (spoken and written) with gesture, touch, and gaze, this natural user interface (NUI) could help you complete online tasks, find what you want, and answer any question as naturally as having a conversation - anytime, anywhere. This talk explores the emergence of this Conversational Web and the fundamental trends in the industry that are providing significant opportunities, and in some cases challenges, for spoken language technologists. The first trend that the talk will cover is the availability of massive data from the Web. A Web of Intents has emerged from the feedback loop of click streams in Web search engines and browsers, capturing many hundreds of millions of queries, page views, and clicks from billions of users every day. This Web of Intents has begun to radically increase the predictive power of statistical learning algorithms underlying modern conversational systems. Second, the talk will discuss the emergence of the Web of Structured Knowledge. With the broadly adopted structuring of web sites through well-formed APIs, the Programmable Web has already enabled the emergence of mobile virtual personal assistants. And with the commitment to adopt standard semantic ontologies, the large search engine companies (Microsoft, Google) are finally making the structured "Semantic Web" a reality. Third, the "Applification" of the Web, which is decentralizing the web into highly structured but fragmented, specialized functional islands. This talk will explore all three trends, and propose some particular approaches to leveraging the first two and turning the third into an asset for the Conversational Web.
Larry Heck is a Microsoft Distinguished Engineer, and the Chief Scientist for the Conversational Systems Research Center (CSRC) in Microsoft Research. As Chief Scientist, he is responsible for the deciding the CSRC's long-term vision, direction, and research program. From 2005 to 2009, he was Vice President of Search & Advertising Sciences at Yahoo!, responsible for the creation, development, and deployment of the algorithms powering Yahoo! Search, Yahoo! Sponsored Search, Yahoo! Content Match, and Yahoo! display advertising. From 1998 to 2005, he was with Nuance Communications and served as Vice President of R&D, responsible for natural language processing, speech recognition, voice authentication, and text-to-speech synthesis technology. He began his career as a researcher at the Stanford Research Institute (1992-1998), initially in the field of acoustics and later in speech research with the Speech Technology and Research (STAR) Laboratory. Dr. Heck received the PhD in Electrical Engineering from the Georgia Institute of Technology in 1991.
Kevin Knight, Information Sciences Institute, University of Southern California
Structure Transformation for Machine Translation: Strings, Trees, and Graphs
String acceptors and transducers (Pereira and Riley, 1997) are a critical technology for NLP and speech systems. They flexibly capture many kinds of stateful left-to-right substitution; simple transducers can be composed into more complex ones; and they are trainable. Tree acceptors and transducers provide even more transformational power (Knight and Graehl, 2005). Still, strings and trees are both weak at representing linguistic structure involving semantics and reference ("who did what to who"). Viewing semantic structures as directed acyclic graphs, we take a look at probabilistic acceptors and transducers for them, demonstrate some linguistic transformations, and point toward a foundation for semantics-based machine translation.
Kevin Knight is a Senior Research Scientist and Fellow at the Information Sciences Institute of the University of Southern California (USC), and a Research Professor in USC's Computer Science Department. He received a PhD in computer science from Carnegie Mellon University and a bachelor's degree from Harvard University. Professor Knight's research interests include natural language processing, machine translation, and decipherment. In 2001, he co-founded Language Weaver, Inc., which provides commercial machine translation solutions to business and government customers. In 2011, he served as President of the Association for Computational Linguistics. Dr. Knight has taught computer science courses at USC for more than fifteen years, and he led an influential intensive workshop on machine translation at Johns Hopkins University. He has authored over 70 research papers on natural language processing, and several best paper awards. Prof. Knight also co-authored the widely adopted textbook, "Artificial Intelligence", published by McGraw-Hill.
Lillian Lee, Cornell University
Language as influence(d)
What effect does language have on people, and what effect do people have on language?
You might say in response, "Who are you to discuss these problems?" and you would be right to do so; these are Major Questions that science has been tackling for many years. But as a field, I think natural language processing and computational linguistics have much to contribute to the conversation, and I hope to encourage the community to further address these issues. To this end, I'll describe two efforts I've been involved in.
The first project provides evidence that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. We consider multiple types of power: status differences (which are relatively static), and dependence (a more "situational" relationship). Using a precise probabilistic formulation of the notion of linguistic coordination, we study how conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court.
Our second project is motivated by the question of what information achieves widespread public awareness. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. We introduce an experimental paradigm that seeks to separate contextual from language effects, using movie quotes as our test case. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. One example is lexical distinctiveness: in aggregate, memorable quotes use less common word choices (as measured by statistical language models), but at the same time are built upon a scaffolding of common syntactic patterns.
Joint work with Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg, and Bo Pang.
ACL 2012 (http://www.cs.cornell.edu/~cristian/memorability.html)
Lillian Lee is a professor of computer science at Cornell University. Her research interests include natural language processing, information retrieval, and machine learning. She is the recipient of the inaugural Best Paper Award at HLT-NAACL 2004 (joint with Regina Barzilay), a citation in "Top Picks: Technology Research Advances of 2004" by Technology Research News (also joint with Regina Barzilay), and an Alfred P. Sloan Research Fellowship. Her group's work has received several mentions in the popular press, including The New York Times, NPR's All Things Considered, and NBC's The Today Show.
Andrew Senior, Google
Deep Neural Networks for Large Vocabulary Speech Recognition
While neural networks were very popular in the early 1990s and used for speech recognition tasks, Gaussian mixture models remained the dominant paradigm with many powerful extensions and neural networks fell out of favor. Neural networks did come to be used for feature extraction, but in recent years there has been a resurgence of interest in neural networks for probabilistic modeling with several groups around the world reporting striking accuracy improvements for a variety of tasks.
In this talk I will describe the current practice of deep neural networks for large vocabulary speech recognition including pre-training and distributed stochastic gradient descent, focusing on Google's experience and experiments, particularly on the scalability of training and techniques for serving live traffic in real time for Google's Voice Search and Voice Input. I will also summarize results from other groups and discuss neural networks for language modeling.
Andrew Senior received his PhD from the University of Cambridge and is a Research Scientist at Google working on speech recognition. Before joining Google he worked at IBM Research in the areas of handwriting, audio-visual speech, face and fingerprint recognition as well as video privacy protection and visual tracking. He edited the book "Privacy Protection in Video Surveillance"; and coauthored Springer's "Guide to Biometrics". His research interests range across speech and pattern recognition, and computer vision.