17th International Conference on

Intelligent Virtual Agents        

(IVA 2017), Aug 27-30, Stockholm


Virtual and Physical: Two Frames of Mind Bilge Mutlu, University of Wisconsin–Madison Abstract.  In creating interactive technologies, virtual and physical embodiments are  often seen as two sides of the same coin. They utilize similar core technologies for  perception, planning, and interaction and engage people in similar ways. Thus,  designers consider these embodiments to be broadly interchangeable and choice of  embodiment to primarily depend on the practical demands of an application. In this talk, I  will make the case that virtual and physical embodiments elicit fundamentally different  frames of mind in the users of the technology and follow different metaphors for  interaction. These differences elicit different expectations, different forms of engagement,  and eventually different interaction outcomes. I will discuss the design implications of  these differences, arguing for different domains of interaction serving as appropriate  context for virtual and physical embodiments.  Biography.  Bilge Mutlu is an associate professor of computer science at the University  of Wisconsin–Madison and the director of the Wisconsin Human-Computer Interaction  Laboratory. His research mission is to develop human-centered principles and methods  to design robotic technologies that help people communicate, work, and pursue personal goals and to facilitate the effective integration of these technologies into the human  environment. In pursuing this mission, he combines technical, behavioral, and design  perspectives in a transdisciplinary research process. Initially trained as a product  designer, Dr. Mutlu completed his doctoral work in human-computer interaction at  Carnegie Mellon University. He is a former Fulbright scholar and the recipient of several  awards from the human-computer and human-robot interaction communities. More  information about Dr. Mutlu is available at http://bilgemutlu.com.   Prosody – Modeling its Multifunctionality and Multimodality Petra Wagner, Bielefeld University Abstract.  In a narrow linguistic definition and focusing at the level above the word, the  term prosody denotes those aspects of the speech signal that convey (i) the impression  of boundaries, and (ii) the impression of some linguistic units being prominent. A less  narrow definition of prosody will also consider discourse related aspects of  suprasegmental structure, i.e. where its helps to negotiate the conversational floor,  where it expresses a speaker‘s attitudinal stance of a situation, where it reveals issues  related to speech planning (disfluencies, hesitations) or as a means to express inter-  speaker relationships. While we assume that these functions of prosody are not only  transported via the speech signal but are tightly interwoven with co-speech movements,  the exact interplay of these two modalities is far from clear. In my talk I will present an  overview of how speech and gesture convey prosodic functions, and make some  tentative suggestions as to how they are linked and how this link can be modeled.  Biography.  After finishing an MA in linguistics, psychology and English studies at  Bielefeld University in 1998, Petra joined the phonetics department at the former Institute  of Communication Science and Phonetics (University of Bonn). During the following  years, her work focused on various aspects of speech synthesis, prosodic prominence  and speech rhythm. She received her PhD in 2002 and successfully finished her  habilitation thesis in 2008. In the same year, she was appointed full professor for  phonetics and phonology at Bielefeld University. There, she established a work group  with a strong interest in the prosody of communicative interaction, including its  multimodal aspects and with a strong focus on timing. More recently, she has started to  increase work on more cross-linguistic analyses, attitudinal expression and second  language acquisition. While focusing on fundamental research, her work has always  sustained a strong link to more applied aspects such as the usability of prosody in  dialogue systems, general human-machine interaction and conversational speech  synthesis. Learning from Visual Speech Iain Matthews, Oculus Research Abstract. Moving faces are interesting and informative. As human observers, we are  more perceptive of inconsistencies and errors the more lifelike a face is realized (the so-  called Uncanny Valley effect). As a broad rule of thumb in content creation, the face  represents at least the same effort and complexity as the rest of a character. When  ultimate realism is required (e.g. a live action visual effects shot) facial performance  capture, modeling, rigging, retargeting, animation, shading, and rendering each  represent significant and costly efforts.  We use speech animation as a well-defined domain for learning to accurately represent  and animate facial motion from data. A fixed mapping from acoustic phonemes to lip  shapes (i.e. visemes) is a poor approximation to the complex, context-dependent  relationship visual speech truly has with acoustic speech production. We introduced  “dynamic visemes” as data-derived visual-only speech units associated with distributions of phone strings and demonstrated they capture context and co-articulation. Further  improvement in predicting speech animation can be achieved using an end-to-end deep  learning approach, which removes the requirement for visual units and can accurately  predict visual speech from phoneme labels or the audio signal directly. Learning to animate full Virtual Humans, and their interaction with each other, using  similar machine learning approaches means collecting much larger and richer data sets. I will highlight some recent results from the “Panopticon” social interaction capture efforts  at Carnegie Mellon University that resulted in the OpenPose full-human pose tracking  system. Biography. Iain Matthews is a Research Scientist at Oculus Research working on social  virtual reality. His research interests include computer vision and facial tracking,  modeling, and animation. Iain received a BEng degree in electronic engineering and a  PhD in computer vision from the University of East Anglia. He then joined Carnegie  Mellon University, first as a post-doctoral fellow then as faculty in the Robotics Institute.  In 2006 he spent two years at Weta Digital creating the facial motion capture system for  the movies Avatar and Tintin, and was awarded a Scientific and Engineering Award  (technical Oscar) for this work in 2017. He joined the newly formed Disney Research  Pittsburgh in 2008 to lead the computer vision group. In 2013 he became the Associate  Director of Disney Research Pittsburgh. Iain holds an adjunct faculty appointment in the  Robotics Institute at Carnegie Mellon University and an Honorary Professor position at  the University of East Anglia. He has published over 100 academic papers and has a  dozen awarded patents. 
IVA2017 August 27-30, Stockholm
Home CALL FOR PAPERS Important dates Submissions Presentation guidelines Registration Keynotes Program Workshops Posters and Demos Video Gala Panel Venue Accommodation Organising committee Past conferences Contact us
Copyright © 2017 KTH. All Rights Reserved.             Images: Creative Commons CC0 license from Pixabay.
Our Sponsors Interested in sponsoring IVA? Contact  sponsorship@iva2017.org for more details. Keynotes