In-vehicle intelligent voice agent: Speech style vs. embodiment, which matters more?

Loading player for https://youtu.be/Z1W9aN5QkuQ...

Which type of voice agent is more favorable in autonomous driving? To answer this question, the Mind Music Machine lab conducted a research project that explored the influence of two features of in-vehicle voice agents – speech style and embodiment – on drivers’ perception and perceived workload in a fully autonomous vehicle.

A 2 (Speech style: informative vs. conversational) x 2 (embodiment: voice vs. robot) factorial design was adopted. The informative and conversational voice agents conveyed the same message but in different styles. For instance, when a pedestrian suddenly intruded the road, the informative voice said, “Jaywalker ahead”, while the conversational voice uttered, “Are you okay? A man suddenly popped out onto the road”. We used Nao, a humanoid robot, as our robot voice agent. We recruited 24 participants to experience fully automated driving scenarios accompanied by four voice agents in the medium-fidelity Nervtech TM driving. The in-vehicle intelligent agents provided information about road conditions and events, which helped drivers maintain situation awareness. Afterward, the participants were asked to complete several questionnaires that evaluated the various aspects of human-agent interaction, such as anthropomorphism, competence, trust, perceived workload, etc. We also collected their preference towards those voice agents after their completion of all conditions.

Our analysis showed that both the conversational speech style and the embodiment of the agent increased their likability and warmth, but there is no interaction effect between these two features. We also found that speech style and embodiment influenced drivers’ perceptions differently. When compared to the informative one, the conversational voice agent received higher anthropomorphism and animacy scores and lower cognitive demand and annoyance scores, regardless of its embodiment. Similarly, when compared to the voice-only agent, the robot agent was perceived as more competent, regardless of its speech style. When accompanied by a robot agent, participants experienced lower overall workload measured via NASA-TLX. Our findings suggest that conversational style determines whether an agent is perceived as more human-like and alive, while a physical body is an essential component for a more competent agent. The combination of conversation style and a physical body was preferred the most. “when the voice said, ‘we’, it felt more natural as it was next to me.”, one participant commented on the reason for their most favorite agent as the conversational robot agent. The feeling of accompanying is another explanation for this type of agent as the most likable one: “being in the vehicle alone was boring without being able to drive, although it was artificial at least the robot was a physical object that interacted with me”, stated by another participant.

Our findings are able to provide design insights on how to emphasize different features of in-vehicle voice agents to fulfill various user needs in highly intelligent autonomous vehicles. In this way, we can promote in-vehicle user experience through emotional design.

Back to ICAT Creativity + Innovation Day 2021