Evaluating Human-likeness with Modality Difference in Virtual Assistant In VA Interaction

Eugene Cho (Ph.D Candidate); Maria D. Molina (Ph.D Candidate); Jinping Wang (Ph.D Candidate)

Dr. S. Shyam Sundar


The use of virtual assistants (VAs) is rapidly spreading, with VA technologies expected to lead a market of $12 billion by 2024 (Baron, 2017). However, despite efforts to build more efficient VAs, the adherence rate and actual usage level remain low (Santos-Perez, Gonzalez-Parada, & Cano-Garcia, 2013). One of the main reasons for this is the lack of emotional and human-like elements in these technologies. However, there is a gap in literature regarding the modality effects on human-likeness and user perception in interactions with VAs primarily designed around voice-recognition. This is particularly important considering that today’s most popular VAs including Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa, and Google Home are agents primarily designed to interact with users based on audio input. In the present study, we explored the potential of modality interactivity, operationalized as the exchange of voice/text input and output, to improve user experience through affording more human-like interactions. Specifically, we focused on three factors as different dimensions of human-like characteristics – anthropomorphism (referred to as human-likeness throughout the study), perceptual bandwidth, and social presence, and their mediating role on attitudes toward the VA system. In addition to modality difference, we also tested how device and task differences could have interactions with modality, considering that many VA systems offer several device options (e.g., mobile phone, laptop) and contextual settings (e.g., informational use, entertainment) that could also affect user perceptions. Finally, as another potential mediator that could have an impact on attitudes toward the VA system, the perceived level of intrusiveness was examined.

H1: Voice interaction, compared to text interaction, will elicit higher levels of perceived (a) human-likeness, (b) social presence, and (c) perceptual bandwidth.

H2: Voice interaction, compared to text interaction, will indirectly increase positive attitudes toward system, mediated by the levels of (a) human-likeness, (b) social presence, and (c) perceptual bandwidth.

RQ1: Will mobile interaction, compared to laptop interaction, moderate the relationship between voice (vs. text) interaction and the levels of (a) social presence, (b) humanness, (c) perceptual bandwidth?

RQ2: Will task difference (hedonic vs. utilitarian) moderate the relationship between voice (vs. text) and the level of (a) social presence, (b) humanness, (c) perceptual bandwidth?

RQ3: What is the relationship between modality (voice vs. text), device (computer vs. mobile), and task difference (hedonic vs utilitarian) on the perceived level of intrusiveness of the interaction with the virtual assistant?

RQ4: Will modality, device, and task difference have an indirect effect on attitudes toward the system mediated by perceptions of intrusiveness?

This study was conducted based on a 2 (modality: voice vs. text) X 2 (device: mobile vs. laptop) X 2 (task type: hedonic vs. utilitarian) mixed factorial experimental design, with modality and device serving as between-subject factors, and type of task, a within-subject factor. In addition, a VA named Cortana, developed by Microsoft, was used to examine our research questions, due to its inherent characteristics that allow users to interact with the system using both voice and text input, and also mobile and computer devices. Eighty-four undergraduate students came to a computer lab to participate in this study (Male, N= 12; Female, N=72), who were randomly assigned to one of four conditions (mobile voice, N = 21; mobile text, N = 22; laptop voice, N = 20; laptop text, N = 21). They were then asked to interact with Cortana involving two different type of task sets (hedonic vs. utilitarian) for five minutes each, with the order for two task types being randomized. After each five-minute interaction for each task set, participants were directed to a desktop PC next to their seat to complete an online questionnaire to evaluate their interactions.

The results partially supported H1 in that interacting with voice (vs. text) significantly increased social presence and perceptual bandwidth. However, the modality effect failed to reach significance for human-likeness. When we examined the moderating effects of device (RQ1), no significant interaction effects appeared for human-likeness and social presence, with a marginally significant effect on perceptual bandwidth showing that voice (vs. text) interaction tended to elevate perceptual bandwidth, only in the laptop (vs. mobile) condition. In addition, the task difference (hedonic vs. utilitarian) moderated the modality effects on social presence, with no significant moderation effects on human-likeness nor perceptual bandwidth (RQ2). In particular, voice (vs. text) interaction enhanced the feeling of social presence, but only with utilitarian (vs. hedonic) tasks. No significant main effects from modality on perceived intrusiveness emerged, with all the possible 2-way or 3-way interaction terms between the three independent variables also failing to show significant effects on intrusiveness (RQ3-4). In terms of the mediation (H2), only in the utilitarian (vs. hedonic) task condition, voice (vs. text) interaction was mediated by higher level of perceived human-likeness and social presence to evoke more positive attitudes toward Cortana. In addition, the hypothesized mediation effects of perceptual bandwidth only received support in utilitarian (vs. hedonic) condition when people used laptops (vs. mobile).

Voice was mediated by perception of human-like characteristics to induce positive attitudes toward Cortana, but only in the utilitarian condition. This might be explained by the fact that voice as an input is associated with higher efficiency while higher efficiency is a more salient goal in utilitarian tasks. In addition, for laptop VA, voice matters. When the laptop “speaks,” novelty effects possibly brings about more positive user reactions. Our study indicates that the modality in virtual assistant interactions will make a difference on the human-likeness features of VAs; moreover, it can even function differently with device type and task contexts taken further into consideration.

