Voice and Gesture in the Age of AI: Implications for UX Design

Published in

Bootcamp

6 min readMar 3, 2023

A photo of an AR headset and touch controls sitting on a surface. — Photo by Vinicius "amnx" Amano on Unsplash

As product designers, it’s important to stay up-to-date with emerging trends and technologies that can shape the user experience (UX) of the products we work with. One such trend that’s gaining momentum is the use of voice and gesture-based interfaces, which are powered by artificial intelligence (AI) and machine learning. In this article, we’ll explore the rise of voice and gesture-based interfaces, the key considerations for designing these interfaces, and the implications for UX design and product development.

The Rise of Voice and Gesture-Based Interfaces

Voice and gesture-based interfaces have been around for decades. As far as voice-based interfaces, they were introduced beginning in 1952 when Bell Labs introduced the first voice recognition system, which was capable of recognizing digits spoken by a single voice. Later, IBM introduced the first commercial speech recognition system in the 1970s, and then came Dragon Dictate in the 1990s, a software application that allowed users to dictate text using their voice. On the gesture-based side, things began in 1982, when the first touch screen was introduced and allowed users to interact with a computer using finger touches. From there came Nintendo’s Power Glove in the 1990s giving users a gesture-based input device for gaming. The landmark device here is, of course, the iPhone which made touch screens mainstream starting in 2007.

While these technologies have been gaining steam on their own for years, the emergence of AI and machine learning has taken these interfaces to a new level. With the ability to process natural language and recognize patterns in gesture recognition, these interfaces are becoming far more sophisticated and user-friendly. In addition, they offer many benefits over traditional interfaces, including hands-free interaction, improved accessibility for users with disabilities, and the ability to provide more personalized experiences. The key is designing them well.

Designing for Voice and Gesture-Based Interfaces

Designing for voice and gesture-based interfaces requires a different set of considerations than traditional interfaces.

One key consideration is natural language processing and speech recognition, which are essential for voice-based interfaces. It’s important to design interfaces that can recognize different accents, dialects, and languages, and can handle ambiguous and complex commands. It can be easy to fall into the bias trap with these technologies, testing them on small groups without much diversity leaving the designs weak in the face of ambiguity and novel input. User testing and targeted user research are key when working on new technologies, even more so than they already are in design generally. It is important to determine who your end users are. If that demographic is wide, consider localizing your voice interface at the start to master a specific subset of languages and dialects well before broadening. Work with local experts and language specialists to ensure the accuracy and effectiveness of your localized voice interface.

For gesture-based interfaces, designers need to consider how users will interact with the interface, and design gestures that are intuitive and easy to perform. Understand the context in which the gesture-based interface will be used. This includes the user’s environment, their physical position and posture, and the tasks they will be performing and use this information to design gestures that are appropriate and effective for the user’s context of use. A classic example is the position in which a phone sits in the hand. Since the bottom two-thirds of the screen are the only points reachable by most thumbs in this position, the primary functions should be reachable within that area. Anything above the bottom two-thirds of the screen is out of reach by the thumb and would require adjusting the hand positioning to reach it.

Accessibility and Voice and Gesture-Based Interfaces

Voice and gesture-based interfaces have the potential to improve accessibility for users with disabilities, such as those who have mobility impairments or are visually impaired. For example, a voice-based interface can allow a user with limited mobility to control a device without using their hands. However, it’s important to ensure that these interfaces are designed with accessibility in mind and that they are usable for all users.

Providing the option for multiple input types across voice, gesture, and text is important here to ensure you’re accommodating for users with different abilities and preferences. Additionally, multi-modal feedback is also necessary. Think of all of the feedback mechanisms a product like Apple Pay uses to ensure accessibility when considering feedback for an important user action. There is a visual on the screen showing a confirmation of payment, there is an audio notification that dings, there is a haptic vibration, and the user will also receive a push notification stating the charge amount. All of these forms are feedback to ensure that all users are receiving confirmation of their actions.

Challenges and Future of Voice and Gesture-Based Interfaces

While voice and gesture-based interfaces offer several benefits, there are also challenges associated with designing for these interfaces. Some of the common pitfalls when designing these new interfaces include:

Not providing adequate feedback:

Voice and gesture-based interfaces that do not provide feedback to the user can lead to confusion and frustration.
For example, a voice interface that does not provide confirmation messages or audible cues when a command is recognized can leave the user uncertain whether the interface understood their command.

Establishing a standard of poor user experience:

There are a plethora of design standards that we have come to adopt in so many of the designs we interact with on a daily basis. Designing a new set of standards in this world of voice and gesture-based interfaces with rigor will be imperative to create positive user experiences.
A gesture-based interface that requires complex or unintuitive gestures can lead to a frustrating experience for the user and poor learnability across devices and other products.

Not considering privacy concerns:

Voice interfaces that are always listening can raise concerns about privacy and security.
For example, there have been several reports of voice assistants recording private conversations and sending them to third-party companies for analysis. At the core of this issue is the breakdown of trust between users and these products. Without trust, users who are slow to adopt new technologies will only delay their adoption longer and with enough breakdown of trust, the market may not decide the risk is worth the investment.

By being aware of these common pitfalls, product designers can take steps to avoid them and create voice and gesture-based interfaces that are effective, engaging, and accessible to all users.

Looking to the future, there are several emerging trends and technologies that could shape the future of voice and gesture-based interfaces. For example, advances in natural language processing and machine learning could make these interfaces even more intuitive and personalized for individual users. In addition, the integration of voice and gesture-based interfaces with other emerging technologies, such as augmented reality and virtual reality, could open up new possibilities for product design and UX.

Implications for UX Design and Product Development

The rise of voice and gesture-based interfaces has important implications for UX design and product development. As product designers, we need to consider how these interfaces can improve the user experience and provide more personalized and accessible experiences. We also need to ensure that these interfaces are designed with accessibility in mind and that they are easy to use and understand. Ultimately, the key to success with voice and gesture-based interfaces is to put the user at the center of the design process and create interfaces that are intuitive, user-friendly, and enjoyable to use. The nature of the design process and design thinking will continue to hold true in these instances but, even more intentionality must be placed around new technologies, especially with the speed and scale of the AI revolution we’re facing.

Conclusion

Voice and gesture-based interfaces are an exciting trend in the world of UX design and product development, and they offer several benefits over traditional interfaces. As product designers, it’s important to stay up-to-date with these emerging technologies and consider how they can be used to improve the user experience of our products. With the right approach, we can create interfaces that are intuitive, accessible, and enjoyable