Imagining possibilities with Apple Vision Pro

Lucian Tucker
Bootcamp
Published in
7 min readJun 11, 2023

--

Profile view of the Apple Vision Pro with battery supply attached.

This week, Apple announced its new “spatial computing” device, Vision Pro. It’s a mixed reality headset that seamlessly blends elements of augmented reality (AR) and virtual reality (VR), allowing users to superimpose digital elements over the real world or fully immerse themselves in a virtual space. As Tim Cook stated during the WWDC keynote announcement, “It’s the first Apple product you look through, and not at.”

Unlike other headsets that focus more on gaming (and the metaverse), Apple’s Vision Pro feels more like a potential replacement for your computer, tablet, and television, thus earning the “spatial computing” label.

It’s difficult to explain without seeing it, so here’s a 9-minute video. I’ll wait.

A good summary of the device’s features.

While there are still some unanswered questions regarding the functionality and specifications of the Vision Pro, after watching and reading about it the past few days I’ve come up with some initial ideas for how they could improve this first-generation device.

Note: It’s possible that these ideas are features that have yet to be announced or marketed.

EyeSight

One of the most unique features of the Vision Pro is called EyeSight. When someone is nearby, a display on the outside of the device projects a real-time capture of your eyes, creating the illusion that people looking at you can see through the headset to your face.

Closeup of a person wearing an Apple Vision Pro headset. EyeSight is activated as they smiles at another person.
EyeSight closeup (from Apple.com)

During app usage, such as reading a text message, the display of your eyes is overlaid with transparent colors to indicate your focus is partially elsewhere. When fully immersed in an experience, where the outside world is not visible, the colors completely cover your eyes.

When I saw this, I immediately thought it would be worth exploring additional ways to utilize this external display.

Personalization

Instead of displaying realistic eyes, what if you could personalize it with fun animations? Imagine LED-style hearts, words or phrases, or emojis.

Facial Expressions

Taking it a step further, what if the display changes based on your facial expressions? For instance, if you look sleepy, it could show Z’s. If shocked by a photo sent to you, it could display X’s.

App-Based

Currently, the display changes depending on whether you are fully immersed or not. However, what if app developers could customize what is shown based on what’s being experienced? While watching the show The Mandolorian, it could display images of stars and galaxies. If enjoying the movie Dune, it could show a desert environment. And if you’re playing a Batman video game, it could reveal the iconic “Bat Signal.”

The trick here, though, is that you might not want people to have any idea what you are doing on your headset, so there needs to be a setting to disable developer-controlled animations if this were to actually be implemented.

A person watching television with Vision Pro.
A person watching the show Foundation alone (from Apple.com).

Entertainment

With the Vision Pro headset, you can recreate the experience of a movie theater right at home. You have the ability to enlarge a floating video player to your desired size, with the background dimmed and a glow effect illuminating the room. Coupled with spatial audio, it may actually rival even the fanciest home entertainment systems.

While it looked great during the keynote, my concern is that it’s a solitary experience to those around you. Although the headset allows people to see your eyes and enables you to see others, they can’t see what you’re watching. Part of the joy of watching a movie is sharing it with others, so what can be done?

Shared Vision Pro Experiences

Granted, the number of households that can comfortably afford more than one Vision Pro (priced at $3499) might be limited, but let’s put that aside for now. Instead, let’s imagine a scenario where friends who each own a Vision Pro come together.

In the past, my friends and I would gather once a week to watch Game of Thrones. If each of us had a Vision Pro, it would be amazing if we could bring them along and share the same experience side by side. It would create the illusion of a single floating video screen fixed in space, much like a television or theater screen.

Projecting Audio/Video

Consider a situation where someone brings a partner to one of these Game of Thrones gatherings who doesn’t have their own headset. What then? Here, Apple could allow videos (and apps) to be mirrored to other devices, like an Apple TV. With this feature enabled, everyone in the room could face the same direction, with some looking at a television screen and others viewing the larger screen within their Vision Pro. Consequently, the audio would need to be shared as well, likely requiring the audio from the headset to be disabled to allow sound to be played through the speakers in the room without issue.

During the keynote, Apple demonstrated how you can connect your Vision Pro to a MacBook laptop simply by looking at it, turning off the laptop screen and projecting the display within your headset. I envision a similar experience with an Apple TV device, but while providing the option to keep the television screens on simultaneously.

It’s important to note that this approach differs from what Apple refers to as spatial SharePlay, which focuses on creating a shared “virtual experience” between individuals who may not be in the same physical environment, utilizing their Spatial Personas.

A person wearing a Vision Pro headset while also using a keyboard and trackpad.
You can use external input devices, like a Magic Trackpad and Magic Keyboard (from Apple.com).

Accessibility/Interaction Enhancements

After watching the keynote and noticing its heavy reliance on eye and hand interactions, I initially thought, “This device may not be accessible to everyone. If you’re a person with low vision, blindness, or limited finger dexterity, this won’t be for you.” However, upon delving into the developer information for visionOS, the operating system behind the device, I was pleasantly surprised by the level of consideration given to accessibility, encompassing vision, motor, cognitive, and hearing concerns.

I do have some suggestions, though, that could further improve the accessibility and benefit everyone.

Voice

In one part of the keynote, a person is shown using voice input to enter text in the Safari address bar. Afterwards, a voice over notes how Siri can be used to “open and close apps, play media, and more.” Let’s dig into the “and more” part.

It would be fantastic if there were a way to control the Vision Pro entirely through voice commands, if desired. For example, instead of relying on pinch gestures for clicking and scrolling, one should be able to simply say “scroll down” or “open” while looking at the appropriate parts of an app. Notice that I’m not including “Hey Siri” before these commands.

Hands

In the demo, hand input primarily involved “clicking” by tapping or holding the thumb and index finger together. Other options, such as a virtual or physical keyboard, were also showcased. However, depending on the experience, additional hand gestures could prove useful.

If watching or editing video, imagine turning an invisible knob left and right to rewind and fast forward. If you’re using several apps at once and want to temporarily move them out of the way to reveal your home screen, you could use one or both your hands to wave away content. And then of course there’s the occasional need to right-click, which could be triggered by combining your thumb with a finger other than your index finger.

Based on my understanding of the developer documentation, these gestures are possible if they are designed into an app’s experience. What I’m hoping for though are some basic standards or best practices that broadly apply to every app.

A person wearing a Vision Pro seemingly gesturing confusion.
A woman gesturing, though probably not in a way the headset recognizes (from Apple.com).

Advanced Gesture Recognition

In order for Vision Pro and similar devices to truly feel magical, I believe they need to learn how to recognize commands and gestures that were not initially programmed. They should be able to intuit the intentions of individuals.

User-Centric Interactions

Let’s revisit the hand gestures I proposed for scrubbing through videos. While turning an invisible knob seems like a great solution for rewinding and fast-forwarding, what if another person believes it would be better to hold their hand straight and slowly move it from left to right? Or to roll one or both hands towards or away from their body? Or perhaps make large waving motions to the left or right, with the speed of the skipping determined by the intensity of the wave?

Admittedly, some of these gestures may require more energy than others. However, it should be left up to each individual to decide how they want to interact with the headset, while the headset itself should be capable of understanding the user’s intentions.

Machine Learning-Powered

In practice, this would function similarly to some of the latest machine learning technologies that enable the communication and interpretation of directions. Instead of translating text into outputs, as ChatGPT does using large language models (LLMs), the headset would translate voice and gesture commands, along with the user’s gaze, to determine the intended action. We could think of this approach as utilizing a “large gesture model,” a phrase I just came up with.

Conclusion

Without having had the opportunity to use the device myself yet, it’s difficult for me to determine what will truly work. Additionally, since Apple’s Vision Pro was only just announced and is scheduled for release in early 2024, there are likely visionOS features that have yet to be revealed.

Still, this doesn’t stop me from imagining the exciting possibilities that lie ahead with this new device.

--

--

I like to create (websites, music, stuff with words), think (technology, psychology, philosophy), and try new things.