The Zoom Gaze

Video conferencing offers an illusory sense of unilateral control over conversations

Full-text audio version of this essay.

[Spanish translation available here]

Since the pandemic began, the seemingly mundane protocols of Zoom have become a significant part of many people’s daily lives: finding the right link, setting up the peripherals, managing the glitches and slippages in this supposedly “synchronous” form of communication. At first, of course, video conferencing was a godsend — a way that things could continue to go on with some semblance of normal. But it quickly became clear that video conferencing is not simply a substitute for face-to-face encounters. It incurs effects of its own.

Not only did Zoom open our homes to unanticipated scrutiny and our schedules to an all-day influx of appointments, it immediately became clear how much more tiring it was to Zoom than to meet. As of this writing, the term Zoom fatigue returns almost 700,000 hits on Google, many of which are listicles on how to combat it. But others try to explain it. One theory is that the hiccups in synchronicity due to bad connections can cause false starts and interruptions, which create communicative friction and frustration that make it hard to maintain conversational etiquette. L.M. Sacasas speculates that the fatigue stems from dealing with reflections and projections of ourselves, making up for the work that bodies in space do. Zoom makes us work harder to convey and receive subtle signals from one another over video. Geert Lovink lays out a meta-analysis of proposed reasons, including what he terms “video vertigo,” a downward spiral that comes from compounding work and leisure in the same space: You need that planned happy hour video call with friends to re-up your energy from so many work calls, but you are too exhausted from work calls to get on another call for happy hour.

You watch yourself as you speak, as you move. You are self-aware and self-correcting in real time

But fatigue is not the only consequence. As Zoom shifts the nature of the relationship between viewing and being viewed, it also shifts our awareness of it: It makes us more conscious of how visibility is mediated by technologies in general. That is, it calls our attention to what theorists describe as “the gaze,” which analyzes the power relations in looking and being seen and how these are consolidated in a particular way of seeing that may come to seem natural. Right now, our new conditions call attention to the different power dynamics that come into play as face-to-face interactions shift to online video spaces — what we might call the Zoom gaze (though, of course, it would apply to video telephony in general). It is critical to understand the Zoom gaze now, before it becomes so familiar that it seems immutable — just the way things are.

Film scholar Laura Mulvey theorized a “male gaze” that was structured and reproduced through cinematography, presuming a male hetero viewer and depicting women primarily as sexual objects rather than subjects. In this interview, Toni Morrison describes how she rejected centering the “white gaze” in her fiction: the presumption of a white audience and the white perspective as neutral. If Foucault used the idea of a “medical gaze” to describe how doctors objectify patients’ bodies to treat them, and the “panoptic” gaze to explore how carceral discipline is internalized, what might we say the Zoom gaze accomplishes? Whose perspective does it seek to naturalize? Whose subjectivity does it center, and in what sorts of forms? What does it condition us to see?

Zoom, like most video-conferencing systems, defaults to presenting you with an image of yourself staring back at yourself (assuming you grant it access to your device’s camera). This immediately confronts you with your own visibility: That is, you are forced to see yourself being seen. In a sense, the screen becomes a mirror, invoking earlier encounters with mirrors that (according to Lacanian theory) lay the foundation for you to subjectively recognize yourself as an object for others. In a Zoom call, however, this effect is magnified, because other people are not theoretical but right there, seeing the objectified you as well. This reflected self persists, accompanying us through our interactions unless we deliberately dismiss it. You watch yourself as you speak, as you move … oops, that piece of hair is out of place. You are self-aware and self-correcting in real time. “Does my face look funny when I say “core competency”?

This foregrounded sense of our visibility can make us acutely conscious of matters of self-presentation, opening a gap between how we wish to be perceived and how we know ourselves to actually be. It can posit the idea of an “authentic” or “real” self that is showing a strategic or artificial self to others. This is one aspect of the Zoom gaze: By defaulting to and normalizing a kind of self-surveillance, the platform routinizes this kind of alienation.

But the objectification of the self doesn’t stop with the live image of you the camera captures. Being on camera turns the space you inhabit into a personal stage and everything that appears in it (including who you share space with) into props. The background you choose or the environment you are in inevitably communicates something about your identity; on Zoom these will likely be interpreted as deliberate choices. Even if you turn your camera off, your little square might become a profile picture — another choice.

At every turn, Zoom presumes that we wish to be persistent objects of perception and invites the idea that everything about our appearance can be customized and personally controlled. Its defaults create the impression that we are free to choose how we appear. We can even choose virtual backgrounds that widely expand what we might want to signal about our identity. But this technology is far from perfect. At times, virtual backgrounds in Zoom were erasing Black skin altogether. It is hard to be in control of how you’re perceived when the software renders your head invisible. But even when the tech works as expected, it can’t correct for how others see you. It can only expose you to endless interpretation. This is another aspect of the Zoom gaze: It imposes an illusion of individual control over conversational conditions that actually vary from person to person, and conceals some of the interpersonal dynamics and prejudices that may be in play.

Some of this plays out at the level of the interface. Although products like Zoom offer lots of choices about how to view ourselves and others — how we position the squares, how they are sized, who is full-screen and who is thumbnailed, who is pinned onscreen, who is spotlighted, whether someone is visible at all — this means that any participant has less control or awareness of how others are viewing them. There is no necessarily shared visual order to the conversation. On Zoom, the meeting settings alone consist of 68 different on/off switches, many of which, when activated, open additional options. Webinars and recording options further complicate matters. All these possibilities may be controlled by individual account holders, meaning that each time you enter a Zoom session, you are confronted by a new configuration of permissions, which may be based on how someone else assigns roles to participants.

Zoom presumes that we wish to be persistent objects of perception. Its defaults create the impression that we are free to choose how we appear

Zoom already allows hosts to control whose faces get blown up to full-size or show up in the top left corner of everyone’s grid. In May 2020 the company removed the “unmute all” setting for hosts due to privacy concerns but now has brought it back as a nuanced “unmute with consent,” which allows a host to unmute an individual participant’s microphone at any time in any of the host’s meetings once given permission. But this framing of consent is problematic to say the least. Can you refuse if the host is your boss? What if they not only have authority over you but abusive intent? An upcoming feature promises to allow hosts to unilaterally establish an “immersive scene” for all participants — essentially, a shared cartoon environment. There seems to be no mention of consent or any ability to opt out, but its example use cases include classrooms and courtrooms, spaces where power dynamics are especially in play. All of this undercuts the control of the camera, microphone, and background you might otherwise believe you have.

The Zoom gaze institutionalizes such dynamics in ways that may be newly obscure or impactful. Think about the positioning around a conference table with the management always at the head: This power dynamic could be re-enacted and reinforced in an immersive scene, aided by a host selectively muting individual microphones and spotlighting cameras to enforce adherence to the agenda. Such features may obscure who is focused on the meeting and who can conceal their drifting attention. Reports accessible to hosts after meetings include “attention tracking,” which measures whether attendees clicked away from main Zoom window for more than 30 seconds. (Hope you didn’t need to reference that email with last quarters’ numbers!) Add layers of artificial intelligence that could track eye movements and speaking times to create engagement scores, and it becomes clear how disciplinary the Zoom gaze can become.

Also, there is no way to know who is having a side chat in another program (with someone in the meeting, or even someone inside or outside the organization) or who could be recording the meeting with additional software or an external camera. Unlike with face-to-face encounters, there can be meetings within meetings within meetings. So much is unknown and so much personal control taken away, it is easy for meetings to feel uneasy and anxious. The Zoom gaze instantiates an intensified paranoia about how conversations are administered, who is paying attention, and who will control the documentation of discussions that can no longer be off the record.

The power dynamics of a conversation are complex. In video conferencing, the software itself can assign power relations that may or may not map onto existing social relations. The Zoom gaze ultimately comprises how the software’s programmers see users in the abstract, a perspective that can condition all the other possible perspectives within a video conference. Software envisions us through the programmers’ decisions about what to allow and restrict, and what the defaults are. It encodes who the company regards as the primary customers for its product by prioritizing certain ways of seeing and normalizing certain assumption of how users should behave.

With Zoom, it seems clear that the technology is created for environments of hierarchical control. Those who created it decided to differentiate permissions between hosts, co-hosts, and participants. What if video conferencing tools worked more like a telephone in that everyone on the call has equal permissions? Big video conferencing platforms like Zoom always value and give the most power to those who established the meeting. The platform’s design seems to assume that this person is benevolent and has only the best of intentions, but there is no guarantee of that. The truth is that the host is simply the customer (or employee of the customer) who has purchased a tool to administer the control the software affords.

This plays out not only in who has permission to do what but in how the software normalizes particular postures of looking. Because we are typically looking at eyes on the screen instead of the camera, eye contact can be askew, which may send an unintentional message that we are inattentive, bored, and not engaging. Apple now offers a feature that autocorrects this physical reality of your gaze with augmented reality, imposing a norm of (simulated) eye contact. In a twisted bit of doublespeak, the language describing this setting in the interface claims it will “establish natural eye contact while on FaceTime,” even though this eye contact is not natural at all.

The Zoom gaze ultimately comprises how the software’s programmers see users in the abstract

But the gaze imposed by software is also a matter of the risks that engineers have overlooked. As the pandemic intensified and more people started videoconferencing, “Zoombombing” incidents rose. This form of trolling was often emboldened by default settings that allowed anyone to enter any room without a password or admittance from a host. “Join” links could be passed around on social media and discussion boards dedicated to Zoombombing, allowing for coordinated attacks. Bombers could even try to guess links randomly by trying different combinations of letters and numbers. Other default settings that allowed anyone in the call to screenshare enabled them to take over the visuals of a meeting, letting them effectively seize the desktop space of everyone by playing loud music or yelling into the microphone.

When these settings were called out as problematic, Zoom’s CEO Eric Yuan apologized and promised to make changes. But in the same breath he also pointed out that the product was being used in ways that the company hadn’t imagined, as if this were an excuse. The short-sightedness was a choice: Zoom anticipated only certain use cases and built the product for certain users — “large institutions with full IT support.” With some threat modeling and even a mild consideration of marginalized perspectives, some of the most problematic cases — which included incidents of racism, sexism, anti-Semitism, and homophobia — could have been avoided.

The various permutations of settings across different video platforms are virtually innumerable. When you enter a meeting will your camera and microphone automatically be on without notice? What if you mostly just want to listen and are in your PJs? Will you be able to text-chat during the meeting? Will it be recorded? It’s impossible to know in advance, and there are no established cultural norms that push meeting hosts to communicate such nuances beforehand. Instead, the Zoom gaze currently institutionalizes uncertainty as the norm.

Power dynamics shift over time as platforms are updated and the companies’ view of us changes. In a recent blog post, Zoom revealed new features, including a video waiting room that could introduce more asymmetries between the watchers and the watched. The company is developing a marketplace that may make money a more direct factor in who can afford which sorts of privileges within meetings. And artificial intelligence tools may soon be able to scrape details of recorded meetings to make “highlight reels” that can recontextualize performances in unanticipated ways and replicate existing biases — as machine learning techniques based on historical data inevitably do, as the many stories about AI’s replicating gender and racial biases show.

Although abuse and fatigue are facets, the Zoom gaze is broader than that. Yes, it is that light that we see in grandma’s eyes when she sees the grandchildren who can’t come visit and all the happiness that the commercials and advertising for teleconferencing promise us. But it is also the embarrassment, shame, and perhaps loss of employment that comes from doing something inappropriate when you thought the camera was off. It is the student sobbing after taking a video-proctored exam where the built-in artificial intelligence falsely flagged them for cheating. It is the judicial system that becomes more corrupt and less just due to remote video trials leaving out court watchers. It is the erosion of freedoms that come when teleconferencing corporations’ policies are used to make decisions about who gets to have a meeting and who does not.

Even though the Zoom gaze existed pre-pandemic, its effects are now amplified, thanks not only to the increased volume of video calls but also the diversity of situations in which they have been adopted. As the pandemic pushes us to use these technologies for what we can’t do in person, let’s not forget what we are giving up to do so. Thinking about the gaze — who is watching and how we are watched; who controls the watching environment and how power dynamics are systematized — allows us to look beyond how companies would like us to see their products. Zoom would like to habituate us to these new power alignments until we regard them as normal and natural, but we do not have to accept this uncritically. We should question these alignments and resist such habituation now, so that we may more thoughtfully shape what we want togetherness to look like when the social is no longer distant.

Autumm Caines is a liminal space. She works as an instructional designer at the University of Michigan – Dearborn, spells her first name with two M’s, and you can find out more about her on autumm.org.