Speech Bubbles — Real Life

Full-text audio version of this essay.

The many evils that the internet is said to have unleashed on the world are often mourned under the umbrella grievance of the “lost art of conversation.” It is a widely held belief that face-to-face forms of communication that prevailed before the digital age were “purer” and more productive than conversations that involve any kind of digital intermediary. This sense of loss is vague but powerful. It encompasses anxieties about the decline of civic life and atomization of community, the proliferation of hate speech, the polarization of society, the monopolization of media, privacy, the algorithmic hand, the perceived shortening of attention-spans and of imaginative faculties. In many accounts, the decline of the in-person conversation is seen as not only symptomatic, but as a key agent of these changes.

The ideal of public conversation operates on the premise that speech — unlike the image, or the written word — is automatically generative: it is assumed that the free-flow vocal exchange of ideas will always lead to “change” for the better, to vaguely positive values such as “unity,” “innovation” and “creativity.” Synchronous forms of communication are seen as more productive and more authentic than asynchronous forms, as though the real self can only ever operate in real time. Amid the oversaturation of text and images, the voice becomes a hopeful vessel for healthy discourse; a kind of communication that precedes the distorting influences of the internet. If on a private level, the voice is perceived as the seat of the soul, then on a public level, the voice is the perceived historic engine of democracy and of human “progress” before communicative signals were scrambled by technological influence, its imagined pinnacle being the oratory of Ancient Rome.

All audio media in some way appeal to the promise of intimacy

Since at least 2013, Silicon Valley has been lustily trying to conjure a platform that would harness the supposed purity of the voice and ideals of public conversational fora. But where technologies that exploit more intimate functions of audio tend to succeed (podcasts, voice-notes, ASMR), public — or semipublic — audio-only social media platforms seem to continually fail. In 2020, a new app called Clubhouse — which advertised itself as a space “where people around the world come together to talk, listen and learn from each other in real time” — enjoyed a brief moment of popularity before mainstream interest waned. In the months after Clubhouse was launched, Twitter announced a new function called Spaces, a “new way to have live audio conversations on Twitter” (and a self-described “experiment focused on the intimacy of the human voice”). Just over a year on, Twitter Spaces, too, seems to have stalled.

Social audio is the next iteration of tech’s ambition to replace the “town square” of old. In general, the ideals of “communication” that it evokes are as vague and unconvincing as old invocations of “connectedness.” Nevertheless, the nostalgia for public life that social audio is tapping into is a powerful force. While trust in the image and in text as sources of authenticity or truth have been degraded, the voice has managed to retain a privileged position as a form of media associated with a more unmediated version of the self, and therefore more proximate to truth. This, paired with the voice’s strong affective power, means that it may be a grave mistake to dismiss the strange space germinating between audio and social media.

The beta version of Clubhouse came onto the market in March 2020, as the pandemic was officially declared. Initially, the app was only available to friends of those who were already on it: each new user was given two invite codes to share by default, except for those who Clubhouse wanted to promote, who were given more. The userbase bloomed outwards from Silicon Valley: in a profile of the app from May 2020, Erin Griffith and Taylor Lorenz described it as “where venture capitalists have gathered to mingle with one another while they are quarantined in their homes.” Like Raya — the infamously exclusive dating app — Clubhouse took on a VIP air. Drake, Oprah, Jared Leto and Ashton Kutcher showed up. At one point, invite codes were selling for hundreds of dollars online. During the pandemic, footage of celebrities at home became too familiar, but a voice suggested a less constructed insight. They could be calling in from anywhere; the supermarket, the bathtub, the toilet.

Initially, the conversations on Clubhouse were carefully curated. To host a conversation, you had to apply via a special form to become an admin of a “club” within Clubhouse. From there, you could start a room, and invite others to speak. Attendance in a room would vary widely depending on the profile of those speaking, ranging from single figures to an initial upper limit of 5,000 attendees. As an audience member, you could ask questions in the chat, and if your question was smart or provocative enough, you might be given the floor by the hosts (celebrities lurking in the audience might be “spotted” and invited up “onstage”). In July 2021, Clubhouse was opened to the public. From then on, anyone could start a room, and decide whether it was visible publicly, only to followers, or only accessible through a private link; though it remained the case that within rooms, only room admins were able to decide who speaks.

Two years after its launch, the app has turned into a weird, horny graveyard. Rooms that aren’t saturated with audio porn are largely dominated by “MLM Schemes and crypto enthusiasts” (as the Ranveer Show summarized), with the odd room devoted to spirituality or a shared hobby. In some ways, this is a natural endpoint of associations between intimacy and the voice (Pornhub, understanding that phone sex will never die, started advertising “moan rooms”) — but in other ways, it’s a puzzle on conflicting ideas about what exactly the role of spoken conversation should be in relation to technology today. As recently as February, Elon Musk reportedly appeared in the app, which relaxed its attendee limit for the occasion. This hasn’t done much to revive interest in Clubhouse, but an article in Techcrunch noted that Musk “at times managed to sound far more nuanced than his meme-propelled, trolling Twitter feed,” suggesting a purposeful attempt on Musk’s part to reiterate the association between the voice and more “wholesome” forms of discourse.

Clubhouse reproduces the atmosphere of TED through a linguistic architecture

At its zenith, the overarching tone of a Clubhouse room sat halfway between the formal and the informal, the public and the private. Clubhouse actively positions itself as the peripheral informal space at the edge of the formal event: the drinks party at the end of the TedX convention, the Q&A at the end of the panel discussion, the show afterparty. Previously, its creators had worked on an app that would show you the bio of anyone in your vicinity; a kind of augmented reality version of LinkedIn in which the potential value of an interaction is measurable according to the social capital of each individual, and the likelihood of the interaction to generate more ideas, more capital. Likewise, the presence of “big names” on Clubhouse wasn’t positioned so much as an end in itself as a means to an end: more “valuable” participants will lead to more valuable conversations.

All audio media in some way appeal to the promise of intimacy in some form; whether it is the sentimental intimacy of a voice note, the parasocial intimacy of a podcast, or the sexually-charged intimacy of personal assistants like Siri. Clubhouse offers the kind of intimacy that is tied to the feeling of being inside the club. Unlike radio, or podcasts, it offers the chance to have one’s name glimpsed by the speaker as they scroll through the list of room attendees (even if you are never deemed important enough to be “handed the mic”). On an affective level, though, not much feels intimate about Clubhouse. This is because the “intimacy” of the voice isn’t an innate quality, but rather an impression that arrives from the interplay of presence and absence. In a way, the more absences or unknowns there are (what are they doing? Where are they calling or listening from? Who are they?), the more intimate the voice is made to feel. Clubhouse is not built for anonymity; there are no handles; only full names, and every room you join is made public. Being seen to be part of the Clubhouse room is an essential part of the experience. In contrast to the experience of radio, which is essentially an experience of unconditional belonging, the experience of social audio feels intensely atomized, each participant individualized and differentiated by a series of public parameters that ultimately are translatable into capital.

The rise of Clubhouse and other forms of social audio has to be understood within the history of the internet talk, a niche on which TED has had an immeasurable influence.

The fact that the rise of social audio coincides with the decline of the TED Talk hints at the former’s attempt to capitalize on the latter’s success: Last year, it was announced that Clubhouse was going to partner with TED to make TED Talks available to Clubhouse users via the app. TED and Clubhouse share more than just an interest in the voice: they are both highly popular with the Silicon Valley entrepreneurs who determined their format, and they are both ostensibly driven by the vague faith that ideas plus (the right kinds of) people equals change. (“When ideas and people come together to engage and debate,” said a TED representative in a press statement on the Clubhouse X TED partnership, “that’s when the real impact happens.”)

In an article for the Drift on the legacy and demise of the Ted Talk, Oscar Schwartz draws attention to the rhetorical style of the medium, which he calls the “inspiresting” — an “aesthetic of populist elitism” which is “smart but not quite intellectual, personal but not sincere, jokey but not funny” (other examples Schwartz gives of the inspiresting include Burning Man, Malcolm Gladwell, the blog Brain Pickings, Alain de Botton, Oliver Sacks, and This American Life). The TED Talk often involves the formulaic pairing of a serious global issue with an unlikely (often technological) solution: it performs a deliberately vague progressivism which envisages change as happening “without any serious transfers of power.” The function of conversation in this context — and of the voice more generally — is to “manifest a new world,” but the “better future” that TED envisages is so nebulous that it rarely materializes.

The promise of TED was aligned with the promise of early forms of social media. As with Zuckerberg’s vision of Facebook facilitating a “global community,” TED sought to bring together the “greatest minds” to facilitate the global proliferation of the “brightest ideas.” The waning of the TED Talk was thus inevitably bound to the waning public faith that “connectedness,” in context of social technologies, had any significant positive outcome. But if the techno-optimism of TED is out of fashion, its emphasis on the voice as a site for engineering meaningful change is alive and well.

The TED Talk achieved cultural ubiquity in the mid-2000s, around the same time that the “podcast boom” was beginning. The use of video was a crucial part of the TED Talk’s efforts to establish its legitimacy, and to stand out amid a wave of new audio content. Its set conjures a non-specific atmosphere of prosperity: high-contrast lighting; the red circular carpet and three signature letters adding up to a literal and figurative stage-of-the-world. Whereas the podcast’s audio-only format meant that — as Suzannah Showler writes — its content would come to be mapped onto the listener’s current surroundings, TED tied itself to a visualization of the metaphorical “public forum,” with the present’s greatest minds taking their turn to be heard. The TED Talk relied heavily on an atmosphere of formal conviviality and polite liveliness: the speaker pacing around dynamically onstage, the clip-on mic freeing up their hands to gesticulate and wield a slide clicker; and, of course, the shots of the audience — beaming, weeping, laughing, clapping. It turned the experience of listening into a communal event, evoking the feeling of participation in an audience even if you were watching the talk asynchronously from the privacy of your own screen.

Clubhouse reproduces the atmosphere of TED through a linguistic architecture. Each talk happens within a “room,” speakers are invited to take the “stage,” and to exit a room, you click a button that says “leave quietly,” as though the app were simulating the experience of brushing past knees in an auditorium. The chat that operates alongside the talk is called the “hallway,” a term that is also used to represent the app as a whole: “Bounce around the hallways of the internet and meet incredible people,” reads Clubhouse’s slogan. The app taps into widespread fears about the loss of public life, while imagining the “public” as something more resembling an exclusive conference than any actual town square. It also hints at the erosion of a boundary between the speaking class and the listening class, promising to open up the prestige of the TED stage to anyone with an idea to share. While few people still believe that written and photographic forms of social media function as forces for democratization, technologies of the voice somehow remains shrouded in the optimism that characterized the early internet; perhaps because the voice is rarely seen as part of the internet at all.

In many ways, social audio positions itself as more adjacent to radio than other forms of social media. Writing of attempts to reinvigorate radio in the ’70s and ’80s, Susan Douglas charts how NPR and Talk Radio spoke to contradictory currents that nonetheless shared “a profound sense of public exclusion from and increasing distrust with the mainstream media in general and TV news in particular.” Traditionally, broadcast radio was seen to extend public life — and the experience of belonging — into private spheres (the home, the car) rather than seeking to replace it. But with the gradual loss of public infrastructure and the atomization of communities, radio programs became “electronic surrogates for the town common […] where people imagined their grandparents — even their parents, for that matter — might have gathered with others to chat, however briefly, about the state of the town, the country, the world.” Talk radio and NPR both attempted to correct the “intermittent, often distracted listening” that characterized radio in the ’80s by creating more engaging and participatory formats, and repositioning listening as an active rather than a passive exercise. But whereas NPR sought to reactivate notions of citizenship and participatory democracy, talk radio developed a financial dependence on a style of macho populism, harnessing the affective dimensions of radio to deliberately rile up listeners: a model that Joe Rogan has since carried into the realm of podcasting.

In March, in the lead up to his bid to take over Twitter, Elon Musk tweeted a poll asking whether his followers felt Twitter’s moderation policies should be looser, considering it serves as “the de facto public town square.” Given Musk’s dreams for a social media app free of content moderation, it is hard to believe that Twitter Spaces — which has already become a haven for the far-right — would not be a factor in his desire for the platform. Musk has been toying with social audio for a while: his appearance on Clubhouse in February came a year after he invited Vladimir Putin to “chat” on the social audio app. In Mid-April, to explain his Twitter bid, Elon Musk spoke to TED curator Chris Anderson in a live TED event (a loaded choice, given that Ted X and Clubhouse are in partnership and competing with Twitter Spaces). He invoked vague ideals of free-flowing conversation: “if in doubt,” he said, “let speech exist.”

The supposed purity of the voice supports the narrative that social audio holds the key to a regenerated civil life

The idea of audio is an incredibly attractive one to tech entrepreneurs because it still carries the charge of nostalgia for radio’s actual public function. It projects an image of public inclusion, while at the same time facilitating a form of interaction that aligns with the “free speech movement,” both idealistically and pragmatically. Because it unfolds in real-time, social audio is notoriously hard to moderate. As this article notes, in its beta phase, Clubhouse did not seem to be interested in trialing any medium for users to report harassment. A few months after Clubhouse launched, a session was leaked in which several venture capitalists and entrepreneurs — many involved with the founding of the app — gathered to lament the power that journalists had to “cancel people” (the conversation revolved largely around a Twitter conflict between New York Times reporter Taylor Lorenz and entrepreneur Balaji Srinivasan). A write-up in Vice reported that the participants in the call “seemed to conceive of themselves as humble citizens preyed upon by corrupted elites cravenly lusting after money and power,” an especially baffling sentiment considering that the app at the time was still invite-only, and largely seen as a private playground for the rich and famous.

Douglas highlights how the participatory ethos of talk radio — its suggestion that it would counter the tendency towards centralization of power — was underscored by the reality that its expansion was the result of government deregulation and increasing corporate control. Debates around the public function of talk radio in the ’70s and ’80 drew attention to the very real threat it posed to broadcast media, but were also charged with vague concerns around “a decline of ‘civility’ and the collapse of ‘civil discourse’” — concerns which were ultimately about what it meant to rebuild a public sphere, “and about just whose public sphere it’s going to be — the educated bourgeoisie’s or the rabble rousers’.”

The controversial Clubhouse session taps into dichotomies between new (audio) and old (written) media, positioning social audio as a zone of unadulterated and democratic truth in opposition to the “gagged” state of mainstream media, and Twitter in particular. The supposed “purity” of the voice — its unmediated, unconstructed quality — works to support the narrative that social audio holds the key to a regenerated civil life, and that the values of civil life are in line with the values of “free speech.”

Up until now, audio has largely been seen as a more stripped-back form of technological engagement — pure consumption, without the bells and whistles of screen-interactivity.

Clubhouse and other forms of social audio attempt to harness lingering public trust in audio as a traditionally “pure” and uncorrupted zone of human interrelation, but they also reapply the demands from which the relatively “passive” experience of audio has traditionally provided an escape. This is probably why the social functions of Spotify, too, are relatively underused — in some ways listening has recently provided a refuge from the demands of a culture that demands people to be consistently public, which means no-one really wants their friends to be tracking their listening habits.

This is one possible reason why emerging forms of live audio media might have failed to garner mainstream interest so far; another, perhaps more obvious reason is that “social audio” doesn’t yet really know what it wants to market itself as. At times it positions itself as a zone for the polite sharing of information, at other times a site for lively (but “healthy”) debate, and at other times an instrument for self-declared “free speech absolutists” like Musk. Social audio attempts to tap into a sense of loss of civic life and community, but its entanglement with the forces that have led to the decimation of public life in the first place demonstrates that this is really nothing but a marketing strategy.

The promise of social audio taps into a seething web of nostalgia, alienation and grief. It harnesses desire for some form of public life beyond atomized social circles and the nuclear family unit, panic about the “state of democracy” and about human interrelationships. The fact that Clubhouse did not succeed in garnering widespread support does not indicate anything inherently incompatible between social formats and the voice. The voice’s contrasting associations — between our most intimate moments of relation, and the public-facing self — are perhaps in momentary conflict, but they both operate on the notion that there is something “pure” and “untainted” about spoken forms of communication, whether they express some sort of authentic self, or beam down some panacea for humanity. We must be careful about the ways we invoke nostalgia for spoken communication, because — as with nearly anything else that is widely understood as being a matter of “purity” — the supposed power of the voice can easily be coopted for conservative ends. And as long as every other form of public infrastructure is continually undermined, the dream of the “electronic agora” remains an empty promise.