Recorded for Quality Assurance

“This call may be recorded for quality assurance.” Surely you’ve encountered this signature phrase of the call-center industry, possibly after a long wait on hold trying to get an answer to why you’re internet won’t work or why your medical bills weren’t covered. The disclaimer is a kind of sonic terms-of-service agreement — uninteresting, unimportant, and compulsively accepted. The recording represents a promise from the company to the customer: You deserve the best service possible, and we are listening to make sure that you get it.

But it is neither an extension of customer service nor a meaningless technicality. Quality-assurance systems are a form of surveillance, and new technologies for processing all the information they collect are expanding their reach and implications. Algorithmic vocal-tone analysis uses machine learning to purportedly identify and quantify the affect of call-center agents and customers alike and turn it into a profitable data set for companies. These systems include Cogito, which, as Wired reported in 2018, claims to measure and interpret data about speakers’ energy, empathy, participation, tone, and pace in real time. It displays directive icons to the agent — coffee cups when the data says they sound sluggish; hearts when their customer sounds upset— as well as sending alerts to their supervisors. Another tonal-analysis system, CallMiner, sends three to five notifications a minute to an agent on a typical call, ranging from, as Wired noted, “messages of congratulation and cute animal photos when software suggests a customer is satisfied” to “a suggestion to ‘calm down’ and a list of soothing talking points” when caller frustration is detected. The aim is to identify the feeling being generated in real time between customers and agents and then harness it, shifting it into patterns of interaction that assure operational efficiency and profit maximization. This then can be passed off as the essence of “good customer service.”

Customers may become unwilling snitches

Representatives from both companies claim that their systems allow agents to be more empathetic, but in practice, they offer emotional surveillance suitable for disciplining workers and manipulating customers. The data they collect can be correlated with customer retention rates and used to identify underperforming workers. Your awkward pauses, over-talking, and unnatural pace can and will be used against them.

In analyzing the aural tone of recorded phone interactions, call-center QA systems appear to merely pick up from where earlier systems left off. IBM’s Watson Tone Analyzer, for instance, promises to detect “joy, fear, sadness, anger, analytical, confident and tentative tones” in transcribed text of customer service calls to assess customers’ experience. But vocal-tone-analysis systems claim to move beyond words to the emotional quality of the customer-agent interaction itself.

It has long been a management goal to rationalize and exploit this relationship, and extract a kind of affective surplus. In The Managed Heart, sociologist Arlie Hochschild described this quest as “the commercialization of human feeling,” documenting how “emotional labor” is demanded from employees and the toll that demand takes on them. Sometimes also referred to as the “smile economy,” this quintessentially American demand for unwavering friendliness, deference, and self-imposed tone regulation from workers is rooted in an almost religious conviction that positive experiences are a customer’s inalienable right. But as Hochschild and others have shown, the positivity mandate correlates with negative health outcomes for workers; it, for instance, may contribute to increased alcohol consumption. Not coincidentally, call centers have notoriously high turnover rates, and employee testimonials describe frequent sudden terminations.

Systems like Cogito and CallFinder describe themselves as “assistive technologies” that provide real-time information to make such emotional labor more efficient. In practice, they extend its domain while deskilling it and making it more inescapable. Vocal tone analysis holds employees accountable for aspects of customer service that may be beyond their direct conscious control. Because they can’t be sure how the monitoring system will interpret their behavior in advance, employees are reduced to reshaping it on the fly in response to the algorithm’s feedback: No promotion until your positivity quota is met! Didn’t you see all those coffee-cup and frowny-face emojis?

The premise of using affect as a job-performance metric would be problematic enough if the process were accurate. But the machinic systems that claim to objectively analyze emotion rely on data sets rife with systemic prejudice, which affects search engine results, law enforcement profiling, and hiring, among many other areas. For vocal tone analysis systems, the biased data set is customers’ voices themselves. How pleasant or desirable a particular voice is found to be is influenced by listener prejudices; call-center agents perceived as nonwhite, women or feminine, queer or trans, or “non- American” are at an entrenched disadvantage, which the datafication process will only serve to reproduce while lending it a pretense of objectivity. The entire process turns customers’ racial, sexist, xenophobic, and ableist prejudices into profitable data, used to justify further marginalization and economic precaritization of groups already most likely to experience them. This closed feedback loop will strip away relevant context and mask discrimination, further reifying the idea that the “customer is always right,” no matter how racist they are.

How we want to feel and how we think we feel are both irrelevant to the goals and operations of algorithmic vocal-tone analysis. Our emotional state during a recorded interaction with a call center agent only matters insofar as it can be translated into data that can be used to predict future behavior. Whatever I say about why I, for instance, cancelled my internet after calling customer service has little bearing on how my tone will be datafied and processed and affixed to the call center agent with whom I interacted. It won’t matter whether I have a vocal disability that renders my voice incongruent with the tone and tenors typically processed by algorithmic voice analysis; nor will it matter if I am having an extremely bad day that has nothing to do with the call center, or if my phone picks up my child screaming in the background, and how agitated that is making me. The quality assurance companies discussed here do not publicly provide figures about the efficacy of their systems, nor is it clear how their accuracy could even be established, given the tautological ways they define emotion. Accuracy is irrelevant to these systems’ potential usefulness — that is, their potential profitability.

How we want to feel and how we think we feel are both irrelevant to the goals of algorithmic vocal-tone analysis

The irrelevance of “true” customer feelings to algorithmic quality assurance systems is a part of a broader trend to use data to objectify consumers and sideline their professed opinions. As John Cheney-Lippold details in We Are Data: Algorithms and the Making of Our Digital Selves, marketing has trended toward ever more fine-grained individual consumer profiles incorporating not just traditional information such as age, gender, race, page views, and purchases but also the supposed emotional states underlying their behavior, gleaned from various forms of external surveillance rather than self-reports. Motivating this trend is a faith in behaviorism over persuasion: that customers can more easily be manipulated on the basis of nonconscious “tells” than they can be consciously convinced, particularly if they are enclosed in a feedback loop that uses these nonconscious signals to shape what a customer perceives.

In addition to vocal tone analysis, services like Amazon’s Rekognition face detection service purport to automatically detect (that is, ascribe) eight distinct emotions on human faces. Axis Communication’s retail surveillance service offers store-wide camera feeds of retail locations that provide “heat maps of customer behavior, giving insight into their movement patterns and showing their path to purchase.” A theoretical marriage of Rekognition with Axis would allow the production of the same sort of “affect scores” for employees that vocal-tone analysis currently promises.

Biometric data from FitBits, phones, and other devices could be used to measure the affective effect we have on each other in a wide range of retail and office-related scenaros. Uber drivers who raise the heart rate of everyone who comes into their car — which might be interpreted as proof of poor driving practices or a confrontational interpersonal style — could theoretically have their rating dropped. Service workers may find their emotional labor explicitly quantified, and customers may become unwilling snitches. Our bodies, through their supposed physiological disclosure of our internal affective states, become profitable sites of data creation for systems whose aims we might fully reject.

In such a world of ubiquitous affect surveillance, it may seem that a consumer who wanted to resist would need to be hyperaware of themselves, affecting not just a cherry voice but a friendly face and posture to avoid bringing down punishment on an unsuspecting employee. But in reality, how any piece of customer behavior will be interpreted is entirely opaque; since iterative machine learning develops inexplicable rationales for its decisions, it is impossible to know which sort of behaviors will jam the system.

It may be too strong a claim, however to say that we are inevitably conscripted into the racist, sexist, ableist, and xenophobic systems of algorithmic analysis — this runs the risk of excusing people’s prejudices on the grounds that they are already anticipated and encoded. It would create an overly fatalistic view of our inability to resist both prejudicial behavior and its datafication. Nonetheless, vocal tone analysis highlights what it means to exist in an increasingly datafied, algorithmic society. We are becoming, in our very materiality, a source of profitable data that has the potential to be weaponized against one another, particularly the most economically precarious among us. Our best hope at resistance may be to starve the machine not of data but of grievance — a complicated task that would require a widespread divestment from the entitlements of the smile economy.