Non-Verbal Communications and the advent of Artificial Intelligence


WE Know How You Feel


Raffi  Khatchadourian

The New Yorker,

January 19, 2015 Issue




By scanning your face computers can decode your unspoken reaction to a movie a political debate even a video call with a...

By scanning your face, computers can decode your unspoken reaction to a movie, a political debate, even a video call with a friend.Illustration by Bryan Christie


“Affectiva is the most visible among a host of competing boutique startups: Emotient, Realeyes, Sension. After Kaliouby and I sat down, she told me, “I think that, ten years down the line, we won’t remember what it was like when we couldn’t just frown at our device, and our device would say, ‘Oh, you didn’t like that, did you?’ ” She took out an iPad containing a version of Affdex, her company’s signature software, which was simplified to track just four emotional “classifiers”: happy, confused, surprised, and disgusted. The software scans for a face; if there are multiple faces, it isolates each one. It then identifies the face’s main regions—mouth, nose, eyes, eyebrows—and it ascribes points to each, rendering the features in simple geometries. When I looked at myself in the live feed on her iPad, my face was covered in green dots. “We call them deformable and non-deformable points,” she said. “Your lip corners will move all over the place—you can smile, you can smirk—so these points are not very helpful in stabilizing the face. Whereas these points, like this at the tip of your nose, don’t go anywhere.” Serving as anchors, the non-deformable points help judge how far other points move.

Affdex also scans for the shifting texture of skin—the distribution of wrinkles around an eye, or the furrow of a brow—and combines that information with the deformable points to build detailed models of the face as it reacts. The algorithm identifies an emotional expression by comparing it with countless others that it has previously analyzed. “If you smile, for example, it recognizes that you are smiling in real time,” Kaliouby told me. I smiled, and a green bar at the bottom of the screen shot up, indicating the program’s increasing confidence that it had identified the correct expression. “Try looking confused,” she said, and I did. The bar for confusion spiked. “There you go,” she said.

Like every company in this field, Affectiva relies on the work of Paul Ekman, a research psychologist who, beginning in the sixties, built a convincing body of evidence that there are at least six universal human emotions, expressed by everyone’s face identically, regardless of gender, age, or cultural upbringing. Ekman worked to decode these expressions, breaking them down into combinations of forty-six individual movements, called “action units.” From this work, he compiled the Facial Action Coding System, or facs—a five-hundred-page taxonomy of facial movements. It has been in use for decades by academics and professionals, from computer animators to police officers interested in the subtleties of deception.

Ekman has had critics, among them social scientists who argue that context plays a far greater role in reading emotions than his theory allows. But context-blind computers appear to support his conclusions. By scanning facial action units, computers can now outperform most people in distinguishing social smiles from those triggered by spontaneous joy, and in differentiating between faked pain and genuine pain. They can determine if a patient is depressed. Operating with unflagging attention, they can register expressions so fleeting that they are unknown even to the person making them. Marian Bartlett, a researcher at the University of California, San Diego, and the lead scientist at Emotient, once ran footage of her family watching TV through her software. During a moment of slapstick violence, her daughter, for a single frame, exhibited ferocious anger, which faded into surprise, then laughter. Her daughter was unaware of the moment of displeasure—but the computer had noticed. Recently, in a peer-reviewed study, Bartlett’s colleagues demonstrated that computers scanning for “micro-expressions” could predict when people would turn down a financial offer: a flash of disgust indicated that the offer was considered unfair, and a flash of anger prefigured the rejection.


Kaliouby often emphasizes that this technology can read only facial expressions, not minds, but Affdex is marketed as a tool that can make reliable inferences about people’s emotions—a tap into the unconscious. The potential applications are vast. CBS uses the software at its Las Vegas laboratory, Television City, where it tests new shows. During the 2012 Presidential elections, Kaliouby’s team used Affdex to track more than two hundred people watching clips of the Obama-Romney debates, and concluded that the software was able to predict voting preference with seventy-three-per-cent accuracy. Affectiva is working with a Skype competitor, Oovoo, to integrate it into video calls. “People are doing more and more videoconferencing, but all this data is not captured in an analytic way,” she told me. Capturing analytics, it turns out, means using the software—say, during a business negotiation—to determine what the person on the other end of the call is not telling you. “The technology will say, ‘O.K., Mr. Whatever is showing signs of engagement—or he just smirked, and that means he was not persuaded.’ ”