MIT Technology Review
Sponsored by Owl Labs 
The Algorithm
Artificial intelligence, demystified
Cause and effect
Hello Algorithm readers,

This week, the AI research community has gathered in New Orleans for the International Conference on Learning Representations (ICLR, pronounced “eye-clear”), one of their major annual conferences. The proceedings are boasting over 3,000 attendees and 1,500 paper submissions, making it one of the most important forums for exchanging new ideas within the field.

This year the talks and accepted papers are heavily focused on tackling four major challenges in deep learning: fairness, robustness, generalizability, and causality. If you’ve been following along with The Algorithm, you’ll likely recognize the first three. We’ve talked about how machine-learning algorithms in their current state are biased, susceptible to adversarial attacks, and incredibly limited in their ability to generalize the patterns they find in a training dataset for multiple applications. Now the research community is busy trying to mitigate these weaknesses to evolve the technology into a more sophisticated form.

What we haven’t talked about much is the final challenge: causality—a golden standard that researchers have puzzled over for some time. Machine learning is great at finding correlations in data, but can it ever figure out causation? Such an achievement would be a huge milestone: if algorithms could help us shed light on the causes and effects of different phenomena in complex systems, it would deepen our understanding of the world and unlock more powerful tools to influence it.

Yesterday, to a packed room, acclaimed researcher Léon Bottou, now at Facebook’s AI research unit and New York University, laid out a new framework for how we might get there. I’m dedicating today’s issue to summarizing his talk. You can also watch it in full here, beginning around 12:00.

Sponsor Message


That terrible meeting you just had? Let’s make it your last with the Meeting Owl.

Teams are wasting far too much time with frustrating meeting setup, bad audio, and completely ignoring the remote folks in the conversation. The Meeting Owl is the first 360° smart camera with simple setup and the best experience for teams. It goes in the center of the table and autofocuses on whoever is talking. Owl Labs’ customers say for the #WFH folks, it nearly feels like sitting in the room with the team. Check out the experience for yourself. 

Meet the Meeting Owl.

Caption: sample images from the MNIST dataset

Let’s begin with Bottou’s first big idea: a new way of thinking about causality. Say you want to build a computer vision system that recognizes handwritten numbers. (This is a classic introductory problem that uses the widely available “MNIST” dataset pictured above.) You’d train a neural network on tons of images of handwritten numbers, each labeled with the number they represent, and end up with a pretty decent system for recognizing new ones it had never seen before.

But let’s say your training dataset is slightly modified and each of the handwritten numbers also have a color—red or green—associated with them. Suspend your disbelief for a moment and imagine that you don't know whether the color or the shape of the markings is a better predictor for the digit. The standard practice today is to simply label each piece of training data with both features and feed them into the neural network for it to decide.

colored MNIST

Caption: samples from a colored MNIST dataset

Here’s where things get interesting. The “colored MNIST” dataset is purposely misleading. Back in the real world we know that the color of the markings is completely irrelevant, but in this particular dataset, the color is in fact a stronger predictor for the digit than its shape. So our neural network learns to use it as the primary predictor of the digit. That’s fine when we then use the network to recognize other handwritten numbers that follow the same coloring patterns. But its performance completely tanks when we reverse the colors of the numbers. (When Bottou played out this thought experiment with real training data and a real neural network, he achieved an 84.3% recognition accuracy in the former scenario and a 10% accuracy in the latter.)

In other words, the neural network found what Bottou calls a “spurious correlation,” which makes it completely useless outside of the narrow context within which it was trained. In theory, if you could get rid of all the spurious correlations in a machine-learning model, you would be left with only the “invariant” ones—those that hold true regardless of context.

Invariance would in turn allow you to understand causality, explains Bottou. If you know the invariant properties of a system and know the intervention performed on a system, you should be able to infer the consequence of that intervention. For example, if you know the shape of a handwritten digit always dictates its meaning, then you can infer that changing its shape (cause) would change its meaning (effect). Or, another example: if you know that all objects are subject to the law of gravity, then you can infer that when you let go of a ball (cause), it will fall to the ground (effect).

Obviously, these are simple cause-and-effect examples based on invariant properties we already know, but they hint at the potential of finding invariant properties for much more complex systems that we don’t yet understand.

So how do we get rid of these spurious correlations? This is Bottou’s second big idea. In current machine-learning practice, the default intuition is to amass as much diverse and representative data as possible into a single training dataset. But Bottou says this approach does a disservice. Different data that comes from different contexts—whether collected at different times, in different locations, or under different experimental conditions—should be preserved as separate datasets rather than mixed and combined. When they are consolidated, as they are now, important contextual information gets lost, leading to a much higher likelihood of spurious correlations.

With multiple context-specific datasets, the nature of training a neural network changes. The network can no longer find the correlations that only hold true in one single diverse training dataset, it must find the correlations that are invariant across all of the diverse datasets. And if those datasets are selected smartly from a full spectrum of contexts, the final correlations should also closely match the invariant properties of the ground truth.

So, let’s return to our simple colored MNIST example one more time. Based on his theory for finding invariant properties, Bottou reran his original experiment. This time he used two colored MNIST datasets, each with different color patterns. He then trained his neural network to find the correlations that held true across both groups. When he tested this improved model on new numbers with the same and reversed color patterns, it achieved a 70% recognition accuracy for both, proving that the neural network had learned to disregard color and focus on the markings' shapes alone.

Lingering questions

In our last issue, I asked whether you thought the proposed discipline “machine behavior” offered a good framework for studying AI. As a reader quickly pointed out, this framework has existed in other fields for quite some time. In the field of science, technology, and society, for example, it’s known as the “actor-network theory.” The researchers also acknowledge that their ideas are not new but rather a synthesis of existing ones. I updated the story to reflect this.

This week, send me any lingering questions you have on Léon Bottou’s talk. I’d also like to know what kinds of causal relationships you’d investigate with AI.

AI and robotics are driving rapid and radical workplace transformation across all industries, for companies large and small.

These and other emerging technologies, like advanced manufacturing and AR/VR, are changing jobs ranging from manufacturing to medicine to retail. Purchase your ticket to EmTech Next today to stay ahead of your peers.

More from TR

Will Knight, senior AI editor, on an AI chip demoed at Jeff Bezos’ secret tech conference: “The microchips are designed to squeeze more out of the ‘deep-learning’ AI algorithms that have already turned the world upside down. And in the process, they may inspire those algorithms themselves to evolve. ‘We need new hardware because Moore’s law has slowed down,’ [MIT researcher Vivienne] Sze says, referring to the axiom coined by Intel cofounder Gordon Moore that predicted that the number of transistors on a chip will double roughly every 18 months—leading to a commensurate performance boost in computer power.” Read more here.

Bits and Bytes

The Trump administration is seeking comment on AI technical standards
It is part of the American AI Initiative, set forth in the president’s executive order in February. (US Federal Register)
+ Our coverage of the executive order (TR)

The first lawsuit over automated investment losses is going to court
In the absence of a legal framework to sue the algorithm, the plaintiff is targeting the salesman who persuaded him to entrust his money with it instead. (Bloomberg)

A GAN is helping neuroscientists decode the brain
The algorithm creates images fine-tuned to stimulate specific neurons, offering a window into how the brain parses the visual world. (The Atlantic)
+ Our explainer on generative adversarial networks, or GANs (TR)

AI is learning to pull a funny
Puns get to the heart of the devilishly hard challenge of teaching machines natural language. (WIRED)

Chinese food factories are using robots to “taste-test” their food
The machines are trained to monitor color, smell, and other sensory information to ensure consistent quality. (SCMP)


When designed for profit-making alone, algorithms necessarily diverge from the public interest.

Yochai Benkler, law professor and co-director of the Berkman Klein Center for Internet & Society at Harvard University, on why tech companies need to be regulated

Karen Hao
Hello! You made it to the bottom. Now that you're here, fancy sending us some feedback? You can also follow me for more AI content and whimsy at @_KarenHao, and share this issue of the newsletter here.
Was this forwarded to you, and you’d like to see more?
New Call-to-action
New call-to-action
You received this newsletter because you subscribed with the email address:
edit preferences   |   unsubscribe   |   follow us     
Facebook      Twitter      Instagram
MIT Technology Review
One Main Street
Cambridge, MA 02142