The main insight of Google’s research is to not see the different interpretability techniques in isolation but as composable building blocks of larger models that help understand the behavior of neural networks.
For instance, feature visualization is a very effective technique to understand the information processed by individual neurons but fails to correlate that insight with the overall decision made by the neural network.
Knowing that neuron-12345 fired five times is relevant but not incredibly useful in the scale of the entire network.
The research about understanding decisions in neural networks has focused on three main areas: feature visualization, attribution and dimensionality reduction.
How does the new Google model for interpretability works specifically?
Well, the main innovation, in my opinion, is that it analyzes the decisions made by different components of a neural network at different levels: individual neurons, connected groups of neurons and complete layers.
ADALINE was developed to recognize binary patterns so that if it was reading streaming bits from a phone line, it could predict the next bit.
MADALINE was the first neural network applied to a real world problem, using an adaptive filter that eliminates echoes on phone lines.
Google research of deep neural network interpretability is not only a theoretical exercise.
The research group accompanied the paper with the release of Lucid, a neural network visualization library that allow developers to make the sort lucid feature visualizations that illustrate the decisions made by individual segments of a neural network.