Machine-learning explainability: Black-box models

By Archith Srinivas | Updated on 17-May-2022

Archith Srinivas

17-May-2022

HIGHLIGHTS

We try and breakdown how machine learning has evolved

Black boxes are essentially modern machine learning models

They are intricate and have even confounded researchers

Modern machine-learning models, such as neural networks, are frequently referred to as "black boxes" because they are so intricate that even the researchers who create them do not truly comprehend how they make predictions.

Yilun Zhou, a graduate student in electrical engineering and computer science at the Computer Science and Artificial Intelligence Laboratory (CSAIL), is the lead author of the mathematical framework, with co-authors Marco Tulio Ribeiro, a senior researcher at Microsoft Research, and Julie Shah, a professor of aeronautics and astronautics and the director of the Interactive Robotics Group at CSAIL.

Researchers use explanation methods to describe individual model decisions to provide some insights. They may, for example, emphasise words in a movie review that impacted the model's decision that the review was favourable.

However, these methods of explanation are useless if humans cannot easily understand them, or even misunderstand them. As a result, MIT researchers developed a mathematical framework for formally quantifying and evaluating the understandability of machine-learning model explanations. This can assist in identifying insights about model behaviour that may be missed if the researcher only evaluates a handful of individual explanations in an attempt to understand the entire model.

Understanding local explanations

Finding another model that mimics a machine-learning model's predictions but uses transparent reasoning patterns is one way to understand it. However, because modern neural network models are so complex, this technique frequently fails. Instead, researchers rely on local explanations that concentrate on individual inputs. These explanations frequently highlight words in the text to emphasise their significance to one prediction made by the model.

The researchers created ExSum (short for explanation summary), a framework that formalises these sorts of claims into rules which can be assessed using quantifiable metrics. ExSum evaluates a rule across an entire dataset instead of the single incident for which it is built.

The user can use ExSum to see if the rule holds up by using three specific benchmarks: coverage, validity, and sharpness. Coverage quantifies how universally applicable the rule is across the entire dataset. The percentage of isolated instances that agree with the rule is highlighted by validity. Sharpness describes how precise the rule is; a highly valid rule may be so generic that it is ineffective for understanding the model.

Extending framework

Zhou intends to build on this work in the future by applying the concept of understandability to different criteria and explanation forms, such as counterfactual explanations. For the time being, they focused on feature attribution methods, which describe the individual features that a model used to generate a conclusion.

Furthermore, he wishes to improve the framework and user interface so that people can create rules more quickly. Writing rules can take hours of human involvement — and some human involvement is required because humans must eventually be able to understand the explanations — but AI assistance could speed up the process.

Archith Srinivas

Student of Journalism, cinephile and a sports fanatic View Full Profile