Intepretability and Modularity

Developing formal understanding of interpretability through structured models, GAMs, and interaction effects.

In scientific and high-stakes domains, we need AI systems that are not only accurate but also intelligible. Our research investigates the theoretical underpinnings of interpretability, focusing on how model structure affects human understanding.

We explore interaction effects as a key barrier to interpretability. Interaction effects arise when two or more input components must be considered jointly to affect the output—meaning that no single component is informative on its own (Lengerich et al., 2020). Because humans tend to reason compositionally and hierarchically, we find that additive representations—which isolate effects into separable components—make complex models more understandable.

To advance this perspective, we have:

Our goal is to formalize what makes models understandable—and build models that are easy to reason about without sacrificing performance.



References

2024

  1. Interpretable Machine Learning Predicts Postpartum Hemorrhage with Severe Maternal Morbidity in a Lower Risk Laboring Obstetric Population
    Benjamin J LengerichRich Caruana, Ian Painter, and 3 more authors
    American Journal of Obstetrics & Gynecology MFM, 2024

2022

  1. Automated interpretable discovery of heterogeneous treatment effectiveness: A COVID-19 case study
    Benjamin J Lengerich, Mark E Nunnally, Yin Aphinyanaphongs, and 2 more authors
    Journal of biomedical informatics, 2022
  2. Dropout as a Regularizer of Interaction Effects
    In Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics , 2022

2021

  1. Neural Additive Models: Interpretable Machine Learning with Neural Nets
    Rishabh Agarwal, Levi Melnick, Nicholas Frosst, and 4 more authors
    Advances in Neural Information Processing Systems, 2021
  2. How Interpretable and Trustworthy are GAMs?
    Chun-Hao Chang, Sarah Tan, Ben Lengerich, and 2 more authors
    In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2021

2020

  1. Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
    Ben Lengerich, Sarah Tan, Chun-Hao Chang, and 2 more authors
    In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) , 26–28 aug 2020