CAVE: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes

Abstract

With the rise of neural networks, especially in high-stakes applications, these networks need two properties (i) robustness and (ii) interpretability to ensure their safety. Recent advances in classifiers with 3D volumetric object representations have demonstrated greatly enhanced robustness in out-of-distribution data. However, these 3D-aware classifiers have not been studied from the perspective of interpretability. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification. We design an inherently-interpretable and robust classifier by extending existing 3D-aware classifiers with concepts extracted from their volumetric representations for classification. In an array of quantitative metrics for interpretability, we compare against different concept-based approaches across the explainable AI literature and show that CAVE discovers well-grounded concepts that are used consistently across images, while achieving superior robustness.

Motivation

Existing image classifiers are either inherent interpretable or robust. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies inherent interpretability and robustness in image classification.

Method

CAVE : A 3D-Aware Inherently Interpretable Classifier

We introduce CAVE , a framework that allows for robust conceptual reasoning and classification through interpretable 3D-aware neural object volumes (NOVs). (1) Given the NOV of class Car, CAVE first extracts concepts via clustering on Gaussian features g_Car^(k) and represent the mean feature of each Gaussian cluster as a concept h_Car^(t). Note that the clusters here are visually refined for illustrative purposes to better convey our method. For classification, CAVE combines image features F_x in (2) and interpretable concept-aware NOVs H in (3) through a bag-of-word concept matching step (4), where each feature f_i ∈ F_x is best aligned with H by cosine similarity. The logit for class y is computed as the sum of cosine similarities over F_x, considering only features mapped to its cluster (5).

LRP with Conservation for CAVE

LRP is, however, only defined for standard architectures and does not yield meaningful attributions for NOV-based classifications, attributing only to a few pixels rather than correctly to whole object features. We introduce LRP for CAVE -like architectures. Top: At an upsampling layer U, feature maps A_v and A_v+l from non-consecutive layers are concatenated after padding for dimensional consistency. The relevance score R is split into R_v and R_v+l, where R_v+l is masked to exclude padding contributions. Bottom: We ensure spatial consistency by mapping relevance R_M from the matching layer to the corresponding feature f_i ∈ F_x, then distributing channel-wise with NOV-weighted feature importance.

Evaluation

Acknowledgments

Nhi Pham was funded by the International Max Planck Research School on Trustworthy Computing (IMPRS-TRUST) program. We thank Christopher Wewer for his support with the NOVUM codebase, insightful discussions on 3D consistency evaluation, and careful proofreading of our paper. We also thank Artur Jesslen for his help with NOVUM codebase issues. Additionally, we thank Ada Görgün and Amin Parchami-Araghi for their helpful discussions.

BibTeX

@inproceedings{pham25cave,
      title     = {Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes},
      author    = {Pham, Nhi and Schiele, Bernt and Kortylewski, Adam and Fischer, Jonas},
      booktitle = {arXiv},
      year      = {2025},}

Escaping Plato's Cave:
Robust Conceptual Reasoning through
Interpretable 3D Neural Object Volumes