With the rise of neural networks, especially in high-stakes applications, these networks need two properties (i) robustness and (ii) interpretability to ensure their safety. Recent advances in classifiers with 3D volumetric object representations have demonstrated greatly enhanced robustness in out-of-distribution data. However, these 3D-aware classifiers have not been studied from the perspective of interpretability. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification. We design an inherently-interpretable and robust classifier by extending existing 3D-aware classifiers with concepts extracted from their volumetric representations for classification. In an array of quantitative metrics for interpretability, we compare against different concept-based approaches across the explainable AI literature and show that CAVE discovers well-grounded concepts that are used consistently across images, while achieving superior robustness.
Existing image classifiers are either inherent interpretable or robust. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies inherent interpretability and robustness in image classification.
We introduce CAVE , a framework that allows for robust conceptual reasoning and classification through interpretable 3D-aware neural object volumes (NOVs). (1) Given the NOV of class Car, CAVE first extracts concepts via clustering on Gaussian features gCar(k) and represent the mean feature of each Gaussian cluster as a concept hCar(t). Note that the clusters here are visually refined for illustrative purposes to better convey our method. For classification, CAVE combines image features Fx in (2) and interpretable concept-aware NOVs H in (3) through a bag-of-word concept matching step (4), where each feature fi ∈ Fx is best aligned with H by cosine similarity. The logit for class y is computed as the sum of cosine similarities over Fx, considering only features mapped to its cluster (5).
LRP is, however, only defined for standard architectures and does not yield meaningful attributions for NOV-based classifications, attributing only to a few pixels rather than correctly to whole object features. We introduce LRP for CAVE -like architectures. Top: At an upsampling layer U, feature maps Av and Av+l from non-consecutive layers are concatenated after padding for dimensional consistency. The relevance score R is split into Rv and Rv+l, where Rv+l is masked to exclude padding contributions. Bottom: We ensure spatial consistency by mapping relevance RM from the matching layer to the corresponding feature fi ∈ Fx, then distributing channel-wise with NOV-weighted feature importance.
We evaluate the consistency of concepts with our novel metric 3D Consistency, where a concept is mapped to an object part in 3D ground-truth CAD models. Here we show 6 concepts of class Bicycle corresponding to those in Fig. 1, where each illustrates the aggregated concept relevance across 100 test images onto the mesh surface.
We evaluate the robustness of concepts by measuring model Accuracy across diverse OOD settings. CAVE performs best in both in-distribution and OOD scenarios, experiencing only a slight deterioration in performance compared to other baseline methods. For interpretability, our metrics are designed to quantify key properties that address the question: to what extent are our explanations aligned with human-annotated object parts? This includes: IoU, Local Coverage, Global Coverage and Pureness. Here, we show a qualitative example of a car with 40% occlusion, where existing methods fail to provide reliable explanations due to their sensitivity to missing object parts. CAVE focuses on more informative regions despite the occlusion, demonstrating better resilience to missing object parts. This highlights the importance of robust explanations when dealing with OOD challenges.
Example concepts learned by CAVE for different classes.
@inproceedings{pham25cave,
title = {Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes},
author = {Pham, Nhi and Schiele, Bernt and Kortylewski, Adam and Fischer, Jonas},
booktitle = {arXiv},
year = {2025},}