Escaping Plato's Cave:
Robust Conceptual Reasoning through
Interpretable 3D Neural Object Volumes

1Max Planck Institute for Informatics, Saarland Informatics Campus, Germany 2University of Freiburg, Germany
*Equal senior advisorship
teaser

Figure 1. We introduce CAVE - Concept Aware Volumes for Explanations. Left: We learn 3D object volumes, here cuboids, with concept representations. Each concept captures distinct local features of objects. Right: At inference, these concepts are matched with 2D image features, achieving robust and interpretable image classification.

Abstract

With the rise of neural networks, especially in high-stakes applications, these networks need two properties (i) robustness and (ii) interpretability to ensure their safety. Recent advances in classifiers with 3D volumetric object representations have demonstrated greatly enhanced robustness in out-of-distribution data. However, these 3D-aware classifiers have not been studied from the perspective of interpretability. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification. We design an inherently-interpretable and robust classifier by extending existing 3D-aware classifiers with concepts extracted from their volumetric representations for classification. In an array of quantitative metrics for interpretability, we compare against different concept-based approaches across the explainable AI literature and show that CAVE discovers well-grounded concepts that are used consistently across images, while achieving superior robustness.

Motivation

Existing image classifiers are either inherent interpretable or robust. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies inherent interpretability and robustness in image classification.

Method

CAVE : A 3D-Aware Inherently Interpretable Classifier

overview

We introduce CAVE , a framework that allows for robust conceptual reasoning and classification through interpretable 3D-aware neural object volumes (NOVs). (1) Given the NOV of class Car, CAVE first extracts concepts via clustering on Gaussian features gCar(k) and represent the mean feature of each Gaussian cluster as a concept hCar(t). Note that the clusters here are visually refined for illustrative purposes to better convey our method. For classification, CAVE combines image features Fx in (2) and interpretable concept-aware NOVs H in (3) through a bag-of-word concept matching step (4), where each feature fi ∈ Fx is best aligned with H by cosine similarity. The logit for class y is computed as the sum of cosine similarities over Fx, considering only features mapped to its cluster (5).

LRP with Conservation for CAVE

LRP is, however, only defined for standard architectures and does not yield meaningful attributions for NOV-based classifications, attributing only to a few pixels rather than correctly to whole object features. We introduce LRP for CAVE -like architectures. Top: At an upsampling layer U, feature maps Av and Av+l from non-consecutive layers are concatenated after padding for dimensional consistency. The relevance score R is split into Rv and Rv+l, where Rv+l is masked to exclude padding contributions. Bottom: We ensure spatial consistency by mapping relevance RM from the matching layer to the corresponding feature fi ∈ Fx, then distributing channel-wise with NOV-weighted feature importance.

lrp

Evaluation

3D Consistency of Concepts

We evaluate the consistency of concepts with our novel metric 3D Consistency, where a concept is mapped to an object part in 3D ground-truth CAD models. Here we show 6 concepts of class Bicycle corresponding to those in Fig. 1, where each illustrates the aggregated concept relevance across 100 test images onto the mesh surface.

CAD

Robustness & Interpretability of Concepts

We evaluate the robustness of concepts by measuring model Accuracy across diverse OOD settings. CAVE performs best in both in-distribution and OOD scenarios, experiencing only a slight deterioration in performance compared to other baseline methods. For interpretability, our metrics are designed to quantify key properties that address the question: to what extent are our explanations aligned with human-annotated object parts? This includes: IoU, Local Coverage, Global Coverage and Pureness. Here, we show a qualitative example of a car with 40% occlusion, where existing methods fail to provide reliable explanations due to their sensitivity to missing object parts. CAVE focuses on more informative regions despite the occlusion, demonstrating better resilience to missing object parts. This highlights the importance of robust explanations when dealing with OOD challenges.

 
occ

CAVE's Concept Visualisation

Example concepts learned by CAVE for different classes.

Acknowledgments

Nhi Pham was funded by the International Max Planck Research School on Trustworthy Computing (IMPRS-TRUST) program. We thank Christopher Wewer for his support with the NOVUM codebase, insightful discussions on 3D consistency evaluation, and careful proofreading of our paper. We also thank Artur Jesslen for his help with NOVUM codebase issues. Additionally, we thank Ada Görgün and Amin Parchami-Araghi for their helpful discussions.

BibTeX

@inproceedings{pham25cave,
      title     = {Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes},
      author    = {Pham, Nhi and Schiele, Bernt and Kortylewski, Adam and Fischer, Jonas},
      booktitle = {arXiv},
      year      = {2025},}