CAVE: Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning

1Max Planck Institute for Informatics, Saarland Informatics Campus, Germany 2University of Freiburg, Germany 3CISPA Helmholtz Center for Information Security, Germany
*Equal senior advisorship
Accepted at ICLR 2026
teaser

CAVE - Concept Aware Volumes for Explanations. (a) We learn 3D object volumes (left), here ellipsoids with concept representations . Each concept captures distinct local features of objects (color coded). At inference (right), these concepts are matched with 2D image features, achieving robust and interpretable image classification. (b) CAVE achieves the best robustness vs. interpretability tradeoff across methods (higher is better on both axes). Here, we measure robustness with vs. interpretability tradeoff across methods (higher is better on both axes). Here, we measure robustness with OOD accuracy (%), and interpretability with concept spatial localisation.

Abstract

With the rise of deep neural networks, especially in safety-critical applications, robustness and interpretability are crucial to ensure their trustworthiness. Recent advances in 3D-aware classifiers that map image features to volumetric representation of objects, rather than relying solely on 2D appearance, have greatly improved robustness on out-of-distribution (OOD) data. Such classifiers have not yet been studied from the perspective of interpretability. Meanwhile, current concept-based XAI methods often neglect OOD robustness. We aim to address both aspects with CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification. We design CAVE as a robust and inherently interpretable classifier that learns sparse concepts from 3D object representation. We further propose 3D Consistency, a metric to measure spatial consistency of concepts. Unlike existing metrics that rely on human-annotated parts on images, 3D-C leverages ground-truth object meshes as a common surface to project and compare explanations across concept-based methods. CAVE achieves competitive classification performance while discovering consistent and meaningful concepts across images in various OOD settings.

Concepts Grounded in 3D Geometry

Our intuition is simple: humans naturally form high-level concepts that are consistent across different viewpoints and appearances of objects. Inspired by this, we leverage object-centric 3D geometry to produce explanations that remain consistent under distribution shifts. By grounding concepts in neural object volumes, our explanations are model-faithful and robust to occlusions and viewpoint changes. Our approach CAVE represents objects with ellipsoid neural volumes, which provide a compact yet expressive 3D shape prior without requiring detailed CAD supervision.

motivation

Unlike NOVUM (Jesslen et al., 2024), which relies on dense, opaque Gaussian features distributed across the object surface, CAVE organises these features into a sparse set of semantically meaningful concepts. This structured grouping makes the underlying representation more interpretable by explicitly linking regions of the 3D volume to high-level concepts, while still preserving the robustness of volumetric modeling under OOD settings.

NOVUM vs CAVE comparison

Method

method

We introduce CAVE, a framework for robust conceptual reasoning and classification through 3D-aware concept-based ellipsoid neural object volumes (NOVs). In this visual illustration, colors indicate the top-5 concepts within each class. For classification, CAVE combines (a) extracted image features Fx and (b) interpretable sparse concept-aware NOVs through a bag-of-word concept matching (c), where each feature fiFx is best aligned with by cosine similarity. Correct classification happens when many image features activate Car concepts, while concepts in other classes fail to align with any feature (crossed-out arrows).

3D Consistency of Concepts

3dc

We introduce 3D consistency (3D-C), a geometry-based metric that measures if a concept consistently maps to the same semantic region of an object across viewpoints and distribution shifts. By projecting concept attributions onto a common 3D object surface, 3D-C evaluates concept stability without relying on human-annotated parts, providing a principled measure of concept consistency. Higher 3D-C means more consistent mapping to the same region.

Attributing Concepts with NOV-aware LRP

lrp

To obtain input-level explanations for our neural object volume (NOV) concepts, we adapt layer-wise relevance propagation (LRP) to volumetric architectures. Our NOV-aware redistribution rule preserves LRP conservation property through concept matching, enabling faithful pixel-level concept attributions. Please refer to our paper for full derivation and more qualitative examples.

lrp_ab

Our NOV-aware LRP correctly attributes concepts and yields localised explanations, even under different OOD settings: snow and 40-60% (heavy) occlusion. Colors indicate the top-5 class-wise concepts per row and are not comparable across rows.

Acknowledgement

We thank Christopher Wewer for insightful discussions on 3D consistency evaluation, and careful proofreading of our paper. Additionally, our codebase is built upon the NOVUM by Artur Jesslen, and LRP for ResNet by Seitaro Otsuki. We also use Orient-Anything by Zehan Wang to generate 3D pose annotations for our model training.

BibTeX

If you find our work useful in your research, please consider citing:

@inproceedings{pham26interpretable,
    title     = {Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes},
    author    = {Pham, Nhi and Jesslen, Artur and Schiele, Bernt and Kortylewski, Adam and Fischer, Jonas},
    booktitle = {The Fourteenth International Conference on Learning Representations},
    year      = {2026},
    url       = {https://openreview.net/forum?id=VSPLa2Sito}
}