- 1. LCA distance robustly predicts models' OOD performance.
- 2. LCA distance suggests how to improve models' generalization.
- 3. LCA distance offers insights into why VLMs generalize so well.
We tackle the challenge of predicting models' Out-of-Distribution (OOD) performance using in-distribution (ID) measurements without requiring OOD data. Existing evaluations with ``Effective robustness'', which use ID accuracy as an indicator of OOD accuracy, encounter limitations when models are trained with diverse supervision and distributions, such as class labels (Vision Models, VMs, on ImageNet) and textual descriptions (Visual-Language Models, VLMs, on LAION). VLMs often generalize better to OOD data than VMs despite having similar or lower ID performance. To improve the prediction of models' OOD performance from ID measurements, we introduce the Lowest Common Ancestor (LCA)-on-the-Line framework. This approach revisits the established concept of LCA distance, which measures the hierarchical distance between labels and predictions within a predefined class hierarchy, such as WordNet. We assess 75 models using ImageNet as the ID dataset and five significantly shifted OOD variants, uncovering a strong linear correlation between ID LCA distance and OOD top-1 accuracy. Our method provides a compelling alternative for understanding why VLMs tend to generalize better. Additionally, we propose a technique to construct a taxonomic hierarchy on any dataset using K-means clustering, demonstrating that LCA distance is robust to the constructed taxonomic hierarchy. Moreover, we demonstrate that aligning model predictions with class taxonomies, through soft labels or prompt engineering, can enhance model generalization.
Our method estimates a model’s generalization based on the in-distribution semantic severity of its mistakes. We use the 'Lowest Common Ancestor' (LCA) distance to rank the distance between the model’s prediction and the ground-truth class within a predefined taxonomic hierarchy, such as WordNet. The LCA distance is proportional to the shortest path from the prediction to the ground-truth class in the hierarchy.
We show that LCA distance, a metric measuring prediction performance with respect to an ontology/hierarchy on in-distribution (ID) data, can robustly predict the model's out-of-distribution (OOD) performance. It unifies Vision Models (VMs) and Vision-Language Models (VLMs) across different modalities and training data sources in terms of measuring model generalization, outperforming "accuracy-on-the-line."
The following plot shows that LCA distance consistently achieves strong linear correlation on multiple ImageNet-OOD datasets. For VMs and VLMs, ID accuracy is not on the line, while ID LCA is on the line.
We show that LCA distance can be used as soft labels to improve models' OOD performance. The hierarchy allows exploiting class-pairwise distances in model training, and we train with cross-entropy plus soft labels loss.
We can construct a hierarchy by clustering per-class mean features using a foundation model like CLIP. We show that using a latent hierarchy performs as well as WordNet.
Using soft labels from latent hierarchies generated by VLMs yields better OOD results than VMs. That said, VLMs have a better human-aligned feature distribution, i.e., their generated labels better align with human-world ontology (WordNet).
@inproceedings{shilca,
title={LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies},
author={Shi, Jia and Gare, Gautam Rajendrakumar and Tian, Jinjin and Chai, Siqi and Lin, Zhiqiu and Vasudevan, Arun Balajee and Feng, Di and Ferroni, Francesco and Kong, Shu},
booktitle={Forty-first International Conference on Machine Learning},year={2024}}