Highlights

B-score: Detecting biases in large language models using response history

Figure 1: VLMs fail on 6 counting tasks (a–e & g) and one low-level vision task (f).

Figure 2: Given a subject (e.g., Adidas logo), we first confirm that all VLMs have sufficient knowledge about the subject via an ID and counting sanity-check questions (a). Then, we test VLMs on the counterfactual image (b) and report its accuracy on the counting (Q1 & Q2) and an Y/N identification task (Q3). For all tasks, we test the hypothesis that the visual bias cues in the background (c) may be so strong that it cause VLMs to ignore the modified object and default to biased answers.

Figure 3: VLMs fail to detect subtle changes in counterfactuals (CF) and default to biased answers.

We investigate how memorized knowledge in large language models (LLMs), while useful for many downstream tasks, can also bias vision-language models (VLMs) and lead them toward incorrect answers on otherwise straightforward visual recognition problems. In this work, we specifically examine how prior knowledge about popular subjects interferes with VLM accuracy on objective visual tasks such as counting and identification.

Our findings reveal that state-of-the-art VLMs exhibit striking biases. For example, when shown a counterfactual Adidas logo with four stripes instead of the usual three, the models often fail to recognize the alteration. On average, they achieve only 17.05% accuracy on counting tasks (such as enumerating stripes, bars, or patterns) across seven diverse domains, including animals, logos, chess, board games, optical illusions, and patterned grids. When textual hints, such as inserting the word “Adidas”, are added to these counterfactual images, VLM performance drops even further, indicating a strong pull of memorized associations over actual visual evidence.

Attempts to mitigate these biases through prompting strategies, such as instructing models to double-check answers or to rely solely on image content, improve counting accuracy by only about +2 percentage points on average, demonstrating how deeply ingrained these biases are.

Our work not only uncovers a novel failure mode in VLMs but also introduces an automated framework for testing VLM biases, providing a foundation for deeper investigation into the intersection of memorization, reasoning, and perception in multimodal AI systems.

Code, data, and an interactive demo are available at: https://vlmsarebiased.github.io

🌟 Try here yourself using our exact prompts and images on the project page!

Report abuse

Highlights

B-score: Detecting biases in large language models using response history

☉ Room 924, N1 building, KAIST,

291 Daehak-ro, Yuseong-gu, Daejeon, ROK.

☎ (+82) 42 350 7851

✉ kimd@kaist.ac.kr