Institution

University of California, Davis

A public research university with active groups in natural language processing, multimodal learning, and trustworthy AI.

Multimodal Models · University of California, Davis

When Vision Speaks for Sound: The Audio-Visual Clever Hans Effect

Top video models look like they hear audio but really guess it from the picture. This paper's THUD probes catch the cheat, and a 10K-sample fix lifts audio grounding by 28 points.