What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

Tübingen AI Center, University of Tübingen
EPFL

*Shared Senior Authors

Abstract

Understanding how humans move their eyes to gather visual information is a central question in neuroscience, cognitive science, and vision research. While recent deep learning (DL) models achieve state-of-the-art performance in predicting human scanpaths, their underlying decision processes remain opaque. At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model - DeepGaze III - to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. By identifying individual fixations where DeepGaze III succeeds and SceneWalk fails, we isolate behaviorally meaningful discrepancies and use them to motivate targeted extensions of the mechanistic framework. These include time-dependent temperature scaling, saccadic momentum and an adaptive cardinal attention bias: Simple, interpretable additions that substantially boost predictive performance. With these extensions, SceneWalk's explained variance on the MIT1003 dataset doubles from 35% to 70%, setting a new state of the art in mechanistic scanpath prediction. Our findings show how performance-optimized neural networks can serve as tools for cognitive model discovery, offering a new path toward interpretable and high-performing models of visual behavior.

TL;DR

A systematic fixation-level comparison of a performance-optimized DNN scanpath model and a mechanistic cognitive model reveals behaviourally relevant mechanisms that can be added to the mechanistic model to substantially improve performance.



Overview
(a) We systematically compare prediction performances of a mechanistic scanpath model (SceneWalk) to a high-performing DNN model (DeepGaze III) to find situations that could be predicted very well but are not by the mechanistic model. Inspecting these extreme cases results in ideas for effects than can be confirmed through further analyses and give rise to mechanisms that are added to the mechanistic model. (b) This process yields three new mechanisms that double SceneWalk's predictive performance on MIT1003, substantially narrowing the explainability gap.

Video Presentation

Introduction

Science often faces a choice:
Build models primarily designed to predict, or models that compactly explain. But what if we used them in synergy?

Our paper tackles this head-on. We combine a deep network (DeepGaze III) with an interpretable mechanistic model (SceneWalk).

💡 Our idea: Use the deep model not just to chase performance, but as a tool for scientific discovery.
We isolate "controversial fixations" where DeepGaze's likelihood vastly exceeds SceneWalk's. These reveal where the mechanistic model fails to capture predictable patterns.


Controversial fixations
Controversial Fixations: Those fixations in the dataset where SceneWalk loses most performance compared to DeepGaze III in terms of LLD or WLLD. Numbers on top of predictions indicate the achieved log-likelihood relative to a uniform model for the fixation in question.


From these systematic failures, we isolated three critical mechanisms SceneWalk was missing. The data pointed to known cognitive principles, but revealed critical new nuances. Our method showed us not just what was missing, but how to formulate it to match human behavior. 👇


New mechanisms

Results

These 3 mechanisms double SceneWalk's explained variance on the MIT1003 dataset (from 35 % → 70 %)! We closed over 56 % of the gap to deep networks, setting a new State-of-the-Art for mechanistic scanpath prediction.
Conceptually: Deep neural networks should be viewed as scientific instruments. They tell us what is predictable in human behavior. We then use that information to ask why, building fully interpretable models that approach the performance of their black-box counterparts.



Results
Contributions of the different added mechanisms on the MIT1003 and DAEMONS datasets.

BibTeX

@article{dagostino2024what,
  title={What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses},
  author={D'Agostino, Federico and Schwetlick, Lisa and Bethge, Matthias and Kümmerer, Matthias},
  journal={NeurIPS},
  year={2025}
}