What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

D'Agostino, Federico; Schwetlick, Lisa; Bethge, Matthias; Kümmerer, Matthias

What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses

Federico D'Agostino, Lisa Schwetlick, Matthias Bethge^*, Matthias Kümmerer^*

Tübingen AI Center, University of Tübingen
EPFL
^*Shared Senior Authors

Abstract

Understanding how humans move their eyes to gather visual information is a central question in neuroscience, cognitive science, and vision research. While recent deep learning (DL) models achieve state-of-the-art performance in predicting human scanpaths, their underlying decision processes remain opaque. At an opposite end of the modeling spectrum, cognitively inspired mechanistic models aim to explain scanpath behavior through interpretable cognitive mechanisms but lag far behind in predictive accuracy. In this work, we bridge this gap by using a high-performing deep model - DeepGaze III - to discover and test mechanisms that improve a leading mechanistic model, SceneWalk. By identifying individual fixations where DeepGaze III succeeds and SceneWalk fails, we isolate behaviorally meaningful discrepancies and use them to motivate targeted extensions of the mechanistic framework. These include time-dependent temperature scaling, saccadic momentum and an adaptive cardinal attention bias: Simple, interpretable additions that substantially boost predictive performance. With these extensions, SceneWalk's explained variance on the MIT1003 dataset doubles from 35% to 70%, setting a new state of the art in mechanistic scanpath prediction. Our findings show how performance-optimized neural networks can serve as tools for cognitive model discovery, offering a new path toward interpretable and high-performing models of visual behavior.

TL;DR

A systematic fixation-level comparison of a performance-optimized DNN scanpath model and a mechanistic cognitive model reveals behaviourally relevant mechanisms that can be added to the mechanistic model to substantially improve performance.

Overview — (a) We systematically compare prediction performances of a mechanistic scanpath model (SceneWalk) to a high-performing DNN model (DeepGaze III) to find situations that could be predicted very well but are not by the mechanistic model. Inspecting these extreme cases results in ideas for effects than can be confirmed through further analyses and give rise to mechanisms that are added to the mechanistic model. (b) This process yields three new mechanisms that double SceneWalk's predictive performance on MIT1003, substantially narrowing the explainability gap.

Video Presentation

Introduction

Science often faces a choice:
Build models primarily designed to predict, or models that compactly explain. But what if we used them in synergy?

Our paper tackles this head-on. We combine a deep network (DeepGaze III) with an interpretable mechanistic model (SceneWalk).

💡 Our idea: Use the deep model not just to chase performance, but as a tool for scientific discovery.
We isolate "controversial fixations" where DeepGaze's likelihood vastly exceeds SceneWalk's. These reveal where the mechanistic model fails to capture predictable patterns.

Controversial fixations — Controversial Fixations: Those fixations in the dataset where SceneWalk loses most performance compared to DeepGaze III in terms of LLD or WLLD. Numbers on top of predictions indicate the achieved log-likelihood relative to a uniform model for the fixation in question.

From these systematic failures, we isolated three critical mechanisms SceneWalk was missing. The data pointed to known cognitive principles, but revealed critical new nuances. Our method showed us not just what was missing, but how to formulate it to match human behavior. 👇

New mechanisms

Time dependent temperature scaling. We found that DeepGaze shows higher confidence (less entropy) early on (a) and predicts fixations in more salient locations (b) in agreement with the empirical data. SceneWalk is not showing this effect. To address this, we introduced fixation-index- dependent temperature scaling (c), modeled with exponential decay (d), which improves predictions for both early and late fixations (e).

Saccadic momentum: Controversial fixations suggested that DeepGaze sometimes prefers saccades to continue in the same direction. We confirmed such a saccadic momentum effect especially after long saccades (a) and later in scanpaths (b). These effects remain even after controlling for the distribution of salient objects (dashed lines), indicating a genuine directional bias. We added a saccadic momentum and return mechanism to SceneWalk (c) modulated by previous saccade length (d) and fixation index (e), improving predictions for "ongoing saccades" controversial fixations (f).

Horizontal and left-wards attention bias: Controversial fixations suggested that DeepGaze predicts early saccades to go to the left. This is indeed the case in the data (a) and the DeepGaze predictions (b). The effect persists when DeepGaze is run with uniform saliency maps (c) while SceneWalk (d) shows a too weak effect that increases over time. Therefore, we added a cardinal attention bias to SceneWalk, than can additionally have a left-asymmetry (e) and can adapt over time (f, g). This improves predictions on the relevant controversial fixations (f).

Results

These 3 mechanisms double SceneWalk's explained variance on the MIT1003 dataset (from 35 % → 70 %)! We closed over 56 % of the gap to deep networks, setting a new State-of-the-Art for mechanistic scanpath prediction.
Conceptually: Deep neural networks should be viewed as scientific instruments. They tell us what is predictable in human behavior. We then use that information to ask why, building fully interpretable models that approach the performance of their black-box counterparts.

BibTeX

@article{dagostino2024what,
  title={What Moves the Eyes: Doubling Mechanistic Model Performance Using Deep Networks to Discover and Test Cognitive Hypotheses},
  author={D'Agostino, Federico and Schwetlick, Lisa and Bethge, Matthias and Kümmerer, Matthias},
  journal={NeurIPS},
  year={2025}
}