[Online] Can you memorize the images you just encountered? A study of visual recognition memory


Memory researchers have been trying to predict whether and to what extent an image can be memorized. Some proposed that memorability is a property of the image itself (Bainbridge, 2019), while others also considered that the learning context of image also mattered (Konkle et al., 2010). In particular, if a stimulus suffers interference from more rivalries from the same category, the stimulus will be worse remembered. Despite being supported by myriad of empirical evidence, neither of the intrinsic nor extrinsic effects has been framed in a mechanistic model of recognition memory. Here we proposes the application of global matching model which pustulates that recognition is based on the summed similarity between the representation of the target stimulus and representations of all the other stimuli in the memory. Despite the demonstrated capability to capture both the intrinsic and contextual effects (Nosofsky et al., 2011), a limitation of global matching model lies in generating accurate representations for common images. Evidence indicated that representation generated by deep learning computational network could be a promising candidate as the remedy to the limitation.

Research Questions / Hypotheses

This study aims to advance the theoretical understanding of visual memorability variations by

  1. replicate the category length effect that memory performance decreases as number of exemplars of same category increases;
  2. examine how the global matching model, when combining CNN representation as the input, can predict variations in memory performance (average HR and FAR from human data) to different image categories.


44 participants completed the study and 5 participants were no-show for unclear reasons.


We asked the participants to perform a online recognition task which required them to study and remember a large set of common images. After a simple math distraction task, the participants were asked to respond to a series of repeat detection task (whether they had seen the images or not). Three types of stimulus may be present in the test phrase. The stimuli can either be a target (an image that is present in the study phrase), a related lure (an unseen image that belongs to the same category of the target), or an unrelated lure (an unseen images that does not belong to any of the categories seen in the study phrase).


This is an ongoing study. The data analysis will mainly involve modelling of the data using a variety of CNNs (e.g., ResNet50) and global matching similarity computation.


Results of the study will speak to the mechanism of human recognition memory. The study has the potential to advance the understanding of how the cognitive model predicts memorability and how computational models can be refined to predict memorability scores. Such advancement has real-life implication regarding image selection (e.g., what images educators should select to promote memorization).