Fake or News? Individual Differences in the Detection of Deepfake Videos

Background

Realistic fake videos generated through artificial intelligence (AI) (known as deepfakes) are a relatively new and potentially dangerous form of online disinformation, challenging our trust in visual information as a reliable indicator of truth. Much research in recent years has focused on the development of AI methods of deepfake detection; human ability to detect deepfakes, however, remains almost entirely unexplored. Recent studies have explored the influence of personality on both visual and auditory perception. Higher trait-level positive schizotypy has been found to be associated with poorer performance in visual discrimination tasks involving noisy stimuli.

Research Questions / Hypotheses

We used a video stimulus set drawn from the Deepfake Detection Challenge database to investigate (1) the extent to which humans can detect state of the art deepfakes; and (2) whether personality influences the ability to do so. We hypothesised that higher positive schizotypy would be associated with poorer detection performance.

Participants

104 participants undertook the study. Pairwise exclusion was applied to participants who had viewed less than 80% of the stimuli in the pairs and/or singles detection tasks. On this basis, one participant was excluded from singles analyses and six from the pairs analyses.

Methods

Participants completed a 60 minute online survey. First they were asked to identify the fake in pairs of videos (10 in total) comprising an original and its manipulated version; and secondly they were asked to indicate whether single videos (50 in total) were real or fake. Participants completed the BFAS as a measure of the Big 5 personality dimensions, and the short O-LIFE as a measure of trait-level schizotypal sub-types.

Results

We found people could identify deepfakes approximately 63% of the time, but there was a wide range of ability with some people performing very poorly (at chance) and some achieving above 80% accuracy. Based on qualitative data, people who performed comparatively poorly in the detection tasks appeared to be relying on video characteristics which were not relevant to the authenticity judgement. Overall, participants found identification of fakes more challenging in poorer quality videos. No relationship between positive schizotypy and detection performance was identified. We did, however, find a negative association between Introvertive Anhedonia (a negative schizotypal trait) and detection performance in the pairs task.

Implications

Results suggest people struggle to identify deepfakes and are therefore vulnerable to this form of disinformation. Human performance was found to be broadly comparable to current AI detection rates. Around 2% of our sample were "super-detectors" who were highly sensitive to detecting artefacts in the stimulus, and were correct around 80% of the time. At the other end of the spectrum were people who performed at chance levels, and who either did not notice artefacts or viewed them as less important in their veracity judgement, instead focusing on the credibility of the speaker, which was not relevant to the task. The extent to which expertise influenced performance was not a focus of the study but would be interesting to explore. The fact that we did not find a relationship between positive schizotypy and detection performance may be due to the complexity of our task. The most relevant to our study is the work of Partos et al., 2016, in which it was found those higher in trait-level positive schizotypy made more errors in a discrimination task using face stimuli embedded in noise. Based on this earlier work, we speculated that those higher in the trait might make also more errors in our detection tasks, both in the form of false alarms and misses. We did not find this to be the case. Although our tasks also involved making a determination as to whether a signal was present in an inherently noisy stimulus, it was considerably more complex, involving consideration of a range of factors including consistency in lighting and sound as well as comparison of facial features across frames. Our results indicated that people who displayed higher trait Introvertive Anhedonia performed more poorly on the detection tasks. Introvertive Anhedonia is a measure of an individual’s solitariness and capacity to experience sensory pleasure (Claridge et al., 1996). Previous research has identified that Social Anhedonia (a comparable construct), is associated with deficits in the processing of emotional expression in faces and that these deficits may be due to the adoption of a different approach to the processing of facial stimuli: face perception is understood to rely mainly on configural processing, whereas higher Introvertive Anhedonia is associated with a tendency to process visual stimuli using a featural rather than configural approach. Although we might expect both configural and featural processing to be important in our detection tasks: configural to identify anomalies in spatial relationships between facial features, and featural to identify oddities in individual facial features, this finding highlights the importance of the configural approach in the task. Further, it supports the view that higher Introvertive Anhedonia might be associated with greater vulnerability to deepfakes, which opens up new and interesting lines of inquiry.