Navigating the Complexity of Hateful Meme Detection

26 Apr 2024


(1) Rui Cao, Singapore Management University;

(2) Ming Shan Hee, Singapore University of Design and Technology;

(3) Adriel Kuek, DSO National Laboratories;

(4) Wen-Haw Chong, Singapore Management University;

(5) Roy Ka-Wei Lee, Singapore University of Design and Technology

(6) Jing Jiang, Singapore Management University.

Abstract and Introduction

Related Work


Proposed Method


Conclusion and References


Memes, typically intended to be humorous or sarcastic, are increasingly being exploited for the proliferation of hateful content, leading to the challenging task of online hateful meme detection [5, 12, 27]. To combat the spread of hateful memes, one line of work regards the hateful meme detection as a multimodal classification task. Researchers have applied pre-trained vision language models (PVLMs) and fine-tune them based on meme detection data [20, 26, 34, 37]. To improve performance, some have tried model ensembling [20, 26, 34]. Another line of work considers combining pre-trained models (e.g., BERT [4] and CLIP [29]) with task specific model architectures and tunes them end-to-end [13, 14, 28]. Recently, authors in [2] have tried converting all meme information into text and prompting language models to better leverage the contextual background knowledge present in language models. This approach achieves the state-of-the-art results on two hateful meme detection benchmarks. However, it adopts a generic method for describing the image through image captioning, often ignoring important factors necessary for hateful meme detection. In this work, we seek to address this issue through probe-based captioning by prompting pre-trained vision-language models with hateful content-centric questions in a zero-shot VQA manner.

This paper is available on arxiv under CC 4.0 license.