AI-GenBench:

A New Ongoing Benchmark for AI-Generated Image Detection

Lorenzo Pellegrini1, Davide Cozzolino2, Serafino Pandolfini1, Davide Maltoni1,
Matteo Ferrara1, Luisa Verdoliva2, Marco Prati3, Marco Ramilli3

1 MI@BioLab - Department of Computer Science and Engineering, University of Bologna, Cesena, Italy
2 GRIP - Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Naples, Italy
3 IdentifAI, Italy

Presented at Verimedia workshop, IJCNN 2025


Immagine illustrativa

Abstract

The rapid advancement of generative AI has revolutionized image creation, enabling high-quality synthesis from text prompts while raising critical challenges for media authenticity. We present AI-GenBench, a novel benchmark designed to address the urgent need for robust detection of AI-generated images in real-world scenarios. Unlike existing solutions that evaluate models on static datasets, AI-GenBench introduces a temporal evaluation framework where detection methods are incrementally trained on synthetic images, historically ordered by their generative models, to test their ability to generalize to new generative models, such as the transition from GANs to diffusion models. Our benchmark focuses on high-quality, diverse visual content and overcomes key limitations of current approaches, including arbitrary dataset splits, unfair comparisons, and excessive computational demands. AI-GenBench provides a comprehensive dataset, a standardized evaluation protocol, and accessible tools for both researchers and non-experts (e.g., journalists, factcheckers), ensuring reproducibility while maintaining practical training requirements. By establishing clear evaluation rules and controlled augmentation strategies, AI-GenBench enables meaningful comparison of detection methods and scalable solutions. Code and data are publicly available to ensure reproducibility and to support the development of robust forensic detectors to keep pace with the rise of new synthetic generators.

Framework

Unlike traditional approaches that evaluate models on static datasets, AI-GenBench introduces a temporal evaluation framework for AI-generated image detection. In this setting, detection models are incrementally trained on synthetic images ordered by the historical release of their generative models. This setup tests how well detectors can generalize to new generation techniques, such as the transition from GANs to diffusion models.
The goal of AI-GenBench is to provide a benchmark for assessing the robustness of detection models across both past and future image generation methods. It includes training and evaluation datasets covering a wide range of image generators released between 2017 and 2024, spanning from early GANs to the latest diffusion-based models. The benchmark also offers a PyTorch Lightning–based framework for training and evaluating detection models, publicly released and maintained on GitHub.

Geneators inclueded in the framework

Leaderboard

In this leaderboard we will include the evaluation results on the AI-GenBench benchmark. To submit a candidate algorithm for evaluation please contact us! The only requirement is that both:

  • the method codebase
  • a report or paper describing the method must be publicly available
Please note that you may freely use the dataset to train and evaluate your model without following the sliding-windows benchmark protocol. However, only methods that follow the benchmark protocol will be included in the leaderboard. We here report the Area Under the ROC Curve (AUROC) of the methods that have been evaluated on the benchmark so far.

Model Name Author / Team Submission Date # Parameters AUROC References
Past Period Next Period Whole Period
ViT-L/14 DINOv2 Baseline from
paper authors
Jul/2025 304M 99.1% 94.2% 97.9%
ViT-L/14 CLIP Baseline from
paper authors
Jul/2025 304M 98.1% 92.0% 97.0%
ResNet-50 CLIP Baseline from
paper authors
Jul/2025 38M 89.9% 81.8% 88.9%

BibTeX

If you use this benchmark and/or code in your research, please cite our paper:


@misc{pellegrini2025aigenbenchnewongoingbenchmark,
      title={AI-GenBench: A New Ongoing Benchmark for AI-Generated Image Detection}, 
      author={Lorenzo Pellegrini and Davide Cozzolino and Serafino Pandolfini and Davide Maltoni and Matteo Ferrara and Luisa Verdoliva and Marco Prati and Marco Ramilli},
      year={2025},
      eprint={2504.20865},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.20865}, 
}