An intensively collected deepfake dataset in-the-wild with metadata
In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23)
In this paper, we thoroughly investigate deepfake videos across various platforms and introduce a newly collected deepfake dataset, called RWDF-23. Additionally, we conduct insightful analyses on different aspects of the data, considering the perspectives of both creators and viewers. We believe that our findings significantly contribute to future research by enabling the development of more robust and effective deepfake detection techniques in real-world scenarios, which are often challenging and unpredictable from various perspectives.
We provide the most comprehensive and diverse up-to-date dataset of deepfake videos collected from real-world sources from YouTube, TikTok, BilliBilli, and Reddit over querying four differnet languages created from 21 different countries, reflecting the latest deepfake generation methods.
We perform in-depth empirical analysis and characterization on how non-pornographic real-world deepfakes are created, used, and shared over different platforms, countries, races, and genders for different purposes.
We first laid ground for capturing and analyzing user responses and reactions on deepfakes from multiple perspectives, considering factors such as audience impression, sentiment, and the impact of deepfakes on trust and perception.