Description
The Pornography database contains nearly 80 hours of 400 pornographic and 400 non-pornographic videos. For the pornographic class, we have browsed websites which only host that kind of material (solving, in a way, the matter of purpose). The database consists of several genres of pornography and depicts actors of many ethnicities, including multi-ethnic ones. For the non-pornographic class, we have browsed general-public purpose video network and selected two samples: 200 videos chosen at random (which we called "easy") and 200 videos selected from textual search queries like "beach", "wrestling", "swimming", which we knew would be particularly challenging for the detector (called "difficult"). In the figure below, we illustrate the diversity of the pornographic videos (top row) and the challenges of the “difficult” non-pornographic ones (middle row). The "easy" cases are shown at bottom row. The huge diversity of cases in both pornographic and non pornographic videos makes this task very challenging.

Illustration of the diversity of the pornographic videos (top row) and the challenges of the “difficult” non-pornographic ones (middle row). The easy cases are shown at bottom row. The huge diversity of cases in both pornographic and non pornographic videos makes this task very challenging.I

A summary of the Pornography database.   
Class Videos  Hours  Shots per Video 
Porn 400 57 15.6
Non-porn ("easy") 200 11.5 33.8
Non-porn ("difficult") 200 8.5 17.5
All videos 800 77 20.6
Ethnic diversity on the pornographic videos.
Ethnicity  % of Videos
Asians 16%
Blacks 14%
Whites 46%
Multi-ethnic 24%






Data Preprocessing
We preprocess the database by segmenting videos into shots. An industry-standard segmentation software, the STOIK Video Converter, has been used. As it is often done in video analysis, a key frame is selected to summarize the content of the shot into a static image. Although there are sophisticated ways to choose the key frame, we opted to simply selected the middle frame of each video shot. In total, there are 16,727 video segments.

Evaluation
The experimental evaluation is a classical 5-fold cross-validation. We report the image classification performance by using the Mean Average Precision (MAP), and the video classification by Accuracy Rate, where the final video label is obtained by majority voting over the images. A confusion table is also used to illustrate the results.

Disclaimer 
THIS DATABASE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The videos, segments, and images provided were produced by third-parties, who may have retained copyrights. They are provided strictly for non-profit research purposes, and limited, controlled distributed, intended to fall under the fair-use limitation. We take no guarantees or responsibilities, whatsoever, arising out of any copyright issue. Use at your own risk.

Downloads
It is necessary to sign a license agreement to get access to the data (videos, video segments and frames), you can find the license agreement herePlease print it, sign it and send a scanned copy to Arnaldo Araújo <arnaldo [at] dcc [dot] ufmg [dot] br>; see also the instructions page in the document for more information.

We have computed several visual features. They are freely available for download:
In order to make the comparison possible, the training and test folds are available for download:
Citation
If you make use of the Pornography database, please cite the following reference: 
  • Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, Arnaldo de A. Araújo. Pooling in Image Representation: the Visual Codeword Point of View. Computer Vision and Image Understanding (CVIU), volume 117, issue 5, p. 453-465, 2013. [ DOI | BibTex ]
Literature
Papers reporting results on the Pornography database:
  • Mauricio Perez, Sandra Avila, Daniel Moreira, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, Anderson Rocha. Video Pornography Detection through Deep Learning Techniques and Motion Information. Neurocomputing, volume 230, p. 279-293, 2017. [ DOI | PDF ]  
  • Daniel Moreira, Sandra Avila, Mauricio Perez, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein, Anderson Rocha. Pornography Classification: The Hidden Clues in Video Space-Time. Forensic Science International, volume 268, p. 46-61, 2016. [ DOIPDF ]
  • Carlos Caetano, Sandra Avila, William R. Schwartz, Silvio Guimarães, Arnaldo de A. Araújo. A Mid-Level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection. Neurocomputing, volume 213, p. 102-114, 2016. [ DOI | PDF ]
  • Mohamed N. Moustafa. Applying Deep Learning to Classify Pornographic Images and Videos. In: 7th Pacific-Rim Symposium on Image and Video Technology (PSIVT), Auckland, New Zealand, 2015. [ PDF ]
  • Carlos Caetano, Sandra Avila, Silvio Guimarães, Arnaldo de A. Araújo. Pornography Detection using BossaNova Video Descriptor. In: 22nd European Signal Processing Conference (EUSIPCO), p. 1681-1685, Lisbon, Portugal, 2014. [ PDF ]
  • Carlos Caetano, Sandra Avila, Silvio Guimarães, Arnaldo de A. Araújo. Representing Local Binary Descriptors with BossaNova for Visual Recognition. In: 29th ACM Symposium on Applied Computing (SAC), p. 49-54, Gyeongju, Korea, 2014. [ PDF ]
  • Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, Arnaldo de A. Araújo. Pooling in Image Representation: the Visual Codeword Point of View. Computer Vision and Image Understanding (CVIU), volume 117, issue 5, p. 453-465, 2013. DOI ]
  • Fillipe Souza, Eduardo Valle, Guillermo Cámara-Chávez, Arnaldo de A. Araújo. An Evaluation on Color Invariant based Local Spatiotemporal Features for Action Recognition. In: 25th Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, Brazil, 2012. [ PDF
  • Eduardo Valle, Sandra Avila, Fillipe Souza, Marcelo Coelho, Arnaldo de A. Araújo. Content-based Filtering for Video Sharing Social Networks. In: 12th Brazilian Symposium on Information and Computer System Security (SBSeg), Workshop on Computer Forensics, p. 625-638, Curitiba, Brazil, 2012. [ PDF ]
  • Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, Arnaldo de A. Araújo. BOSSA: Extended BoW Formalism for Image Classification. In: 18th International Conference on Image Processing (ICIP), p. 2966-2969, Brussels, Belgium, 2011. [ PDF ]
Organizers
  • Sandra Avila Professor of the Institute of Computing at UNICAMP, Brazil, sandra [at] ic [dot] unicamp [dot] br
  • Eduardo Valle Professor of the School of Electrical and Computer Engineering at UNICAMP, Brazil, dovalle [at] dca [dot] fee [dot] unicamp [dot] br
  • Arnaldo de A. Araújo Professor of the Computer Science Department at UFMG, Brazil, arnaldo [at] dcc [dot] ufmg [dot] br
Please feel free to contact us if you have any questions or comments.

Acknowledgements
This work is supported by CAPES, CNPq, FAPEMIG, FAPESP.

Last updated on February 2017.