DOCUMENTATION
________________
________________
Dataset Description
Appropriate and Inappropriate Content
The dataset contains over 140,000 videos from 160 YouTube channels. It includes 80 channels that comprise different categories of videos that are deemed unsuitable for children as per YouTube guidelines and FTC’s Children’s Online Privacy Protection Act (COPPA). These categories of videos include: classic cartoons edited to be inappropriate, wrongly-categorized gaming content for adults, inappropriate live content, and deceptive channels targeting children.
On the other hand, 80 channels for children are chosen based on their video categories’ compliance with YouTube's guidelines and FTC’s Children’s Online Privacy Protection Act (COPPA). Categories that are considered appropriate for children include: nursery rhymes, video game plays without any inappropriate content, kids’ toy demonstrations, toy ratings, and children's music or dance performance.
Table 1. Categories and examples of appropriate and inappropriate channels that are annotated by five of our co-authors
Manual Annotation Process
Five co-authors independently annotate this dataset. First, we come up with the categories of videos that are appropriate for kids to watch and the ones that are inappropriate for kids to watch. Then, for each of the identified categories, the five researchers who annotate this dataset, individually identify some popular channels using simple manual YouTube searches and watching videos of those channels. Next, the annotators independently label the channels that they haven't identified.
Inter-Annotator Agreement Process
We compute the agreement between raters using Bennett et al.'s S score. The S score value that we get is 0.85, which indicates a strong agreement between raters. While Cohen's Kappa measures the overall agreement between two raters, Bennett et al.'s S score is one of the common techniques used for calculating inter-annotator agreement even for more than two raters, as in our case. It accommodates the percentage of rater agreement that might be expected by chance, instead of just the simple agreement between raters, as with Cohen's Kappa
Crawling Data
Given that each single YouTube video have one unique identification code (ID), we can in turn download their meta-data to our local machine. Particularly, with the Google Cloud Platform, we are able to create an API key to access the YouTube service. After that, we use googleapiclient library in Python to send a request with a video’s ID to google cloud. Finally, upon the respond of the service, we can extract the video’s meta data: channel titles, video titles, thumbnail photos, tags, duration, video category, the number of views, likes, dislikes, and comments. Separately, we also obtain the video’s subtitle similarly.
Ethics
We collect only data publicly available on the Web and do not (1) interact with online users in any way nor (2) simulate any logged-in activity on YouTube or the other platforms. Therefore, the IRB approval was not required.
Pre-Processing Subtitles
Firstly, we split our dataset into training and test set subjected that their channel IDs are not overlapping. Furthermore, the number of appropriate and inappropriate videos in each dataset have to approximately equal. In particular, our training dataset has 49.2 thousand videos in total including 26.4 thousand videos each type. Likewise, the test set has 20.8 thousand videos in total with 10.4 thousand videos each type. For the training phase, we actively split every video's subtitle into chunks which have length of 100 words and drop the remaining sequence, then we randomly select chunks to form our training dataset that has 341,509 appropriate and 378,142 inappropriate sequences, respectively. Regarding the test set, we pre-process the subtitles the same as for the training without dropping the remaining of the subtitles. Therefore, testing videos can have different number of chunks. After the pre-processing step, our machine-learning model is applied.
Table 2: Number of videos in each category of training and test set in our dataset.
Table 3: Number of videos with and without subtitles per class.
Figure 2. Architecture diagram of our two-stage classification approach. In the former stage (a), BERT model is trained for enriching subtitle’s representation vectors. The "Lpre" can be either contrastive loss or binary cross-entropy loss. In the classification training stage (b), we propose to apply GRU module for aggregating features from different input types of one video.
References
[1] Sultan Alshamrani. 2020. Detecting and Measuring the Exposure of Children and Adolescents to Inappropriate Comments in YouTube. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3213–3216
[2] Camila Souza Aráujo, Gabriel Magno, Wagner Meira, Virgilio Almeida, Pedro Hartung, and Danilo Doneda. 2017. Characterizing videos, audience and advertising in Youtube channels for kids. In International Conference on Social Informatics. Springer, 341–359
[3] BBC. 2021. Alexa tells 10-year-old girl to touch live plug with penny. https://www.bbc.com/news/technology-59810383.
[4] Edward M Bennett, R Alpert, and AC Goldstein. 1954. Communications through limited-response questioning. Public Opinion Quarterly 18, 3 (1954), 303–308.
[5] Marina Buzzi. 2011. Children and YouTube: access to safe content. In Proceedings of the 9th ACM SIGCHI Italian Chapter International Conference on Computer-Human Interaction: Facing Complexity. 125–131.
[6] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
[7] Common-sense. 2020. YOUNG KIDS AND YOUTUBE. https://d2e111jq13me73.cloudfront.net/sites/default/files/uploads/research/2020youngkidsyoutube-reportfinal-releaseforweb.pdf.
[8] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.
[9] Alexandre Ashade Lassance Cunha, Melissa Carvalho Costa, and Marco Auŕelio C Pacheco. 2019. Sentiment analysis of youtube video comments using deep neural networks. In International Conference on Artificial Intelligence and Soft Computing. Springer, 561–570.
[10] Brian Dean. 2021. How Many People Use YouTube in 2021. https://backlinko.com/youtube-users [Online;accessed 01-December-2021]
[11] Carsten Eickhoff and Arjen P de Vries. 2010. Identifying suitable YouTube videos for children. 3rd Networked and electronic media summit (NEM) (2010).
[12] FTC. 2021. YouTube channel owners: Is your content directed to children? https://www.ftc.gov/newsevents/blogs/business-blog/2019/11/youtube-channel-owners-your-content-directed-children.
[13] Sam Gutelle. 2017. YouTube’s Algorithm Recommends Disturbing Videos To Kids, But Viewers Can Help Clean It Up. https://www.tubefilter.com/2017/11/09/youtube-kids-algorithm-videos/. [On-
line; accessed 01-December-2021].
[14] Kilem L Gwet. 2014. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.
[15] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.
[16] Internet-Matters. 2021. Learn about it. https://www.internetmatters.org/issues/inappropriate-content/learn-about-it/.
[17] Akari Ishikawa, Edson Bollis, and Sandra Avila. 2019. Combating the elsagate phenomenon: Deep learning architectures for disturbing cartoons. In 2019 7th International Workshop on Biometrics and Forensics (IWBF). IEEE, 1–6.
[18] Julia Jacobo. 2019. YouTube Kids video featuring suicide instructions removed after reports from parenting blog. https://abcnews.go.com/US/youtube-kids-video-featuring-suicide-instructions-removed-reports/
story?id=61326717. [Online; accessed 01-December-2021].
[19] Rishabh Kaushal, Srishty Saha, Payal Bajaj, and Ponnurangam Kumaraguru. 2016. KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube. In 2016 14th Annual Conference on Privacy, Security and Trust (PST). IEEE, 157–164.
[20] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[21] Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[22] Kostantinos Papadamou, Antonis Papasavva, Savvas Zannettou, Jeremy Blackburn, Nicolas Kourtellis, Ilias Leontiadis, Gianluca Stringhini, and Michael Sirivianos. 2020. Disturbed YouTube for
kids: Characterizing and detecting inappropriate videos targeting young children. In Proceedings of the international AAAI Conference on web and social media, Vol. 14. 522–533.
[23] Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert networks. arXiv preprint arXiv:1908.10084 (2019).
[24] Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition
. 815–823.
[25] Shubham Singh, Rishabh Kaushal, Arun Balaji Buduru, and Ponnurangam Kumaraguru. 2019. KidsGUARD: Fine grained approach for child unsafe video representation and detection. In
Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. 2104–2111.
[26] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.
[27] The-New-York-Times. 2017. On YouTube Kids, Startling Videos Slip Past Filters. https://www.nytimes.com/2017/11/04/business/media/youtube-kids-paw-patrol.html.
[28] YouTube. 2021. YouTube channel owners: Is your content directed to children? https://support.google.com/youtube/answer/9528076?#zippy=%5C%2Chow-do-i-know-if-i-should-set-my-content-as-made-for-kids.
[29] Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision . 19–27.