About
In the project titled "Samba: Identifying Inappropriate Videos for Young Children on YouTube", we propose a fusion model, called Samba, which uses both metadata and video subtitles for content classification. Using subtitles in the model helps better infer the true nature of a video improving classification accuracy. On a large-scale, comprehensive dataset of 70K videos, we show that Samba achieves 95% accuracy, outperforming metadata-only classifier by a large margin. We publicly release our dataset.
The dataset that was used in creating this language model contains both meta-data and subtitle data characterized for appropriate and inappropriate videos.
We have a formulated our definitions for each respective category based on Youtube’s guidelines and the FTC’s Children’s Online Privacy Protection Act (COPPA): videos labeled "appropriate" will have content relevant to the interests of toddlers and preschoolers (aged 1-5 years). Examples of appropriate content include videos centered around education, animals, and nursery rhymes. Videos labeled "inappropriate" will have content that is adult-specific, including graphic violence, nudity, and profanity. Visual examples of inappropriate videos, along with their given title, are shown below:
Our contributions are summarized as follows:
Collection and release of a comprehensive, labeled dataset of YouTube videos that are appropriate and inappropriate for young children. We created this dataset systematically, by devising various content categories that may be appropriate or inappropriate, based on publicly available sources for content categorization. We then sampled YouTube channels in these categories, labeled them manually and mined their content for our dataset. We ended up with 142~K videos, out of which we randomly sampled a balanced evaluation dataset, with equal proportion of appropriate and inappropriate videos.
A novel classification model for inappropriate videos for young children. We propose a subtitle and metadata based ensemble architecture, Samba, to achieve high classification accuracy and high robustness to manipulation by publishers. In comparison with closely related works (e.g. Papadamou et al. (2020)), our model achieves 8% higher classification accuracy.
Analysis and Recommendations. In addition to YouTube videos we use our model to classify advertisements shown alongside appropriate children videos. We also provide comprehensive recommendations for a safer YouTube experience for young viewers, which includes: (1) automatically identifying and tagging targeted demographics, kids vs. adults, during video uploads, (2) restricting videos on YouTube as a default setting, (3) aligning advertisements to video content and classification, (4) removing third-party advertisements from children videos. Finally, we discuss how our work could be extended to other platforms, or AI assistant devices, such as Alexa.