This part identifies high-risk behaviours, platform weaknesses, and patterns of misuse by focusing on data extraction and preparation as well as the analysis of online abuse patterns and high-risk attributes.
Facts from the Publicly Available Datasets
Source Summary:
This dataset compiles National Crime Records Bureau (NCRB) data on cyber‑crimes against children in India from 2017 through 2021, including child pornography and other cyber‑offences reported across years and states/union territories.
Rapid Increase Over Time
NCRB reported that cyber‑crime against children jumped sharply in certain years. For example, 2020 saw over a 400% increase compared to 2019, largely due to offences involving publishing or transmitting sexually explicit content involving minors.
Earlier years showed rising counts too: from 79 cases in 2017 to 117 in 2018, then more in 2019 before the large jump in 2020.
Types of Offences
A large share of crimes against children involved child pornography or sexual material online.
These crimes include creation, sharing or possession of explicit material involving minors as well as other cyber offences where children are victims.
Geographic Patterns
Certain states like Uttar Pradesh, Karnataka, Maharashtra, Kerala and Odisha reported the highest numbers of cyber‑crime against children in specific years.
Patterns suggest that more populous or more digitally connected states tend to report higher numbers, though reporting practices and awareness also influence these figures.
Much of the increase aligns with greater online access during the pandemic years when children spent more time online for schooling and social connection.
The bulk of crimes recorded were distribution or transmission of illicit content, highlighting patterns of exploitation rather than purely technical hacking or fraud.
Source Summary:
This Office for National Statistics (ONS) dataset covers online habits, contact with strangers online, sexual messaging and bullying among 10–15‑year-olds in England and Wales.
High Internet Engagement
92.6% of children aged 10–15 went online daily or almost daily.
58.1% spent three or more hours a day online, up from 47.6% three years earlier.
Interactions with Unknown Contacts
35% accepted friend requests from people they did not know.
19.2% had exchanged messages with someone they never met in person, and 4.4% met them offline.
Nearly half (49.6%) of those interacting with unknown online contacts had no mutual connection, increasing risk.
Sexual Messaging
9.5% of children aged 13–15 received sexual messages in the past year, mostly via social media.
76.7% of those received them more than once.
Bullying and Harassment
19.1% experienced online bullying behaviours in the past year, and 34.9% experienced in‑person bullying.
Online bullying behaviours were more common among girls (22.5%) than boys (16.0%).
A notable share of victims didn’t report the bullying to anyone.
Online activity is pervasive and often unsupervised; even with parental awareness, children are engaging with unknown contacts and risky content.
Sexual messages and online bullying remain persistent issues, not just isolated events.
Source Summary:
This dataset (widely referenced in research and similar to ones hosted on OpenDataBay/Kaggle) contains tens of thousands of tweets labeled by type of cyberbullying or as non‑cyberbullying posts (e.g., gender, age, religion, ethnicity, other).
Size and Labels: Around 47,000 tweets, with six classes including categories such as gender, age, religion, ethnicity, other cyberbullying types, and not cyberbullying.
Use Case: Designed for training and evaluating models that can automatically detect different forms of online harassment and abusive language.
Cyberbullying detection models need to distinguish between contextual abuse and everyday language.
Categories like gender or ethnicity harassment reflect real dimensions of online abuse that affect women and minority groups disproportionately.
These annotated datasets are key for understanding patterns in language that reflect harassment rather than innocuous chatter.
Online Exposure Increases Risk.
Greater internet usage among children correlates with higher likelihood of risky interactions, whether bullying, sexual messaging, or predation.
Exploitation Often Involves Repetition and Reach.
The cyberbullying tweets data and UK survey both show that online harassment covers repeated abusive language and can spread widely due to the nature of social platforms.
Gendered Patterns Appear.
In England and Wales, girls experience higher rates of online bullying.
Cyberbullying categories like gender or sexuality harassment in tweet datasets point to vulnerabilities that intersect with societal biases.
Digital literacy programmes that teach critical judgement of strangers online and recognising grooming or unsafe requests.
Age‑appropriate privacy settings across all devices and platforms.
Encouragement to report bullying or inappropriate contact to trusted adults immediately.
Set clear guidelines for screen time and app use.
Engage in regular conversations about online experiences without judgment.
Use monitoring tools where appropriate, balanced with trust.
Integrate comprehensive online safety education into curricula.
Run workshops on identifying cyberbullying and healthy digital communication.
Ensure reporting systems are confidential and supportive.
Develop stronger reporting and takedown processes for harmful content involving minors.
Mandate age verification safeguards and limit minors’ exposure to high‑risk interactions
Support research‑informed AI detection of abuse (using datasets like the tweet classification one) with privacy safeguards.
Public awareness campaigns about cyber‑crime reporting, especially for child sexual exploitation.
Specialized units trained in digital evidence and child protection.
24/7 helplines and victim support services.