Datasets

Dataset Overview

This dataset supports our ASONAM 2025 study on the authenticity of LinkedIn profiles. It contains 4,200 profiles distributed as follows:

It is the first dataset designed to test profile detection models against real, manually faked, and LLM-generated adversarial examples.

Profile Categories

1. Legitimate LinkedIn Profiles (LLPs)

2. Manual Fake Profiles (FLPs)

3. GPT-3.5 Fake Profiles (GPT3.5Ps)

4. GPT-4 Fake Profiles (GPT4Ps)

Data Structure and Features

Each profile includes both textual and numeric information:

All profiles are consistently structured and formatted for model ingestion or manual inspection.

Similarity Metrics and Preprocessing

Cosine Similarity

We measured textual similarity between generated and legitimate profiles: