Data

The datasets are composed by texts written by multiple users, with possibly multiple posts per user.

Format

The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gender (male|female), and the age range ([0,19], [20,29], [30-39], [40-49], [50-100]). This is a sample: