The training dataset consists of ≃ 5700 Reddit posts (main posts or comments) from Argentina, Chile, Colombia, and México. These posts contain at least one of the LGBTQ+ keywords of our lexicon: trans, lgbt, gay/gays, lesbiana/as, bisexual/es, asexual/es, transexual/les, travesti/is, queer/s, transgénero, pansexual/es, intersexual/es.
The training set has been released. Check at our Codabench
The development set contains ≃ 1400 Reddit posts (main posts or coments), from Argentina, Chile, Colombia, and México. These posts contin at least one of the LGBTQ+ keywords of our lexicon.
The develoment set has been released. Check at our Codabench