Habash, Nizar, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli, and Omar Kallas. 2022. Camel Treebank: An Open Multi-genre Arabic Dependency Treebank. In Proceedings of the Language Resources and Evaluation Conference (LREC). Marseille, France. [link]
Habash, Nizar, Reem Faraj, and Dima Taji. The Camel Treebank Guidelines. (V 0.9, 22Jun2022) [link]
In the Habash et al. (2022) paper, we introduced a dependency treebanking evaluation metric that can handle comparing trees with different numbers of words, which may result from spelling correction (merges/splits) in addition to different tokenizations. Camel-depeval compares two CoNLL-X or CoNLL-U files or directories, to obtain the tokenization F-score and POS tag accuracy, as well as the LAS, UAS, and label scores. [link]
Getting the data is easy!
Just fill out this google form and you'll be redirected to the data on Google Drive.
Palmyra is a platform-independent graphical tool for syntactic dependency annotation supporting languages that require complex morphological tokenization. Palmyra was used for developing the Camel Treebank. [link]