Warning: this page may contain offensive language samples in Turkish

This page describes the annotation of a Turkish offensive language corpus. The corpus consist of randomly sampled tweets, and annotated in a similar way to OffensEval and GermEval.

For more details, see,

The data

We distribute the data in a few alternative formats.

The annotations are distributed under the terms of Creative Commons Attribution License (CC-BY). Please cite the following paper, if you use this resource.

@inproceedings{coltekin2020lrec,
 author  = {\c{C}\"{o}ltekin, \c{C}a\u{g}r{\i}},
 year  = {2020},
 title  = {A Corpus of Turkish Offensive Language on Social Media},
 booktitle  = {Proceedings of The 12th Language Resources and Evaluation Conference},
 pages  = {6174--6184},
 address  = {Marseille, France},
 url  = {https://www.aclweb.org/anthology/2020.lrec-1.758},
}