This page describes the annotation of a Turkish offensive language corpus. The corpus consist of randomly sampled tweets, and annotated in a similar way to OffensEval and GermEval.

For more details, see,

# The data

We distribute the data in a few alternative formats.

• troff-v1.0.tsv.gz contains the full data set as described in the paper. The file is formatted as TSV file, with four fields, namely, id, timestamp, text, and label. Note that this file uses quoting and preserves newlines in the original Tweets. Here the labeling is “flat”:
• non not offensive
• prof profanity, or non-targeted offense
• grp offense towards a group
• indv offense towards an individual
• oth offense towards an other (non-human) entity, often an event or organization
• offenseval2020-turkish.zip contains the data as used in OffensEval 2020 shared task. Please see the enclosed README file and official OffensEval web page for further information.

@inproceedings{coltekin2020lrec,