GLAD: Groningen Lightweight Authorship Detection

Abstract

We present a simple and effective approach to authorship verification for Dutch, English, Spanish and Greek, which can be easily ported to yet other languages. We train a binary linear classifier both on the features describing known and unknown documents individually, and on the joint features comparing these two types of documents. The list of feature types includes, among others, character n-grams, the lexical overlap, visual text properties and a compression measure. We obtain competitive results that outperform the baseline and position our system among the top PAN shared task participants.

Topics

6 Figures and Tables

Download Full PDF Version (Non-Commercial Use)