Improving automated content analysis with news-specific word embeddings for medium-resourced languages

Image credit: Unsplash


In this contribution, we investigate whether it is worth the effort to train a custom model rather than relying on (limited) available pre-trained models. For the case of Dutch, few embedding models are available, and they are trained on ordinary human language from the World Wide Web. These models capture the specifics of news article data less well and are therefore likely to be less suitable to study and understand dynamics of this domain.

Feb 7, 2019 9:30 AM
Felicia Loecherbach
Felicia Loecherbach
Assistant Professor Political Communication and Journalism

My research interests include understanding news consumption online making use of theories from political communication and journalism. I use computational methods to study digital trace data. Only publishing research and tools open source.