Guided Topic Modeling with Word2Vec: A Technical Note

19.09.2024

Autoren

Dr. Stefan Salbrechter Prof. Dr. Thomas Dangl

Abstract

We propose GTM (Guided Topic Modeling), an algorithm that enables the fast and flexible generation of comprehensive topic clusters from (a pair of) seed words. The unsupervised algorithm performs clustering in the word-embedding space while offering the possibility to adjust the characteristics of the topic clusters via several hyperparameters. Applications for this methodology are information retrieval, classification and the calculation of various topic indices from news feeds.

Weiterführende Informationen