Guided Topic Modeling with Word2Vec: A Technical Note



Dr. Stefan Salbrechter Prof. Dr. Thomas Dangl


We propose GTM (Guided Topic Modeling), an algorithm that enables the fast and flexible generation of comprehensive topic clusters from (a pair of) seed words. The unsupervised algorithm performs clustering in the word-embedding space while offering the possibility to adjust the characteristics of the topic clusters via several hyperparameters. Applications for this methodology are information retrieval, classification and the calculation of various topic indices from news feeds.

Weiterführende Informationen