What you can do¶
topica is a general topic-modeling toolkit. This section tours what it does. If your goal is a publishable analysis, pair it with Publishing in a journal.
-
:material-shape: The models
Thirteen model families, from LDA through STM, HDP, dynamic and supervised topics, to short-text models, all with one consistent API.
-
:material-broom: Preprocessing
Tokenize, build a
Corpus, prune the vocabulary, detect phrases, and split long documents while preserving metadata. -
:material-chart-bell-curve: Covariates & STM
Relate topics to document metadata: prevalence and content covariates, effect estimation, clustered SEs, GLM links.
-
:material-check-decagram: Diagnostics & validation
Coherence, exclusivity, intrusion tests, stability, alignment, FREX labels, and pyLDAvis, all model-agnostic.
-
:material-compare: Distinguishing words
Fighting Words: which words separate two corpora, with significance.
-
:material-message-text: Short text
Models built for tweets, headlines, and survey answers (
PT,GSDMM). -
:material-arrow-right-circle: Held-out inference
transformnew documents onto a fitted model across every model family.
Everything returns NumPy arrays, fits are deterministic for a fixed seed, and
the variational models parallelize across cores automatically.