What you can do¶

topica is a general topic-modeling toolkit. This section tours what it does. If your goal is a publishable analysis, pair it with Publishing in a journal.

The models

More than a dozen model families, from LDA through STM and its STS sentiment-discourse extension, HDP, dynamic and supervised topics, to short-text and embedding-based models, all with one consistent API.
Preprocessing

Tokenize, build a Corpus, prune the vocabulary, detect phrases, and split long documents while preserving metadata.
Covariates & STM

Relate topics to document metadata: prevalence and content covariates, effect estimation, clustered SEs, GLM links.
Diagnostics & validation

Coherence, exclusivity, intrusion tests, stability, alignment, ensemble consensus across runs, FREX labels, and pyLDAvis, all model-agnostic.
Distinguishing words

Fighting Words: which words separate two corpora, with significance.
Short text

Models built for tweets, headlines, and survey answers (PT, GSDMM).
Held-out inference

transform new documents onto a fitted model across every model family.

Everything returns NumPy arrays, fits are deterministic for a fixed seed, and the variational models parallelize across cores automatically.