Skip to content

What you can do

topica is a general topic-modeling toolkit. This section tours what it does. If your goal is a publishable analysis, pair it with Publishing in a journal.

  • :material-shape: The models

    Thirteen model families, from LDA through STM, HDP, dynamic and supervised topics, to short-text models, all with one consistent API.

  • :material-broom: Preprocessing

    Tokenize, build a Corpus, prune the vocabulary, detect phrases, and split long documents while preserving metadata.

  • :material-chart-bell-curve: Covariates & STM

    Relate topics to document metadata: prevalence and content covariates, effect estimation, clustered SEs, GLM links.

  • :material-check-decagram: Diagnostics & validation

    Coherence, exclusivity, intrusion tests, stability, alignment, FREX labels, and pyLDAvis, all model-agnostic.

  • :material-compare: Distinguishing words

    Fighting Words: which words separate two corpora, with significance.

  • :material-message-text: Short text

    Models built for tweets, headlines, and survey answers (PT, GSDMM).

  • :material-arrow-right-circle: Held-out inference

    transform new documents onto a fitted model across every model family.

Everything returns NumPy arrays, fits are deterministic for a fixed seed, and the variational models parallelize across cores automatically.