Skip to content

Diagnostics

Model-agnostic quality, interpretation, and validation tools. They take any fitted model's topic_word / doc_topic (or raw arrays), so they work the same across every model family. All are available at the top level (topica.<name>) and in topica.diagnostics.

Quality

topica.coherence

Topic coherence and diversity diagnostics.

Windowed PMI-based coherence measures (Röder, Both & Hinneburg, Exploring the Space of Topic Coherence Measures, WSDM 2015) alongside UMass (Mimno et al. 2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single gensim-style coherence_type= switch:

  • "u_mass" — document co-occurrence, intrinsic; range roughly (-inf, 0].
  • "c_uci" — pairwise PMI over a sliding window (Newman et al. 2010).
  • "c_npmi" — pairwise normalized PMI; range [-1, 1].
  • "c_v" — the indirect-cosine/NPMI measure that correlates best with human judgements in Röder et al.; range roughly [0, 1].

Every measure scores each topic's top words against a reference corpus of tokenized documents. By default that is your training corpus, but — as with gensim's :class:CoherenceModel — you can pass any external reference (e.g. a Wikipedia dump) via texts for a more human-aligned signal. topic_diversity reports the fraction of unique words across all topics' top-N, the standard companion to coherence in modern topic-model papers.

These are pure-Python/numpy and work with any model here: pass a fitted model (its top words are read automatically) or an explicit list of word lists.

__cached__ module-attribute

__cached__ = '/home/runner/work/topica/topica/python/topica/__pycache__/coherence.cpython-311.pyc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__doc__ module-attribute

__doc__ = 'Topic coherence and diversity diagnostics.\n\nWindowed PMI-based coherence measures (Röder, Both & Hinneburg, *Exploring the\nSpace of Topic Coherence Measures*, WSDM 2015) alongside UMass (Mimno et al.\n2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single\ngensim-style ``coherence_type=`` switch:\n\n- ``"u_mass"``  — document co-occurrence, intrinsic; range roughly ``(-inf, 0]``.\n- ``"c_uci"``   — pairwise PMI over a sliding window (Newman et al. 2010).\n- ``"c_npmi"``  — pairwise normalized PMI; range ``[-1, 1]``.\n- ``"c_v"``     — the indirect-cosine/NPMI measure that correlates best with human\n  judgements in Röder et al.; range roughly ``[0, 1]``.\n\nEvery measure scores each topic\'s top words against a *reference corpus* of\ntokenized documents. By default that is your training corpus, but — as with\ngensim\'s :class:`CoherenceModel` — you can pass any external reference (e.g. a\nWikipedia dump) via ``texts`` for a more human-aligned signal. ``topic_diversity``\nreports the fraction of unique words across all topics\' top-N, the standard\ncompanion to coherence in modern topic-model papers.\n\nThese are pure-Python/numpy and work with any model here: pass a fitted model\n(its top words are read automatically) or an explicit list of word lists.\n'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__file__ module-attribute

__file__ = '/home/runner/work/topica/topica/python/topica/coherence.py'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__name__ module-attribute

__name__ = 'topica.coherence'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__package__ module-attribute

__package__ = 'topica'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

coherence

coherence(topics, texts, *, coherence_type='c_v', topn=10, window_size=None, epsilon=1e-12)

Per-topic coherence against a reference corpus.

Parameters:

Name Type Description Default
topics a fitted model, or a list of topics (each a list of words, or of

(word, prob) pairs).

required
texts list of tokenized documents (``list[list[str]]``) — the reference

corpus. Pass your training documents, or an external corpus.

required
coherence_type one of ``"u_mass"``, ``"c_uci"``, ``"c_npmi"``, ``"c_v"``

(default "c_v").

'c_v'
topn number of top words per topic to score (default 10).
10
window_size sliding-window width for the windowed measures; ``None`` uses

the per-measure default (110 for c_v, 10 for c_uci/c_npmi). Ignored by u_mass.

None

Returns:

Type Description
numpy.ndarray of shape ``(num_topics,)`` — the coherence of each topic.
Take ``.mean()`` for the overall model score.

topic_diversity

topic_diversity(topics, topn=25)

Fraction of unique words across all topics' top-topn words (Dieng, Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low values indicate topics that recycle the same words.

topics is a fitted model or a list of word lists.

exclusivity

exclusivity(model_or_phi, *, n=10)

Per-topic exclusivity, shape (num_topics,).

For each topic, the mean over its top-n words (by probability) of the exclusivity φ_{t,v} / Σ_k φ_{k,v} — how concentrated a word is in this topic rather than shared across topics. Pair with per-topic coherence (e.g. a model's coherence(n)) to make stm's coherence-vs-exclusivity quality plot: good topics sit toward the upper-right (coherent and distinctive).

model_or_phi is a fitted model (uses its topic_word) or a (K, V) array.

word_intrusion

word_intrusion(model_or_phi, vocabulary=None, *, n_words=5, seed=0)

Build a word intrusion test for human topic validation.

For each topic, take its top n_words words and splice in one intruder — a word that ranks highly in some other topic but has low probability in this one. A coherent topic is one where a human can reliably spot the intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per topic) of dicts with:

  • topic — the topic index,
  • words — the n_words + 1 words in shuffled, presentation order,
  • intruder — the intruder word,
  • intruder_index — its position in words (the answer key).

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). Deterministic for a fixed seed.

document_intrusion

document_intrusion(model_or_theta, texts=None, *, n_docs=3, seed=0)

Build a document intrusion test for human topic validation.

For each topic, take the n_docs documents with the highest proportion of that topic and splice in one intruder — a document where the topic is nearly absent (and another topic dominates). A topic that captures real document similarity is one where a human can spot the intruder. Returns a list (per topic) of dicts with:

  • topic — the topic index,
  • doc_indices — the n_docs + 1 document indices in shuffled order,
  • intruder_index — the intruder's position in doc_indices,
  • texts — the corresponding text previews (only if texts is given).

model_or_theta is a (D, K) θ array (or a fitted model, whose doc_topic is used). Deterministic for a fixed seed.

topica.topic_diversity

topic_diversity(topics, topn=25)

Fraction of unique words across all topics' top-topn words (Dieng, Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low values indicate topics that recycle the same words.

topics is a fitted model or a list of word lists.

topica.exclusivity

exclusivity(model_or_phi, *, n=10)

Per-topic exclusivity, shape (num_topics,).

For each topic, the mean over its top-n words (by probability) of the exclusivity φ_{t,v} / Σ_k φ_{k,v} — how concentrated a word is in this topic rather than shared across topics. Pair with per-topic coherence (e.g. a model's coherence(n)) to make stm's coherence-vs-exclusivity quality plot: good topics sit toward the upper-right (coherent and distinctive).

model_or_phi is a fitted model (uses its topic_word) or a (K, V) array.

topica.quality_frontier

quality_frontier(model, *, n=10, texts=None, coherence_type='u_mass', plot=False)

Per-topic coherence, exclusivity, and prevalence — the data behind stm's classic coherence-vs-exclusivity quality plot.

Returns a dict of equal-length arrays: topic, coherence, exclusivity, prevalence (mean θ). By default coherence is the fast per-topic UMass score; pass texts and a windowed coherence_type (e.g. "c_v") for the human-aligned measure. Feed the dict straight to pandas / matplotlib; with plot=True (and matplotlib installed) a labeled scatter Figure is returned alongside the dict as (data, fig).

Interpretation

topica.label_topics

label_topics(topic_word, vocabulary, *, n=10)

stm-style topic labels: prob, FREX, lift, and score word lists per topic.

Returns a list (per topic) of dicts with keys prob, frex, lift, score, each a list of (word, value) pairs.

topica.llm_topic_labels

llm_topic_labels(model, texts=None, *, call=None, llm_model='gpt-4o-mini', n_words=12, n_docs=3, max_chars=300, instructions=None, set_labels=False)

A short, human-readable label for each topic, generated by an LLM.

For each topic, assembles a prompt from its top words and representative documents (see :func:topic_label_prompts) and asks a model for a concise label. Returns a list of labels, one per topic.

Supply the model one of two ways:

  • call: any callable str(prompt) -> str(label) — your own client, ollama, whatever. Zero extra dependencies; you own determinism.
  • otherwise llm_model names a model used through :func:llm_backend (the topica[llm] extra). call takes precedence when both are given.

With set_labels=True the labels are stored via :func:topica.set_topic_labels, so they flow into :func:topica.topic_info, :func:topica.topic_labels, and :func:topica.plot_report.

LLM labels are a convenience, not a reproducible measurement: pin the model and set temperature to 0, and keep :func:topica.label_topics (FREX / probability / lift) for the defensible descriptors.

topica.llm_backend

llm_backend(model='gpt-4o-mini', *, system=None, **options)

A str -> str callable backed by the llm library, for the call= argument of :func:llm_topic_labels.

model names any model llm can reach — OpenAI, Anthropic, or local models through plugins such as llm-ollama. options pass through to llm (e.g. temperature=0 for reproducible labels where the provider supports it). Requires the optional llm package (pip install llm or pip install "topica[llm]").

topica.topic_label_prompts

topic_label_prompts(model, texts=None, *, n_words=12, n_docs=3, max_chars=300, instructions=None)

One labeling prompt per topic — exactly the text a model is asked to label.

Each prompt lists the topic's top n_words words and, when texts is given, up to n_docs representative documents (each whitespace-collapsed and truncated to max_chars). instructions overrides the default task framing. Returns a list of prompt strings, one per topic.

This is the plumbing behind :func:llm_topic_labels; build it yourself to see or adjust what the model sees, or to drive a model topica does not know about.

topica.frex

frex(topic_word, vocabulary, *, w=0.5, n=10)

FREX (FRequency–EXclusivity) top words per topic.

For each topic, words are scored by the weighted harmonic mean of the ECDF rank of their probability (frequency) and the ECDF rank of their exclusivity φ_{t,v} / Σ_k φ_{k,v} — the same combination stm uses. w weights frequency vs exclusivity. Returns a list (per topic) of (word, frex).

topica.relevance

relevance(topic_word, vocabulary, *, topic=None, lam=0.6, n=10, term_frequency=None)

LDAvis relevance of words to topics (Sievert & Shirley 2014):

relevance(w | t) = λ·log p(w|t) + (1-λ)·log[p(w|t) / p(w)]

λ=1 ranks by probability; λ=0 by lift (exclusivity); the LDAvis default 0.6 balances them. p(w) is the corpus word marginal — pass term_frequency (word counts in vocabulary order) for the empirical marginal, else the topic-averaged φ is used. Returns (word, relevance) lists per topic, or for one topic.

topica.find_thoughts

find_thoughts(doc_topic, texts=None, *, topic, n=3)

The n documents most associated with topic (≈ stm's findThoughts).

Returns a list of (doc_index, proportion, text) sorted by descending topic proportion; text is None when texts is not supplied.

topica.find_thoughts_html

find_thoughts_html(model, texts, *, topics=None, n_docs=3, n_words=8, max_chars=400, markdown=False)

Render each topic's most representative documents for close reading, with the topic's top words highlighted in the document text.

Distant reading (top words) is only half of topic validation; the other half is reading the actual documents a topic loads on. This builds a self-contained HTML snippet (or Markdown) you can display in a notebook: per topic, its top words followed by its n_docs highest-θ documents, each truncated to max_chars with the topic's words marked.

model is any fitted model exposing topic_word, doc_topic and vocabulary; texts are the original document strings, aligned to the rows of doc_topic. Returns a string (HTML unless markdown=True).

topica.topic_correlation

topic_correlation(doc_topic, *, threshold=0.05)

Topic-correlation network (≈ stm's topicCorr "simple" method).

Correlates topic proportions across documents; topic pairs whose correlation exceeds threshold become network edges. Returns a :class:TopicCorrelation with the correlation matrix, a 0/1 adjacency matrix (zero diagonal), and the edge list.

topica.prepare_pyldavis

prepare_pyldavis(model, docs, **kwargs)

Build the LDAvis intertopic-distance visualization for a fitted model.

docs are the tokenized training documents (list[list[str]]), used for document lengths and term frequencies. If pyLDAvis is installed this returns its PreparedData (pass to pyLDAvis.display / save_html); otherwise it returns a :class:PyLDAvisInputs you can feed to pyLDAvis.prepare later. Extra kwargs go to pyLDAvis.prepare (e.g. sort_topics=False).

Validation

topica.word_intrusion

word_intrusion(model_or_phi, vocabulary=None, *, n_words=5, seed=0)

Build a word intrusion test for human topic validation.

For each topic, take its top n_words words and splice in one intruder — a word that ranks highly in some other topic but has low probability in this one. A coherent topic is one where a human can reliably spot the intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per topic) of dicts with:

  • topic — the topic index,
  • words — the n_words + 1 words in shuffled, presentation order,
  • intruder — the intruder word,
  • intruder_index — its position in words (the answer key).

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). Deterministic for a fixed seed.

topica.document_intrusion

document_intrusion(model_or_theta, texts=None, *, n_docs=3, seed=0)

Build a document intrusion test for human topic validation.

For each topic, take the n_docs documents with the highest proportion of that topic and splice in one intruder — a document where the topic is nearly absent (and another topic dominates). A topic that captures real document similarity is one where a human can spot the intruder. Returns a list (per topic) of dicts with:

  • topic — the topic index,
  • doc_indices — the n_docs + 1 document indices in shuffled order,
  • intruder_index — the intruder's position in doc_indices,
  • texts — the corresponding text previews (only if texts is given).

model_or_theta is a (D, K) θ array (or a fitted model, whose doc_topic is used). Deterministic for a fixed seed.

topica.bootstrap_stability

bootstrap_stability(docs, *, k, n_boot=20, topn=10, seed=0, model_factory=None, **fit_kwargs)

Flag fragile topics by refitting on bootstrap resamples of the corpus.

The standard defense against "topic modeling is a fishing expedition": fit a reference model on the full corpus, then refit on n_boot resamples of the documents (drawn with replacement). Each bootstrap model's topics are matched to the reference's by top-word overlap, and a reference topic's stability is the mean Jaccard overlap of its top-topn words with its matched bootstrap topic. Topics that dissolve under resampling score low.

Matching is on the top words as strings, so it is correct even though each resample is fit as a fresh corpus with its own vocabulary indexing.

Parameters:

Name Type Description Default
docs the corpus (``list[list[str]]`` or a ``Corpus``).
required
k number of topics.
required
n_boot number of bootstrap resamples.
20
model_factory ``callable(seed) -> unfitted model``. Defaults to

LDA(num_topics=k, seed=seed). Use it to bootstrap any model.

None
fit_kwargs forwarded to each model's ``fit`` (e.g. ``iterations=500``).
required

Returns:

Type Description
dict with ``topic`` (indices), ``stability`` (per-topic mean Jaccard in
``[0, 1]``), ``mean`` (overall), and ``reference`` (the full-corpus model).

topica.search_k

search_k(docs, ks, *, model='lda', prevalence=None, held_out=None, iterations=500, em_iters=30, num_samples=3, sample_interval=10, seed=42, coherence_n=10)

Fit a model for each K and report quality metrics (stm's searchK).

With model="lda" (default) fits an :class:~topica.LDA per K. With model="stm" fits an :class:~topica.STM per K — pass prevalence (a covariate design matrix) to scan K for the model you'll actually report.

Returns a list of dicts (one per K) with k, coherence (mean UMass), exclusivity (mean top-word exclusivity), and — for model="lda" with held_outperplexity (held-out). The coherence/exclusivity trade-off is the signal: there is rarely a single best K, so read it alongside interpretability (see the K guide).

topica.check_residuals

check_residuals(model, docs, *, tol=0.01)

Residual-dispersion test for whether K is too small (Taddy 2012), a faithful port of R stm's checkResiduals.

Under a correctly specified model the multinomial residuals have dispersion σ² = 1. A dispersion well above 1 (small p-value) is evidence the latent topics cannot absorb the overdispersion — i.e. K is too low. Run it alongside :func:search_k. docs are the tokenized training documents aligned to model.doc_topic's rows.

Returns a :class:ResidualCheck with dispersion (σ²), pvalue (χ² test of σ²=1 vs σ²>1), and df.

topica.align_topics

align_topics(a, b, *, metric='cosine')

Match the topics of two fits one-to-one by minimal total distance (Hungarian on the cross-fit topic-word distance matrix). Use it to compare runs across seeds, across K, or train vs. resample — your fits are deterministic, so the matching is reproducible.

a, b are fitted models or K×V topic-word arrays (same vocabulary order). metric is "cosine" or "js" (Jensen-Shannon). Returns a list of (topic_a, topic_b, distance) sorted by topic_a.

topica.topic_stability

topic_stability(runs, *, topn=10, metric='cosine')

Term-centric stability of topics across multiple fits (Greene, O'Callaghan & Cunningham 2014): a "how robust is this K?" score.

runs is a list of fitted models or topic-word arrays over the same vocabulary (e.g. fits at different seeds, or on bootstrap resamples). Each later run's topics are matched to the first run's, and stability is the mean Jaccard overlap of their top-topn words. Returns a float in [0, 1]; higher means more reproducible topics.

Reporting

Model-neutral summaries that work on any fitted model.

topica.plot_report

plot_report(model, *, texts=None, timestamps=None, groups=None, n=8, coherence_type='c_v', title=None, figsize=None)

A one-figure overview of a fitted model, composed from topica's diagnostics.

Panels are adaptive: each is drawn only when its inputs and the model support it, so the report works across every model. Always included is the topic prevalence bar (mean doc_topic per topic, labelled with each topic's top words). Added when available:

  • topic quality — coherence vs exclusivity (the stm quality frontier); a windowed coherence_type is used when texts is given (raw strings or token lists are both accepted), else UMass;
  • topic correlation — the doc_topic correlation heatmap (K in 2..40);
  • topics over time — mean prevalence per distinct timestamps value;
  • topics per class — mean prevalence within each level of groups.

Returns a matplotlib Figure; save it with fig.savefig("report.png") or .pdf. Requires matplotlib (the only added dependency).

topica.topic_info

topic_info(model, texts=None, *, n=8, labels=None) -> list

One summary row per topic — the headline table for a fitted model.

Each row is a dict with topic (id), label, size (hard assignments), prevalence (mean of the topic's doc_topic column), and top_words (the top-n words, via model.top_words when available else the raw topic-word row). When texts is given each row also carries representative_docs, its n highest-loading documents. On a clustering model with outliers a final topic=-1 row reports the outlier count and carries no words. Rows are sorted by topic id.

labels overrides the labels for this table only; otherwise :func:topic_labels (custom labels over topic_names) is used.

topica.topics_over_time

topics_over_time(model, timestamps, *, normalize=True) -> dict

Mean topic prevalence at each distinct timestamp value.

timestamps is one value per document. For each distinct timestamp we average doc_topic over the documents stamped with it, giving a topic prevalence trajectory you can plot directly. With normalize=True each row is rescaled to sum to one (so it reads as a topic share at that time).

Returns {"labels": [sorted distinct timestamps], "prevalence": (T, K) array}.

topica.topics_per_class

topics_per_class(model, groups, *, ci=0.95)

Mean topic prevalence within each level of a grouping variable.

A thin wrapper over :func:topica.by_strata on model.doc_topic: groups is one label per document, and the result is a list of per-stratum prevalence records (mean and confidence interval per topic).