Diagnostics¶
Model-agnostic quality, interpretation, and validation tools. They take any
fitted model's topic_word / doc_topic (or raw arrays), so they work the same
across every model family. All are available at the top level (topica.<name>) and
in topica.diagnostics.
Quality¶
topica.coherence ¶
Topic coherence and diversity diagnostics.
Windowed PMI-based coherence measures (Röder, Both & Hinneburg, Exploring the
Space of Topic Coherence Measures, WSDM 2015) alongside UMass (Mimno et al.
2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single
gensim-style coherence_type= switch:
"u_mass"— document co-occurrence, intrinsic; range roughly(-inf, 0]."c_uci"— pairwise PMI over a sliding window (Newman et al. 2010)."c_npmi"— pairwise normalized PMI; range[-1, 1]."c_v"— the indirect-cosine/NPMI measure that correlates best with human judgements in Röder et al.; range roughly[0, 1].
Every measure scores each topic's top words against a reference corpus of
tokenized documents. By default that is your training corpus, but — as with
gensim's :class:CoherenceModel — you can pass any external reference (e.g. a
Wikipedia dump) via texts for a more human-aligned signal. topic_diversity
reports the fraction of unique words across all topics' top-N, the standard
companion to coherence in modern topic-model papers.
These are pure-Python/numpy and work with any model here: pass a fitted model (its top words are read automatically) or an explicit list of word lists.
__cached__
module-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__doc__
module-attribute
¶
__doc__ = 'Topic coherence and diversity diagnostics.\n\nWindowed PMI-based coherence measures (Röder, Both & Hinneburg, *Exploring the\nSpace of Topic Coherence Measures*, WSDM 2015) alongside UMass (Mimno et al.\n2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single\ngensim-style ``coherence_type=`` switch:\n\n- ``"u_mass"`` — document co-occurrence, intrinsic; range roughly ``(-inf, 0]``.\n- ``"c_uci"`` — pairwise PMI over a sliding window (Newman et al. 2010).\n- ``"c_npmi"`` — pairwise normalized PMI; range ``[-1, 1]``.\n- ``"c_v"`` — the indirect-cosine/NPMI measure that correlates best with human\n judgements in Röder et al.; range roughly ``[0, 1]``.\n\nEvery measure scores each topic\'s top words against a *reference corpus* of\ntokenized documents. By default that is your training corpus, but — as with\ngensim\'s :class:`CoherenceModel` — you can pass any external reference (e.g. a\nWikipedia dump) via ``texts`` for a more human-aligned signal. ``topic_diversity``\nreports the fraction of unique words across all topics\' top-N, the standard\ncompanion to coherence in modern topic-model papers.\n\nThese are pure-Python/numpy and work with any model here: pass a fitted model\n(its top words are read automatically) or an explicit list of word lists.\n'
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__file__
module-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__name__
module-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__package__
module-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
coherence ¶
Per-topic coherence against a reference corpus.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topics
|
a fitted model, or a list of topics (each a list of words, or of
|
|
required |
texts
|
list of tokenized documents (``list[list[str]]``) — the reference
|
corpus. Pass your training documents, or an external corpus. |
required |
coherence_type
|
one of ``"u_mass"``, ``"c_uci"``, ``"c_npmi"``, ``"c_v"``
|
(default |
'c_v'
|
topn
|
number of top words per topic to score (default 10).
|
|
10
|
window_size
|
sliding-window width for the windowed measures; ``None`` uses
|
the per-measure default (110 for |
None
|
Returns:
| Type | Description |
|---|---|
numpy.ndarray of shape ``(num_topics,)`` — the coherence of each topic.
|
|
Take ``.mean()`` for the overall model score.
|
|
topic_diversity ¶
Fraction of unique words across all topics' top-topn words (Dieng,
Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low
values indicate topics that recycle the same words.
topics is a fitted model or a list of word lists.
exclusivity ¶
Per-topic exclusivity, shape (num_topics,).
For each topic, the mean over its top-n words (by probability) of the
exclusivity φ_{t,v} / Σ_k φ_{k,v} — how concentrated a word is in this
topic rather than shared across topics. Pair with per-topic coherence (e.g.
a model's coherence(n)) to make stm's coherence-vs-exclusivity quality
plot: good topics sit toward the upper-right (coherent and distinctive).
model_or_phi is a fitted model (uses its topic_word) or a (K, V)
array.
word_intrusion ¶
Build a word intrusion test for human topic validation.
For each topic, take its top n_words words and splice in one intruder
— a word that ranks highly in some other topic but has low probability in
this one. A coherent topic is one where a human can reliably spot the
intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per
topic) of dicts with:
topic— the topic index,words— then_words + 1words in shuffled, presentation order,intruder— the intruder word,intruder_index— its position inwords(the answer key).
model_or_phi is a fitted model (uses its topic_word / vocabulary)
or a (K, V) array (then pass vocabulary). Deterministic for a fixed
seed.
document_intrusion ¶
Build a document intrusion test for human topic validation.
For each topic, take the n_docs documents with the highest proportion of
that topic and splice in one intruder — a document where the topic is
nearly absent (and another topic dominates). A topic that captures real
document similarity is one where a human can spot the intruder. Returns a
list (per topic) of dicts with:
topic— the topic index,doc_indices— then_docs + 1document indices in shuffled order,intruder_index— the intruder's position indoc_indices,texts— the corresponding text previews (only iftextsis given).
model_or_theta is a (D, K) θ array (or a fitted model, whose
doc_topic is used). Deterministic for a fixed seed.
topica.topic_diversity ¶
Fraction of unique words across all topics' top-topn words (Dieng,
Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low
values indicate topics that recycle the same words.
topics is a fitted model or a list of word lists.
topica.exclusivity ¶
Per-topic exclusivity, shape (num_topics,).
For each topic, the mean over its top-n words (by probability) of the
exclusivity φ_{t,v} / Σ_k φ_{k,v} — how concentrated a word is in this
topic rather than shared across topics. Pair with per-topic coherence (e.g.
a model's coherence(n)) to make stm's coherence-vs-exclusivity quality
plot: good topics sit toward the upper-right (coherent and distinctive).
model_or_phi is a fitted model (uses its topic_word) or a (K, V)
array.
topica.quality_frontier ¶
Per-topic coherence, exclusivity, and prevalence — the data behind stm's classic coherence-vs-exclusivity quality plot.
Returns a dict of equal-length arrays: topic, coherence,
exclusivity, prevalence (mean θ). By default coherence is the fast
per-topic UMass score; pass texts and a windowed coherence_type (e.g.
"c_v") for the human-aligned measure. Feed the dict straight to pandas /
matplotlib; with plot=True (and matplotlib installed) a labeled scatter
Figure is returned alongside the dict as (data, fig).
Interpretation¶
topica.label_topics ¶
stm-style topic labels: prob, FREX, lift, and score word lists per topic.
Returns a list (per topic) of dicts with keys prob, frex, lift,
score, each a list of (word, value) pairs.
topica.llm_topic_labels ¶
llm_topic_labels(model, texts=None, *, call=None, llm_model='gpt-4o-mini', n_words=12, n_docs=3, max_chars=300, instructions=None, set_labels=False)
A short, human-readable label for each topic, generated by an LLM.
For each topic, assembles a prompt from its top words and representative
documents (see :func:topic_label_prompts) and asks a model for a concise
label. Returns a list of labels, one per topic.
Supply the model one of two ways:
call: any callablestr(prompt) -> str(label)— your own client,ollama, whatever. Zero extra dependencies; you own determinism.- otherwise
llm_modelnames a model used through :func:llm_backend(thetopica[llm]extra).calltakes precedence when both are given.
With set_labels=True the labels are stored via
:func:topica.set_topic_labels, so they flow into :func:topica.topic_info,
:func:topica.topic_labels, and :func:topica.plot_report.
LLM labels are a convenience, not a reproducible measurement: pin the model
and set temperature to 0, and keep :func:topica.label_topics (FREX /
probability / lift) for the defensible descriptors.
topica.llm_backend ¶
A str -> str callable backed by the llm library, for the call=
argument of :func:llm_topic_labels.
model names any model llm can reach — OpenAI, Anthropic, or local
models through plugins such as llm-ollama. options pass through to
llm (e.g. temperature=0 for reproducible labels where the provider
supports it). Requires the optional llm package (pip install llm or
pip install "topica[llm]").
topica.topic_label_prompts ¶
One labeling prompt per topic — exactly the text a model is asked to label.
Each prompt lists the topic's top n_words words and, when texts is
given, up to n_docs representative documents (each whitespace-collapsed
and truncated to max_chars). instructions overrides the default task
framing. Returns a list of prompt strings, one per topic.
This is the plumbing behind :func:llm_topic_labels; build it yourself to see
or adjust what the model sees, or to drive a model topica does not know about.
topica.frex ¶
FREX (FRequency–EXclusivity) top words per topic.
For each topic, words are scored by the weighted harmonic mean of the ECDF
rank of their probability (frequency) and the ECDF rank of their exclusivity
φ_{t,v} / Σ_k φ_{k,v} — the same combination stm uses. w weights
frequency vs exclusivity. Returns a list (per topic) of (word, frex).
topica.relevance ¶
LDAvis relevance of words to topics (Sievert & Shirley 2014):
relevance(w | t) = λ·log p(w|t) + (1-λ)·log[p(w|t) / p(w)]
λ=1 ranks by probability; λ=0 by lift (exclusivity); the LDAvis default 0.6
balances them. p(w) is the corpus word marginal — pass term_frequency
(word counts in vocabulary order) for the empirical marginal, else the
topic-averaged φ is used. Returns (word, relevance) lists per topic, or
for one topic.
topica.find_thoughts ¶
The n documents most associated with topic (≈ stm's findThoughts).
Returns a list of (doc_index, proportion, text) sorted by descending
topic proportion; text is None when texts is not supplied.
topica.find_thoughts_html ¶
find_thoughts_html(model, texts, *, topics=None, n_docs=3, n_words=8, max_chars=400, markdown=False)
Render each topic's most representative documents for close reading, with the topic's top words highlighted in the document text.
Distant reading (top words) is only half of topic validation; the other half
is reading the actual documents a topic loads on. This builds a self-contained
HTML snippet (or Markdown) you can display in a notebook: per topic, its
top words followed by its n_docs highest-θ documents, each truncated to
max_chars with the topic's words marked.
model is any fitted model exposing topic_word, doc_topic and
vocabulary; texts are the original document strings, aligned to the
rows of doc_topic. Returns a string (HTML unless markdown=True).
topica.topic_correlation ¶
Topic-correlation network (≈ stm's topicCorr "simple" method).
Correlates topic proportions across documents; topic pairs whose correlation
exceeds threshold become network edges. Returns a
:class:TopicCorrelation with the correlation matrix, a 0/1 adjacency
matrix (zero diagonal), and the edge list.
topica.prepare_pyldavis ¶
Build the LDAvis intertopic-distance visualization for a fitted model.
docs are the tokenized training documents (list[list[str]]), used for
document lengths and term frequencies. If pyLDAvis is installed this
returns its PreparedData (pass to pyLDAvis.display / save_html);
otherwise it returns a :class:PyLDAvisInputs you can feed to
pyLDAvis.prepare later. Extra kwargs go to pyLDAvis.prepare
(e.g. sort_topics=False).
Validation¶
topica.word_intrusion ¶
Build a word intrusion test for human topic validation.
For each topic, take its top n_words words and splice in one intruder
— a word that ranks highly in some other topic but has low probability in
this one. A coherent topic is one where a human can reliably spot the
intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per
topic) of dicts with:
topic— the topic index,words— then_words + 1words in shuffled, presentation order,intruder— the intruder word,intruder_index— its position inwords(the answer key).
model_or_phi is a fitted model (uses its topic_word / vocabulary)
or a (K, V) array (then pass vocabulary). Deterministic for a fixed
seed.
topica.document_intrusion ¶
Build a document intrusion test for human topic validation.
For each topic, take the n_docs documents with the highest proportion of
that topic and splice in one intruder — a document where the topic is
nearly absent (and another topic dominates). A topic that captures real
document similarity is one where a human can spot the intruder. Returns a
list (per topic) of dicts with:
topic— the topic index,doc_indices— then_docs + 1document indices in shuffled order,intruder_index— the intruder's position indoc_indices,texts— the corresponding text previews (only iftextsis given).
model_or_theta is a (D, K) θ array (or a fitted model, whose
doc_topic is used). Deterministic for a fixed seed.
topica.bootstrap_stability ¶
Flag fragile topics by refitting on bootstrap resamples of the corpus.
The standard defense against "topic modeling is a fishing expedition": fit a
reference model on the full corpus, then refit on n_boot resamples of the
documents (drawn with replacement). Each bootstrap model's topics are matched
to the reference's by top-word overlap, and a reference topic's stability
is the mean Jaccard overlap of its top-topn words with its matched bootstrap
topic. Topics that dissolve under resampling score low.
Matching is on the top words as strings, so it is correct even though each resample is fit as a fresh corpus with its own vocabulary indexing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docs
|
the corpus (``list[list[str]]`` or a ``Corpus``).
|
|
required |
k
|
number of topics.
|
|
required |
n_boot
|
number of bootstrap resamples.
|
|
20
|
model_factory
|
``callable(seed) -> unfitted model``. Defaults to
|
|
None
|
fit_kwargs
|
forwarded to each model's ``fit`` (e.g. ``iterations=500``).
|
|
required |
Returns:
| Type | Description |
|---|---|
dict with ``topic`` (indices), ``stability`` (per-topic mean Jaccard in
|
|
``[0, 1]``), ``mean`` (overall), and ``reference`` (the full-corpus model).
|
|
topica.search_k ¶
search_k(docs, ks, *, model='lda', prevalence=None, held_out=None, iterations=500, em_iters=30, num_samples=3, sample_interval=10, seed=42, coherence_n=10)
Fit a model for each K and report quality metrics (stm's searchK).
With model="lda" (default) fits an :class:~topica.LDA per K. With
model="stm" fits an :class:~topica.STM per K — pass prevalence
(a covariate design matrix) to scan K for the model you'll actually report.
Returns a list of dicts (one per K) with k, coherence (mean UMass),
exclusivity (mean top-word exclusivity), and — for model="lda" with
held_out — perplexity (held-out). The coherence/exclusivity trade-off
is the signal: there is rarely a single best K, so read it alongside
interpretability (see the K guide).
topica.check_residuals ¶
Residual-dispersion test for whether K is too small (Taddy 2012), a faithful
port of R stm's checkResiduals.
Under a correctly specified model the multinomial residuals have dispersion
σ² = 1. A dispersion well above 1 (small p-value) is evidence the latent
topics cannot absorb the overdispersion — i.e. K is too low. Run it alongside
:func:search_k. docs are the tokenized training documents aligned to
model.doc_topic's rows.
Returns a :class:ResidualCheck with dispersion (σ²), pvalue (χ²
test of σ²=1 vs σ²>1), and df.
topica.align_topics ¶
Match the topics of two fits one-to-one by minimal total distance (Hungarian on the cross-fit topic-word distance matrix). Use it to compare runs across seeds, across K, or train vs. resample — your fits are deterministic, so the matching is reproducible.
a, b are fitted models or K×V topic-word arrays (same vocabulary order).
metric is "cosine" or "js" (Jensen-Shannon). Returns a list of
(topic_a, topic_b, distance) sorted by topic_a.
topica.topic_stability ¶
Term-centric stability of topics across multiple fits (Greene, O'Callaghan & Cunningham 2014): a "how robust is this K?" score.
runs is a list of fitted models or topic-word arrays over the same
vocabulary (e.g. fits at different seeds, or on bootstrap resamples). Each
later run's topics are matched to the first run's, and stability is the mean
Jaccard overlap of their top-topn words. Returns a float in [0, 1];
higher means more reproducible topics.
Reporting¶
Model-neutral summaries that work on any fitted model.
topica.plot_report ¶
plot_report(model, *, texts=None, timestamps=None, groups=None, n=8, coherence_type='c_v', title=None, figsize=None)
A one-figure overview of a fitted model, composed from topica's diagnostics.
Panels are adaptive: each is drawn only when its inputs and the model support
it, so the report works across every model. Always included is the topic
prevalence bar (mean doc_topic per topic, labelled with each topic's top
words). Added when available:
- topic quality — coherence vs exclusivity (the stm quality frontier); a
windowed
coherence_typeis used whentextsis given (raw strings or token lists are both accepted), else UMass; - topic correlation — the
doc_topiccorrelation heatmap (K in 2..40); - topics over time — mean prevalence per distinct
timestampsvalue; - topics per class — mean prevalence within each level of
groups.
Returns a matplotlib Figure; save it with fig.savefig("report.png") or
.pdf. Requires matplotlib (the only added dependency).
topica.topic_info ¶
One summary row per topic — the headline table for a fitted model.
Each row is a dict with topic (id), label, size (hard
assignments), prevalence (mean of the topic's doc_topic column), and
top_words (the top-n words, via model.top_words when available
else the raw topic-word row). When texts is given each row also carries
representative_docs, its n highest-loading documents. On a clustering
model with outliers a final topic=-1 row reports the outlier count and
carries no words. Rows are sorted by topic id.
labels overrides the labels for this table only; otherwise
:func:topic_labels (custom labels over topic_names) is used.
topica.topics_over_time ¶
Mean topic prevalence at each distinct timestamp value.
timestamps is one value per document. For each distinct timestamp we
average doc_topic over the documents stamped with it, giving a topic
prevalence trajectory you can plot directly. With normalize=True each
row is rescaled to sum to one (so it reads as a topic share at that time).
Returns {"labels": [sorted distinct timestamps], "prevalence": (T, K)
array}.
topica.topics_per_class ¶
Mean topic prevalence within each level of a grouping variable.
A thin wrapper over :func:topica.by_strata on model.doc_topic:
groups is one label per document, and the result is a list of
per-stratum prevalence records (mean and confidence interval per topic).