Diagnostics¶

Model-agnostic quality, interpretation, and validation tools. They take any fitted model's topic_word / doc_topic (or raw arrays), so they work the same across every model family. All are available at the top level (topica.<name>) and in the topica.validation module.

One-call table¶

topica.diagnostics ¶

diagnostics(model, texts=None, *, n=10, coherence_type=None, stability=False, n_boot=20, model_factory=None, seed=0)

One per-topic diagnostics table for a fitted model.

Consolidates the quality numbers people otherwise gather one function at a time — coherence, exclusivity, FREX words, size, prevalence, top words, and (optionally) bootstrap stability — into a single row-per-topic table. It reads a model's analysis surface, so it works for every model and you never pass a raw matrix where a model is wanted, or vice versa.

Parameters:

Name	Type	Description	Default
`model`	`a fitted topica model.`		required
`texts`	the reference corpus for windowed coherence (a ``Corpus``, raw	strings, or token lists). Without it, coherence falls back to the model's own UMass score. Required when `stability=True`.	`None`
`n`	`top-word count used for coherence, exclusivity, FREX, and the word lists.`		`10`
`coherence_type`	override the coherence metric (``"c_v"`` default when	`texts` is given, `"u_mass"` otherwise).	`None`
`stability`	`also report per-topic bootstrap stability (mean top-word Jaccard`	over `n_boot` refits, matched back to this model). Off by default since it refits the model; needs `texts` (the documents) to resample.	`False`
`model_factory`	``callable(seed) -> unfitted model`` for the stability refits;	defaults to rebuilding the model's own type as `type(model)(num_topics=K, seed=seed)`. Pass your own for models whose constructor needs more.	`None`

Returns:

Type	Description
A pandas ``DataFrame`` indexed by topic (columns: ``label``, ``size``,
``prevalence``, ``coherence``, ``exclusivity``, ``stability``, ``top_words``,
``frex``), or a list of row dicts when pandas is not installed.

topica.perplexity ¶

perplexity(model, held_out, *, seed=0)

Document-completion held-out perplexity for a generative model.

For each held-out document, half its tokens (even positions) estimate the document's topic mixture through the model's transform, and the other half (odd positions) are scored under that mixture, p(w) = sum_k theta_k * topic_word[k, w]. Returns exp(-sum log p / N_eval); lower is better.

Because the scored tokens are held out from the mixture estimate, this does not trivially fall as K grows the way in-sample likelihood does, so it is a fair quantity to compare across K when justifying a topic count. It works for any model with a generative transform(documents) and a topic_word distribution (LDA, DMR, CTM, STM, HDP, keyATM, ...). The embedding-cluster models have no document likelihood; compare those with coherence or diversity.

(LDA additionally offers the more rigorous Wallach et al. left-to-right estimator as LDA.perplexity / LDA.evaluate.)

Parameters:

Name	Type	Default
`model`	`a fitted generative model.`	required
`held_out`	documents the model was not trained on (token lists or a ``Corpus``).	required
`seed`	RNG seed for the Gibbs ``transform`` (ignored by the variational models).	`0`

Quality¶

topica.coherence ¶

Topic coherence and diversity diagnostics.

Windowed PMI-based coherence measures (Röder, Both & Hinneburg, Exploring the Space of Topic Coherence Measures, WSDM 2015) alongside UMass (Mimno et al. 2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single gensim-style coherence_type= switch:

"u_mass" — document co-occurrence, intrinsic; range roughly (-inf, 0].
"c_uci" — pairwise PMI over a sliding window (Newman et al. 2010).
"c_npmi" — pairwise normalized PMI; range [-1, 1].
"c_v" — the indirect-cosine/NPMI measure that correlates best with human judgements in Röder et al.; range roughly [0, 1].

Every measure scores each topic's top words against a reference corpus of tokenized documents. By default that is your training corpus, but — as with gensim's :class:CoherenceModel — you can pass any external reference (e.g. a Wikipedia dump) via texts for a more human-aligned signal. topic_diversity reports the fraction of unique words across all topics' top-N, the standard companion to coherence in modern topic-model papers.

These are pure-Python/numpy and work with any model here: pass a fitted model (its top words are read automatically) or an explicit list of word lists.

ALIGN_IRRELEVANT_PROMPT `module-attribute` ¶

ALIGN_IRRELEVANT_PROMPT = 'You are a helpful assistant evaluating how well a topic\'s words describe a document. {dataset}Identify which of the topic words are NOT relevant to the document. Reply with a comma-separated list of the irrelevant words, or "none".\n\nDocument:\n{document}\n\nTopic words: {words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

ALIGN_MISSING_PROMPT `module-attribute` ¶

ALIGN_MISSING_PROMPT = 'You are a helpful assistant evaluating how well a topic\'s words cover a document\'s themes. {dataset}Identify significant themes present in the document that are NOT captured by the topic words. Reply with a comma-separated list of the missing themes, or "none".\n\nDocument:\n{document}\n\nTopic words: {words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

DIVERSITY_PROMPT `module-attribute` ¶

DIVERSITY_PROMPT = 'You are a helpful assistant comparing two topics from a topic model. {dataset}Rate the thematic distinctiveness between the two groups of words from 1 to 3, where 1 = partially overlapping themes and 3 = highly distinctive themes. Reply with a single number.\n\nGroup 1: {words_a}\nGroup 2: {words_b}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

DUPLICATE_PROMPT `module-attribute` ¶

DUPLICATE_PROMPT = 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Identify pairs of words that refer to the exact same concept or idea (not merely related or similar). Reply with a comma-separated list of pairs like (word1, word2), or "none".\n\n{words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

INTRUSION_PROMPT `module-attribute` ¶

INTRUSION_PROMPT = 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Select which word is the least related to all other words. If multiple words do not fit, choose the word that is most out of place. Reply with a single word.\n\n{words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

LABEL_PROMPT `module-attribute` ¶

LABEL_PROMPT = 'You are a helpful assistant labeling documents by their main theme. {dataset}{research}Read the document below and annotate it with a {granularity} label naming its single main theme.{examples} Reply with only the label, a single word or short phrase.\n\nDocument:\n{document}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

LLM_EVAL_PROMPTS `module-attribute` ¶

LLM_EVAL_PROMPTS = {'rating': 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Please rate how related the following words are to each other on a scale from 1 to 3 ("1" = not very related, "2" = moderately related, "3" = very related). Reply with a single number, indicating the overall appropriateness of the topic.\n\n{words}', 'intrusion': 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Select which word is the least related to all other words. If multiple words do not fit, choose the word that is most out of place. Reply with a single word.\n\n{words}', 'label': 'You are a helpful assistant labeling documents by their main theme. {dataset}{research}Read the document below and annotate it with a {granularity} label naming its single main theme.{examples} Reply with only the label, a single word or short phrase.\n\nDocument:\n{document}', 'outlier': 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Identify the words that do not semantically belong to the same conceptual theme as the others. Reply with a comma-separated list of only those words, or "none".\n\n{words}', 'repetitive_rate': 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Evaluate whether there are semantically equivalent (redundant) words. Rate the repetitiveness from 1 to 3, where 1 = highly repetitive with significant semantic overlap and 3 = minimal repetition with diverse, distinctive words. Reply with a single number.\n\n{words}', 'duplicate': 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Identify pairs of words that refer to the exact same concept or idea (not merely related or similar). Reply with a comma-separated list of pairs like (word1, word2), or "none".\n\n{words}', 'diversity': 'You are a helpful assistant comparing two topics from a topic model. {dataset}Rate the thematic distinctiveness between the two groups of words from 1 to 3, where 1 = partially overlapping themes and 3 = highly distinctive themes. Reply with a single number.\n\nGroup 1: {words_a}\nGroup 2: {words_b}', 'align_irrelevant': 'You are a helpful assistant evaluating how well a topic\'s words describe a document. {dataset}Identify which of the topic words are NOT relevant to the document. Reply with a comma-separated list of the irrelevant words, or "none".\n\nDocument:\n{document}\n\nTopic words: {words}', 'align_missing': 'You are a helpful assistant evaluating how well a topic\'s words cover a document\'s themes. {dataset}Identify significant themes present in the document that are NOT captured by the topic words. Reply with a comma-separated list of the missing themes, or "none".\n\nDocument:\n{document}\n\nTopic words: {words}'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

OUTLIER_PROMPT `module-attribute` ¶

OUTLIER_PROMPT = 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Identify the words that do not semantically belong to the same conceptual theme as the others. Reply with a comma-separated list of only those words, or "none".\n\n{words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

RATING_PROMPT `module-attribute` ¶

RATING_PROMPT = 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Please rate how related the following words are to each other on a scale from 1 to 3 ("1" = not very related, "2" = moderately related, "3" = very related). Reply with a single number, indicating the overall appropriateness of the topic.\n\n{words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

REPETITIVE_RATE_PROMPT `module-attribute` ¶

REPETITIVE_RATE_PROMPT = 'You are a helpful assistant evaluating the top words of a topic model output for a given topic. {dataset}Evaluate whether there are semantically equivalent (redundant) words. Rate the repetitiveness from 1 to 3, where 1 = highly repetitive with significant semantic overlap and 3 = minimal repetition with diverse, distinctive words. Reply with a single number.\n\n{words}'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

annotations `module-attribute` ¶

__annotations__ = {'LLM_EVAL_PROMPTS': 'dict[str, str]'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

cached `module-attribute` ¶

__cached__ = '/home/runner/work/topica/topica/python/topica/__pycache__/coherence.cpython-311.pyc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

doc `module-attribute` ¶

__doc__ = 'Topic coherence and diversity diagnostics.\n\nWindowed PMI-based coherence measures (Röder, Both & Hinneburg, *Exploring the\nSpace of Topic Coherence Measures*, WSDM 2015) alongside UMass (Mimno et al.\n2011) and topic diversity (Dieng, Ruiz & Blei 2020), exposed through a single\ngensim-style ``coherence_type=`` switch:\n\n- ``"u_mass"``  — document co-occurrence, intrinsic; range roughly ``(-inf, 0]``.\n- ``"c_uci"``   — pairwise PMI over a sliding window (Newman et al. 2010).\n- ``"c_npmi"``  — pairwise normalized PMI; range ``[-1, 1]``.\n- ``"c_v"``     — the indirect-cosine/NPMI measure that correlates best with human\n  judgements in Röder et al.; range roughly ``[0, 1]``.\n\nEvery measure scores each topic\'s top words against a *reference corpus* of\ntokenized documents. By default that is your training corpus, but — as with\ngensim\'s :class:`CoherenceModel` — you can pass any external reference (e.g. a\nWikipedia dump) via ``texts`` for a more human-aligned signal. ``topic_diversity``\nreports the fraction of unique words across all topics\' top-N, the standard\ncompanion to coherence in modern topic-model papers.\n\nThese are pure-Python/numpy and work with any model here: pass a fitted model\n(its top words are read automatically) or an explicit list of word lists.\n'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

file `module-attribute` ¶

__file__ = '/home/runner/work/topica/topica/python/topica/coherence.py'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

name `module-attribute` ¶

__name__ = 'topica.coherence'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

package `module-attribute` ¶

__package__ = 'topica'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

CoherenceCI ¶

Per-topic coherence with a bootstrap standard error and interval.

estimate/se/ci_low/ci_high are each (num_topics,) arrays: the coherence on the full reference corpus, the bootstrap standard error, and the lower/upper percentile bounds.

annotations `class-attribute` ¶

__annotations__ = {'estimate': 'np.ndarray', 'se': 'np.ndarray', 'ci_low': 'np.ndarray', 'ci_high': 'np.ndarray'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ `class-attribute` ¶

__dataclass_fields__ = {'estimate': Field(name='estimate',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'se': Field(name='se',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'ci_low': Field(name='ci_low',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'ci_high': Field(name='ci_high',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

doc `class-attribute` ¶

__doc__ = 'Per-topic coherence with a bootstrap standard error and interval.\n\n    ``estimate``/``se``/``ci_low``/``ci_high`` are each ``(num_topics,)`` arrays:\n    the coherence on the full reference corpus, the bootstrap standard error, and\n    the lower/upper percentile bounds.\n    '

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__match_args__ `class-attribute` ¶

__match_args__ = ('estimate', 'se', 'ci_low', 'ci_high')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

module `class-attribute` ¶

__module__ = 'topica.coherence'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

weakref `property` ¶

__weakref__

list of weak references to the object

coherence ¶

coherence(topics, texts, *, coherence_type='c_v', topn=10, window_size=None, epsilon=1e-12)

Per-topic coherence against a reference corpus.

Parameters:

Name	Type	Description	Default
`topics`	`a fitted model, or a list of topics (each a list of words, or of`	`(word, prob)` pairs).	required
`texts`	list of tokenized documents (``list[list[str]]``) — the reference	corpus. Pass your training documents, or an external corpus.	required
`coherence_type`	one of ``"u_mass"``, ``"c_uci"``, ``"c_npmi"``, ``"c_v"``	(default `"c_v"`).	`'c_v'`
`topn`	`number of top words per topic to score (default 10).`		`10`
`window_size`	sliding-window width for the windowed measures; ``None`` uses	the per-measure default (110 for `c_v`, 10 for `c_uci`/`c_npmi`). Ignored by `u_mass`.	`None`

Returns:

Type	Description
numpy.ndarray of shape ``(num_topics,)`` — the coherence of each topic.
Take ``.mean()`` for the overall model score.

coherence_ci ¶

coherence_ci(topics, texts, *, coherence_type='c_v', topn=10, window_size=None, n_boot=200, ci=0.9, seed=0, epsilon=1e-12)

Bootstrap standard errors and a credible interval for topic coherence.

Coherence is a corpus statistic with no model likelihood or posterior behind it, so its uncertainty is obtained by bootstrap: hold each topic's top words fixed, resample the reference documents with replacement n_boot times, recompute coherence on each resample, and report the per-topic standard error and percentile interval. The topics never change, so there is no refit and no topic-alignment step — the interval reflects how much a topic's coherence score would wobble under a different sample of the reference corpus, the right answer to "is topic A's coherence reliably higher than topic B's?".

estimate is the coherence on the full corpus (the conventional point summary); because resampling documents estimates the sampling distribution of that same statistic, the percentile interval is centered on it (unlike the posterior-draw intervals elsewhere).

Parameters:

Name	Type	Description	Default
`topics`	a fitted model, or a list of topics (each a list of words / ``(word,	prob)`` pairs). The top words are extracted once and held fixed.	required
`texts`	`list of tokenized documents — the reference corpus to resample.`		required
`coherence_type`	as in :func:`coherence`.		`'c_v'`
`topn`	as in :func:`coherence`.		`'c_v'`
`window_size`	as in :func:`coherence`.		`'c_v'`
`epsilon`	as in :func:`coherence`.		`'c_v'`
`n_boot`	`number of bootstrap resamples (each recomputes co-occurrence, so this`	is O(n_boot x corpus size); the windowed measures (`c_v` etc.) are the costliest).	`200`
`ci`	`central interval mass (default 0.9 for a 90% interval).`		`0.9`
`seed`	`seed for the document resampling.`		`0`

Returns:

Type	Description
`CoherenceCI`	`(estimate, se, ci_low, ci_high)`, each `(num_topics,)`.

topic_diversity ¶

topic_diversity(topics, topn=25)

Fraction of unique words across all topics' top-topn words (Dieng, Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low values indicate topics that recycle the same words.

topics is a fitted model or a list of word lists.

topic_semantic_diversity ¶

topic_semantic_diversity(topics, topn=25)

Fraction of unique top-word pairs across all topics (Wu, Nguyen & Luu 2024, "A Survey on Neural Topic Models", Eq. 18). Where topic_diversity counts unique single words, this counts unique pairs drawn from each topic's top-topn words: a pair occurrence is "unique" when that unordered pair appears in exactly one topic's top words. 1.0 means every top-word pair is unique to its topic; higher = more diverse. A pair disambiguates word sense, so this is "semantic-aware" — no embeddings are needed.

topics is a fitted model or a list of word lists. topn must be an integer >= 2 (pairs require at least two words).

exclusivity ¶

exclusivity(model_or_phi, *, n=10, w=0.7)

Per-topic exclusivity, shape (num_topics,) — stm's exclusivity.

For each topic, the FREX summary over its top-n words (by probability): the sum of each word's frequency–exclusivity score (the rank harmonic mean of probability and exclusivity φ_{t,v} / Σ_k φ_{k,v}, weighted by w, stm's default 0.7). Higher means the topic's top words are more distinctive. Pair with per-topic coherence to make stm's coherence-vs-exclusivity quality plot: good topics sit toward the upper-right (coherent and distinctive).

The scores come from the single stm-faithful implementation in topica's Rust core (topica-core's inspect), shared with faSTM and the Stata plugin.

.. note:: This is stm's exclusivity (a sum of FREX scores over the top n words, roughly in [0, n]), not a mean exclusivity in [0, 1]. The scale changed in the move to the shared stm-faithful core.

model_or_phi is a fitted model (uses its topic_word) or a (K, V) array.

semantic_coherence ¶

semantic_coherence(model_or_phi, texts, vocabulary=None, *, n=10)

Per-topic semantic coherence, shape (num_topics,) — stm's semCoh1beta.

The UMass document-co-occurrence coherence over each topic's top-n words, with stm's 0.01 smoothing (higher = better). This is stm's exact semantic coherence, from topica's Rust core (topica-core's inspect), shared with faSTM and the Stata plugin. For the broader, gensim-aligned coherence measures (c_v, c_npmi, u_mass) use :func:coherence instead.

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). texts is the reference corpus: a :class:topica.Corpus, or a list of token lists (the words per document).

word_intrusion ¶

word_intrusion(model_or_phi, vocabulary=None, *, n_words=5, seed=0)

Build a word intrusion test for human topic validation.

For each topic, take its top n_words words and splice in one intruder — a word that ranks highly in some other topic but has low probability in this one. A coherent topic is one where a human can reliably spot the intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per topic) of dicts with:

topic — the topic index,
words — the n_words + 1 words in shuffled, presentation order,
intruder — the intruder word,
intruder_index — its position in words (the answer key).

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). Deterministic for a fixed seed.

document_intrusion ¶

document_intrusion(model_or_theta, texts=None, *, n_docs=3, seed=0)

Build a document intrusion test for human topic validation.

For each topic, take the n_docs documents with the highest proportion of that topic and splice in one intruder — a document where the topic is nearly absent (and another topic dominates). A topic that captures real document similarity is one where a human can spot the intruder. Returns a list (per topic) of dicts with:

topic — the topic index,
doc_indices — the n_docs + 1 document indices in shuffled order,
intruder_index — the intruder's position in doc_indices,
texts — the corresponding text previews (only if texts is given).

model_or_theta is a (D, K) θ array (or a fitted model, whose doc_topic is used). Deterministic for a fixed seed.

llm_coherence ¶

llm_coherence(model, *, backend, n_words=10, scale=(1, 3), dataset_description=None, seed=0, n_samples=1, shuffle=True, prompts=None)

LLM-rated topic coherence (Stammbach et al. 2023): the headline LLM metric.

For each topic, the top n_words words are shuffled and an LLM rates how related they are on a scale (default 1-3). Returns a per-topic numpy array of mean ratings (higher = more coherent). This is the metric that beats NPMI / c_v at tracking human judgment in the paper; it sits beside :func:coherence, :func:topic_diversity, and :func:topic_semantic_diversity, but is llm-bounded -- it calls an external model and is not bit-deterministic.

Parameters:

Name	Type	Description	Default
`model`	`fitted model or list of word lists`	Anything :func:`_extract_topics` accepts.	required
`backend`	callable ``str -> str`` or model-name str	The LLM. Pass `topica.llm.backend(name, temperature=0)` or a model name.	required
`n_words`	`the number of top words shown and the rating range.`		`10`
`scale`	`the number of top words shown and the rating range.`		`10`
`dataset_description`	`optional str`	A one-line corpus description added to the prompt (small reported gains).	`None`
`seed`	`int`	Seeds the per-topic word shuffles (reproducible task; the LLM is not).	`0`
`n_samples`	`int`	Calls the LLM this many times per topic and averages (tames non-determinism; the paper uses temperature=1 to mimic annotator variation).	`1`
`prompts`	`optional dict`	Override the editable templates (key `"rating"`); defaults to :data:`LLM_EVAL_PROMPTS`.	`None`

llm_intrusion ¶

llm_intrusion(model, vocabulary=None, *, backend, n_words=5, dataset_description=None, seed=0, n_samples=1, prompts=None)

LLM word-intrusion accuracy (Stammbach et al. 2023).

Builds the intrusion task with :func:word_intrusion (top n_words words plus one intruder, shuffled), asks the LLM to pick the intruder, and scores it against the answer key. Returns {"accuracy": float, "per_topic": [...]} where each per-topic dict has topic, intruder, picked, and correct.

The paper finds an LLM matches human accuracy on this task (~72%), but rating (:func:llm_coherence) tracks human topic rankings better -- lead with llm_coherence and report this alongside. llm-bounded; see :func:llm_coherence for the shared backend / n_samples semantics.

llm_select_k ¶

llm_select_k(models, docs, *, backend, n_docs=10, granularity='broad', example_labels=None, research_question=None, criterion='knee', tol=0.03, seed=0, n_samples=1, max_chars=1500, prompts=None)

Choose the number of topics by LLM document-label purity (Stammbach et al. 2023). For each candidate fitted model, take each topic's top n_docs documents, have an LLM assign each a theme label, and score the topic by label purity — the fraction of its documents sharing the majority label. The model's score is the mean per-topic purity.

This is the paper's working number-of-topics signal: doc-label purity tracks ground-truth cluster quality (ARI), whereas rating the top words across K does not (their negative result). Complements :func:search_k (coherence / exclusivity / perplexity) with a human-aligned, llm-bounded criterion.

.. note:: Purity rises then plateaus as K grows — over-splitting one theme into two topics yields two same-labelled, still-pure topics — so the raw maximum tends to over-split (the mirror of coherence's bias toward small K; cf. :func:search_k's frontier). The default criterion="knee" therefore returns the smallest K whose purity is within tol of the best (the plateau onset), not the bare argmax. Always read the full scores curve; criterion="max" restores the literal highest-purity pick.

Parameters:

Name	Type	Description	Default
`models`	`sequence of fitted models`	Candidates, typically the same corpus fit at different `num_topics`.	required
`docs`	`Corpus \| list of str \| list of token lists`	The documents, in the order the models were fit on (their `doc_topic` rows).	required
`backend`	callable ``str -> str`` or model-name str	The LLM (see :func:`llm_coherence`).	required
`n_docs`	`int`	Top documents per topic to label.	`10`
`granularity`	`(broad, narrow)`	Whether to ask for a broad or a narrow theme label.	`"broad"`
`example_labels`	`optional sequence of str`	Example label vocabulary shown to the model (steers granularity/format).	`None`
`research_question`	`optional str`	A one-line framing ("label by the policy area discussed", ...).	`None`
`criterion`	`(knee, max)`	How `best` is chosen from the purity curve. `"knee"` (default) returns the smallest `K` within `tol` of the best purity (the plateau onset); `"max"` returns the highest-purity model (which tends to over-split).	`"knee"`
`tol`	`float`	Purity tolerance for the knee (default 0.03).	`0.03`
`n_samples`	`int`	Majority-vote the label over this many calls per document.	`1`
`max_chars`	`int`	Truncate each document to this many characters in the prompt.	`1500`

Returns:

Type	Description
dict with ``best`` (the chosen model's ``num_topics``), ``best_index``, and
``scores`` (a list of ``{"num_topics", "purity", "per_topic_purity"}`` per model).

llm_outlier ¶

llm_outlier(model, *, backend, n_words=10, n_samples=5, threshold=3, dataset_description=None, seed=0, prompts=None)

Unsupervised semantic-outlier detection (Tan & D'Souza 2025, C_outlier).

For each topic, asks the LLM to list the words that do not fit the topic, over n_samples runs, and keeps a word flagged in at least threshold runs (the paper's 3-of-5 vote). Returns a per-topic list of dicts with topic, outliers (the flagged words), and count. Unlike :func:llm_intrusion there is no planted answer — this surfaces which words make a topic incoherent. llm-bounded; see :func:llm_coherence for backend/n_samples semantics.

llm_repetitiveness ¶

llm_repetitiveness(model, *, backend, n_words=10, n_samples=1, dataset_description=None, seed=0, prompts=None)

LLM repetitiveness (Tan & D'Souza 2025): is apparent coherence just redundancy?

Returns a per-topic list of dicts with rate (R_rate: 1 = highly repetitive, 3 = diverse/distinctive; averaged over n_samples), duplicate_pairs (R_duplicate: word pairs the LLM judges the same concept), and duplicate_count. A robust coherent topic has a high rate and a low duplicate count. Complements :func:topic_semantic_diversity on the LLM side. llm-bounded.

llm_diversity ¶

llm_diversity(model, *, backend, n_words=10, n_samples=1, max_pairs=None, dataset_description=None, seed=0, prompts=None)

Cross-topic LLM diversity (Tan & D'Souza 2025, D_rate).

Rates the thematic distinctiveness of every pair of topics 1-3 (1 = overlapping, 3 = distinctive) and averages. Returns {"mean": float, "pairwise": [...]} with one {"topics": (i, j), "rate": r} per scored pair. O(K²) calls; pass max_pairs to score a deterministic random subset. The LLM analog of :func:topic_diversity / :func:topic_semantic_diversity. llm-bounded.

llm_adversarial ¶

llm_adversarial(model, *, backend, intruder='shakespeare', n_words=10, n_samples=5, threshold=3, dataset_description=None, seed=0, prompts=None)

Gold-free adversarial self-check (Tan & D'Souza 2025, AdvT_outlier).

Plants a known-unrelated word (default "shakespeare") into each topic's top words and measures how often the LLM's :func:llm_outlier detection flags it. This validates the metric and the model's capability without human-gold data, on any corpus — a low detection rate means the model is too weak for these tasks. Returns {"detection_rate": float, "intruder": str, "per_topic": [...]}.

llm_alignment ¶

llm_alignment(model, docs, *, backend, n_words=10, n_docs=5, dataset_description=None, seed=0, prompts=None, max_chars=1500)

Topic-document alignment (Tan & D'Souza 2025, A_ir-topic / A_missing-theme).

For each topic, takes its top n_docs documents and asks the LLM, per document, (1) how many topic words are irrelevant to it (overrepresentation) and (2) how many document themes are missing from the topic words (underrepresentation), averaging over the documents. Returns a per-topic list of dicts with topic, irrelevant (mean count) and missing (mean count); lower is better on both. Needs the documents and O(K·n_docs) calls. llm-bounded.

topica.coherence_ci ¶

coherence_ci(topics, texts, *, coherence_type='c_v', topn=10, window_size=None, n_boot=200, ci=0.9, seed=0, epsilon=1e-12)

Bootstrap standard errors and a credible interval for topic coherence.

Coherence is a corpus statistic with no model likelihood or posterior behind it, so its uncertainty is obtained by bootstrap: hold each topic's top words fixed, resample the reference documents with replacement n_boot times, recompute coherence on each resample, and report the per-topic standard error and percentile interval. The topics never change, so there is no refit and no topic-alignment step — the interval reflects how much a topic's coherence score would wobble under a different sample of the reference corpus, the right answer to "is topic A's coherence reliably higher than topic B's?".

estimate is the coherence on the full corpus (the conventional point summary); because resampling documents estimates the sampling distribution of that same statistic, the percentile interval is centered on it (unlike the posterior-draw intervals elsewhere).

Parameters:

Name	Type	Description	Default
`topics`	a fitted model, or a list of topics (each a list of words / ``(word,	prob)`` pairs). The top words are extracted once and held fixed.	required
`texts`	`list of tokenized documents — the reference corpus to resample.`		required
`coherence_type`	as in :func:`coherence`.		`'c_v'`
`topn`	as in :func:`coherence`.		`'c_v'`
`window_size`	as in :func:`coherence`.		`'c_v'`
`epsilon`	as in :func:`coherence`.		`'c_v'`
`n_boot`	`number of bootstrap resamples (each recomputes co-occurrence, so this`	is O(n_boot x corpus size); the windowed measures (`c_v` etc.) are the costliest).	`200`
`ci`	`central interval mass (default 0.9 for a 90% interval).`		`0.9`
`seed`	`seed for the document resampling.`		`0`

Returns:

Type	Description
`CoherenceCI`	`(estimate, se, ci_low, ci_high)`, each `(num_topics,)`.

topica.semantic_coherence ¶

semantic_coherence(model_or_phi, texts, vocabulary=None, *, n=10)

Per-topic semantic coherence, shape (num_topics,) — stm's semCoh1beta.

The UMass document-co-occurrence coherence over each topic's top-n words, with stm's 0.01 smoothing (higher = better). This is stm's exact semantic coherence, from topica's Rust core (topica-core's inspect), shared with faSTM and the Stata plugin. For the broader, gensim-aligned coherence measures (c_v, c_npmi, u_mass) use :func:coherence instead.

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). texts is the reference corpus: a :class:topica.Corpus, or a list of token lists (the words per document).

topica.topic_diversity ¶

topic_diversity(topics, topn=25)

Fraction of unique words across all topics' top-topn words (Dieng, Ruiz & Blei 2020). 1.0 means every top word is unique to its topic; low values indicate topics that recycle the same words.

topics is a fitted model or a list of word lists.

topica.topic_semantic_diversity ¶

topic_semantic_diversity(topics, topn=25)

Fraction of unique top-word pairs across all topics (Wu, Nguyen & Luu 2024, "A Survey on Neural Topic Models", Eq. 18). Where topic_diversity counts unique single words, this counts unique pairs drawn from each topic's top-topn words: a pair occurrence is "unique" when that unordered pair appears in exactly one topic's top words. 1.0 means every top-word pair is unique to its topic; higher = more diverse. A pair disambiguates word sense, so this is "semantic-aware" — no embeddings are needed.

topics is a fitted model or a list of word lists. topn must be an integer >= 2 (pairs require at least two words).

topica.exclusivity ¶

exclusivity(model_or_phi, *, n=10, w=0.7)

Per-topic exclusivity, shape (num_topics,) — stm's exclusivity.

For each topic, the FREX summary over its top-n words (by probability): the sum of each word's frequency–exclusivity score (the rank harmonic mean of probability and exclusivity φ_{t,v} / Σ_k φ_{k,v}, weighted by w, stm's default 0.7). Higher means the topic's top words are more distinctive. Pair with per-topic coherence to make stm's coherence-vs-exclusivity quality plot: good topics sit toward the upper-right (coherent and distinctive).

The scores come from the single stm-faithful implementation in topica's Rust core (topica-core's inspect), shared with faSTM and the Stata plugin.

.. note:: This is stm's exclusivity (a sum of FREX scores over the top n words, roughly in [0, n]), not a mean exclusivity in [0, 1]. The scale changed in the move to the shared stm-faithful core.

model_or_phi is a fitted model (uses its topic_word) or a (K, V) array.

topica.quality_frontier ¶

quality_frontier(model, *, n=10, texts=None, coherence_type='u_mass', plot=False)

Per-topic coherence, exclusivity, and prevalence — the data behind stm's classic coherence-vs-exclusivity quality plot.

Returns a dict of equal-length arrays: topic, coherence, exclusivity, prevalence (mean θ). By default coherence is the fast per-topic UMass score; pass texts and a windowed coherence_type (e.g. "c_v") for the human-aligned measure. Feed the dict straight to pandas / matplotlib; with plot=True (and matplotlib installed) a labeled scatter Figure is returned alongside the dict as (data, fig).

External validation¶

When you have gold (or partially gold) labels for your documents, agreement scores how well the discovered topics recover them — the check that actually tracks recovery, where coherence can mislead.

topica.agreement ¶

External validation: score a topic assignment against gold labels.

:func:agreement answers the most basic validation question a topic model can be asked — given documents I have hand-labeled, how well do the discovered topics recover those labels? It reports the standard partition-comparison metrics (ARI, NMI, homogeneity, completeness, V-measure, and cluster purity), computed from the two label vectors alone.

This complements :func:topica.coherence. Coherence rates the interpretability of a topic's top words; it does not tell you whether documents were assigned to the right topic, and for embedding-based cluster models it can be actively misleading (a model can keep tight, coherent top-words while the document partition drifts). When you have labels to check against, agreement is the number that tracks recovery.

The metrics are label-agnostic (invariant to how the cluster/class ids are named), so they work whether pred is 0..k cluster ids and gold is category codes, or any other integer labeling. The formulas match scikit-learn's adjusted_rand_score, normalized_mutual_info_score (arithmetic averaging), and homogeneity_completeness_v_measure; agreement needs only numpy.

cached `module-attribute` ¶

__cached__ = '/home/runner/work/topica/topica/python/topica/__pycache__/agreement.cpython-311.pyc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

doc `module-attribute` ¶

__doc__ = "External validation: score a topic assignment against gold labels.\n\n:func:`agreement` answers the most basic validation question a topic model can be\nasked — *given documents I have hand-labeled, how well do the discovered topics\nrecover those labels?* It reports the standard partition-comparison metrics (ARI,\nNMI, homogeneity, completeness, V-measure, and cluster purity), computed from the\ntwo label vectors alone.\n\nThis complements :func:`topica.coherence`. Coherence rates the *interpretability*\nof a topic's top words; it does not tell you whether documents were assigned to the\nright topic, and for embedding-based cluster models it can be actively misleading\n(a model can keep tight, coherent top-words while the document partition drifts).\nWhen you have labels to check against, ``agreement`` is the number that tracks\nrecovery.\n\nThe metrics are label-agnostic (invariant to how the cluster/class ids are named),\nso they work whether ``pred`` is ``0..k`` cluster ids and ``gold`` is category\ncodes, or any other integer labeling. The formulas match ``scikit-learn``'s\n``adjusted_rand_score``, ``normalized_mutual_info_score`` (arithmetic averaging),\nand ``homogeneity_completeness_v_measure``; ``agreement`` needs only numpy.\n"

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

file `module-attribute` ¶

__file__ = '/home/runner/work/topica/topica/python/topica/agreement.py'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

name `module-attribute` ¶

__name__ = 'topica.agreement'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

package `module-attribute` ¶

__package__ = 'topica'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

agreement ¶

agreement(pred, gold, *, noise='keep')

Score a topic/cluster assignment against gold labels.

Parameters:

Name	Type	Description	Default
`pred`	`array-like of int`	Predicted topic/cluster per document — e.g. `model.labels` for a cluster model, or `model.doc_topic.argmax(1)` for a mixture model.	required
`gold`	`array-like of int`	Reference (hand-coded) label per document, aligned one-to-one with `pred`. To score a partially-labeled corpus, pass only the labeled subset: `agreement(pred[mask], gold[mask])`.	required
`noise`	`(keep, drop)`	How to treat documents that `pred` left unassigned (label `-1`, the HDBSCAN/BERTopic noise bucket). `"keep"` scores `-1` as its own topic (so a large noise bucket is penalized honestly). `"drop"` excludes those documents before scoring (scores only the assigned ones). Documents with a gold label of `-1` are dropped under `"drop"` as well.	`"keep"`

Returns:

Type	Description
`dict`	`{"ari", "nmi", "homogeneity", "completeness", "v_measure", "purity"}`. ARI is adjusted for chance (0 = random, 1 = identical partitions, and it can go slightly negative); the others lie in `[0, 1]`. `purity` is asymmetric — it measures how class-pure the predicted clusters are — and, unlike the others, is not penalized for splitting one class across many clusters.

Notes

All metrics are invariant to how the labels are named. Values match scikit-learn (normalized_mutual_info_score with arithmetic averaging). Pair with :func:topica.coherence: coherence for whether the top words read as a theme, agreement for whether the document partition is right.

Interpretation¶

topica.label_topics ¶

label_topics(topic_word, vocabulary=None, *, n=10, word_counts=None, corpus=None)

stm-style topic labels: prob, FREX, lift, and score word lists per topic.

Returns a list (per topic) of dicts with keys prob, frex, lift, score, each a list of (word, value) pairs. FREX, lift, and score all come from the single stm-faithful implementation in topica's Rust core (topica-core's inspect), so they cannot drift from faSTM / the Stata plugin.

lift is stm's lift, log P(w|topic) − log P(w), where P(w) is the empirical word frequency. Pass word_counts (a length-V array) or corpus (a :class:topica.Corpus, whose word counts are read for you) for the exact value; without either, P(w) is estimated from the topic-word matrix's column marginal (lift depends only on relative word frequency, so the ranking matches). word_counts / corpus also enable stm's James-Stein FREX shrinkage (see :func:frex).

topic_word is a fitted model (uses its topic_word and vocabulary) or a (K, V) array, in which case pass vocabulary.

topica.llm_topic_labels ¶

llm_topic_labels(model, texts=None, *, backend=None, llm_model='gpt-4o-mini', n_words=12, n_docs=3, max_chars=300, instructions=None, set_labels=False)

A short, human-readable label for each topic, generated by an LLM.

For each topic, assembles a prompt from its top words and representative documents (see :func:topic_label_prompts) and asks a model for a concise label. Returns a list of labels, one per topic.

Supply the model one of two ways:

backend: any callable str(prompt) -> str(label) — your own client, ollama, whatever, or :func:topica.llm_backend / :func:topica.llm.backend. Zero extra dependencies; you own determinism.
otherwise llm_model names a model used through :func:llm_backend (the topica[llm] extra). backend takes precedence when both are given.

With set_labels=True the labels are stored via :func:topica.set_topic_labels, so they flow into :func:topica.topic_info, :func:topica.topic_labels, and :func:topica.plot_report.

LLM labels are a convenience, not a reproducible measurement: pin the model and set temperature to 0, and keep :func:topica.label_topics (FREX / probability / lift) for the defensible descriptors.

topica.llm_backend ¶

llm_backend(model='gpt-4o-mini', *, key=None, system=None, **options)

A str -> str callable backed by the llm library, for the backend= argument of :func:llm_topic_labels.

model names any model llm can reach — OpenAI, Anthropic, or local models through plugins such as llm-ollama. By default the API key is resolved by llm itself: a stored llm keys value, else the provider's environment variable (OPENAI_API_KEY for OpenAI). Pass key to override that with an explicit key. options pass through to llm (e.g. temperature=0 for reproducible labels where the provider supports it). Requires the optional llm package (pip install llm or pip install "topica[llm]").

topica.topic_label_prompts ¶

topic_label_prompts(model, texts=None, *, n_words=12, n_docs=3, max_chars=300, instructions=None)

One labeling prompt per topic — exactly the text a model is asked to label.

Each prompt lists the topic's top n_words words and, when texts is given, up to n_docs representative documents (each whitespace-collapsed and truncated to max_chars). instructions overrides the default task framing. Returns a list of prompt strings, one per topic.

This is the plumbing behind :func:llm_topic_labels; build it yourself to see or adjust what the model sees, or to drive a model topica does not know about.

topica.frex ¶

frex(topic_word, vocabulary=None, *, w=0.5, n=10, word_counts=None, corpus=None)

FREX (FRequency–EXclusivity) top words per topic.

For each topic, words are scored by the weighted harmonic mean of the rank of their probability (frequency) and the rank of their exclusivity φ_{t,v} / Σ_k φ_{k,v} — stm's calcfrex. w weights frequency vs exclusivity. Returns a list (per topic) of (word, frex).

The scores come from the single, stm-faithful implementation in topica's Rust core (topica-core's inspect module — the same one faSTM and the Stata plugin use), so the FREX definition can never drift between languages.

Pass word_counts (a length-V array of corpus word frequencies) or corpus (a :class:topica.Corpus, whose word counts are read for you) to apply stm's James-Stein exclusivity shrinkage, which is stm's default; it damps the exclusivity of rare words that appear in only one topic by chance. Without either (the default here) no shrinkage is applied.

topic_word is a fitted model (uses its topic_word and vocabulary) or a (K, V) array, in which case pass vocabulary.

topica.mmr ¶

mmr(topic_word, word_embeddings, vocabulary=None, *, n=10, diversity=0.3, n_candidates=None)

Maximal-marginal-relevance top words, to cut redundant near-synonyms.

For each topic, take the top n_candidates words by topic_word weight and greedily reselect n of them, each pick maximizing

``(1 - diversity) * relevance(word) - diversity * max_cos(word, picked)``

where relevance is the (per-topic, max-normalized) topic_word weight and the redundancy term is the cosine between word embeddings. diversity=0 returns the plain top words; higher trades relevance for variety, like BERTopic's MaximalMarginalRelevance(diversity=...).

Parameters:

Name	Type	Description	Default
`topic_word`	a fitted model (uses its ``topic_word`` and ``vocabulary``) or a	`(K, V)` array, in which case pass `vocabulary`.	required
`word_embeddings`	a ``(V, E)`` matrix aligned to the vocabulary — the word	vectors (for Top2Vec, the ones you fit with; otherwise embed the vocabulary with your embedding model, as BERTopic's MMR does internally).	required
`n`	`words returned per topic.`		`10`
`diversity`	in ``[0, 1]``; 0 is the plain top words, higher is more diverse.		`0.3`
`n_candidates`	how many top words to rerank (default ``max(5 * n, n)``).		`None`

Returns:

Type	Description
A list per topic of ``(word, topic_word_weight)`` pairs, like ``top_words``.

topica.relevance ¶

relevance(topic_word, vocabulary=None, *, topic=None, lam=0.6, n=10, term_frequency=None)

LDAvis relevance of words to topics (Sievert & Shirley 2014):

relevance(w | t) = λ·log p(w|t) + (1-λ)·log[p(w|t) / p(w)]

λ=1 ranks by probability; λ=0 by lift (exclusivity); the LDAvis default 0.6 balances them. p(w) is the corpus word marginal — pass term_frequency (word counts in vocabulary order) for the empirical marginal, else the topic-averaged φ is used. Returns (word, relevance) lists per topic, or for one topic.

topic_word is a fitted model (uses its topic_word and vocabulary) or a (K, V) array, in which case pass vocabulary.

topica.find_thoughts ¶

find_thoughts(doc_topic, texts=None, *, topic, n=3)

The n documents most associated with topic (≈ stm's findThoughts).

Returns a list of (doc_index, proportion, text) sorted by descending topic proportion; text is None when texts is not supplied.

doc_topic is a fitted model (uses its doc_topic) or a (D, K) array.

topica.find_thoughts_html ¶

find_thoughts_html(model, texts, *, topics=None, n_docs=3, n_words=8, max_chars=400, markdown=False)

Render each topic's most representative documents for close reading, with the topic's top words highlighted in the document text.

Distant reading (top words) is only half of topic validation; the other half is reading the actual documents a topic loads on. This builds a self-contained HTML snippet (or Markdown) you can display in a notebook: per topic, its top words followed by its n_docs highest-θ documents, each truncated to max_chars with the topic's words marked.

model is any fitted model exposing topic_word, doc_topic and vocabulary; texts are the original document strings, aligned to the rows of doc_topic. Returns a string (HTML unless markdown=True).

topica.topic_correlation ¶

topic_correlation(doc_topic, *, threshold=0.05)

Topic-correlation network (≈ stm's topicCorr "simple" method).

Correlates topic proportions across documents; topic pairs whose correlation exceeds threshold become network edges. Returns a :class:TopicCorrelation with the correlation matrix, a 0/1 adjacency matrix (zero diagonal), and the edge list.

This is the raw across-document theta correlation, matching stm's topicCorr default ("simple") method. Raw theta correlation is compositionally biased (the simplex constraint induces spurious negative correlation); for the closure-corrected alternatives use viz.topic_correlation(model, method="clr") (the viz layer's default) or method="partial"/"eta".

doc_topic is a fitted model (uses its doc_topic) or a (D, K) array.

topica.prepare_pyldavis ¶

prepare_pyldavis(model, docs, **kwargs)

Build the LDAvis intertopic-distance visualization for a fitted model.

docs are the tokenized training documents (list[list[str]]), used for document lengths and term frequencies. If pyLDAvis is installed this returns its PreparedData (pass to pyLDAvis.display / save_html); otherwise it returns a :class:PyLDAvisInputs you can feed to pyLDAvis.prepare later. Extra kwargs go to pyLDAvis.prepare (e.g. sort_topics=False).

Validation¶

topica.word_intrusion ¶

word_intrusion(model_or_phi, vocabulary=None, *, n_words=5, seed=0)

Build a word intrusion test for human topic validation.

For each topic, take its top n_words words and splice in one intruder — a word that ranks highly in some other topic but has low probability in this one. A coherent topic is one where a human can reliably spot the intruder (Chang et al. 2009, "Reading Tea Leaves"). Returns a list (per topic) of dicts with:

topic — the topic index,
words — the n_words + 1 words in shuffled, presentation order,
intruder — the intruder word,
intruder_index — its position in words (the answer key).

model_or_phi is a fitted model (uses its topic_word / vocabulary) or a (K, V) array (then pass vocabulary). Deterministic for a fixed seed.

topica.document_intrusion ¶

document_intrusion(model_or_theta, texts=None, *, n_docs=3, seed=0)

Build a document intrusion test for human topic validation.

For each topic, take the n_docs documents with the highest proportion of that topic and splice in one intruder — a document where the topic is nearly absent (and another topic dominates). A topic that captures real document similarity is one where a human can spot the intruder. Returns a list (per topic) of dicts with:

topic — the topic index,
doc_indices — the n_docs + 1 document indices in shuffled order,
intruder_index — the intruder's position in doc_indices,
texts — the corresponding text previews (only if texts is given).

model_or_theta is a (D, K) θ array (or a fitted model, whose doc_topic is used). Deterministic for a fixed seed.

LLM-based evaluation (`topica.llm`)¶

topica.llm.coherence ¶

coherence(model, *, backend, n_words=10, scale=(1, 3), dataset_description=None, seed=0, n_samples=1, shuffle=True, prompts=None)

LLM-rated topic coherence (Stammbach et al. 2023): the headline LLM metric.

For each topic, the top n_words words are shuffled and an LLM rates how related they are on a scale (default 1-3). Returns a per-topic numpy array of mean ratings (higher = more coherent). This is the metric that beats NPMI / c_v at tracking human judgment in the paper; it sits beside :func:coherence, :func:topic_diversity, and :func:topic_semantic_diversity, but is llm-bounded -- it calls an external model and is not bit-deterministic.

Parameters:

Name	Type	Description	Default
`model`	`fitted model or list of word lists`	Anything :func:`_extract_topics` accepts.	required
`backend`	callable ``str -> str`` or model-name str	The LLM. Pass `topica.llm.backend(name, temperature=0)` or a model name.	required
`n_words`	`the number of top words shown and the rating range.`		`10`
`scale`	`the number of top words shown and the rating range.`		`10`
`dataset_description`	`optional str`	A one-line corpus description added to the prompt (small reported gains).	`None`
`seed`	`int`	Seeds the per-topic word shuffles (reproducible task; the LLM is not).	`0`
`n_samples`	`int`	Calls the LLM this many times per topic and averages (tames non-determinism; the paper uses temperature=1 to mimic annotator variation).	`1`
`prompts`	`optional dict`	Override the editable templates (key `"rating"`); defaults to :data:`LLM_EVAL_PROMPTS`.	`None`

topica.llm.intrusion ¶

intrusion(model, vocabulary=None, *, backend, n_words=5, dataset_description=None, seed=0, n_samples=1, prompts=None)

LLM word-intrusion accuracy (Stammbach et al. 2023).

Builds the intrusion task with :func:word_intrusion (top n_words words plus one intruder, shuffled), asks the LLM to pick the intruder, and scores it against the answer key. Returns {"accuracy": float, "per_topic": [...]} where each per-topic dict has topic, intruder, picked, and correct.

The paper finds an LLM matches human accuracy on this task (~72%), but rating (:func:llm_coherence) tracks human topic rankings better -- lead with llm_coherence and report this alongside. llm-bounded; see :func:llm_coherence for the shared backend / n_samples semantics.

topica.llm.select_k ¶

select_k(models, docs, *, backend, n_docs=10, granularity='broad', example_labels=None, research_question=None, criterion='knee', tol=0.03, seed=0, n_samples=1, max_chars=1500, prompts=None)

Choose the number of topics by LLM document-label purity (Stammbach et al. 2023). For each candidate fitted model, take each topic's top n_docs documents, have an LLM assign each a theme label, and score the topic by label purity — the fraction of its documents sharing the majority label. The model's score is the mean per-topic purity.

This is the paper's working number-of-topics signal: doc-label purity tracks ground-truth cluster quality (ARI), whereas rating the top words across K does not (their negative result). Complements :func:search_k (coherence / exclusivity / perplexity) with a human-aligned, llm-bounded criterion.

.. note:: Purity rises then plateaus as K grows — over-splitting one theme into two topics yields two same-labelled, still-pure topics — so the raw maximum tends to over-split (the mirror of coherence's bias toward small K; cf. :func:search_k's frontier). The default criterion="knee" therefore returns the smallest K whose purity is within tol of the best (the plateau onset), not the bare argmax. Always read the full scores curve; criterion="max" restores the literal highest-purity pick.

Parameters:

Name	Type	Description	Default
`models`	`sequence of fitted models`	Candidates, typically the same corpus fit at different `num_topics`.	required
`docs`	`Corpus \| list of str \| list of token lists`	The documents, in the order the models were fit on (their `doc_topic` rows).	required
`backend`	callable ``str -> str`` or model-name str	The LLM (see :func:`llm_coherence`).	required
`n_docs`	`int`	Top documents per topic to label.	`10`
`granularity`	`(broad, narrow)`	Whether to ask for a broad or a narrow theme label.	`"broad"`
`example_labels`	`optional sequence of str`	Example label vocabulary shown to the model (steers granularity/format).	`None`
`research_question`	`optional str`	A one-line framing ("label by the policy area discussed", ...).	`None`
`criterion`	`(knee, max)`	How `best` is chosen from the purity curve. `"knee"` (default) returns the smallest `K` within `tol` of the best purity (the plateau onset); `"max"` returns the highest-purity model (which tends to over-split).	`"knee"`
`tol`	`float`	Purity tolerance for the knee (default 0.03).	`0.03`
`n_samples`	`int`	Majority-vote the label over this many calls per document.	`1`
`max_chars`	`int`	Truncate each document to this many characters in the prompt.	`1500`

Returns:

Type	Description
dict with ``best`` (the chosen model's ``num_topics``), ``best_index``, and
``scores`` (a list of ``{"num_topics", "purity", "per_topic_purity"}`` per model).

topica.llm.outlier ¶

outlier(model, *, backend, n_words=10, n_samples=5, threshold=3, dataset_description=None, seed=0, prompts=None)

Unsupervised semantic-outlier detection (Tan & D'Souza 2025, C_outlier).

For each topic, asks the LLM to list the words that do not fit the topic, over n_samples runs, and keeps a word flagged in at least threshold runs (the paper's 3-of-5 vote). Returns a per-topic list of dicts with topic, outliers (the flagged words), and count. Unlike :func:llm_intrusion there is no planted answer — this surfaces which words make a topic incoherent. llm-bounded; see :func:llm_coherence for backend/n_samples semantics.

topica.llm.repetitiveness ¶

repetitiveness(model, *, backend, n_words=10, n_samples=1, dataset_description=None, seed=0, prompts=None)

LLM repetitiveness (Tan & D'Souza 2025): is apparent coherence just redundancy?

Returns a per-topic list of dicts with rate (R_rate: 1 = highly repetitive, 3 = diverse/distinctive; averaged over n_samples), duplicate_pairs (R_duplicate: word pairs the LLM judges the same concept), and duplicate_count. A robust coherent topic has a high rate and a low duplicate count. Complements :func:topic_semantic_diversity on the LLM side. llm-bounded.

topica.llm.diversity ¶

diversity(model, *, backend, n_words=10, n_samples=1, max_pairs=None, dataset_description=None, seed=0, prompts=None)

Cross-topic LLM diversity (Tan & D'Souza 2025, D_rate).

Rates the thematic distinctiveness of every pair of topics 1-3 (1 = overlapping, 3 = distinctive) and averages. Returns {"mean": float, "pairwise": [...]} with one {"topics": (i, j), "rate": r} per scored pair. O(K²) calls; pass max_pairs to score a deterministic random subset. The LLM analog of :func:topic_diversity / :func:topic_semantic_diversity. llm-bounded.

topica.llm.alignment ¶

alignment(model, docs, *, backend, n_words=10, n_docs=5, dataset_description=None, seed=0, prompts=None, max_chars=1500)

Topic-document alignment (Tan & D'Souza 2025, A_ir-topic / A_missing-theme).

For each topic, takes its top n_docs documents and asks the LLM, per document, (1) how many topic words are irrelevant to it (overrepresentation) and (2) how many document themes are missing from the topic words (underrepresentation), averaging over the documents. Returns a per-topic list of dicts with topic, irrelevant (mean count) and missing (mean count); lower is better on both. Needs the documents and O(K·n_docs) calls. llm-bounded.

topica.llm.adversarial ¶

adversarial(model, *, backend, intruder='shakespeare', n_words=10, n_samples=5, threshold=3, dataset_description=None, seed=0, prompts=None)

Gold-free adversarial self-check (Tan & D'Souza 2025, AdvT_outlier).

Plants a known-unrelated word (default "shakespeare") into each topic's top words and measures how often the LLM's :func:llm_outlier detection flags it. This validates the metric and the model's capability without human-gold data, on any corpus — a low detection rate means the model is too weak for these tasks. Returns {"detection_rate": float, "intruder": str, "per_topic": [...]}.

topica.bootstrap_stability ¶

bootstrap_stability(docs, *, k=None, n_boot=20, topn=10, seed=0, model_factory=None, reference=None, **fit_kwargs)

Flag fragile topics by refitting on bootstrap resamples of the corpus.

The standard defense against "topic modeling is a fishing expedition": fit a reference model on the full corpus, then refit on n_boot resamples of the documents (drawn with replacement). Each bootstrap model's topics are matched to the reference's by top-word overlap, and a reference topic's stability is the mean Jaccard overlap of its top-topn words with its matched bootstrap topic. Topics that dissolve under resampling score low.

Matching is on the top words as strings, so it is correct even though each resample is fit as a fresh corpus with its own vocabulary indexing.

Parameters:

Name	Type	Description	Default
`docs`	the corpus (``list[list[str]]`` or a ``Corpus``).		required
`k`	number of topics. Required unless ``reference`` is given (then taken from	it).	`None`
`n_boot`	`number of bootstrap resamples.`		`20`
`model_factory`	``callable(seed) -> unfitted model``. Defaults to	`LDA(num_topics=k, seed=seed)`. Use it to bootstrap any model.	`None`
`reference`	`an already-fitted model to measure the stability of. When given,`	the resample topics are matched back to it (rather than to a fresh full-corpus fit), so the per-topic stability lines up with that model's topic indices. `model_factory` should rebuild the same model type.	`None`
`fit_kwargs`	forwarded to each model's ``fit`` (e.g. ``iters=500``).		required

Returns:

Type	Description
dict with ``topic`` (indices), ``stability`` (per-topic mean Jaccard in
``[0, 1]``), ``mean`` (overall), and ``reference`` (the reference model).

topica.search_k ¶

search_k(docs, ks, *, model='lda', prevalence=None, content=None, held_out=None, iters=500, num_samples=3, sample_interval=10, seed=42, coherence_n=10, coherence_type='u_mass')

Fit a model for each K and report quality metrics (stm's searchK).

With model="lda" (default) fits an :class:~topica.LDA per K. With model="stm" fits an :class:~topica.STM per K — pass prevalence (a covariate design matrix) and optional content (group labels) to scan K for the model you'll actually report.

Returns a :class:SearchKResult (a list of per-K dicts) with k, coherence (mean of selected coherence type, default "u_mass"), exclusivity (mean top-word exclusivity), and — when held_out is supplied — a held-out quality metric. The result also carries .directions (whether higher or lower is better per metric) and a .best_k(metric=...) selector. best_k defaults to the held-out metric when one is supplied, otherwise to a coherence/exclusivity frontier (a knee), because bare UMass coherence is roughly monotone in K and would just return the smallest K scanned.

Two held-out paths are supported, determined by the type of held_out:

Heldout object (from :func:make_heldout): scored with :func:eval_heldout; results stored under "heldout_loglik" (mean_per_doc_loglik, higher / less negative is better). Use this path for the standard within-corpus word-heldout diagnostic.
Corpus or token lists (legacy): scored with :func:perplexity; results stored under "perplexity" (lower is better). This is the document-completion perplexity on a separate held-out set.

Parameters:

Name	Type	Description	Default
`docs`	training documents (``list[list[str]]`` or a ``Corpus``).		required
`ks`	`sequence of topic counts to scan.`		required
`model`	``"lda"`` (default) or ``"stm"``.		`'lda'`
`prevalence`	covariate design matrix for ``model="stm"``; ignored otherwise.		`None`
`content`	optional content group labels (sequence of str/int) for ``model="stm"``.		`None`
`held_out`	optional held-out set. Pass a :class:`Heldout` (from	:func:`make_heldout`) or a separate corpus / token lists.	`None`
`iters`	`training iterations per fit.`		`500`
`num_samples`	`Gibbs samples per fit (LDA only).`		`3`
`sample_interval`	`iterations between Gibbs samples (LDA only).`		`10`
`seed`	`RNG seed for every fit and transform call.`		`42`
`coherence_n`	`top-word count used for coherence and exclusivity.`		`10`
`coherence_type`	one of ``"u_mass"``, ``"c_uci"``, ``"c_npmi"``, ``"c_v"`` (default ``"u_mass"``).		`'u_mass'`

topica.check_residuals ¶

check_residuals(model, docs, *, tol=0.01)

Residual-dispersion test for whether K is too small (Taddy 2012), a faithful port of R stm's checkResiduals.

Under a correctly specified model the multinomial residuals have dispersion σ² = 1. A dispersion well above 1 (small p-value) is evidence the latent topics cannot absorb the overdispersion — i.e. K is too low. Run it alongside :func:search_k. docs are the tokenized training documents aligned to model.doc_topic's rows.

Returns a :class:ResidualCheck with dispersion (σ²), pvalue (χ² test of σ²=1 vs σ²>1), and df.

topica.document_residuals ¶

document_residuals(model, docs, *, floor=1e-12)

How poorly the fitted model explains each document, for outlier hunting.

Reconstructs each document's expected word distribution as theta_d @ beta and compares it to the document's actual word counts. A high residual marks a document the current topics cannot account for: an off-topic intruder, an anomaly, or a sign the model is missing a theme. This is the per-document complement to :func:check_residuals, which collapses the whole corpus into one "is K too small?" dispersion statistic.

docs are the tokenized documents aligned row-for-row to model.doc_topic (the corpus the model was fit on). To score new documents, get their theta with model.transform first.

Returns a list of per-document dicts sorted by descending novelty (most anomalous first). Each has doc (row index), novelty (the headline score: OOV-aware per-word cross-entropy), cross_entropy (the length-robust in-vocabulary-only per-word log-loss; nan if the document has no in-vocab tokens), kl (KL(actual || recon); length-confounded, use with care), cosine_dist (1 - cosine), oov (out-of-vocabulary token fraction), n_tokens and n_invocab.

A pure cross-entropy residual can only see in-vocabulary tokens, so a document written entirely in unknown words would otherwise look perfectly explained; novelty folds the OOV mass back in, which is what makes off-topic-vocabulary intruders rank at the top.

topica.flag_topics ¶

flag_topics(model, texts, *, n=10, coherence_type='c_v')

Score every topic on cheap quality features and flag likely junk.

A quick "are these topics real, or did I forget to clean my corpus?" check. For each topic it gathers :func:topica.coherence, :func:topica.exclusivity, the normalized topic-word entropy (1.0 = a perfectly flat, uninformative topic), corpus prevalence, and the fraction of its top words that are stopwords, then flags a topic as junk when any of:

stopword-soup — at least 40% of the top words are stopwords;
dead/tiny — prevalence below half its uniform share (0.5 / K);
incoherent+flat — coherence in the run's bottom quartile and topic-word entropy in its top quartile.

The thresholds are relative to the run, so the flag reads as "junk for this model". texts are the tokenized documents (used only for coherence; they need not align to doc_topic).

Returns a list of per-topic dicts (in topic order) with topic, coherence, exclusivity, beta_entropy, prevalence, stopword_frac, junk (bool), reasons (list of str), and top_words.

topica.topic_dendrogram ¶

topic_dendrogram(model, *, metric='js', method='average', n_topwords=20)

Agglomeratively merge a fitted model's topics into a multi-resolution tree.

A post-hoc, no-refit answer to "are these K topics really a handful of super-themes, and are any of them near-duplicates?". It builds a K x K topic distance and runs hierarchical clustering, returning a :class:TopicDendrogram you can :meth:~TopicDendrogram.cut at any resolution or query for :meth:~TopicDendrogram.merge_candidates. This is the flat-model counterpart to :class:~topica.HLDA (which fits a topic tree directly) and to :func:topica.ensemble (which merges across runs).

Works on any fitted model exposing topic_word and vocabulary.

Parameters:

Name	Type	Description	Default
`model`	`a fitted topica model.`		required
`metric`	`(js, hellinger, cosine, doctopic)`	Topic distance. `js` (Jensen-Shannon) and `hellinger` compare the full topic-word distributions; `cosine` compares top-`n_topwords` indicator sets; `doctopic` uses `1 - correlation` of the `doc_topic` columns (how often topics co-occur in documents).	`"js"`
`method`	`str`	SciPy linkage method ("average", "ward", "complete", ...).	`"average"`
`n_topwords`	`int`	Words per topic for the `cosine` metric and for leaf labels.	`20`

Returns:

Type	Description
class:`TopicDendrogram`.

Notes

Requires SciPy (pip install 'topica[viz]' or scipy).

topica.align_topics ¶

align_topics(a, b, *, metric='cosine', threshold=0.3, depth=50, p=0.9, word_embeddings=None) -> AlignmentResult

Match the topics of two fits one-to-one by minimal total distance (Hungarian on the cross-fit topic-word distance matrix). Use it to compare runs across seeds, across K, or train vs. resample.

a, b are fitted models or K×V topic-word arrays (same vocabulary order, or automatically intersected if .vocabulary is available). metric is "cosine", "js" (Jensen-Shannon), "rbo" (Rank-biased overlap), or "emd"/"ot" (Earth Mover's Distance). Returns an AlignmentResult object which behaves as a list of (topic_a, topic_b, distance) tuples sorted by topic_a, but exposes additional attributes: matches, splits, merges, unaligned_a, unaligned_b, and similarity_matrix.

topica.topic_stability ¶

topic_stability(runs, *, topn=10, metric='cosine')

Term-centric stability of topics across multiple fits (Greene, O'Callaghan & Cunningham 2014): a "how robust is this K?" score.

runs is a list of fitted models or topic-word arrays over the same vocabulary (e.g. fits at different seeds, or on bootstrap resamples). Each later run's topics are matched to the first run's, and stability is the mean Jaccard overlap of their top-topn words. Returns a float in [0, 1]; higher means more reproducible topics.

topica.ensemble ¶

Ensemble topic modeling: combine several independent fits into one consensus.

A single topic-model fit is a draw from a noisy procedure — change the seed or a hyperparameter and the topics shift, sometimes a lot (Hoyle et al. 2022, "Are Neural Topic Models Broken?"). Combining several independent runs is more reliable than any one run: across Hoyle et al.'s experiments the ensemble improves on the median run in 97% of contexts and never loses to the worst. This module builds that consensus.

It is the natural follow-on to :func:~topica.select_model, which fits N runs at a fixed K. Instead of picking the best run with plot_models, ensemble combines all of them.

Three methods are available:

method="cluster" (default) reproduces Hoyle et al. §6. Pool the topics from every run (m runs of K topics each give m·K topics), measure the pairwise distance between them — a blend lambda_·D(topic-word) + (1-lambda_)·D(doc-topic) using a top-weighted rank distance (Rank-Biased Overlap, or average Jaccard) — cluster the pooled topics into K groups, and take the element-wise mean within each cluster. Clustering does not force a one-to-one match, so a topic that splits or merges across runs is handled naturally, and a cluster only a few runs contributed to is flagged as low-support.

method="align" is a lighter, fully deterministic alternative (the Miller & McCoy 2017 / Mäntylä et al. 2018 lineage): align every run's topics one-to-one to a single reference run (Hungarian matching on the topic-word distributions) and average the aligned topics. No clustering, no Θ, no λ.

method="stable" reimplements gensim's EnsembleLda (Brigl 2019). It does not fix K: it pools the topics, measures an asymmetric masked-cosine distance between them, runs Checkback DBSCAN (CBDBSCAN) to find dense, reproducible "cores", and keeps only the clusters with enough cores as stable topics (averaging their members). Unstable topics — those that do not recur densely across runs — are discarded as noise rather than averaged in, so the number of consensus topics is discovered from the data. Validated against gensim in parity/.

The result duck-types as a fitted model for the model-neutral analysis surface (it exposes topic_word, doc_topic, and vocabulary), so the consensus flows straight into :func:~topica.coherence, the diagnostics, and the rest. Each ensemble topic carries a stability score and a reliable flag, so a consensus topic the individual runs do not actually agree on is marked, not silently trusted.

cached `module-attribute` ¶

__cached__ = '/home/runner/work/topica/topica/python/topica/__pycache__/ensemble.cpython-311.pyc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

doc `module-attribute` ¶

__doc__ = 'Ensemble topic modeling: combine several independent fits into one consensus.\n\nA single topic-model fit is a draw from a noisy procedure — change the seed or a\nhyperparameter and the topics shift, sometimes a lot (Hoyle et al. 2022, "Are\nNeural Topic Models Broken?"). Combining several independent runs is more reliable\nthan any one run: across Hoyle et al.\'s experiments the ensemble improves on the\nmedian run in 97% of contexts and never loses to the worst. This module builds\nthat consensus.\n\nIt is the natural follow-on to :func:`~topica.select_model`, which fits N runs at a\nfixed K. Instead of *picking* the best run with ``plot_models``, ``ensemble``\n*combines* all of them.\n\nThree methods are available:\n\n``method="cluster"`` (default) reproduces Hoyle et al. §6. Pool the topics from\nevery run (m runs of K topics each give m·K topics), measure the pairwise distance\nbetween them — a blend ``lambda_·D(topic-word) + (1-lambda_)·D(doc-topic)`` using a\ntop-weighted rank distance (Rank-Biased Overlap, or average Jaccard) — cluster the\npooled topics into K groups, and take the element-wise mean within each cluster.\nClustering does not force a one-to-one match, so a topic that splits or merges\nacross runs is handled naturally, and a cluster only a few runs contributed to is\nflagged as low-support.\n\n``method="align"`` is a lighter, fully deterministic alternative (the Miller &\nMcCoy 2017 / Mäntylä et al. 2018 lineage): align every run\'s topics one-to-one to\na single reference run (Hungarian matching on the topic-word distributions) and\naverage the aligned topics. No clustering, no Θ, no λ.\n\n``method="stable"`` reimplements gensim\'s ``EnsembleLda`` (Brigl 2019). It does not\nfix K: it pools the topics, measures an asymmetric masked-cosine distance between\nthem, runs Checkback DBSCAN (CBDBSCAN) to find dense, reproducible "cores", and\nkeeps only the clusters with enough cores as *stable topics* (averaging their\nmembers). Unstable topics — those that do not recur densely across runs — are\ndiscarded as noise rather than averaged in, so the number of consensus topics is\ndiscovered from the data. Validated against gensim in ``parity/``.\n\nThe result duck-types as a fitted model for the model-neutral analysis surface (it\nexposes ``topic_word``, ``doc_topic``, and ``vocabulary``), so the consensus flows\nstraight into :func:`~topica.coherence`, the diagnostics, and the rest. Each\nensemble topic carries a ``stability`` score and a ``reliable`` flag, so a\nconsensus topic the individual runs do not actually agree on is marked, not\nsilently trusted.\n'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

file `module-attribute` ¶

__file__ = '/home/runner/work/topica/topica/python/topica/ensemble.py'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

name `module-attribute` ¶

__name__ = 'topica.ensemble'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

package `module-attribute` ¶

__package__ = 'topica'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

EnsembleResult ¶

Consensus of several topic-model fits, returned by :func:ensemble.

Exposes topic_word, doc_topic, and vocabulary so it can be passed wherever a fitted model is accepted by the model-neutral analysis functions (:func:~topica.coherence, the diagnostics surface, :func:~topica.align_topics).

Attributes:

Name	Type	Description
`topic_word`	``(K, V)`` averaged, row-normalized topic-word matrix.
`doc_topic`	``(D, K)`` averaged document-topic matrix, or ``None`` when the	runs were not fit on the same documents in the same order.
`vocabulary`	the shared vocabulary, or ``None`` when raw arrays were passed.
`stability`	``(K,)`` per-topic consistency in ``[0, 1]``. For ``"cluster"`` and	`"stable"` it is one minus the mean pairwise distance among the run topics that formed the cluster; for `"align"` it is the mean top-word Jaccard with the matched run topics. 1.0 means every run produced the same topic.
`support`	``(K,)`` how well-backed each topic is. For ``"cluster"`` and	`"stable"` it is the fraction of runs that contributed a topic to the cluster (1.0 = all runs found it); for `"align"` it is the match margin over the next-best run topic. A small value means few runs really support the topic.
`reliable`	``(K,)`` bool — ``stability >= 0.5`` and well-supported. An	unreliable topic is a consensus the individual runs do not agree on; treat it with suspicion. (`"stable"` topics are reproducible by construction, so this is usually all `True`.)
`agreement`	scalar mean of ``stability`` — an overall "how reproducible is this	K?" number (`nan` if `"stable"` found no topics).
`method`	``"cluster"``, ``"align"``, or ``"stable"``.
`cluster_sizes`	``(K,)`` number of run topics in each cluster (``"cluster"``	and `"stable"`; `None` for `"align"`).
`reference`	index of the reference run (``"align"`` only; ``None`` for	`"cluster"`).
`n_runs`	`number of fits combined.`
`runs`	`the input fits, in the order given.`

annotations `class-attribute` ¶

__annotations__ = {'topic_word': 'np.ndarray', 'doc_topic': 'np.ndarray | None', 'vocabulary': 'list | None', 'stability': 'np.ndarray', 'support': 'np.ndarray', 'reliable': 'np.ndarray', 'agreement': 'float', 'method': 'str', 'cluster_sizes': 'np.ndarray | None', 'reference': 'int | None', 'n_runs': 'int', 'runs': 'list'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ `class-attribute` ¶

__dataclass_fields__ = {'topic_word': Field(name='topic_word',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'doc_topic': Field(name='doc_topic',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'vocabulary': Field(name='vocabulary',type='list | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'stability': Field(name='stability',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'support': Field(name='support',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'reliable': Field(name='reliable',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'agreement': Field(name='agreement',type='float',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'method': Field(name='method',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cluster_sizes': Field(name='cluster_sizes',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'reference': Field(name='reference',type='int | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'n_runs': Field(name='n_runs',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'runs': Field(name='runs',type='list',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

doc `class-attribute` ¶

__doc__ = 'Consensus of several topic-model fits, returned by :func:`ensemble`.\n\n    Exposes ``topic_word``, ``doc_topic``, and ``vocabulary`` so it can be passed\n    wherever a fitted model is accepted by the model-neutral analysis functions\n    (:func:`~topica.coherence`, the diagnostics surface, :func:`~topica.align_topics`).\n\n    Attributes\n    ----------\n    topic_word : ``(K, V)`` averaged, row-normalized topic-word matrix.\n    doc_topic : ``(D, K)`` averaged document-topic matrix, or ``None`` when the\n        runs were not fit on the same documents in the same order.\n    vocabulary : the shared vocabulary, or ``None`` when raw arrays were passed.\n    stability : ``(K,)`` per-topic consistency in ``[0, 1]``. For ``"cluster"`` and\n        ``"stable"`` it is one minus the mean pairwise distance among the run\n        topics that formed the cluster; for ``"align"`` it is the mean top-word\n        Jaccard with the matched run topics. 1.0 means every run produced the same\n        topic.\n    support : ``(K,)`` how well-backed each topic is. For ``"cluster"`` and\n        ``"stable"`` it is the fraction of runs that contributed a topic to the\n        cluster (1.0 = all runs found it); for ``"align"`` it is the match margin\n        over the next-best run topic. A small value means few runs really support\n        the topic.\n    reliable : ``(K,)`` bool — ``stability >= 0.5`` *and* well-supported. An\n        unreliable topic is a consensus the individual runs do not agree on; treat\n        it with suspicion. (``"stable"`` topics are reproducible by construction,\n        so this is usually all ``True``.)\n    agreement : scalar mean of ``stability`` — an overall "how reproducible is this\n        K?" number (``nan`` if ``"stable"`` found no topics).\n    method : ``"cluster"``, ``"align"``, or ``"stable"``.\n    cluster_sizes : ``(K,)`` number of run topics in each cluster (``"cluster"``\n        and ``"stable"``; ``None`` for ``"align"``).\n    reference : index of the reference run (``"align"`` only; ``None`` for\n        ``"cluster"``).\n    n_runs : number of fits combined.\n    runs : the input fits, in the order given.\n    '

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__match_args__ `class-attribute` ¶

__match_args__ = ('topic_word', 'doc_topic', 'vocabulary', 'stability', 'support', 'reliable', 'agreement', 'method', 'cluster_sizes', 'reference', 'n_runs', 'runs')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

module `class-attribute` ¶

__module__ = 'topica.ensemble'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

weakref `property` ¶

__weakref__

list of weak references to the object

top_words ¶

top_words(n=10)

Top-n (term, probability) pairs per ensemble topic, matching the fitted-model contract so the result drops into the analysis surface. The term is a word when a vocabulary is known, else the integer term index.

ensemble ¶

ensemble(runs, *, method='cluster', num_topics=None, lambda_=0.5, distance='rbo', topn=10, reference='medoid', metric='cosine', weights=None, eps=0.1, min_samples=None, min_cores=None, masking='mass', masking_threshold=None)

Combine several topic-model fits into one consensus model.

The consensus is more reliable than any single run — it beats the median run and rarely loses to the best (Hoyle et al. 2022). This is the natural follow-on to :func:~topica.select_model: fit N runs, then combine them here instead of picking one.

Parameters:

Name	Type	Description	Default
`runs`	a list of fitted models (or ``(K, V)`` topic-word arrays sharing a	vocabulary), or a :class:`~topica.validation.SelectModelResult`. All runs must share the same K and vocabulary.	required
`method`	``"cluster"`` (default) reproduces Hoyle et al. §6 — pool the topics	from all runs, cluster them, and average within each cluster. `"align"` matches every run's topics to one reference run and averages the aligned topics (simpler, deterministic, no document-topic distance).	`'cluster'`
`num_topics`	number of consensus topics for ``"cluster"`` (default: the runs'	K). Ignored by `"align"` (which always returns K).	`None`
`lambda_`	``"cluster"`` only — weight on the topic-word distance when pooling	topics; `1 - lambda_` weights the document-topic distance. Falls back to `1.0` when the runs were not fit on the same documents.	`0.5`
`distance`	``"cluster"`` only — the top-weighted rank distance between topics:	`"rbo"` (Rank-Biased Overlap, default) or `"jaccard"` (average Jaccard).	`'rbo'`
`topn`	`top-word (and top-document) count for the distances and diagnostics.`		`10`
`reference`	``"align"`` only — which run anchors the matching. ``"medoid"``	(default) picks the run that aligns most cheaply to all others; `"first"` uses run 0; an int uses that run.	`'medoid'`
`metric`	``"align"`` only — topic-word distance for the matching, ``"cosine"``	(default) or `"js"`.	`'cosine'`
`weights`	optional per-run weights (length ``n_runs``) for a weighted average —	e.g. down-weight low-coherence runs. `None` (default) weights equally. Used by `"cluster"` and `"align"`.	`None`
`eps`	``"stable"`` only —	the gensim `EnsembleLda` knobs. `eps` (default 0.1) is the CBDBSCAN neighbor radius; `min_samples` (default `int(n_runs/2)`) the neighbors needed to be a core; `min_cores` (default `min(3, n_runs//4 + 1)`) the cores a cluster needs to count as a stable topic; `masking` is `"mass"` (default) or `"rank"` and `masking_threshold` its cutoff (gensim defaults 0.95 / 0.11). A larger `eps` or smaller `min_cores` yields more (looser) stable topics.	`0.1`
`min_samples`	``"stable"`` only —	the gensim `EnsembleLda` knobs. `eps` (default 0.1) is the CBDBSCAN neighbor radius; `min_samples` (default `int(n_runs/2)`) the neighbors needed to be a core; `min_cores` (default `min(3, n_runs//4 + 1)`) the cores a cluster needs to count as a stable topic; `masking` is `"mass"` (default) or `"rank"` and `masking_threshold` its cutoff (gensim defaults 0.95 / 0.11). A larger `eps` or smaller `min_cores` yields more (looser) stable topics.	`0.1`
`min_cores`	``"stable"`` only —	the gensim `EnsembleLda` knobs. `eps` (default 0.1) is the CBDBSCAN neighbor radius; `min_samples` (default `int(n_runs/2)`) the neighbors needed to be a core; `min_cores` (default `min(3, n_runs//4 + 1)`) the cores a cluster needs to count as a stable topic; `masking` is `"mass"` (default) or `"rank"` and `masking_threshold` its cutoff (gensim defaults 0.95 / 0.11). A larger `eps` or smaller `min_cores` yields more (looser) stable topics.	`0.1`
`masking`	``"stable"`` only —	the gensim `EnsembleLda` knobs. `eps` (default 0.1) is the CBDBSCAN neighbor radius; `min_samples` (default `int(n_runs/2)`) the neighbors needed to be a core; `min_cores` (default `min(3, n_runs//4 + 1)`) the cores a cluster needs to count as a stable topic; `masking` is `"mass"` (default) or `"rank"` and `masking_threshold` its cutoff (gensim defaults 0.95 / 0.11). A larger `eps` or smaller `min_cores` yields more (looser) stable topics.	`0.1`
`masking_threshold`	``"stable"`` only —	the gensim `EnsembleLda` knobs. `eps` (default 0.1) is the CBDBSCAN neighbor radius; `min_samples` (default `int(n_runs/2)`) the neighbors needed to be a core; `min_cores` (default `min(3, n_runs//4 + 1)`) the cores a cluster needs to count as a stable topic; `masking` is `"mass"` (default) or `"rank"` and `masking_threshold` its cutoff (gensim defaults 0.95 / 0.11). A larger `eps` or smaller `min_cores` yields more (looser) stable topics.	`0.1`

Returns:

Name	Type	Description
`An`	class:`EnsembleResult`. It exposes ``topic_word``, ``doc_topic``, and
	``vocabulary``, so it passes straight into :func:`~topica.coherence`, the
	diagnostics, and other model-neutral analyses. Per-topic ``stability`` and
	``reliable`` flags mark consensus topics the individual runs do not agree on.

cross_ensemble ¶

cross_ensemble(models, texts=None, *, method='cluster', num_topics=None, lambda_=0.5, distance='rbo', topn=10, weights=None) -> EnsembleResult

Combine several topic-model fits from different architectures into one consensus.

Unlike ensemble, cross_ensemble allows combining models with different architectures (e.g. classical parametric models like LDA/STM and neural embedding models like BERTopic) and automatically intersects/aligns their vocabularies if they expose a .vocabulary attribute.

Parameters:

Name	Type	Default
`models`	`list of fitted model instances.`	required
`texts`	`optional text corpus / tokenized documents, used for validating document counts.`	`None`
`method`	``"cluster"`` (default) - pools and clusters the topics.	`'cluster'`
`num_topics`	`number of consensus topics (default: median K of the input models).`	`None`
`lambda_`	`weight on topic-word distance vs document-topic distance.`	`0.5`
`distance`	distance metric for clustering (``"rbo"`` or ``"jaccard"``).	`'rbo'`
`topn`	`number of top words to use for distance calculation.`	`10`
`weights`	`optional per-model weights.`	`None`

Returns:

Type	Description
An ``EnsembleResult``.

topica.EnsembleResult ¶

Consensus of several topic-model fits, returned by :func:ensemble.

Exposes topic_word, doc_topic, and vocabulary so it can be passed wherever a fitted model is accepted by the model-neutral analysis functions (:func:~topica.coherence, the diagnostics surface, :func:~topica.align_topics).

Attributes:

Name	Type	Description
`topic_word`	``(K, V)`` averaged, row-normalized topic-word matrix.
`doc_topic`	``(D, K)`` averaged document-topic matrix, or ``None`` when the	runs were not fit on the same documents in the same order.
`vocabulary`	the shared vocabulary, or ``None`` when raw arrays were passed.
`stability`	``(K,)`` per-topic consistency in ``[0, 1]``. For ``"cluster"`` and	`"stable"` it is one minus the mean pairwise distance among the run topics that formed the cluster; for `"align"` it is the mean top-word Jaccard with the matched run topics. 1.0 means every run produced the same topic.
`support`	``(K,)`` how well-backed each topic is. For ``"cluster"`` and	`"stable"` it is the fraction of runs that contributed a topic to the cluster (1.0 = all runs found it); for `"align"` it is the match margin over the next-best run topic. A small value means few runs really support the topic.
`reliable`	``(K,)`` bool — ``stability >= 0.5`` and well-supported. An	unreliable topic is a consensus the individual runs do not agree on; treat it with suspicion. (`"stable"` topics are reproducible by construction, so this is usually all `True`.)
`agreement`	scalar mean of ``stability`` — an overall "how reproducible is this	K?" number (`nan` if `"stable"` found no topics).
`method`	``"cluster"``, ``"align"``, or ``"stable"``.
`cluster_sizes`	``(K,)`` number of run topics in each cluster (``"cluster"``	and `"stable"`; `None` for `"align"`).
`reference`	index of the reference run (``"align"`` only; ``None`` for	`"cluster"`).
`n_runs`	`number of fits combined.`
`runs`	`the input fits, in the order given.`

annotations `class-attribute` ¶

__annotations__ = {'topic_word': 'np.ndarray', 'doc_topic': 'np.ndarray | None', 'vocabulary': 'list | None', 'stability': 'np.ndarray', 'support': 'np.ndarray', 'reliable': 'np.ndarray', 'agreement': 'float', 'method': 'str', 'cluster_sizes': 'np.ndarray | None', 'reference': 'int | None', 'n_runs': 'int', 'runs': 'list'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ `class-attribute` ¶

__dataclass_fields__ = {'topic_word': Field(name='topic_word',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'doc_topic': Field(name='doc_topic',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'vocabulary': Field(name='vocabulary',type='list | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'stability': Field(name='stability',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'support': Field(name='support',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'reliable': Field(name='reliable',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'agreement': Field(name='agreement',type='float',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'method': Field(name='method',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cluster_sizes': Field(name='cluster_sizes',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'reference': Field(name='reference',type='int | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'n_runs': Field(name='n_runs',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'runs': Field(name='runs',type='list',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=False,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

doc `class-attribute` ¶

__doc__ = 'Consensus of several topic-model fits, returned by :func:`ensemble`.\n\n    Exposes ``topic_word``, ``doc_topic``, and ``vocabulary`` so it can be passed\n    wherever a fitted model is accepted by the model-neutral analysis functions\n    (:func:`~topica.coherence`, the diagnostics surface, :func:`~topica.align_topics`).\n\n    Attributes\n    ----------\n    topic_word : ``(K, V)`` averaged, row-normalized topic-word matrix.\n    doc_topic : ``(D, K)`` averaged document-topic matrix, or ``None`` when the\n        runs were not fit on the same documents in the same order.\n    vocabulary : the shared vocabulary, or ``None`` when raw arrays were passed.\n    stability : ``(K,)`` per-topic consistency in ``[0, 1]``. For ``"cluster"`` and\n        ``"stable"`` it is one minus the mean pairwise distance among the run\n        topics that formed the cluster; for ``"align"`` it is the mean top-word\n        Jaccard with the matched run topics. 1.0 means every run produced the same\n        topic.\n    support : ``(K,)`` how well-backed each topic is. For ``"cluster"`` and\n        ``"stable"`` it is the fraction of runs that contributed a topic to the\n        cluster (1.0 = all runs found it); for ``"align"`` it is the match margin\n        over the next-best run topic. A small value means few runs really support\n        the topic.\n    reliable : ``(K,)`` bool — ``stability >= 0.5`` *and* well-supported. An\n        unreliable topic is a consensus the individual runs do not agree on; treat\n        it with suspicion. (``"stable"`` topics are reproducible by construction,\n        so this is usually all ``True``.)\n    agreement : scalar mean of ``stability`` — an overall "how reproducible is this\n        K?" number (``nan`` if ``"stable"`` found no topics).\n    method : ``"cluster"``, ``"align"``, or ``"stable"``.\n    cluster_sizes : ``(K,)`` number of run topics in each cluster (``"cluster"``\n        and ``"stable"``; ``None`` for ``"align"``).\n    reference : index of the reference run (``"align"`` only; ``None`` for\n        ``"cluster"``).\n    n_runs : number of fits combined.\n    runs : the input fits, in the order given.\n    '

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__match_args__ `class-attribute` ¶

__match_args__ = ('topic_word', 'doc_topic', 'vocabulary', 'stability', 'support', 'reliable', 'agreement', 'method', 'cluster_sizes', 'reference', 'n_runs', 'runs')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

module `class-attribute` ¶

__module__ = 'topica.ensemble'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

weakref `property` ¶

__weakref__

list of weak references to the object

top_words ¶

top_words(n=10)

Top-n (term, probability) pairs per ensemble topic, matching the fitted-model contract so the result drops into the analysis surface. The term is a word when a vocabulary is known, else the integer term index.

topica.cross_ensemble ¶

cross_ensemble(models, texts=None, *, method='cluster', num_topics=None, lambda_=0.5, distance='rbo', topn=10, weights=None) -> EnsembleResult

Combine several topic-model fits from different architectures into one consensus.

Unlike ensemble, cross_ensemble allows combining models with different architectures (e.g. classical parametric models like LDA/STM and neural embedding models like BERTopic) and automatically intersects/aligns their vocabularies if they expose a .vocabulary attribute.

Parameters:

Name	Type	Default
`models`	`list of fitted model instances.`	required
`texts`	`optional text corpus / tokenized documents, used for validating document counts.`	`None`
`method`	``"cluster"`` (default) - pools and clusters the topics.	`'cluster'`
`num_topics`	`number of consensus topics (default: median K of the input models).`	`None`
`lambda_`	`weight on topic-word distance vs document-topic distance.`	`0.5`
`distance`	distance metric for clustering (``"rbo"`` or ``"jaccard"``).	`'rbo'`
`topn`	`number of top words to use for distance calculation.`	`10`
`weights`	`optional per-model weights.`	`None`

Returns:

Type	Description
An ``EnsembleResult``.

MCMC convergence (`topica.mcmc`)¶

Single-chain autocorrelation and effective sample size for the collapsed-Gibbs models, computed from the retained log-likelihood trace and theta_draws. See the convergence section of the diagnostics guide.

topica.mcmc_diagnostics ¶

mcmc_diagnostics(model, *, warn: bool = True) -> McmcDiagnostics

Single-chain MCMC diagnostics from a fitted Gibbs model's retained traces.

Reads the model's log-likelihood history and thinned theta_draws and reports the autocorrelation and effective sample size of each -- the honest "has the chain mixed?" companion to the convergence_tol plateau check.

Parameters:

Name	Type	Description	Default
`model`	`a fitted topica model`	Must expose `theta_draws` (fit with `keep_theta_draws=True`, the default). The log-likelihood diagnostics also need a non-empty `log_likelihood_history` / `fit_history`.	required
`warn`	`bool`	Warn when the model is not a Gibbs sampler. The variational models (STM, CTM, ...) converge a bound and have no MCMC chain; these diagnostics do not apply to them.	`True`

Returns:

Type	Description
`McmcDiagnostics`

Raises:

Type	Description
`ValueError`	If the model retained no `theta_draws`.

topica.McmcDiagnostics ¶

Single-chain MCMC diagnostics for a fitted Gibbs model.

Attributes:

Name	Type	Description
`model`	`str`	The model class name.
`inference`	`str or None`	The model's inference engine from the registry (`"gibbs"` for the samplers these diagnostics are meant for).
`n_draws`	`int`	Number of retained `theta_draws`.
`loglik_autocorr`	`ndarray or None`	Autocorrelation of the log-likelihood trace, or `None` when the model recorded no trace (e.g. the WarpLDA / CVB0 sampler paths).
`loglik_tau`	`float or None`	Integrated autocorrelation time of the log-likelihood trace.
`loglik_ess`	`float or None`	Effective sample size of the log-likelihood trace (`len(trace) / tau`).
`theta_ess`	`ndarray`	Per-element effective sample size of `theta_draws`, shaped `(num_docs, num_topics)`.

annotations `class-attribute` ¶

__annotations__ = {'model': 'str', 'inference': 'str | None', 'n_draws': 'int', 'loglik_autocorr': 'np.ndarray | None', 'loglik_tau': 'float | None', 'loglik_ess': 'float | None', 'theta_ess': 'np.ndarray'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ `class-attribute` ¶

__dataclass_fields__ = {'model': Field(name='model',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'inference': Field(name='inference',type='str | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'n_draws': Field(name='n_draws',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_autocorr': Field(name='loglik_autocorr',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_tau': Field(name='loglik_tau',type='float | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_ess': Field(name='loglik_ess',type='float | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'theta_ess': Field(name='theta_ess',type='np.ndarray',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

doc `class-attribute` ¶

__doc__ = 'Single-chain MCMC diagnostics for a fitted Gibbs model.\n\n    Attributes\n    ----------\n    model : str\n        The model class name.\n    inference : str or None\n        The model\'s inference engine from the registry (``"gibbs"`` for the\n        samplers these diagnostics are meant for).\n    n_draws : int\n        Number of retained ``theta_draws``.\n    loglik_autocorr : numpy.ndarray or None\n        Autocorrelation of the log-likelihood trace, or ``None`` when the model\n        recorded no trace (e.g. the WarpLDA / CVB0 sampler paths).\n    loglik_tau : float or None\n        Integrated autocorrelation time of the log-likelihood trace.\n    loglik_ess : float or None\n        Effective sample size of the log-likelihood trace (``len(trace) / tau``).\n    theta_ess : numpy.ndarray\n        Per-element effective sample size of ``theta_draws``, shaped\n        ``(num_docs, num_topics)``.\n    '

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__match_args__ `class-attribute` ¶

__match_args__ = ('model', 'inference', 'n_draws', 'loglik_autocorr', 'loglik_tau', 'loglik_ess', 'theta_ess')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

module `class-attribute` ¶

__module__ = 'topica.mcmc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

weakref `property` ¶

__weakref__

list of weak references to the object

summary ¶

summary() -> str

A short human-readable table of the diagnostics.

topica.effective_sample_size ¶

effective_sample_size(chain)

Effective sample size ESS = N / tau of a chain.

Parameters:

Name	Type	Description	Default
`chain`	`1-D or 2-D array`	A single scalar chain of length `N`, or an `(N, params)` matrix of `params` independent chains sharing the `N` draws (e.g. `theta_draws` flattened over docs and topics).	required

Returns:

Type	Description
`float or ndarray`	A scalar for a 1-D chain, or a `(params,)` array of per-column ESS for a 2-D chain. A degenerate (constant) column yields `0.0`.

topica.integrated_autocorr_time ¶

integrated_autocorr_time(x) -> float

Integrated autocorrelation time tau = 1 + 2 * sum_{t>=1} rho_t.

Uses Geyer's (1992) initial-positive-sequence estimator: the pair sums Gamma_m = rho_{2m} + rho_{2m+1} are summed until the first non-positive pair, which truncates the noisy tail of the empirical autocorrelation. Floored at 1 (an independent chain), so ESS never exceeds N.

A degenerate (constant) chain returns inf -- there is no information to resample from, so its effective sample size is zero.

topica.autocorrelation ¶

autocorrelation(x, max_lag: int | None = None) -> np.ndarray

Autocorrelation function of a 1-D trace at lags 0..max_lag.

Parameters:

Name	Type	Description	Default
`x`	`1-D sequence`	The trace (e.g. a log-likelihood history or a single scalar chain).	required
`max_lag`	`int`	Highest lag to return. Defaults to `len(x) - 1`. Lag 0 is always 1.	`None`

Returns:

Type	Description
`ndarray`	`rho[0..max_lag]` with `rho[0] == 1`. A constant trace (zero variance) returns all zeros except `rho[0]`.

Multi-chain Gelman-Rubin R-hat and cross-chain ESS, from several fits of the same model at different seeds. See the R-hat section of the diagnostics guide.

topica.multichain_diagnostics ¶

multichain_diagnostics(chains, *, warmup: float = 0.5, metric: str = 'cosine', reference: int = 0, warn: bool = True) -> MultiChainDiagnostics

Gelman-Rubin diagnostics across several fitted Gibbs models.

Fit the same model at several seeds on the same corpus, pass the fitted models here, and this reports whether the chains agree. Two views are computed: R-hat and cross-chain ESS on the permutation-invariant log-likelihood trace, and per-topic R-hat on each topic's prevalence after the topics are aligned across chains (topic indices are label-switched across seeds, so alignment comes first).

Parameters:

Name	Type	Description	Default
`chains`	`sequence of fitted topica models`	At least two fits of the same model class on the same corpus at different seeds. The topic-level statistics need `theta_draws` on every chain (fit with `keep_theta_draws=True`, the default).	required
`warmup`	`float`	Fraction of each log-likelihood trace to discard from the front as burn-in before computing R-hat. The `theta_draws` are already the last post-warmup thinned samples, so the warmup fraction applies to the log-likelihood trace only.	`0.5`
`metric`	`str`	Topic-word distance metric for the cross-chain alignment (passed to :func:`topica.align_topics`).	`"cosine"`
`reference`	`int`	Index of the chain whose topic order the others are aligned to.	`0`
`warn`	`bool`	Warn when the chains are not Gibbs samplers, disagree on class, or lack the traces a statistic needs.	`True`

Returns:

Type	Description
`MultiChainDiagnostics`

Raises:

Type	Description
`ValueError`	If fewer than two chains are supplied.

topica.MultiChainDiagnostics ¶

Multi-chain (Gelman-Rubin) diagnostics for a set of fitted Gibbs models.

Attributes:

Name	Type	Description
`model`	`str`	The model class name (all chains must share it).
`inference`	`str or None`	The model's inference engine from the registry.
`n_chains`	`int`	Number of chains compared.
`n_draws`	`int`	Retained `theta_draws` per chain used for the topic-level statistics.
`loglik_rhat`	`float or None`	R-hat of the (post-warmup) log-likelihood trace across chains -- the permutation-invariant "did the chains agree?" headline. `None` when the chains recorded no usable log-likelihood trace.
`loglik_ess`	`float or None`	Cross-chain effective sample size of the log-likelihood trace.
`loglik_n`	`int or None`	Post-warmup trace length per chain used for the log-likelihood statistics.
`topic_rhat`	`ndarray or None`	Per-topic R-hat of each aligned topic's per-draw prevalence, shape `(num_topics,)`. `None` when the chains retained no `theta_draws`.
`topic_ess`	`ndarray or None`	Per-topic cross-chain ESS of the aligned topic prevalence.
`topic_alignment`	`ndarray or None`	Per-topic alignment quality: the minimum top-word Jaccard of the topic to its reference-chain match across the other chains. Low values flag topics whose R-hat compares topics that did not line up.
`reference`	`int or None`	Index of the chain used as the alignment reference.

annotations `class-attribute` ¶

__annotations__ = {'model': 'str', 'inference': 'str | None', 'n_chains': 'int', 'n_draws': 'int', 'loglik_rhat': 'float | None', 'loglik_ess': 'float | None', 'loglik_n': 'int | None', 'topic_rhat': 'np.ndarray | None', 'topic_ess': 'np.ndarray | None', 'topic_alignment': 'np.ndarray | None', 'reference': 'int | None'}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

__dataclass_fields__ `class-attribute` ¶

__dataclass_fields__ = {'model': Field(name='model',type='str',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'inference': Field(name='inference',type='str | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'n_chains': Field(name='n_chains',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'n_draws': Field(name='n_draws',type='int',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_rhat': Field(name='loglik_rhat',type='float | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_ess': Field(name='loglik_ess',type='float | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'loglik_n': Field(name='loglik_n',type='int | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'topic_rhat': Field(name='topic_rhat',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'topic_ess': Field(name='topic_ess',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'topic_alignment': Field(name='topic_alignment',type='np.ndarray | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'reference': Field(name='reference',type='int | None',default=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,default_factory=<dataclasses._MISSING_TYPE object at 0x7f4a64521850>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)

doc `class-attribute` ¶

__doc__ = 'Multi-chain (Gelman-Rubin) diagnostics for a set of fitted Gibbs models.\n\n    Attributes\n    ----------\n    model : str\n        The model class name (all chains must share it).\n    inference : str or None\n        The model\'s inference engine from the registry.\n    n_chains : int\n        Number of chains compared.\n    n_draws : int\n        Retained ``theta_draws`` per chain used for the topic-level statistics.\n    loglik_rhat : float or None\n        R-hat of the (post-warmup) log-likelihood trace across chains -- the\n        permutation-invariant "did the chains agree?" headline. ``None`` when the\n        chains recorded no usable log-likelihood trace.\n    loglik_ess : float or None\n        Cross-chain effective sample size of the log-likelihood trace.\n    loglik_n : int or None\n        Post-warmup trace length per chain used for the log-likelihood statistics.\n    topic_rhat : numpy.ndarray or None\n        Per-topic R-hat of each aligned topic\'s per-draw prevalence, shape\n        ``(num_topics,)``. ``None`` when the chains retained no ``theta_draws``.\n    topic_ess : numpy.ndarray or None\n        Per-topic cross-chain ESS of the aligned topic prevalence.\n    topic_alignment : numpy.ndarray or None\n        Per-topic alignment quality: the minimum top-word Jaccard of the topic to\n        its reference-chain match across the other chains. Low values flag topics\n        whose R-hat compares topics that did not line up.\n    reference : int or None\n        Index of the chain used as the alignment reference.\n    '

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__match_args__ `class-attribute` ¶

__match_args__ = ('model', 'inference', 'n_chains', 'n_draws', 'loglik_rhat', 'loglik_ess', 'loglik_n', 'topic_rhat', 'topic_ess', 'topic_alignment', 'reference')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable's items.

If the argument is a tuple, the return value is the same object.

module `class-attribute` ¶

__module__ = 'topica.mcmc'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

weakref `property` ¶

__weakref__

list of weak references to the object

converged `property` ¶

converged

Whether every reported R-hat clears the conventional 1.01 threshold.

summary ¶

summary() -> str

A short human-readable table of the multi-chain diagnostics.

topica.rhat ¶

rhat(chains, *, split: bool = True, rank_normalize: bool = True) -> float

Gelman-Rubin R-hat (potential scale reduction) across MCMC chains.

Compares the variance between chains to the variance within each chain. At convergence the chains are draws from one distribution and R-hat -> 1; a value above roughly 1.01 means the chains have not mixed to a common target and the run needs more sweeps (Vehtari et al. 2021).

Parameters:

Name	Type	Description	Default
`chains`	`2-D array or sequence of 1-D chains`	`(n_chains, n_draws)`, or a list of per-chain traces. Unequal-length chains are truncated to the shortest. At least two chains are required.	required
`split`	`bool`	Split each chain in half before comparing (split-R-hat), so a single chain that has not stopped drifting is caught as two disagreeing halves.	`True`
`rank_normalize`	`bool`	Rank-normalize the pooled draws first (the improved, tail-robust R-hat). Set `False` for the classical Gelman-Rubin statistic on the raw scale.	`True`

Returns:

Type	Description
`float`	The R-hat statistic. `inf` if the chains are constant but disagree; `1.0` if they share a single constant value.

Held-out likelihood¶

Build a within-corpus word-heldout set — the analogue of R stm's make.heldout — and score it under a fitted model to get document-completion log-likelihood.

topica.make_heldout ¶

make_heldout(corpus, *, prop_docs=0.5, prop_words=0.5, seed=0)

Build a within-corpus word-heldout set (R stm's make.heldout).

We sample floor(prop_docs * D) documents and remove floor(prop_words * len(doc)) randomly chosen token positions from each. The remaining tokens stay in the corpus; the removed tokens form the heldout set. Fit a model on .documents and score it with :func:eval_heldout.

Documents too short to split (fewer than 2 tokens, or those for which the split would leave 0 retained or 0 held-out tokens) are silently skipped rather than raising an error; the sampled set may therefore be slightly smaller than floor(prop_docs * D).

Parameters:

Name	Type	Description	Default
`corpus`	a ``Corpus`` (its ``.documents()`` method is called), a list of	raw strings (split on whitespace), or a list of token lists.	required
`prop_docs`	`fraction of documents to sample; default 0.5.`		`0.5`
`prop_words`	`fraction of tokens to hold out per sampled document; default 0.5.`		`0.5`
`seed`	`numpy Generator seed for reproducibility.`		`0`

Returns:

Name	Type	Description
`A`	class:`Heldout` dataclass. Pass ``.documents`` to ``model.fit`` and
	the whole object to :func:`eval_heldout`.

topica.eval_heldout ¶

eval_heldout(model, heldout, *, seed=0)

Score held-out words from :func:make_heldout under a fitted model (R stm's eval.heldout).

We infer each sampled document's topic mixture from its retained tokens (heldout.documents[doc_index]) via the model's transform, then score the withheld tokens under p(w) = sum_k theta_k * phi[k, w].

Requires that model was fit on heldout.documents (the training corpus returned by :func:make_heldout). Works for any generative model that exposes transform and topic_word: LDA, DMR, CTM, STM, HDP, LabeledLDA, and SupervisedLDA. The keyword/anchored Gibbs models (keyATM, SeededLDA, SAGE, PA, PT) do not expose transform and so fall outside this diagnostic, and the embedding-cluster models (BERTopic, Top2Vec) define no document likelihood; both raise a clear error.

Parameters:

Name	Type	Default
`model`	a fitted generative model (must have been fit on ``heldout.documents``).	required
`heldout`	a :class:`Heldout` returned by :func:`make_heldout`.	required
`seed`	RNG seed for the Gibbs ``transform`` (variational models ignore it).	`0`

Returns:

Name	Type	Description
`A`	class:`HeldoutResult` dataclass. The headline metric is
	``.mean_per_doc_loglik``; higher (less negative) is better.

Estimator conformance¶

Check any fitted model or model class against the topica estimator contract; returns a list of violation strings (empty means fully conformant).

topica.check_conformance ¶

check_conformance(model_or_class) -> list[str]

Check model_or_class against the topica estimator contract.

Returns a list of violation strings. An empty list means the model satisfies every applicable tier requirement (or has a valid exemption). Does NOT look up KNOWN_GAPS or EXEMPT -- it reports all raw violations so the conformance test can categorize them. Call this from the test or from your own CI after adding a new estimator.

Parameters:

Name	Type	Description	Default
`model_or_class`	`an estimator instance or class.`		required

Returns:

Type	Description
`list of str`	Each entry is a human-readable description of the violation, e.g. `"missing class attribute: topic_names"`.

Reporting¶

Model-neutral summaries that work on any fitted model.

topica.plot_report ¶

plot_report(model, *, texts=None, timestamps=None, groups=None, n=8, coherence_type='c_v', title=None, figsize=None)

A one-figure overview of a fitted model, composed from topica's diagnostics.

Panels are adaptive: each is drawn only when its inputs and the model support it, so the report works across every model. Always included is the topic prevalence bar (mean doc_topic per topic, labelled with each topic's top words). Added when available:

topic quality — coherence vs exclusivity (the stm quality frontier); a windowed coherence_type is used when texts is given (raw strings or token lists are both accepted), else UMass;
topic correlation — the doc_topic correlation heatmap (K in 2..40);
topics over time — mean prevalence per distinct timestamps value;
topics per class — mean prevalence within each level of groups.

Returns a matplotlib Figure; save it with fig.savefig("report.png") or .pdf. Requires matplotlib (the only added dependency).

topica.topic_info ¶

topic_info(model, texts=None, *, n=8, labels=None) -> list

One summary row per topic — the headline table for a fitted model.

Each row is a dict with topic (id), label, size (hard assignments), prevalence (mean of the topic's doc_topic column), and top_words (the top-n words, via model.top_words when available else the raw topic-word row). When texts is given each row also carries representative_docs, its n highest-loading documents. On a clustering model with outliers a final topic=-1 row reports the outlier count and carries no words. Rows are sorted by topic id.

labels overrides the labels for this table only; otherwise :func:topic_labels (custom labels over topic_names) is used.

topica.topics_over_time ¶

topics_over_time(model, timestamps, *, normalize=True) -> dict

Mean topic prevalence at each distinct timestamp value.

timestamps is one value per document. For each distinct timestamp we average doc_topic over the documents stamped with it, giving a topic prevalence trajectory you can plot directly. With normalize=True each row is rescaled to sum to one (so it reads as a topic share at that time).

Returns {"labels": [sorted distinct timestamps], "prevalence": (T, K) array}.

topica.topics_per_class ¶

topics_per_class(model, groups, *, ci=0.95)

Mean topic prevalence within each level of a grouping variable.

A thin wrapper over :func:topica.by_strata on model.doc_topic: groups is one label per document, and the result is a list of per-stratum prevalence records (mean and confidence interval per topic).

topica.contrastive_topics ¶

contrastive_topics(model, texts, groups, *, prior=0.01, informative=False, min_count=5, n_words=10, group_order=None)

Which topics most separate two groups, and the words that shift inside each.

This is the topic-conditional extension of :func:topica.fighting_words. A plain Fighting Words contrast pools the whole corpus into two bags of words; here we hold the topic fixed and ask, within topic t, how the two groups word it differently. Each document's word counts are weighted by its responsibility for the topic (model.doc_topic[d, t]), the weighted counts are split by group, and the Monroe-Colaresi-Quinn z-score is computed per topic. We report two complementary signals, since a topic both groups use equally can still split sharply on how they word it:

usage_diff — mean doc_topic for group A minus group B: which topics one side simply talks about more.
vocab_shift — the root-mean-square topic-conditional z over the words it keeps: how much the two groups diverge in their wording of the topic.

Works on any fitted model that exposes doc_topic and vocabulary (LDA, STM, DMR, CTM, keyATM, ...). texts must be the same documents, in the same order, that produced model.doc_topic.

Parameters:

Name	Type	Description	Default
`model`	a fitted topica model with ``doc_topic`` (D x K) and ``vocabulary``.		required
`texts`	sequence of token lists (``list[list[str]]``), one per document,	aligned row-for-row with `model.doc_topic`.	required
`groups`	`sequence of one label per document. Must take exactly two distinct`	values (a binary contrast).	required
`prior`	`float`	Dirichlet pseudocount passed to the z-score; see :func:`fighting_words`.	`0.01`
`informative`	`bool`	Use Monroe et al.'s informative (frequency-scaled) prior.	`False`
`min_count`	`int`	Within a topic, ignore words whose responsibility-weighted count across both groups is below this. Keeps the per-topic word lists and `vocab_shift` from being dominated by near-zero-mass words.	`5`
`n_words`	`int`	How many distinctive words to return per side, per topic.	`10`
`group_order`	`(a, b)`	Fix which group is A (positive z, positive `usage_diff`). Defaults to the two labels sorted, so the result is deterministic.	`None`

Returns:

Type	Description
list[dict], one row per topic sorted by descending ``abs(usage_diff)``. Each
row has ``topic`` (id), ``name`` (effective label), ``a_label``/``b_label``
(the two groups), ``usage_diff``, ``leans`` (the label that uses the topic
more), ``vocab_shift``, and ``a_words``/``b_words`` (lists of ``(word, z)``,
`each side's most distinctive within-topic words).`

Diagnostics¶

One-call table¶

topica.diagnostics ¶

topica.perplexity ¶

Quality¶

topica.coherence ¶

ALIGN_IRRELEVANT_PROMPT module-attribute ¶

ALIGN_MISSING_PROMPT module-attribute ¶

DIVERSITY_PROMPT module-attribute ¶

DUPLICATE_PROMPT module-attribute ¶

INTRUSION_PROMPT module-attribute ¶

LABEL_PROMPT module-attribute ¶

LLM_EVAL_PROMPTS module-attribute ¶

OUTLIER_PROMPT module-attribute ¶

RATING_PROMPT module-attribute ¶

REPETITIVE_RATE_PROMPT module-attribute ¶

__annotations__ module-attribute ¶

__cached__ module-attribute ¶

__doc__ module-attribute ¶

__file__ module-attribute ¶

__name__ module-attribute ¶

__package__ module-attribute ¶

CoherenceCI ¶

__annotations__ class-attribute ¶

__dataclass_fields__ class-attribute ¶

__doc__ class-attribute ¶

__match_args__ class-attribute ¶

__module__ class-attribute ¶

__weakref__ property ¶

coherence ¶

coherence_ci ¶

topic_diversity ¶

topic_semantic_diversity ¶

exclusivity ¶

semantic_coherence ¶

word_intrusion ¶

document_intrusion ¶

llm_coherence ¶

llm_intrusion ¶

llm_select_k ¶

llm_outlier ¶

llm_repetitiveness ¶

llm_diversity ¶

llm_adversarial ¶

llm_alignment ¶

topica.coherence_ci ¶

topica.semantic_coherence ¶

topica.topic_diversity ¶

topica.topic_semantic_diversity ¶

topica.exclusivity ¶

topica.quality_frontier ¶

External validation¶

topica.agreement ¶

__cached__ module-attribute ¶

__doc__ module-attribute ¶

__file__ module-attribute ¶

__name__ module-attribute ¶

__package__ module-attribute ¶

agreement ¶

Interpretation¶

topica.label_topics ¶

topica.llm_topic_labels ¶

topica.llm_backend ¶

topica.topic_label_prompts ¶

topica.frex ¶

topica.mmr ¶

topica.relevance ¶

topica.find_thoughts ¶

topica.find_thoughts_html ¶

topica.topic_correlation ¶

topica.prepare_pyldavis ¶

Validation¶

topica.word_intrusion ¶

topica.document_intrusion ¶

LLM-based evaluation (topica.llm)¶

topica.llm.coherence ¶

topica.llm.intrusion ¶

topica.llm.select_k ¶

topica.llm.outlier ¶

topica.llm.repetitiveness ¶

ALIGN_IRRELEVANT_PROMPT `module-attribute` ¶

ALIGN_MISSING_PROMPT `module-attribute` ¶

DIVERSITY_PROMPT `module-attribute` ¶

DUPLICATE_PROMPT `module-attribute` ¶

INTRUSION_PROMPT `module-attribute` ¶

LABEL_PROMPT `module-attribute` ¶

LLM_EVAL_PROMPTS `module-attribute` ¶

OUTLIER_PROMPT `module-attribute` ¶

RATING_PROMPT `module-attribute` ¶

REPETITIVE_RATE_PROMPT `module-attribute` ¶

annotations `module-attribute` ¶

cached `module-attribute` ¶

doc `module-attribute` ¶

file `module-attribute` ¶

name `module-attribute` ¶

package `module-attribute` ¶

annotations `class-attribute` ¶

__dataclass_fields__ `class-attribute` ¶

doc `class-attribute` ¶

__match_args__ `class-attribute` ¶

module `class-attribute` ¶

weakref `property` ¶

cached `module-attribute` ¶

doc `module-attribute` ¶

file `module-attribute` ¶

name `module-attribute` ¶

package `module-attribute` ¶

LLM-based evaluation (`topica.llm`)¶

cached `module-attribute` ¶

doc `module-attribute` ¶

file `module-attribute` ¶

name `module-attribute` ¶

package `module-attribute` ¶

annotations `class-attribute` ¶

__dataclass_fields__ `class-attribute` ¶

doc `class-attribute` ¶

__match_args__ `class-attribute` ¶

module `class-attribute` ¶

weakref `property` ¶

annotations `class-attribute` ¶

__dataclass_fields__ `class-attribute` ¶

doc `class-attribute` ¶

__match_args__ `class-attribute` ¶

module `class-attribute` ¶

weakref `property` ¶

MCMC convergence (`topica.mcmc`)¶

annotations `class-attribute` ¶

__dataclass_fields__ `class-attribute` ¶

doc `class-attribute` ¶

__match_args__ `class-attribute` ¶

module `class-attribute` ¶

weakref `property` ¶

annotations `class-attribute` ¶

__dataclass_fields__ `class-attribute` ¶

doc `class-attribute` ¶

__match_args__ `class-attribute` ¶

module `class-attribute` ¶

weakref `property` ¶

converged `property` ¶