Skip to content

keyATM toolkit

keyATM-specific workflow helpers live in topica.keyatm. The general post-hoc diagnostics (top words, representative documents, coherence, pyLDAvis, covariate effects, …) are model-agnostic and work on a fitted KeyATM directly: see the Diagnostics and STM toolkit pages.

The fitted model also exposes the convergence trace keyATM's plot_modelfit reports, as KeyATM.log_likelihood_history — a list of (iteration, per-token log-likelihood) pairs (perplexity is exp(-log_likelihood)).

topica.keyatm.top_topics

top_topics(model_or_theta, *, n=2, topic_names=None)

The n most prevalent topics in each document (≈ keyATM::top_topics).

Returns a list (one per document) of (topic_name, proportion) pairs, sorted by descending document-topic proportion. Pass a fitted :class:~topica.KeyATM (topic names are taken from it) or a raw theta array.

topica.keyatm.by_strata

by_strata(model_or_theta, strata, *, ci=0.95, topic_names=None)

Mean topic prevalence within each level of a document covariate (≈ keyATM::by_strata_DocTopic).

Splits documents by their value in strata (one label per document) and, for each level, reports the mean of each topic's proportion with a normal-approximation confidence interval on that mean. This is keyATM's descriptive answer to "how does topic prevalence differ across groups"; for a regression with uncertainty propagated from the topic estimates, use :func:topica.stm.estimate_effect with posterior draws instead.

Returns a list of :class:StrataPrevalence, one per unique stratum (sorted). [s.as_dict() for s in result] builds a table.

topica.keyatm.visualize_keywords

visualize_keywords(docs, keywords)

Corpus frequency of each keyword (≈ keyATM::visualize_keywords).

For every keyword in every set, reports how common it is in docs so you can catch keywords that are too rare to anchor a topic or so frequent they dominate it — the diagnostic keyATM asks you to run before fitting.

Returns a dict mapping each keyword-set name to a list of dicts {"keyword", "count", "proportion", "doc_freq"} sorted by descending proportion, where proportion is the keyword's share of all corpus tokens and doc_freq is the number of documents containing it.

topica.keyatm.refine_keywords

refine_keywords(docs, keywords, *, min_count=2, min_doc_freq=1, verbose=False)

Drop keywords too rare to anchor a topic (≈ keyATM::refine_keywords).

Removes any keyword whose corpus count is below min_count or whose document frequency is below min_doc_freq (so out-of-vocabulary keywords, with count 0, always go). Keyword sets that end up empty are dropped, since a keyword topic needs at least one surviving keyword.

Returns (refined, dropped) where refined is the cleaned keyword dict and dropped maps each set name to the list of removed keywords. Set verbose=True to print a short report.