keyATM toolkit¶
keyATM-specific workflow helpers live in topica.keyatm. The general post-hoc
diagnostics (top words, representative documents, coherence, pyLDAvis, covariate
effects, …) are model-agnostic and work on a fitted KeyATM directly: see the
Diagnostics and STM toolkit pages.
The fitted model also exposes the convergence trace keyATM's plot_modelfit
reports, as KeyATM.log_likelihood_history — a list of (iteration, per-token
log-likelihood) pairs (perplexity is exp(-log_likelihood)).
topica.keyatm.top_topics ¶
The n most prevalent topics in each document (≈ keyATM::top_topics).
Returns a list (one per document) of (topic_name, proportion) pairs,
sorted by descending document-topic proportion. Pass a fitted
:class:~topica.KeyATM (topic names are taken from it) or a raw theta
array.
topica.keyatm.by_strata ¶
Mean topic prevalence within each level of a document covariate
(≈ keyATM::by_strata_DocTopic).
Splits documents by their value in strata (one label per document) and,
for each level, reports the mean of each topic's proportion with a
normal-approximation confidence interval on that mean. This is keyATM's
descriptive answer to "how does topic prevalence differ across groups"; for
a regression with uncertainty propagated from the topic estimates, use
:func:topica.stm.estimate_effect with posterior draws instead.
Returns a list of :class:StrataPrevalence, one per unique stratum (sorted).
[s.as_dict() for s in result] builds a table.
topica.keyatm.visualize_keywords ¶
Corpus frequency of each keyword (≈ keyATM::visualize_keywords).
For every keyword in every set, reports how common it is in docs so you
can catch keywords that are too rare to anchor a topic or so frequent they
dominate it — the diagnostic keyATM asks you to run before fitting.
Returns a dict mapping each keyword-set name to a list of dicts
{"keyword", "count", "proportion", "doc_freq"} sorted by descending
proportion, where proportion is the keyword's share of all corpus tokens
and doc_freq is the number of documents containing it.
topica.keyatm.refine_keywords ¶
Drop keywords too rare to anchor a topic (≈ keyATM::refine_keywords).
Removes any keyword whose corpus count is below min_count or whose
document frequency is below min_doc_freq (so out-of-vocabulary keywords,
with count 0, always go). Keyword sets that end up empty are dropped, since
a keyword topic needs at least one surviving keyword.
Returns (refined, dropped) where refined is the cleaned keyword dict
and dropped maps each set name to the list of removed keywords. Set
verbose=True to print a short report.