Skip to contents

Fitting a model

Fit a structural topic model and build the prevalence design.

stm()
Fit a structural topic model (fast Rust backend, stm-compatible object)
s()
Spline term for prevalence formulas
makeDesignMatrix()
Build a (sparse) design matrix for new data (stm-compatible)

Covariate effects

Honest effect estimation (method of composition) with weights, cluster-robust SEs, and random effects; marginal effects and effect plots.

estimateEffect()
Estimate covariate effects on topic prevalence (method of composition)
ame()
Average marginal effects from an estimateEffect fit
effect_estimates()
Extract estimateEffect estimates as a tidy data.frame (no plotting)
posterior_theta_samples()
Draw from the per-document topic-proportion posterior
plot(<faSTM_effect>)
Plot estimated covariate effects on topic prevalence

Inspecting topics

Labels, representative documents, FREX, and topic correlations.

label_topics()
Label topics by top words (prob, FREX, lift, score)
sage_labels()
Labels for a content (SAGE) model
find_thoughts()
Representative documents for each topic
find_topic()
Find topics whose top words include given words
topic_terms()
Top terms per topic, with their numeric scores (tidy)
topic_proportions()
Expected topic proportions (the numbers behind the summary plot)
content_topics()
Marginal content words by one content covariate
frex_scores()
FREX scores for every word and topic
topic_correlation()
Topic correlation graph (positive correlations of topic proportions)
topic_corr_graph()
Topic-correlation network as an igraph graph
plot(<faSTM>)
Plot a fitted model
plot_topic_network()
Topic correlation network

Topic quality

Semantic coherence (Mimno / NPMI / C_V), exclusivity, diagnostics.

coherence()
Topic coherence (Mimno / NPMI / c_v)
semantic_coherence()
Semantic coherence (Mimno et al. 2011)
exclusivity()
Topic exclusivity (FREX-summary, frexw default 0.7)
check_residuals()
Residual dispersion check (is K large enough?)

Choosing the number of topics

Held-out evaluation and model selection across K.

search_k()
Search over the number of topics K
select_model()
Fit several models and keep the ones on the quality frontier
select_best()
Pick one model from a select_model run
many_topics()
Select models across a range of K
multi_stm()
Cross-run topic stability
make_heldout()
Create a held-out version of a corpus for document-completion validation
eval_heldout()
Evaluate held-out log-likelihood of a fit on a held-out set
permutation_test()
Permutation test for a binary covariate's effect on topics
topic_lasso()
Predict a document-level outcome from topic proportions (lasso)
plot(<faSTM_searchk>)
Plot search_k diagnostics
as.data.frame(<faSTM_searchk>)
Convert search_k diagnostics to long form for plotting

Out-of-sample inference

Infer topic proportions for new documents.

fit_new_documents()
Infer topic proportions for new documents
predict(<faSTM>)
Predict topic proportions for new documents

Tidy (broom) interface

tidy(<faSTM>)
Tidy a faSTM fit (topic-term or document-topic distributions)
tidy(<faSTM_effect>)
Tidy an estimateEffect fit (one row per term per topic)
glance(<faSTM>)
One-row model summary for a faSTM fit
augment(<faSTM>)
Augment: most-likely topic for each document-term token
reexports tidy glance augment
Objects exported from other packages

Corpus & text preparation

Read prepared text from quanteda / tidytext and convert corpora.

as_corpus()
Build a faSTM corpus from prepared text
align_corpus()
Align a new corpus to a fitted model's vocabulary
from_tidy()
Build a faSTM corpus from a tidy (long) term-count table
make_dt()
Document-topic proportions as a data frame
read_ldac() write_ldac()
Read/write a corpus in LDA-C (Blei) sparse format

Datasets

poliblog
CMU 2008 Political Blog Corpus (poliblog5k)
congress
U.S. Congressional Speeches (Party x Chamber, 1987-2011)

stm-compatibility shims

Aliases that keep stm-style call sites working unmodified.

alignCorpus()
Align a new corpus to a reference vocabulary (stm-compatible)
asSTMCorpus()
Coerce inputs into an stm-style corpus (stm-compatible)
convertCorpus()
Convert documents/vocab between corpus formats (stm-compatible)
fitNewDocuments()
Infer topics for new documents (stm-compatible signature)
checkBeta()
Flag words that load almost entirely on one topic