A drop-in replacement for stm::stm()'s fitting step. Accepts the same
documents / vocab / prevalence / content inputs, fits with topica's
Rust core, and returns an object compatible with the stm package so that
stm::labelTopics(), stm::plot.STM(), stm::findThoughts(),
stm::sageLabels(), and stm::toLDAvis() work unmodified. Use
estimateEffect() from this package for covariate effects that propagate
topic-estimation uncertainty.
Usage
stm(
documents,
vocab,
K,
prevalence = NULL,
content = NULL,
data = NULL,
max.em.its = 500L,
emtol = 1e-05,
init.type = c("Spectral", "Random", "LDA", "Custom"),
init.beta = NULL,
model = NULL,
gamma.prior = c("Pooled", "L1"),
gamma.l1.alpha = 0.001,
sigma.prior = 0,
seed = 1L,
inference = c("batch", "svi"),
batch_size = 256L,
tau = 64,
kappa = 0.7,
num_threads = 0L,
verbose = TRUE,
...
)Arguments
- documents
stm-format documents: a named list of
2 x n_dinteger matrices (row 1 = 1-based word id intovocab, row 2 = count). Produced bystm::prepDocuments().- vocab
Character vector of vocabulary terms.
- K
Number of topics.
- prevalence
A right-hand-side formula (e.g.
~ treatment + s(age)) or a design matrix; topic prevalence covariates.datasupplies the variables.- content
A right-hand-side formula naming a single categorical variable, or a factor; the SAGE content covariate.
datasupplies the variable.- data
A data.frame of document metadata (the
metafromstm::prepDocuments()), aligned todocuments.- max.em.its
Maximum EM iterations (batch) / epochs (svi).
- emtol
Relative-bound convergence tolerance.
- init.type
Topic initialization:
"Spectral"(stm's default),"Random","LDA"(seed from a quick CVB0 LDA, like stm's collapsed-Gibbs init), or"Custom"(seed frominit.betaor a suppliedmodel).- init.beta
Optional K x V topic-word probability matrix to start the fit from a given initialization (overrides
init.type). Supplying Rstm's exact spectral beta here reproduces that run — a guaranteed "replicate the original" mode (topica #234/#235).- model
A fitted model whose topic-word matrix seeds
init.type = "Custom".- gamma.prior
Prevalence-coefficient prior:
"Pooled"(ridge, stm default) or"L1".- sigma.prior
Shrinkage applied to the topic covariance off-diagonal.
- seed
Integer seed (batch fit is reproducible from it).
- inference
"batch"(default, parity-validated) or"svi"(stochastic variational; scales to large corpora — requires a topica build with STM-SVI).- batch_size, tau, kappa
SVI controls (minibatch size; Robbins-Monro
(tau + t)^(-kappa)step schedule). Ignored wheninference = "batch".- num_threads
Worker threads for the parallel variational E-step.
0(default) uses all cores;>= 1pins a scoped pool. Results are identical regardless of thread count.- verbose
Logical; print progress.