Fits the model across a range of K and reports diagnostics for choosing it:
held-out likelihood (document completion), semantic coherence, exclusivity,
and the variational bound. Unlike stm::searchK, the per-K fits parallelize
across K (a long-standing request, bstewart/stm#262) and each fit is itself
fast (Rust), so a sweep that took minutes takes seconds.
Usage
search_k(
corpus,
K,
prevalence = NULL,
content = NULL,
heldout = TRUE,
proportion = 0.5,
residuals = FALSE,
cores = 1L,
M = 10L,
seed = 1L,
measure = c("mimno", "npmi", "c_v"),
verbose = FALSE,
...
)Arguments
- corpus
A
faSTM_corpus(fromas_corpus()).- K
Integer vector of topic counts to try.
- prevalence, content
Optional covariate formulas (see
stm()).- heldout
Logical; compute held-out likelihood via document completion.
- proportion
Held-out token fraction (passed to
make_heldout()).- cores
Number of K-fits to run in parallel (forked; 1 = sequential). When
cores > 1each fit runs single-threaded to avoid oversubscription; whencores == 1each fit uses all cores.- M
Top words for coherence/exclusivity.
- seed
RNG seed (held-out split + fits).
- ...
Passed to
stm()(e.g.max.em.its,init.type).