Benchmarks¶
How fast topica is, measured honestly. Every number here is fit time only (model construction and import excluded), on fixed-seed synthetic corpora, on one machine (Apple M-series, 14 cores), and reproducible with the command shown. Speed depends on corpus size, vocabulary, the number of topics, and hardware, so read these as orders of magnitude, not guarantees.
STM vs R stm¶
This is the comparison that matters for social scientists. R's stm is the
field standard, and a fit you wait minutes for in R runs in seconds in topica.
Both engines run the same number of EM iterations from a spectral
initialization (R with emtol=0 so it does not stop early), so this measures
per-iteration cost, not time to convergence.
All cores (topica parallelizes the variational E-step; R stm is single-threaded):
| docs | vocab | K | topica | R stm |
speedup |
|---|---|---|---|---|---|
| 1,000 | 500 | 10 | 0.14s | 3.16s | 22.5× |
| 2,000 | 2,000 | 10 | 0.49s | 6.60s | 13.5× |
| 5,000 | 5,000 | 20 | 2.75s | 26.9s | 9.8× |
Single core (apples-to-apples, RAYON_NUM_THREADS=1):
| docs | vocab | K | topica | R stm |
speedup |
|---|---|---|---|---|---|
| 1,000 | 500 | 10 | 0.50s | 3.03s | 6.0× |
| 2,000 | 2,000 | 10 | 1.44s | 6.51s | 4.5× |
| 5,000 | 5,000 | 20 | 8.97s | 26.3s | 2.9× |
So topica is roughly 3 to 6 times faster single-threaded and 10 to 23 times on
all cores, and it produces the same fit (the content and prevalence models are
validated against R stm). Reproduce:
python benchmarks/bench_stm.py # all cores
RAYON_NUM_THREADS=1 python benchmarks/bench_stm.py # single core
What this is, and is not
Per-iteration fit time, not time to convergence (the two engines may need a
different number of iterations to converge). One machine, synthetic corpora.
R stm is single-threaded by design; the all-cores column is topica's
automatic parallelism, which is the speed you actually get.
LDA: MALLET's algorithm without the JVM¶
topica's LDA is MALLET's SparseLDA collapsed-Gibbs sampler, reproduced
bit-for-bit (it matches MALLET's train output exactly). Against R, JVM MALLET,
and pure-Python gensim, that is a large speedup with no JVM startup.
Against tomotopy, a C++/SIMD library in the same performance tier, plain LDA is a wash, and which one wins depends on threading. 200 Gibbs iterations, fit time only.
Single core (exact, num_threads=1 / workers=1): tomotopy's tighter inner
loop is about 20% ahead.
| docs | vocab | K | topica | tomotopy |
|---|---|---|---|---|
| 2,000 | 1,000 | 20 | 1.84s | 1.59s |
| 5,000 | 2,000 | 50 | 6.58s | 5.37s |
| 10,000 | 3,000 | 50 | 14.2s | 11.3s |
All cores (both use approximate parallel Gibbs): topica's document-partitioned parallelism scales better at these sizes, so it pulls even or slightly ahead.
| docs | vocab | K | topica | tomotopy |
|---|---|---|---|---|
| 2,000 | 1,000 | 20 | 0.49s | 0.73s |
| 5,000 | 2,000 | 50 | 1.54s | 1.82s |
| 10,000 | 3,000 | 50 | 2.83s | 2.73s |
We report this straight: for plain LDA the two are interchangeable on speed. topica's advantage is the STM, covariate-effect, and diagnostics stack built around the sampler, not raw LDA throughput.
keyATM vs R keyATM¶
topica's keyATM reproduces the R package's keyword-assisted model and is
validated against it: the same keyword topics,
the same per-sweep asymmetric-α estimation, the same model_fit log-likelihood.
On speed it matches R's C++ sampler single-threaded and adds a
document-partitioned parallel sweep that R has no equivalent of. Same keywords,
same number of Gibbs sweeps, α learned each sweep on both sides; fit time only.
| docs | vocab | K | sweeps | topica (1 core) | topica (4 cores) | R keyATM |
|---|---|---|---|---|---|---|
| 2,000 | 2,632 | 10 | 1,000 | 25.9s | 12.1s | 24.5s |
So topica is at parity with R single-threaded and about 2× faster on four
cores. If you do not need the R-matching asymmetric prior, estimate_alpha=False
fixes a symmetric α and skips the per-sweep slice sampler for a further 15 to 20%
(more at larger K). This row, with the STM and LDA comparisons above, is
reproducible in one command:
Large-K sampling: SparseLDA vs LightLDA¶
LightLDA's
O(1)-per-token alias sampler is built for very large K. At the corpus sizes
typical of social science, SparseLDA stays faster, because its buckets remain
sparse; LightLDA's flatter scaling in K only pulls ahead past roughly K ≈ 1,000.
Use the default sampler="sparse" unless you have a specific large-K reason.
Coherence¶
c_v and the other windowed coherence measures are computed in the Rust core,
counting only the word pairs within a topic's top-N rather than a full
vocabulary-by-vocabulary matrix. A 500-topic c_v that took minutes in a
pure-Python loop now takes a fraction of a second, which is what makes coherence
practical for model selection at large K.