Skip to content

Dynamic keyATM: U.S. Supreme Court opinions

Source. Eshima, S., Imai, K., & Sasaki, T. (2024). Keyword-Assisted Topic Models. American Journal of Political Science, 68(2), 730–750. doi:10.1111/ajps.12779. Replication data: Harvard Dataverse, doi:10.7910/DVN/RKNNVL.

The paper's third application fits a dynamic keyATM to U.S. Supreme Court opinions and shows that anchoring topics with keywords yields more interpretable topics and better document classification than weighted LDA, while a hidden Markov model tracks how topic prevalence shifts over time. We refit that analysis with topica's KeyATM and compare against the paper's published tables.

The corpus is 17,245 opinions from 1946 to 2012, drawn from the Supreme Court Database, with a human-coded primary topic per opinion and a keyword list per topic. We use the same 14 topics, a 5-state HMM (the paper's choice), and the default information-theory token weighting.

Fit

import topica

# `docs` are the tokenized opinions; `years` is the filing year of each opinion;
# `seeds` maps each of the 14 topic labels to its keyword list.
model = topica.KeyATM(seeds, num_topics=14, seed=2020)
model.fit(docs, timestamps=years, num_states=5, iters=1000, num_threads=8)

The documents are sorted by year internally, so doc_topic returns in the original order and time_prevalence follows time_labels (1946 … 2012).

Same data

Before comparing model output, we confirm we are on the same corpus. topica's gold-label counts match the paper's Table 5 exactly, every topic:

Topic topica count paper Table 5
Criminal procedure 4,268 4,268
Economic activity 3,062 3,062
Civil rights 2,855 2,855
Judicial power 1,964 1,964
First amendment 1,795 1,795
Due process 738 738
Federalism 720 720
Unions 664 664

Topic interpretability (paper Table 6)

The estimated top words match the paper's keyATM column nearly word for word:

Topic topica paper keyATM
Criminal procedure trial jury defendant evidence criminal sentence petitioner conviction trial jury defendant evidence criminal sentence petitioner right conviction counsel
First amendment public amendment first speech government religious right may public amendment first speech government may interest right political religious
Unions employee union labor employer board agreement employment contract employee union labor employer board agreement party contract employment bargaining
Federal taxation tax property income pay payment amount fund benefit tax property benefit income interest …

Our Privacy topic (search officer police arrest warrant amendment fourth seizure) is a clean Fourth-Amendment cluster, if anything tighter than the paper's.

Document classification (paper Figure 4)

The paper measures classification by the area under the ROC curve for each topic's posterior topic proportion predicting the human label, and reports that keyATM beats weighted LDA on every topic except Privacy. topica's per-topic AUROC averages 0.79 (median 0.80), and Privacy is the lone weak topic at 0.57, the same exception the paper singles out and attributes to uninformative Privacy keywords.

Strong (AUROC ≥ 0.88) Middle Weak
Unions 0.98, Federal taxation 0.98, First amendment 0.95, Miscellaneous 0.93, Criminal procedure 0.90, Attorneys 0.89 Judicial power 0.82, Civil rights 0.79, Interstate relations 0.75 Due process 0.67, Federalism 0.66, Economic activity 0.62, Privacy 0.57

Single-label classification accuracy is 0.43, about six times the 0.07 chance rate over 14 topics. (Private action has only three labeled opinions and carries no signal, so its AUROC is not meaningful.)

Direct comparison with R keyATM

Matching the published tables is one bar. A stronger one is matching the reference software's own output. We fit R's keyATM and topica's KeyATM on the same documents with the same settings (14 topics, 5 states, 300 sweeps, seed 2020, information-theory weighting), topica single-threaded so both run the exact sequential sampler, and compare directly. Because keywords pin each topic to a fixed label in both engines, topic k is the same topic in both, so we compare them position by position.

The two are different samplers in different languages with different random number streams, so they cannot agree bit for bit. They agree to the level that matters:

  • Topic word distributions. Mean cosine 0.93 across the 14 topics (median 0.93, minimum 0.86). The estimated topics are the same.
  • Classification. Mean AUROC 0.82 in R and 0.82 in topica, with a mean absolute per-topic difference of 0.017. The largest gap, 0.055, is on Interstate relations, which has only 119 labeled opinions.

This is what "topica reproduces keyATM" means in practice: hand both engines the same corpus and they return the same topics and the same classification performance.

Time trend

The HMM partitions 1946–2012 into regimes and recovers the expected movements in the docket: Judicial power falls steeply across the period, Economic activity and Unions rise, and Criminal procedure and First amendment decline. time_prevalence gives the smoothed proportion per year and time_state the regime each year occupies.

What differs

Two differences are worth stating plainly. First, this is a single chain of 1,000 sweeps, where the paper averages five chains of 3,000; the topics and the AUROC ordering are stable, but the exact numbers will move at the third decimal across seeds. Second, the change-point HMM makes prevalence piecewise-constant within a regime, so a year-by-year curve has steps at the estimated change points rather than the fully smooth path some readers may expect.

Speed

Per-sweep cost on this corpus (22.4M weighted tokens, 14 topics, 5 states), measured at steady state:

Engine s/sweep vs R
R keyATM (single-thread) 1.8 1.0×
topica (single-thread) 1.7 1.05× faster
topica (8 threads) 0.47 3.8× faster

topica matches R single-threaded and pulls ahead with threads. The single-thread parity came from one change: the sampler had been probing a per-topic hash map for every keyword topic on every token, hundreds of millions of lookups per sweep, which an inverted word-to-topic index removes without changing the sampled distribution. R's keyATM has no multi-threading, so the 8-thread column is the margin available on a multi-core machine.