faSTM does not do its own tokenization — it reads an already-prepared
document-term representation from the tools the field already uses
(quanteda, tidytext) or a plain sparse matrix. as_corpus() normalizes
any of these into the structure stm() consumes, dropping empty documents
and re-indexing the vocabulary, with metadata kept aligned.
Arguments
- x
A
quantedadfm, a document-termMatrix/matrix (documents in rows, terms in columns, withcolnames), or an existingfaSTM_corpus. For a tidy (long) term table usefrom_tidy().- meta
Optional data.frame of document metadata, one row per document, aligned to
x. For adfm, defaults toquanteda::docvars(x).- ...
Unused.