R package for (almost) language-agnostic sentence tokenization
sentenceR is a language-agnostic utility designed for sentence tokenization of raw text. Using the UDPipe POS tagging pipeline, the package automatically extracts sentences with their appropriate indexes (hence the “crowbar” logo as a reference to extraction). The package works with any of the 100+ language models natively provided by UDPipe package (for more information and installation instructions, see GitHub repository).
R package for bootstrapping wrodscores models
bws is a bootstrapping utility designed for stabilizing scaling scores across different reference documents. Built on top of quanteda’s wordscores function, the package automatically scales multiple wordscores models using user-defined pairs of reference documents and averages the results as stabilized scaling scores (for more information and installation instructions, see GitHub repository).