Package-level declarations

Types

Link copied to clipboard

A default cluster TextTokenizer taking advantage of the best capabilities of each Android version.

Link copied to clipboard
@RequiresApi(value = 24)
class IcuTextTokenizer(language: Language?, unit: TextUnit) : Tokenizer<String, IntRange>

Implementation of a TextTokenizer using ICU components to perform the actual tokenization while taking into account languages specificities.

Link copied to clipboard

A naive Tokenizer relying on java.text.BreakIterator to split the content.

Link copied to clipboard

A tokenizer splitting a String into range tokens (e.g. words, sentences, etc.).

Link copied to clipboard

A text token unit which can be used with a TextTokenizer.

Link copied to clipboard
fun interface Tokenizer<D, T>

A tokenizer splits a piece of data D into a list of T tokens.