//readium-shared/org.readium.r2.shared.util.tokenizer
Package-level declarations¶
Types¶
Name | Summary |
---|---|
DefaultTextContentTokenizer | [androidJvm] class DefaultTextContentTokenizer : Tokenizer<String, IntRange> A default cluster TextTokenizer taking advantage of the best capabilities of each Android version. |
IcuTextTokenizer | [androidJvm] @RequiresApi(value = 24) class IcuTextTokenizer(language: Language?, unit: TextUnit) : Tokenizer<String, IntRange> Implementation of a TextTokenizer using ICU components to perform the actual tokenization while taking into account languages specificities. |
NaiveTextTokenizer | [androidJvm] class NaiveTextTokenizer(unit: TextUnit) : Tokenizer<String, IntRange> A naive Tokenizer relying on java.text.BreakIterator to split the content. |
TextTokenizer | [androidJvm] typealias TextTokenizer = Tokenizer<String, IntRange> A tokenizer splitting a String into range tokens (e.g. words, sentences, etc.). |
TextUnit | [androidJvm] enum TextUnit : Enum<TextUnit> A text token unit which can be used with a TextTokenizer. |
Tokenizer | [androidJvm] fun interface Tokenizer<D, T> A tokenizer splits a piece of data D into a list of T tokens. |