Skip to content

//readium-shared/org.readium.r2.shared.util.tokenizer

Package-level declarations

Types

Name Summary
DefaultTextContentTokenizer [androidJvm]
class DefaultTextContentTokenizer : Tokenizer<String, IntRange>
A default cluster TextTokenizer taking advantage of the best capabilities of each Android version.
IcuTextTokenizer [androidJvm]
@RequiresApi(value = 24)
class IcuTextTokenizer(language: Language?, unit: TextUnit) : Tokenizer<String, IntRange>
Implementation of a TextTokenizer using ICU components to perform the actual tokenization while taking into account languages specificities.
NaiveTextTokenizer [androidJvm]
class NaiveTextTokenizer(unit: TextUnit) : Tokenizer<String, IntRange>
A naive Tokenizer relying on java.text.BreakIterator to split the content.
TextTokenizer [androidJvm]
typealias TextTokenizer = Tokenizer<String, IntRange>
A tokenizer splitting a String into range tokens (e.g. words, sentences, etc.).
TextUnit [androidJvm]
enum TextUnit : Enum<TextUnit>
A text token unit which can be used with a TextTokenizer.
Tokenizer [androidJvm]
fun interface Tokenizer<D, T>
A tokenizer splits a piece of data D into a list of T tokens.