readium-shared/org.readium.r2.shared.publication.services.content/TextContentTokenizer

TextContentTokenizer

class TextContentTokenizer(language: Language?, overrideContentLanguage: Boolean = false, contextSnippetLength: Int = 50, textTokenizerFactory: (Language?) -> TextTokenizer) : ContentTokenizer

A ContentTokenizer using a TextTokenizer to split the text of the Content.Element into smaller portions.

Parameters

contextSnippetLength

Length of before and after snippets in the produced Locators.

overrideContentLanguage

If true, let language override language information that could be available in content. If false, language will be used only as a default when there is no data-specific information.

Constructors

TextContentTokenizer

constructor(language: Language?, overrideContentLanguage: Boolean = false, contextSnippetLength: Int = 50, textTokenizerFactory: (Language?) -> TextTokenizer)

constructor(language: Language?, unit: TextUnit, overrideContentLanguage: Boolean = false)

A ContentTokenizer using the default TextTokenizer to split the text of the Content.Element.

Functions

tokenize

open override fun tokenize(data: Content.Element): List<Content.Element>