TextContentTokenizer

constructor(language: Language?, overrideContentLanguage: Boolean = false, contextSnippetLength: Int = 50, textTokenizerFactory: (Language?) -> TextTokenizer)

Parameters

contextSnippetLength

Length of before and after snippets in the produced Locators.

overrideContentLanguage

If true, let language override language information that could be available in content. If false, language will be used only as a default when there is no data-specific information.


constructor(language: Language?, unit: TextUnit, overrideContentLanguage: Boolean = false)

A ContentTokenizer using the default TextTokenizer to split the text of the Content.Element.