Introduction of a new publication service providing a way to search an excerpt through the content of a publication.
Being able to search through a publication’s content is a useful feature, often expected by end users. We can offer a unified API for the wide variety of publication formats supported by Readium to make it easy for reading apps to implement such feature.
To ensure interoperability, this new Search Service will use the Locator model. This means that a mobile or desktop app could – with the same code – display a search interface for a remote Web Publication, if the Publication Server implements the proper Search Web Service.
Search can be implemented in many different ways, so being able to switch implementations without touching the UX layer would be valuable. For example, a reading app might want to use a full-text search database to improve search performance and search across multiple publications in the user bookshelf.
To begin a search interaction, call Publication::search()
with a text query. It returns a search iterator which can be used to crawl through Locator
occurrences, with next()
.
let searchIterator: SearchIterator = await publication.search("orange")
let page1: LocatorCollection = await searchIterator.next()
navigator.go(page1.locators[0])
The search iterator may also provide the total number of occurrences with resultCount
. This property is optional because it might not be available with all search algorithms.
"Found ${searchIterator.resultCount} occurrences"
A plain search can be an expensive operation. To keep the resource usage under control, the search results are paginated thanks to the search iterator. You can move forward in the pages with the next()
function, which returns a LocatorCollection
object.
One of the usual ways to present the results is as a scrollable list of occurrences. You can use next()
to implement the infinite scroll pattern by loading the next page of results when the user reaches the end of the scroll view.
After reaching the end of the publication, any subsequent call to next()
will return null
.
You don’t have any control over the number of items returned in a page. This depends on the implementation of the Search Service used. For example, a full-text search might return a constant number of locators per page, while a plain crawling search could return one page per publication resource.
The Search Service might keep some resources allocated for your search query, such as a cursor. To make sure they are recovered when the user is done with the search, do not forget to call close()
on the search iterator.
searchIterator.close()
Depending on the search algorithm, the Search Service might be able to offer options to customize how results are found. Query publication.searchOptions
to know which options are available for the publication.
When searching for a query, you can customize some of the supported search options.
let searchIterator = publication.search("recette kiwi", options: SearchOptions(
caseSensitive: false,
language: "fr",
])
Each option has an associated value – such as a boolean – to determine its action. The options in publication.searchOptions
will have the default values for the service. If you omit an option from the search query, its default value will be used.
You should adapt the user interface according to the available search options and their default value.
diacriticCheckbox.visible =
publication.searchOptions.diacriticSensitive ?: false
This new proposal does not impact any existing API. The Kotlin toolkit already provides a search feature implemented with mark.js, but its code is entirely in the test app, so out of scope for R2 modules. Reading apps are free to keep the implementation using mark.js and ignore the new Search Service.
SearchService
Interface (implements Publication.Service
)options: SearchOptions
search(query: String, options: SearchOptions? = null) -> SearchIterator
query
.search()
, its value is assumed to be the default one.SearchIterator
used to crawl through the results, or an error if the search could not be handled.Publication
HelpersisSearchable: Boolean = findService(SearchService::class) != null
searchOptions: SearchOptions = findService(SearchService::class)?.options ?: SearchOptions()
search(query: String, options: SearchOptions? = null) -> SearchIterator = findService(SearchService::class).search(query, options)
query
.SearchOptions
ClassHolds the available search options and their current values.
caseSensitive: Boolean?
(JSON: case-sensitive
)
diacriticSensitive: Boolean?
(JSON: diacritic-sensitive
)
wholeWord: Boolean?
(JSON: whole-word
)
exact: Boolean?
(JSON: exact
)
language: String?
(JSON: language
)
regularExpression: Boolean?
(JSON: regex
)
otherOptions: Map<String, String>
Custom options can be declared by a Search Service in otherOptions
. Such extensions should use a reverse domain name notation (e.g. com.company.x
) as JSON key to avoid conflicts.
SearchIterator
InterfaceIterates through search results.
resultCount: Int?
next()
.next() -> LocatorCollection?
null
when reaching the end of the publication, or an error in case of failure.close()
LocatorCollection
ObjectRepresents a sequential list of Locator
objects. For example, a search result or a list of positions.
metadata
title: LocalizedTitle?
– A user-facing title representing this collection of locators.numberOfItems: Int?
– Indicates the total number of locators in the collection.otherMetadata: [String: Any]
– Additional metadata for extensions, as a JSON dictionary.links: [Link]
locators: [Locator]
Example implementations which should be provided by the Readium toolkits.
StringSearchService
A rather naive implementation iterating over each resource of the publication and searching into the sanitized text content.
WebSearchService
A facade to the JSON Web Service described in the following section.
search
Route/~readium/search{?query}
query
is the percent-encoded text query to search.search
application/vnd.readium.locators+json
OPTIONS
ResponseWhen using the OPTIONS
HTTP method, without any query parameters, the server returns the supported search options as a JSON object.
OPTIONS https://publication-server.com/search
{
"options": {
"case-sensitive": false,
"diacritic-sensitive": false,
"com.company.regex-type": "perl"
}
}
GET
ResponseThe GET
HTTP method is used to perform the search. It expects the query
parameter as well as one additional parameter per custom option, for example:
GET https://publication-server.com/search?query=orange&case-sensitive=1&com.company.regex-type=icu
A valid Search Web Service must support integer representations for boolean query options.
Status Code | Description | Format |
---|---|---|
200 |
Returns the first page of results | LocatorCollection object |
400 |
Invalid search query or options | Problem Details object |
LocatorCollection
ObjectIn metadata
a feed MAY contain the following elements:
Key | Definition | Format |
---|---|---|
numberOfItems |
Indicates the total number of results for this search | Integer |
title |
A user-facing title representing this collection of locators | Localized String |
In links
the following relations MAY be used:
Relation | Definition | Reference |
---|---|---|
self |
Refers to the current page of results | RFC4287 |
next |
Refers to the next page of results, if the end of the publication is not already reached. | HTML |
{
"metadata": {
"title": "Searching <riddle> in Alice in Wonderlands - Page 1",
"numberOfItems": 42
},
"links": [
{"rel": "self", "href": "/978-1503222687/search?query=riddle", "type": "application/vnd.readium.locators+json"},
{"rel": "next", "href": "/978-1503222687/search?query=riddle&page=2", "type": "application/vnd.readium.locators+json"}
],
"locators": [
{
"href": "/978-1503222687/chap7.html",
"type": "application/xhtml+xml",
"title": "Chapter 1",
"locations": {
"fragments": [
":~:text=riddle,-yet%3F'"
],
"progression": 0.43
},
"text": {
"before": "'Have you guessed the ",
"highlight": "riddle",
"after": " yet?' the Hatter said, turning to Alice again."
}
},
{
"href": "/978-1503222687/chap7.html",
"type": "application/xhtml+xml",
"title": "Chapter 1",
"locations": {
"fragments": [
":~:text=in%20asking-,riddles"
],
"progression": 0.47
},
"text": {
"before": "I'm glad they've begun asking ",
"highlight": "riddles",
"after": ".--I believe I can guess that,"
}
}
]
}
Locator
ObjectsProviding a title
for locators is useful to group search results in the user interface. A common choice is to use the table of contents’ title where the occurrence is located.
A valid Locator
object returned by a Search Service must have at least a text
context. With long enough before
and after
snippets (> 30 characters), a Navigator is able to locate the search occurrence in most cases.
The text
is also used in the search user interface to display additional context to the user. As such, it should be sanitized by:
The progression
and totalProgression
locations are not mandatory, but a very useful addition to display in the user interface.
A text fragment such as :~:text=in%20asking-,riddles
may be provided to improve interoperability in a web browser context.
A potential alternative currently implemented in the Kotlin test app is to crawl through each resource with Web View and using mark.js to locate search results.
On the plus side, this solution ensures accurate results and “free” highlighting thanks to mark.js. Unfortunately this is very resource intensive and slow, and may loose the current navigator location.
Some rendering SDKs (e.g. Web Views, PDF viewers, etc.) provide native search APIs which might offer more accurate search results.
There are a few drawbacks when using such APIs:
However, in some cases (e.g. PDF) it might be still be beneficial to use them. In which case, we could wrap the native API into its own SearchService
which would only be usable with a Publication
loaded in a Navigator.
The main potential issue is with locators containing only a text context with reflowable publications, which is the case with the default StringSearchService
implementation and probably FTS-based solutions. We cannot guarantee accurate locations compared to using CFI or DOM ranges. It might fail in very specific publications.
However, I feel like this drawback is outweighed by the ease of implementation of text-only locations and the fact that they are less fragile. In practice, I did not notice any positioning errors during early implementation. Other solutions like Hypothesis have been using text-based locations for a while with success.
An implementation based on a full-text search database would be an exciting solution for reading apps, since it offers near-instant results, cross-bookshelf search and advanced features like stemming.
SQLite ships with an FTS extension making it easy to implement on most platforms without too much overhead.
While implementing the basic StringSearchService
on mobile toolkits, I identified three important pieces:
StringSearchService
itself, which:
Locator
collections from ranges of occurrences.ResourceContentExtractor
is a component which extracts and sanitizes the text of a resource.
StringSearchService
is provided with a factory which will create a new ResourceContentExtractor
for each resource, according to the media type declared in its Link object.StringSearchService
.ICU
(API 24) using the International Components for Unicode library to offer language-aware search and case/diacritic sensitivity options.Naive
(fallback) which performs a simple exact search with no options.