Readium Speech is a TypeScript library for implementing a read aloud feature with Web technologies. It follows best practices gathered through interviews with members of the digital publishing industry.
While this project is still in a very early stage, it is meant to power the read aloud feature for two different Readium projects: Readium Web and Thorium.
Readium Speech was spun out as a separate project in order to facilitate its integration as a shared component, but also because of its potential outside of the realm of ebook reading apps.
For our initial work on this project, we focused on voice selection based on recommended voices.
The outline of this work has been explored in a GitHub discussion and through a best practices document.
In the second phase, we focused on implementing a WebSpeech API-based solution with an architecture designed for future extensibility:
ReadiumSpeechPlaybackEngineReadiumSpeechNavigatorKey features include advanced voice selection, cross-browser playback control, flexible content loading, and comprehensive event handling for UI feedback. The architecture is designed to be extensible for different TTS backends while maintaining TypeScript-first development practices.
Two live demos are available:
The first demo showcases the following features:
The second demo focuses on in-context reading with seamless voice selection (grouped by region and sorted based on quality), and playback control, providing an optional read-along experience that integrates naturally with the content.
Install the package using npm:
npm install @readium/speech
Or using yarn:
yarn add @readium/speech
import { WebSpeechVoiceManager, WebSpeechReadAloudNavigator } from "@readium/speech";
// Initialize voice manager
const voiceManager = await WebSpeechVoiceManager.initialize({
languages: ["en", "fr", "es"] // List of languages to fetch voices for
});
// Get the best available voice for a specific language
const voice = await voiceManager.getDefaultVoice("en-US");
// Create a navigator instance
const navigator = new WebSpeechReadAloudNavigator();
await navigator.setVoice(voice);
// Handle playback events
navigator.on("play", () => console.log("Playback started"));
navigator.on("pause", () => console.log("Playback paused"));
navigator.on("end", () => console.log("Playback completed"));
// Load and play content
const content = document.getElementById("content");
navigator.loadContent(content);
navigator.play();
Documentation provides guides for:
We are trying to use a test-driven development approach as much as possible, where we write tests before implementing the code. Currently, this is true for the WebSpeechVoiceManager class as it deals primarily with voice selection and management, where mocking is straightforward.
The playback logic is more complex and may not be suitable for this approach yet, as it involves more intricate state management and user interactions that is difficult to handle through mock objects, especially as browsers vary significantly in their implementation of the Web Speech API.
To build the library:
npm run build
This will compile the TypeScript code and generate the following outputs in the build/ directory:
index.js (ES modules)index.cjs (CommonJS)The project includes two demo applications that can be served locally:
npm run start
For ChromeOS development, the project includes a debug mode that mocks the Web Speech API with the set of voices exported from the ChromeOS browser:
Open the debug page: http://localhost:8080/debug
The debug page loads mock voices from a json file which contains a snapshot of ChromeOS voices.
To run the test suite for WebSpeechVoiceManager:
npm test
This project is based on the work done initially by Hadrien Gardeur in the web-speech-recommended-voices repository.
Hundreds of voices have been documented as JSON and released under a CC0 license.