Readium Speech is a TypeScript library for implementing a read aloud feature with Web technologies. It follows best practices gathered through interviews with members of the digital publishing industry.
While this project is still in a very early stage, it is meant to power the read aloud feature for two different Readium projects: Readium Web and Thorium.
Readium Speech was spun out as a separate project in order to facilitate its integration as a shared component, but also because of its potential outside of the realm of ebook reading apps.
For our initial work on this project, we focused on voice selection based on recommended voices.
The outline of this work has been explored in a GitHub discussion and through a best practices document.
In the second phase, we focused on implementing a WebSpeech API-based solution with an architecture designed for future extensibility:
ReadiumSpeechPlaybackEngineReadiumSpeechNavigatorKey features include advanced voice selection, cross-browser playback control, flexible content loading, and comprehensive event handling for UI feedback. The architecture is designed to be extensible for different TTS backends while maintaining TypeScript-first development practices.
Two live demos are available:
The first demo showcases the following features:
The second demo focuses on in-context reading with seamless voice selection (grouped by region and sorted based on quality), and playback control, providing an optional read-along experience that integrates naturally with the content.
git clone https://github.com/readium/speech.git
cd speech
npm install
npm run build
npm link
# Then in your project directory:
# npm link readium-speech
import { WebSpeechVoiceManager } from "readium-speech";
async function setupVoices() {
try {
// Initialize the voice manager
const voiceManager = await WebSpeechVoiceManager.initialize();
// Get all available voices
const allVoices = voiceManager.getVoices();
console.log("Available voices:", allVoices);
// Get voices with filters
const filteredVoices = voiceManager.getVoices({
language: ["en", "fr"],
gender: "female",
quality: "high",
offlineOnly: true,
excludeNovelty: true,
excludeVeryLowQuality: true
});
// Get voices grouped by language
const voices = voiceManager.getVoices();
const groupedByLanguage = voiceManager.groupVoices(voices, "language");
// Get a test utterance for a specific language
const testText = voiceManager.getTestUtterance("en");
} catch (error) {
console.error("Error initializing voice manager:", error);
}
}
await setupVoices();
Documentation provides guide for:
The main class for managing Web Speech API voices with enhanced functionality.
static initialize(maxTimeout?: number, interval?: number): Promise<WebSpeechVoiceManager>
Creates and initializes a new WebSpeechVoiceManager instance. This static factory method must be called to create an instance.
maxTimeout: Maximum time in milliseconds to wait for voices to load (default: 10000ms)interval: Interval in milliseconds between voice loading checks (default: 100ms)voiceManager.getVoices(options?: VoiceFilterOptions): ReadiumSpeechVoice[]
Fetches all available voices that match the specified filter criteria.
interface VoiceFilterOptions {
language?: string | string[]; // Filter by language code(s) (e.g., "en", "fr")
source?: TSource; // Filter by voice source ("json" | "browser")
gender?: TGender; // "male" | "female" | "other"
quality?: TQuality | TQuality[]; // "high" | "medium" | "low" | "veryLow"
offlineOnly?: boolean; // Only return voices available offline
provider?: string; // Filter by voice provider
excludeNovelty?: boolean; // Exclude novelty voices, true by default
excludeVeryLowQuality?: boolean; // Exclude very low quality voices, true by default
}
voiceManager.filterVoices(voices: ReadiumSpeechVoice[], options: VoiceFilterOptions): ReadiumSpeechVoice[]
Filters voices based on the specified criteria.
voiceManager.groupVoices(voices: ReadiumSpeechVoice[], groupBy: "language" | "region" | "gender" | "quality" | "provider"): VoiceGroup
Organizes voices into groups based on the specified criteria. The available grouping options are:
"language": Groups voices by their language code"region": Groups voices by their region"gender": Groups voices by gender"quality": Groups voices by quality level"provider": Groups voices by their providervoiceManager.sortVoices(voices: ReadiumSpeechVoice[], options: SortOptions): ReadiumSpeechVoice[]
Arranges voices according to the specified sorting criteria. The SortOptions interface allows you to sort by various properties and specify sort order.
interface SortOptions {
by: "name" | "language" | "gender" | "quality" | "region";
order?: "asc" | "desc";
}
voiceManager.getTestUtterance(language: string): string
Retrieves a sample text string suitable for testing text-to-speech functionality in the specified language. If no sample text is available for the specified language, it returns an empty string.
ReadiumSpeechVoiceinterface ReadiumSpeechVoice {
source: TSource; // "json" | "browser"
// Core identification (required)
label: string; // Human-friendly label for the voice
name: string; // JSON Name (or Web Speech API name if not found)
originalName: string; // Original name of the voice
voiceURI?: string; // For Web Speech API compatibility
// Localization
language: string; // BCP-47 language tag
localizedName?: TLocalizedName; // Localization pattern (android/apple)
altNames?: string[]; // Alternative names (mostly for Apple voices)
altLanguage?: string; // Alternative BCP-47 language tag
otherLanguages?: string[]; // Other languages this voice can speak
multiLingual?: boolean; // If voice can handle multiple languages
// Voice characteristics
gender?: TGender; // Voice gender ("female" | "male" | "neutral")
children?: boolean; // If this is a children's voice
// Quality and capabilities
quality?: TQuality[]; // Available quality levels for this voice ("veryLow" | "low" | "normal" | "high" | "veryHigh")
pitchControl?: boolean; // Whether pitch can be controlled
// Performance settings
pitch?: number; // Current pitch (0-2, where 1 is normal)
rate?: number; // Speech rate (0.1-10, where 1 is normal)
// Platform and compatibility
browser?: string[]; // Supported browsers
os?: string[]; // Supported operating systems
preloaded?: boolean; // If the voice is preloaded on the system
nativeID?: string | string[]; // Platform-specific voice ID(s)
// Additional metadata
note?: string; // Additional notes about the voice
provider?: string; // Voice provider (e.g., "Microsoft", "Google")
// Allow any additional properties that might be in the JSON
[key: string]: any;
}
LanguageInfointerface LanguageInfo {
code: string;
label: string;
count: number;
}
TQualitytype TQuality = "veryLow" | "low" | "normal" | "high" | "veryHigh";
TGendertype TGender = "female" | "male" | "neutral";
TSourcetype TSource = "json" | "browser";
interface ReadiumSpeechNavigator {
// Voice Management
getVoices(): Promise<ReadiumSpeechVoice[]>;
setVoice(voice: ReadiumSpeechVoice | string): Promise<void>;
getCurrentVoice(): ReadiumSpeechVoice | null;
// Content Management
loadContent(content: ReadiumSpeechUtterance | ReadiumSpeechUtterance[]): void;
getCurrentContent(): ReadiumSpeechUtterance | null;
getContentQueue(): ReadiumSpeechUtterance[];
// Playback Control
play(): void;
pause(): void;
stop(): void;
// Navigation
next(): boolean;
previous(): boolean;
jumpTo(utteranceIndex: number): void;
// Playback Parameters
setRate(rate: number): void;
getRate(): number;
setPitch(pitch: number): void;
getPitch(): number;
setVolume(volume: number): void;
getVolume(): number;
// State
getState(): ReadiumSpeechPlaybackState;
getCurrentUtteranceIndex(): number;
// Events
on(
event: ReadiumSpeechPlaybackEvent["type"],
listener: (event: ReadiumSpeechPlaybackEvent) => void
): void;
// Cleanup
destroy(): void;
}
type ReadiumSpeechPlaybackEvent = {
type:
| "start" // Playback started
| "pause" // Playback paused
| "resume" // Playback resumed
| "end" // Playback ended naturally
| "stop" // Playback stopped manually
| "skip" // Skipped to another utterance
| "error" // An error occurred
| "boundary" // Reached a word/sentence boundary
| "mark" // Reached a named mark in SSML
| "idle" // No content loaded
| "loading" // Loading content
| "ready" // Ready to play
| "voiceschanged"; // Available voices changed
detail?: any; // Event-specific data
};
type ReadiumSpeechPlaybackState = "playing" | "paused" | "idle" | "loading" | "ready";