speech

Readium Speech

Readium Speech is a TypeScript library for implementing a read aloud feature with Web technologies. It follows best practices gathered through interviews with members of the digital publishing industry.

While this project is still in a very early stage, it is meant to power the read aloud feature for two different Readium projects: Readium Web and Thorium.

Readium Speech was spun out as a separate project in order to facilitate its integration as a shared component, but also because of its potential outside of the realm of ebook reading apps.

Scope

Current focus

For our initial work on this project, we focused on voice selection based on recommended voices.

The outline of this work has been explored in a GitHub discussion and through a best practices document.

In the second phase, we focused on implementing a WebSpeech API-based solution with an architecture designed for future extensibility:

Key features include advanced voice selection, cross-browser playback control, flexible content loading, and comprehensive event handling for UI feedback. The architecture is designed to be extensible for different TTS backends while maintaining TypeScript-first development practices.

Demos

Two live demos are available:

  1. Voice selection with playback demo
  2. In-context demo

The first demo showcases the following features:

The second demo focuses on in-context reading with seamless voice selection (grouped by region and sorted based on quality), and playback control, providing an optional read-along experience that integrates naturally with the content.

QuickStart

Prerequisites

Installation

  1. Clone the repository:
    git clone https://github.com/readium/speech.git
    cd speech
    
  2. Install dependencies:
    npm install
    
  3. Build the package:
    npm run build
    
  4. Link the package locally (optional, for development):
    npm link
    # Then in your project directory:
    # npm link readium-speech
    

Basic Usage

import { WebSpeechVoiceManager } from "readium-speech";

async function setupVoices() {
  try {
    // Initialize the voice manager
    const voiceManager = await WebSpeechVoiceManager.initialize();
    
    // Get all available voices
    const allVoices = voiceManager.getVoices();
    console.log("Available voices:", allVoices);
    
    // Get voices with filters
    const filteredVoices = voiceManager.getVoices({
      language: ["en", "fr"],
      gender: "female",
      quality: "high",
      offlineOnly: true,
      excludeNovelty: true,
      excludeVeryLowQuality: true
    });
    
    // Get voices grouped by language
    const voices = voiceManager.getVoices();
    const groupedByLanguage = voiceManager.groupVoices(voices, "language");
    
    // Get a test utterance for a specific language
    const testText = voiceManager.getTestUtterance("en");
    
  } catch (error) {
    console.error("Error initializing voice manager:", error);
  }
}

await setupVoices();

Docs

Documentation provides guide for:

API Reference

Class: WebSpeechVoiceManager

The main class for managing Web Speech API voices with enhanced functionality.

Initialize the Voice Manager

static initialize(maxTimeout?: number, interval?: number): Promise<WebSpeechVoiceManager>

Creates and initializes a new WebSpeechVoiceManager instance. This static factory method must be called to create an instance.

Get Available Voices

voiceManager.getVoices(options?: VoiceFilterOptions): ReadiumSpeechVoice[]

Fetches all available voices that match the specified filter criteria.

interface VoiceFilterOptions {
  language?: string | string[];  // Filter by language code(s) (e.g., "en", "fr")
  source?: TSource;  // Filter by voice source ("json" | "browser")
  gender?: TGender;  // "male" | "female" | "other"
  quality?: TQuality | TQuality[];  // "high" | "medium" | "low" | "veryLow"
  offlineOnly?: boolean;  // Only return voices available offline
  provider?: string;  // Filter by voice provider
  excludeNovelty?: boolean;  // Exclude novelty voices, true by default
  excludeVeryLowQuality?: boolean;  // Exclude very low quality voices, true by default
}

Filter Voices

voiceManager.filterVoices(voices: ReadiumSpeechVoice[], options: VoiceFilterOptions): ReadiumSpeechVoice[]

Filters voices based on the specified criteria.

Group Voices

voiceManager.groupVoices(voices: ReadiumSpeechVoice[], groupBy: "language" | "region" | "gender" | "quality" | "provider"): VoiceGroup

Organizes voices into groups based on the specified criteria. The available grouping options are:

Sort Voices

voiceManager.sortVoices(voices: ReadiumSpeechVoice[], options: SortOptions): ReadiumSpeechVoice[]

Arranges voices according to the specified sorting criteria. The SortOptions interface allows you to sort by various properties and specify sort order.

interface SortOptions {
  by: "name" | "language" | "gender" | "quality" | "region";
  order?: "asc" | "desc";
}

Testing

Get Test Utterance

voiceManager.getTestUtterance(language: string): string

Retrieves a sample text string suitable for testing text-to-speech functionality in the specified language. If no sample text is available for the specified language, it returns an empty string.

Interfaces

ReadiumSpeechVoice

interface ReadiumSpeechVoice {
  source: TSource;        // "json" | "browser"

  // Core identification (required)
  label: string;          // Human-friendly label for the voice
  name: string;           // JSON Name (or Web Speech API name if not found)
  originalName: string;   // Original name of the voice
  voiceURI?: string;      // For Web Speech API compatibility
  
  // Localization
  language: string;       // BCP-47 language tag
  localizedName?: TLocalizedName; // Localization pattern (android/apple)
  altNames?: string[];     // Alternative names (mostly for Apple voices)
  altLanguage?: string;    // Alternative BCP-47 language tag
  otherLanguages?: string[]; // Other languages this voice can speak
  multiLingual?: boolean;  // If voice can handle multiple languages
  
  // Voice characteristics
  gender?: TGender;       // Voice gender ("female" | "male" | "neutral")
  children?: boolean;     // If this is a children's voice
  
  // Quality and capabilities
  quality?: TQuality[];    // Available quality levels for this voice ("veryLow" | "low" | "normal" | "high" | "veryHigh")
  pitchControl?: boolean;  // Whether pitch can be controlled
  
  // Performance settings
  pitch?: number;         // Current pitch (0-2, where 1 is normal)
  rate?: number;          // Speech rate (0.1-10, where 1 is normal)
  
  // Platform and compatibility
  browser?: string[];     // Supported browsers
  os?: string[];          // Supported operating systems
  preloaded?: boolean;    // If the voice is preloaded on the system
  nativeID?: string | string[]; // Platform-specific voice ID(s)
  
  // Additional metadata
  note?: string;          // Additional notes about the voice
  provider?: string;      // Voice provider (e.g., "Microsoft", "Google")
  
  // Allow any additional properties that might be in the JSON
  [key: string]: any;
}

LanguageInfo

interface LanguageInfo {
  code: string;
  label: string;
  count: number;
}

Enums

TQuality

type TQuality = "veryLow" | "low" | "normal" | "high" | "veryHigh";

TGender

type TGender = "female" | "male" | "neutral";

TSource

type TSource = "json" | "browser";

Playback API

ReadiumSpeechNavigator

interface ReadiumSpeechNavigator {
  // Voice Management
  getVoices(): Promise<ReadiumSpeechVoice[]>;
  setVoice(voice: ReadiumSpeechVoice | string): Promise<void>;
  getCurrentVoice(): ReadiumSpeechVoice | null;
  
  // Content Management
  loadContent(content: ReadiumSpeechUtterance | ReadiumSpeechUtterance[]): void;
  getCurrentContent(): ReadiumSpeechUtterance | null;
  getContentQueue(): ReadiumSpeechUtterance[];
  
  // Playback Control
  play(): void;
  pause(): void;
  stop(): void;
  
  // Navigation
  next(): boolean;
  previous(): boolean;
  jumpTo(utteranceIndex: number): void;
  
  // Playback Parameters
  setRate(rate: number): void;
  getRate(): number;
  setPitch(pitch: number): void;
  getPitch(): number;
  setVolume(volume: number): void;
  getVolume(): number;
  
  // State
  getState(): ReadiumSpeechPlaybackState;
  getCurrentUtteranceIndex(): number;
  
  // Events
  on(
    event: ReadiumSpeechPlaybackEvent["type"],
    listener: (event: ReadiumSpeechPlaybackEvent) => void
  ): void;
  
  // Cleanup
  destroy(): void;
}

Events

ReadiumSpeechPlaybackEvent

type ReadiumSpeechPlaybackEvent = {
  type: 
    | "start"           // Playback started
    | "pause"           // Playback paused
    | "resume"          // Playback resumed
    | "end"             // Playback ended naturally
    | "stop"            // Playback stopped manually
    | "skip"            // Skipped to another utterance
    | "error"           // An error occurred
    | "boundary"        // Reached a word/sentence boundary
    | "mark"            // Reached a named mark in SSML
    | "idle"            // No content loaded
    | "loading"         // Loading content
    | "ready"           // Ready to play
    | "voiceschanged";   // Available voices changed
  detail?: any;  // Event-specific data
};

ReadiumSpeechPlaybackState

type ReadiumSpeechPlaybackState = "playing" | "paused" | "idle" | "loading" | "ready";