Readium Logo

Media Type

Summary

This proposal introduces a dedicated API to easily figure out a file format.

While a Publication is independent of any particular format, knowing the format of a publication file is necessary to:

This API is not tied to Publication, so it can be used as a general purpose tool to guess a file format, e.g. during HTTP requests or in the LCP library.

Developer Guide

You can use the Media Type API every time you need to figure out the format of a file or bytes.

To use this API efficiently, you should:

Sniffing the Media Type of Raw Bytes

You can use directly MediaType.of() for sniffing raw bytes (e.g. the body of an HTTP response). It will take a closure returning the bytes lazily.

let feedLink: Link
let response = httpClient.get(feedLink.href)

let mediaType = MediaType.of(
    bytes: { response.body },
    // You can give several file extension and media type hints, which will be sniffed in order.
    fileExtensions: [feedLink.href.pathExtension],
    mediaTypes: [response.headers["Content-Type"], feedLink.type]
)

In the case of an HTTP response, this can be simplified by using the HTTPResponse.sniffMediaType() extension:

let feedLink: Link
let response = httpClient.get(feedLink.href)

let mediaType = response.sniffMediaType(mediaTypes: [feedLink.type])

Sniffing the Media Type of a File

For local files, you can provide an absolute path to MediaType.of(). To improve sniffing speed, you should also provide a media type hint if possible – for example if you previously stored it in a database.

let dbBook = database.get(bookId)

let mediaType = MediaType.of(
    path: dbBook.path,
    mediaTypes: [dbBook.mediaType]
)

Supporting a Custom Media Type

Reading apps are welcome to extend this API with their own media types. To declare a custom media type, you need to:

  1. Create a MediaType constant, optionally in the MediaType. namespace.
  2. Create a sniffer function to recognize your media type from a MediaType.SnifferContext.
  3. Then, either:
    1. add your sniffer to the MediaType.sniffers shared list to be used globally,
    2. or use your sniffer on a case-by-case basis, by passing it to the sniffers argument of MediaType.of().

Here’s an example with Adobe’s ACSM media type.

// 1. Create the `MediaType` instance.
private let acsmMediaType = MediaType(
    "application/vnd.adobe.adept+xml",
    name: "Adobe Content Server Manager",
    fileExtension: "acsm"
)

extension MediaType {
    static var ACSM: MediaType { acsmMediaType }
}

// 2. Create the sniffer function.
func sniffACSM(context: MediaType.SnifferContext) -> MediaType? {
    if
        context.hasMediaType("application/vnd.adobe.adept+xml") ||
        context.hasFileExtension("acsm") ||
        context.contentAsXML?.documentElement?.localName == "fulfillmentToken"
    { 
        return MediaType.ACSM
    }

    return nil
}

// 3.1. Declare the sniffer globally.
MediaType.sniffers.add(sniffACSM)
let mediaType = MediaType.of(path: acsmPath)

// 3.2. Or use the sniffer on a case-by-case basis. 
let mediaType = MediaType.of(path: acsmPath, sniffers: MediaType.sniffers + [sniffACSM])

Reference Guide

File formats are represented by MediaType instances, which can be used to get the file extension, name and media type string.

However, some formats can be identified by several media type aliases, for example CBZ has for canonical type application/vnd.comicbook+zip but has an historical alias application/x-cbz. In this case, you should only store the canonical type in a database. You can resolve the canonical version of a known media type using mediaType.canonicalized.

All Readium APIs already return canonical media types, so this is useful only if you create your own MediaType from strings.

let fileExtension = MediaType("text/plain")?.canonicalized.fileExtension

Sniffers are functions with the type MediaType.Sniffer whose job is to resolve a MediaType from bytes or metadata. Each supported MediaType must have at least one matching sniffer to be recognized. Therefore, a reading app should provide its own sniffers to support custom publication formats.

MediaType class

Represents a document format, identified by a unique RFC 6838 media type.

MediaType handles:

Comparing media types is more complicated than it looks, since they can contain parameters such as charset=utf-8. We can’t ignore them because some formats use parameters in their media type, for example application/atom+xml;profile=opds-catalog for an OPDS 1 catalog.

Constructor

Properties

Methods

Helpers

Computed properties for convenience. More can be added as needed.

Constants

Static constants are provided in MediaType for well known media types. These are MediaType instances, not String.

Constant Media Type Extension Name
AAC audio/aac aac  
ACSM application/vnd.adobe.adept+xml acsm Adobe Content Server Message
AIFF audio/aiff aiff  
AVI video/x-msvideo avi  
AVIF image/avif avif  
Binary application/octet-stream    
BMP image/bmp bmp  
CBZ application/vnd.comicbook+zip cbz Comic Book Archive
CSS text/css css  
DiViNa application/divina+zip divina Digital Visual Narratives
DiViNaManifest application/divina+json json Digital Visual Narratives
EPUB application/epub+zip epub EPUB
GIF image/gif gif  
GZ application/gzip gz  
JavaScript text/javascript js  
JPEG image/jpeg jpeg  
JXL image/jxl jxl  
HTML text/html html  
JSON application/json json  
LCPProtectedAudiobook application/audiobook+lcp lcpa LCP Protected Audiobook
LCPProtectedPDF application/pdf+lcp lcpdf LCP Protected PDF
LCPLicenseDocument application/vnd.readium.lcp.license.v1.0+json lcpl LCP License
LCPStatusDocument application/vnd.readium.license.status.v1.0+json    
LPF application/lpf+zip lpf  
MP3 audio/mpeg mp3  
MPEG video/mpeg mpeg  
NCX application/x-dtbncx+xml ncx  
Ogg audio/ogg oga  
Ogv video/ogg ogv  
Opus audio/opus opus  
OPDS1 application/atom+xml;profile=opds-catalog    
OPDS1Entry application/atom+xml;type=entry;profile=opds-catalog    
OPDS2 application/opds+json    
OPDS2Publication application/opds-publication+json    
OPDSAuthentication application/opds-authentication+json    
OTF font/otf otf  
PDF application/pdf pdf PDF
PNG image/png png  
ReadiumAudiobook application/audiobook+zip audiobook Readium Audiobook
ReadiumAudiobookManifest application/audiobook+json json Readium Audiobook
ReadiumWebPub application/webpub+zip webpub Readium Web Publication
ReadiumWebPubManifest application/webpub+json json Readium Web Publication
SMIL application/smil+xml smil  
SVG image/svg+xml svg  
Text text/plain txt  
TIFF image/tiff tiff  
TTF font/ttf ttf  
W3CWPUBManifest (non-existent) application/x.readium.w3c.wpub+json json Web Publication
WAV audio/wav wav  
WebMAudio audio/webm webm  
WebMVideo video/webm webm  
WebP image/webp webp  
WOFF font/woff woff  
WOFF2 font/woff2 woff2  
XHTML application/xhtml+xml xhtml  
XML application/xml xml  
ZAB (non-existent) application/x.readium.zab+zip zab Zipped Audio Book
ZIP application/zip zip  

MediaType.Sniffer Function Type

Determines if the provided content matches a known media type.

Definition

MediaType.SnifferContext Interface

A companion type of MediaType.Sniffer holding the type hints (file extensions, types) and providing an access to the file content.

Examples of concrete implementations:

Properties

Methods

HTTP Response Extension

It’s useful to be able to resolve a format from an HTTP response. Therefore, implementations should provide when possible an extension to the native HTTP response type.

This extension will create a MediaType.BytesSnifferContext using these informations:

Sniffing Strategy

It’s important to have consistent results across platforms, so we need to use the same sniffing strategy.

Sniffing a format is done in two rounds, because we want to give an opportunity to all sniffers to return a MediaType quickly before inspecting the content itself:

  1. Light Sniffing checks only the provided file extension or media type hints.
  2. Heavy Sniffing reads the bytes to perform more advanced sniffing.

To do that, MediaType.of() will iterate over all the sniffers twice, first with a MediaType.SnifferContext containing only extensions and media types, and the second time with a context containing the content, if available.

Default Sniffers

Sniffers can encapsulate the detection of several media types to factorize similar detection logic. For example, the following sniffers were identified. The sniffers order is important, because some formats are subsets of others.

  1. HTML
  2. OPDS 1
  3. OPDS 2
  4. LCP License Document
  5. Bitmap (BMP, GIF, JPEG, JXL, PNG, TIFF, WebP, AVIF)
  6. Readium Web Publication (WebPub, Audiobook, DiViNa, RWPM, LCPA and LCPDF)
  7. W3C Web Publication
  8. EPUB
  9. LPF
  10. Free-form ZIP (CBZ and ZAB)
  11. PDF

In the case of bitmap formats, the default Readium sniffers don’t perform any heavy sniffing, because we only need to detect these formats using file extensions in ZIP entries or media types in a manifest. If needed, a reading app could add additional sniffers doing heavy sniffing of bitmap files.

Media Types

Audiobook (Readium)
Audiobook Manifest (Readium)
BMP
CBZ
DiViNa
DiViNa Manifest
EPUB
GIF
HTML
JPEG
JPEG XL
OPDS 1 Feed
OPDS 1 Entry
OPDS 2 Feed
OPDS 2 Publication
OPDS Authentication Document
LCP Protected Audiobook
LCP Protected PDF
LCP License Document
LPF (Lightweight Packaging Format)
PDF
PNG
Web Publication (Readium)
Web Publication Manifest (Readium)
Web Publication Manifest (W3C)
TIFF
WebP
AVIF
ZAB (Zipped Audio Book)