This proposal introduces a dedicated API to easily figure out a file format.
While a Publication is independent of any particular format, knowing the format of a publication
file is necessary to:
This API is not tied to Publication, so it can be used as a general purpose tool to guess a file
format, e.g. during HTTP requests or in the LCP library.
You can use the Media Type API every time you need to figure out the format of a file or bytes.
To use this API efficiently, you should:
You can use directly MediaType.of() for sniffing raw bytes (e.g. the body of an HTTP response). It
will take a closure returning the bytes lazily.
let feedLink: Link
let response = httpClient.get(feedLink.href)
let mediaType = MediaType.of(
bytes: { response.body },
// You can give several file extension and media type hints, which will be sniffed in order.
fileExtensions: [feedLink.href.pathExtension],
mediaTypes: [response.headers["Content-Type"], feedLink.type]
)
In the case of an HTTP response, this can be simplified by using the HTTPResponse.sniffMediaType()
extension:
let feedLink: Link
let response = httpClient.get(feedLink.href)
let mediaType = response.sniffMediaType(mediaTypes: [feedLink.type])
For local files, you can provide an absolute path to MediaType.of(). To improve sniffing speed,
you should also provide a media type hint if possible – for example if you previously stored it in a
database.
let dbBook = database.get(bookId)
let mediaType = MediaType.of(
path: dbBook.path,
mediaTypes: [dbBook.mediaType]
)
Reading apps are welcome to extend this API with their own media types. To declare a custom media type, you need to:
MediaType constant, optionally in the MediaType. namespace.MediaType.SnifferContext.MediaType.sniffers shared list to be used globally,sniffers argument of
MediaType.of().Here’s an example with Adobe’s ACSM media type.
// 1. Create the `MediaType` instance.
private let acsmMediaType = MediaType(
"application/vnd.adobe.adept+xml",
name: "Adobe Content Server Manager",
fileExtension: "acsm"
)
extension MediaType {
static var ACSM: MediaType { acsmMediaType }
}
// 2. Create the sniffer function.
func sniffACSM(context: MediaType.SnifferContext) -> MediaType? {
if
context.hasMediaType("application/vnd.adobe.adept+xml") ||
context.hasFileExtension("acsm") ||
context.contentAsXML?.documentElement?.localName == "fulfillmentToken"
{
return MediaType.ACSM
}
return nil
}
// 3.1. Declare the sniffer globally.
MediaType.sniffers.add(sniffACSM)
let mediaType = MediaType.of(path: acsmPath)
// 3.2. Or use the sniffer on a case-by-case basis.
let mediaType = MediaType.of(path: acsmPath, sniffers: MediaType.sniffers + [sniffACSM])
File formats are represented by MediaType instances, which can be used to get the file
extension, name and media type string.
However, some formats can be identified by several media type aliases, for example CBZ has for
canonical type application/vnd.comicbook+zip but has an historical alias application/x-cbz. In
this case, you should only store the canonical type in a database. You can resolve the canonical
version of a known media type using mediaType.canonicalized.
All Readium APIs already return canonical media types, so this is useful only if you
create your own MediaType from strings.
let fileExtension = MediaType("text/plain")?.canonicalized.fileExtension
Sniffers are functions with the type MediaType.Sniffer whose job is to resolve a MediaType from
bytes or metadata. Each supported MediaType must have at least one matching sniffer to be
recognized. Therefore, a reading app should provide its own sniffers to support custom publication
formats.
MediaType classRepresents a document format, identified by a unique RFC 6838 media type.
MediaType handles:
Comparing media types is more complicated than it looks, since they can contain
parameters such as charset=utf-8. We can’t ignore them
because some formats use parameters in their media type, for example
application/atom+xml;profile=opds-catalog for an OPDS 1 catalog.
MediaType(string: String, name: String? = null, fileExtension: String? = null)
MediaType from its string representation and an optional name and file extension.string: String (or toString() if more idiomatic).
charset parameter, which is uppercased.name: String?
fileExtension: String?
type: String
application in application/epub+zip.subtype: String
epub+zip in application/epub+zip.parameters: Map<String, String>
charset=utf-8.structuredSyntaxSuffix: String?
+zip in application/epub+zipencoding: Encoding?
charset parameter, if there’s any.Encoding type provided by the platform, for convenience.canonicalized: MediaType
application/x-cbz is an alias of the
canonical application/vnd.comicbook+zip.MediaType.of(string) || this.contains(other: MediaType) -> Boolean, contains(other: String) -> Boolean
other media type is included in this media type.text/html contains text/html;charset=utf-8.other must match the parameters in the parameters property, but extra parameters are ignored.image/* contains image/png and */* contains everything.matches(other: MediaType) -> Boolean, matches(other: String) -> Boolean
other are the same, ignoring parameters that are not in both media types.text/html matches text/html;charset=utf-8, but text/html;charset=ascii
doesn’t. This is basically like contains, but working in both directions.== (equality)
of(mediaTypes: List<String> = [], fileExtensions: List<String> = [], sniffers: List<Sniffer> = MediaType.sniffers) -> MediaType?
mediaTypes: List<String> = []
Link.type, from a Content-Type HTTP header or from a database.fileExtensions: List<String> = []
sniffers: List<Sniffer> = MediaType.sniffers
MediaType.sniffers + [customSniffer].ofFile(file: String, mediaTypes: List<String> = [], fileExtensions: List<String> = [], sniffers: List<Sniffer> = MediaType.sniffers) -> MediaType?
file: String
ofBytes(bytes: () -> ByteArray, mediaTypes: List<String> = [], fileExtensions: List<String> = [], sniffers: List<Sniffer> = MediaType.sniffers) -> MediaType?
bytes: () -> ByteArray
Computed properties for convenience. More can be added as needed.
isZIP: Boolean
isJSON: Boolean
isOPDS: Boolean
OPDS1, OPDS1Entry, OPDS2
or OPDS2Publication.isHTML: Boolean
HTML or XHTML.isBitmap: Boolean
BMP, GIF, JPEG, JXL, PNG, TIFF, WebP or AVIF.isAudio: Boolean
isPublication: Boolean
Link HelpersmediaType: MediaType
application/octet-stream if the type can’t be determined.MediaType.of(link.type) ?? MediaType.binary.sniffers: List<Sniffer>
MediaType.Static constants are provided in MediaType for well known media types. These are MediaType
instances, not String.
| Constant | Media Type | Extension | Name |
|---|---|---|---|
AAC |
audio/aac | aac | Â |
ACSM |
application/vnd.adobe.adept+xml | acsm | Adobe Content Server Message |
AIFF |
audio/aiff | aiff | Â |
AVI |
video/x-msvideo | avi | Â |
AVIF |
image/avif | avif | Â |
Binary |
application/octet-stream | Â | Â |
BMP |
image/bmp | bmp | Â |
CBZ |
application/vnd.comicbook+zip | cbz | Comic Book Archive |
CSS |
text/css | css | Â |
DiViNa |
application/divina+zip | divina | Digital Visual Narratives |
DiViNaManifest |
application/divina+json | json | Digital Visual Narratives |
EPUB |
application/epub+zip | epub | EPUB |
GIF |
image/gif | gif | Â |
GZ |
application/gzip | gz | Â |
JavaScript |
text/javascript | js | Â |
JPEG |
image/jpeg | jpeg | Â |
JXL |
image/jxl | jxl | Â |
HTML |
text/html | html | Â |
JSON |
application/json | json | Â |
LCPProtectedAudiobook |
application/audiobook+lcp | lcpa | LCP Protected Audiobook |
LCPProtectedPDF |
application/pdf+lcp | lcpdf | LCP Protected PDF |
LCPLicenseDocument |
application/vnd.readium.lcp.license.v1.0+json | lcpl | LCP License |
LCPStatusDocument |
application/vnd.readium.license.status.v1.0+json | Â | Â |
LPF |
application/lpf+zip | lpf | Â |
MP3 |
audio/mpeg | mp3 | Â |
MPEG |
video/mpeg | mpeg | Â |
NCX |
application/x-dtbncx+xml | ncx | Â |
Ogg |
audio/ogg | oga | Â |
Ogv |
video/ogg | ogv | Â |
Opus |
audio/opus | opus | Â |
OPDS1 |
application/atom+xml;profile=opds-catalog | Â | Â |
OPDS1Entry |
application/atom+xml;type=entry;profile=opds-catalog | Â | Â |
OPDS2 |
application/opds+json | Â | Â |
OPDS2Publication |
application/opds-publication+json | Â | Â |
OPDSAuthentication |
application/opds-authentication+json | Â | Â |
OTF |
font/otf | otf | Â |
PDF |
application/pdf | ||
PNG |
image/png | png | Â |
ReadiumAudiobook |
application/audiobook+zip | audiobook | Readium Audiobook |
ReadiumAudiobookManifest |
application/audiobook+json | json | Readium Audiobook |
ReadiumWebPub |
application/webpub+zip | webpub | Readium Web Publication |
ReadiumWebPubManifest |
application/webpub+json | json | Readium Web Publication |
SMIL |
application/smil+xml | smil | Â |
SVG |
image/svg+xml | svg | Â |
Text |
text/plain | txt | Â |
TIFF |
image/tiff | tiff | Â |
TTF |
font/ttf | ttf | Â |
W3CWPUBManifest |
(non-existent) application/x.readium.w3c.wpub+json | json | Web Publication |
WAV |
audio/wav | wav | Â |
WebMAudio |
audio/webm | webm | Â |
WebMVideo |
video/webm | webm | Â |
WebP |
image/webp | webp | Â |
WOFF |
font/woff | woff | Â |
WOFF2 |
font/woff2 | woff2 | Â |
XHTML |
application/xhtml+xml | xhtml | Â |
XML |
application/xml | xml | Â |
ZAB |
(non-existent) application/x.readium.zab+zip | zab | Zipped Audio Book |
ZIP |
application/zip | zip | Â |
MediaType.Sniffer Function TypeDetermines if the provided content matches a known media type.
MediaType.Sniffer = (context: MediaType.SnifferContext) -> MediaType?
context holds the file metadata and cached content, which are shared among the sniffers.MediaType.SnifferContext InterfaceA companion type of MediaType.Sniffer holding the type hints (file extensions, types) and providing an access to the file content.
Examples of concrete implementations:
MediaType.FileSnifferContext to sniff a local file.MediaType.BytesSnifferContext to sniff a bytes array.MediaType.MetadataSnifferContext to sniff only the media type and file extension hints.mediaTypes: List<String>
fileExtensions: List<String>
encoding: Encoding?
Encoding declared in the media types’ charset parameter.Encoding type provided by the platform, for convenience.contentAsString: String?
charset parameter from the media type hints to figure out an encoding. Otherwise, fallback on UTF-8.contentAsXML: XMLDocument?
contentAsArchive: Archive?
contentAsJSON: JSONObject?
contentAsRWPM: Publication?
hasFileExtension(fileExtensions: String...) -> Boolean
fileExtensions array.hasMediaType(mediaTypes: String...) -> Boolean, hasMediaType(mediaTypes: MediaType...) -> Boolean
mediaTypes array, using MediaType to handle the comparison.stream() -> Stream?
read(range: Range<Int>? = null) -> ByteArray?
range.close()
It’s useful to be able to resolve a format from an HTTP response. Therefore, implementations should provide when possible an extension to the native HTTP response type.
HTTPResponse.sniffMediaType(mediaTypes: List<String> = [], fileExtensions: List<String> = [], sniffers: List<Sniffer> = MediaType.sniffers): MediaType?
mediaTypes
fileExtensions
sniffers
This extension will create a MediaType.BytesSnifferContext using these informations:
mediaTypes, in order:
Content-Type HTTP header,mediaTypes, for example to use the value of Link.type.fileExtensions, in order:
Content-Disposition,bytes: the response’s bodyIt’s important to have consistent results across platforms, so we need to use the same sniffing strategy.
Sniffing a format is done in two rounds, because we want to give an opportunity to all sniffers to
return a MediaType quickly before inspecting the content itself:
To do that, MediaType.of() will iterate over all the sniffers twice, first with a
MediaType.SnifferContext containing only extensions and media types, and the second time with a
context containing the content, if available.
Sniffers can encapsulate the detection of several media types to factorize similar detection logic. For example, the following sniffers were identified. The sniffers order is important, because some formats are subsets of others.
In the case of bitmap formats, the default Readium sniffers don’t perform any heavy sniffing, because we only need to detect these formats using file extensions in ZIP entries or media types in a manifest. If needed, a reading app could add additional sniffers doing heavy sniffing of bitmap files.
audiobookapplication/audiobook+zipmanifest.json entry, parsed as an RWPM with either:
metadata.@type == http://schema.org/Audiobook, orLink with an audio type, checked with MediaType::isAudioapplication/audiobook+jsonmetadata.@type == http://schema.org/Audiobook, orLink with an audio type, checked with MediaType::isAudiobmp or dibimage/bmp or image/x-bmpcbzapplication/vnd.comicbook+zip, application/x-cbz or application/x-cbracbf, gif, jpeg, jpg, jxl, png, tiff, tif, webp, avif or xml. and Thumbs.db are ignoreddivinaapplication/divina+zipmanifest.json entry parsed as an RWPM, with a reading order containing only bitmap images – checked using MediaType.isBitmap on each Link.typeapplication/divina+jsonMediaType.isBitmap on each Link.typeepubapplication/epub+zipmimetype entry containing strictly application/epub+zip, encoded in US-ASCIIgifimage/gifhtm, html, xht or xhtmltext/html or application/xhtml+xml, checked using MediaType.isHTML<html> root nodejpg, jpeg, jpe, jif, jfif or jfiimage/jpegjxlimage/jxlapplication/atom+xml;profile=opds-catalog<feed> root node with the XML namespace http://www.w3.org/2005/Atomapplication/atom+xml;type=entry;profile=opds-catalog<entry> root node with the XML namespace http://www.w3.org/2005/Atomapplication/opds+jsonLink with self rel and application/opds+json typeapplication/opds-publication+jsonLink with a rel starting with http://opds-spec.org/acquisitionapplication/opds-authentication+json or application/vnd.opds.authentication.v1.0+jsonid, title and authenticationlcpaapplication/audiobook+lcplicense.lcpl entrymanifest.json entry, parsed as an RWPM with either:
metadata.@type == http://schema.org/Audiobook, orLink with an audio type, checked with MediaType::isAudiolcpdfapplication/pdf+lcplicense.lcpl entrymanifest.json entry, parsed as an RWPM with a reading order containing only Link with application/pdf typelcplapplication/vnd.readium.lcp.license.v1.0+jsonid, issued, provider and encryptionlpfapplication/lpf+zippublication.json entry, containing at least https://www.w3.org/ns/pub-context in the @context string/array propertyindex.html entrypdfapplication/pdf%PDF-pngimage/pngwebpubapplication/webpub+zipmanifest.json entry parsed as an RWPMapplication/webpub+jsonLink with self rel and application/webpub+json typehttps://www.w3.org/ns/wp-context in the @context string/array propertytiff or tifimage/tiff or image/tiff-fxwebpimage/webpavifimage/avifzabaac, aiff, alac, flac, m4a, m4b, mp3, ogg, oga, mogg, opus, wav or webmasx, bio, m3u, m3u8, pla, pls, smil, vlc, wpl, xspf or zpl. and Thumbs.db are ignored