Skip to content

About RMIDI

spessasus edited this page Aug 7, 2024 · 23 revisions

Official SF2 RMIDI Specification

Preamble

MIDI files have long faced a significant challenge: different sounds on different devices.

SF2 + MIDI combinations address this issue partially by ensuring that playing both files through an SF2-compliant synth results in the same sound being produced. The RMIDI format is not new; it was originally developed by Microsoft as a RIFF wrapper for MIDI files and later expanded by the MIDI Manufacturers Association to support embedding DLS soundfonts. However, DLS is not widely used today, whereas the SoundFont2 (SF2) format serves a similar purpose and remains quite popular.

The SF2 RMIDI format integrates MIDI and SF2 files into a single file, augmented with additional metadata. This document serves as the official specification for this format.

This version of RMIDI was created by Zoltán Bacskó of Falcosoft and implemented in Falcosoft SoundFont Midi Player 6. I am in contact with Zoltán, who granted permission to use this as the official specification.

If you find any part of this specification unclear, please reach out via this thread or file a GitHub issue in this repository.

Table of Contents

Specification

Terminology

This specification assumes familiarity with the SoundFont2 format and the Standard MIDI File (SMF) format. Additional terminology used in this specification includes:

  • The software: Refers to software compliant with this specification.
  • Bit: The most basic data structure element, either 0 or 1.
  • Byte: A data structure element of eight bits, with no defined meaning to those bits.
  • SoundFont: A SoundFont2 compliant binary.
  • Embedded SoundFont: The SoundFont bank embedded in an RMIDI file.
  • Main SoundFont: The regular SoundFont bank loaded by the software prior to loading the RMIDI file.
  • Bank: MIDI controller 0 Bank Select MSB and the bank number of a soundfont preset.
  • RIFF: Resource Interchange File Format. A file container format for storing data in tagged chunks.
  • Chunk: The top-level division of a RIFF file.
  • Little Endian: Byte ordering in memory with the least significant byte at the lowest address.
  • GM: General MIDI system, ignoring all Bank select messages.
  • XG: Yamaha eXtended General MIDI, an extension to the General MIDI standard created by Yamaha.
  • GS: Roland General Standard, an extension to the General MIDI standard created by Roland.
  • Encoding: Assigning numbers to graphical characters.
  • ASCII: American Standard Code for Information Interchange, a character encoding standard for electronic communication.

Extension

The file extension is .rmi, and the MIME type is audio/rmid. The file type should be referred to as MIDI with embedded SF2 or Embedded MIDI.

RIFF Chunk

The RMIDI format uses RIFF chunks to structure the data.

Each RIFF chunk in an RMIDI file follows this format:

  • 4 bytes: Chunk header in ASCII (e.g., RIFF)
  • 4 bytes: Chunk size as a 32-bit unsigned little-endian number
  • Chunk data: Optionally, the first 4 bytes of the data represent the chunk type in ASCII (e.g., sfbk)

NOTE: The chunk size must be even. If the initial chunk data is odd, a padding byte of 0 must be added at the end. The chunk's length does not include this padding byte.

IMPORTANT: This constraint applies only to RIFF chunks within the RMIDI file and does not affect RIFF chunks within the soundfont chunk.

RMID File Structure

An RMIDI file consists of:

  • RIFF chunk (main chunk)
    • RMID ASCII string
    • data chunk containing the complete MIDI file (MThd, MTrk, etc.)
    • Optional LIST chunk: Metadata for the file, similar to SF2's chunk
    • RIFF chunk: Complete soundfont binary. The first 4 bytes of this chunk should be sfbk, indicating a soundfont. SF3 compressed soundfonts are allowed.

Handling Differences

When the file structure deviates from the above:

  1. Any additional chunks should be ignored and preserved as-is.
  2. If the chunk order differs from this specification, the file should be rejected. While software may support arbitrary chunk orders, compliance with this specification does not require it.
  3. If no soundfont bank is present, the file should use the main soundfont and assume a bank offset of 0, ignoring the DBNK chunk.
  4. If the soundfont bank uses the older DLS format, software not capable of reading DLS should reject the file. Software that supports DLS should use the contained DLS and assume a bank offset of 0, ignoring the DBNK chunk.

INFO Chunk

The INFO chunk describes file metadata and the soundfont's bank offset and follows these rules:

  • Any additional RIFF chunks within the INFO chunk should be ignored and preserved as-is.
  • The chunk size must be even, as specified in the general RIFF structure.

The INFO chunk may contain the following optional chunks:

  • DBNK chunk: Soundfont's bank offset. See DBNK Chunk for details.
  • INAM chunk: Song name. Ideally matches the MIDI file name but is not required.
  • ICOP chunk: Copyright. String of any length.
  • IART chunk: Artist (MIDI creator). String of any length.
  • ICRD chunk: Creation date. String of any length.
  • IPRD chunk: Album name. String of any length.
  • IPIC chunk: Attached picture (e.g., album cover). Binary picture data. PNG or JPEG recommended.
  • IGNR chunk: Song genre. String of any length.
  • ICMT chunk: Comment/description. String of any length.
  • IENG chunk: Engineer (soundfont creator). String of any length.
  • ISFT chunk: Software used to create the file. String of any length.
  • IENC chunk: Encoding used for other INFO chunks, string. Not case-sensitive, but lowercase is preferred (e.g., utf-8). Software capable of reading the IENC chunk must support the following encodings. Note that this field must use basic ASCII encoding.

Chunk Rules

The following rules apply to the INFO chunk:

  1. The order of chunks in INFO is arbitrary.
  2. Chunks of length 0 should be discarded.
  3. Unknown INFO chunks should be ignored and preserved as-is.
  4. If the IENC chunk is not specified, software can use any encoding, but assuming utf-8 is recommended.
  5. If the software can display the song's name, it should use either the track name in MIDI or the INAM chunk, preferring the INAM chunk.
  6. Compatible software may ignore all INFO chunks except the DBNK chunk to remain compliant.

IENC Chunk Requirements

For Level 3 compatibility, software must support the following encodings (both lowercase and uppercase):

  • utf-8
  • shift-jis or Shift_JIS (equivalent encodings)
  • windows-1250 (Central Europe)
  • windows-1251 (Cyrillic)
  • windows-1252 (Western)
  • windows-1253 (Greek)
  • windows-1254 (Turkish)
  • windows-1255 (Hebrew)
  • windows-1256 (Arabic)
  • windows-1257 (Baltic)
  • windows-1258 (Vietnamese)

Software may decode other encodings but is not required to.

IPIC Chunk Requirements

For Level 4 compatibility, software must support the following image formats:

  • Portable Network Graphics (PNG)
  • Joint Photographic Experts Group (JPEG)

Other formats (e.g., gif, webp, ico) may also be supported but are not required.

DBNK Chunk

The DBNK chunk is an optional RIFF chunk within the RMIDI INFO List.

It always has a length of 2 bytes, with these bytes forming a 16-bit unsigned little-endian offset for the soundfont banks. If the chunk's length is not 2 bytes or the number is out of range, the file should be rejected.

Current boundaries are: min: 0 and max: 127. The other byte is reserved for future use. If no DBNK is specified, an offset of 1 is assumed by default.

For general use, a bank offset of 0 is recommended as it allows bundling the soundfont and the MIDI without modification.

Bank Offset

The bank offset adjusts every bank in the soundfont, except for bank 128.

Bank offset 0

A bank offset of 0 has a few special characteristics:

  1. If the software has a main soundfont, presets in the embedded soundfont override the main presets.
  2. On drum channels, the bank is 0. For XG MIDIs, drum channels use bank 127.
  3. The MIDI file can use the GM system and not contain any bank selects at all.
  4. If the selected bank has not been found, the channel should fall back to the first preset with the given program number of the embedded soundfont, rather than the main one.
  5. If the selected program has not been found, the channel should fall back to the first preset of the embedded soundfont, rather than the main one.

Other bank offsets

For example, bank offset of 1 means:

  1. Every bank in the soundfont is incremented by 1.
  2. For drums, the bank is 1. For XG MIDIs drum behavior is undefined; the software might expect bank 1 or bank 127. Using a bank offset of 0 in that case is recommended and defined.
  3. The MIDI file must reflect the change as well: all bank selects are incremented by 1 when compared to the original composition.
  4. The MIDI must use valid banks and presets, as the software may fall back to the main soundfont instead of defaulting to preset 0 of the embedded soundfont. Missing presets in the MIDI sequence is undefined and should be avoided.
  5. The system cannot be GM since the bank is ignored. GS or XG are valid. This requires sanitizing the MIDI and setting either GS or XG at the start.

For a DBNK value of 0, only constraint 3 applies.

The MIDI file must reflect this shift:

  • For example, if DBNK is 1 and the MIDI requests preset 001:080, it should call bank select 2 instead of 1.
  • If DBNK is 0, no offset is applied, and the MIDI remains unchanged.

Program and Bank Correction

As stated in constraint 3, the MIDI must use valid banks and presets, as the software may fallback to the loaded soundfont instead of defaulting to preset 0 of the embedded soundfont. The system cannot be GM since the bank is ignored. GS or XG are valid. This requires sanitizing the MIDI and setting either GS or XG at the start.

Software Requirements

Not all chunks in the file must be read for the file to play correctly. Software compatibility with the RMIDI format is categorized into levels:

Level 1

Basic RMIDI compatibility. The software must:

  • Read and interpret the RMID ASCII string as the file indicator.
  • Handle the data chunk containing the MIDI data.
  • Process the DBNK chunk within the INFO chunk and correctly offset the soundfont (or bank selects in the MIDI) based on this value.
  • Read the RIFF chunk with the soundfont data.

Level 2

This level requires basic interpretation of the INFO chunk. The software must:

  • Read all Level 1 chunks.
  • Interpret all metadata chunks (INAM, IPRD, ICRD, ICOP, etc.) as ASCII or utf-8.

Level 3

This level requires support for the IENC chunk. The software must:

  • Read all Level 1 and Level 2 chunks.
  • Interpret the IENC chunk and support the required encodings.

As of 2024-08-07, Falcosoft Midi Player meets this level of compatibility.

Level 4

This level requires support for the IPIC chunk. The software must:

  • Read all Level 1, Level 2, and Level 3 chunks.
  • Interpret the IPIC chunk and support the required image formats.

As of 2024-08-06, SpessaSynth meets this level of compatibility.

Recommendations for Writing RMIDI Files

The following recommendations are not required for file validity but are advised:

  1. Trim the soundfont to include only presets used in the file.
  2. Ensure the MIDI file references used banks and programs to avoid undefined behavior from missing presets.
  3. Always include the DBNK chunk, even if the offset is 1.
  4. Include the IENC chunk to ensure correct encoding is used.
  5. Omit metadata chunks rather than writing them with a length of 0 if not applicable.

Reference Implementation

Below is SpessaSynth's implementation of the format in JavaScript, which may be useful for developers: