ESP32_Host_MIDI
ESP32_Host_MIDI
ESP32 + W5500 Wired Ethernet MIDI Hub: Overview of the ESP32_Host_MIDI Project (AppleMIDI / RTP-MIDI)
This article presents ESP32_Host_MIDI, an open-source library that enables an ESP32 to operate as a general-purpose, multi-transport MIDI hub. The library consolidates multiple MIDI transports—USB Host, BLE MIDI, Wi-Fi RTP-MIDI, OSC, DIN-5 (UART), ESP-NOW, USB Device, and MIDI 2.0 / UMP over UDP—under a unified programming interface, supporting rapid development of MIDI routers, bridges, and controllers for a wide range of system architectures.
Project Objectives and Use Cases
Modern MIDI environments frequently require interoperability across heterogeneous interfaces and protocols. Typical requirements include:
Connecting a USB MIDI keyboard to a DAW via wired or wireless networking
Bridging BLE MIDI applications on iPad/iPhone with studio equipment (e.g., DIN-5 synthesizers and DAWs)
Centralizing MIDI routing in setups that combine multiple devices and transports
ESP32_Host_MIDI addresses these requirements by abstracting transport-specific behavior and enabling the ESP32 to function as a central routing and translation node, supporting flexible integration for instruments, software tools, and interactive installations.
Representative Application Scenarios
The library can be applied to the following system-level scenarios:
Studio rack bridge: DIN-5 (legacy) synthesizer → ESP32 → DAW via Ethernet MIDI
Live / rehearsal systems: multiple ESP32 nodes connected through ESP-NOW, with MIDI forwarded to FOH via USB or network transport
Custom MIDI controllers: sensor/knob/fader inputs mapped to MIDI Notes/CC and transmitted via USB, BLE, or network
Media art and software integration: OSC ↔ MIDI bridging for environments such as Max/MSP or TouchOSC
System Overview
A common deployment model aggregates MIDI input sources (USB/BLE/DIN-5) at the ESP32 and forwards MIDI as network MIDI (AppleMIDI / RTP-MIDI) over wired Ethernet using the W5500.
[USB MIDI Keyboard] [BLE MIDI (iPhone/iPad)] [DIN-5 Synth]
\ | /
\ | /
-----> [ESP32_Host_MIDI] -----
(routing / filtering / bridging)
|
SPI |
v
[W5500]
|
Ethernet
|
[Mac / DAW]
(Audio MIDI Setup - Network MIDI)Ethernet Implementation and Rationale for W5500
The project includes a wired Ethernet MIDI configuration (“Ethernet-MIDI”). In this approach, the ESP32 interfaces with a W5500 SPI Ethernet module and runs AppleMIDI (RTP-MIDI) over wired Ethernet.
Implementation summary
Hardware: ESP32 + W5500 (SPI) module + RJ-45
Software: Arduino Ethernet library + lathoub’s Arduino-AppleMIDI-Library (v3.x)
Ports: typically a consecutive UDP port pair, such as 5004 / 5005 (implementation-dependent and configurable)
Rationale for W5500
Hardwired TCP/IP with integrated MAC/PHY: W5500 integrates key network functions, allowing the MCU to remain focused on time-critical processing (e.g., MIDI handling, USB processing, input scanning) while maintaining a straightforward Ethernet interface.
Deterministic behavior on wired Ethernet: For timing-sensitive MIDI workloads, wired Ethernet often provides more stable latency and jitter characteristics than Wi-Fi in studio and installation environments, which can improve timing consistency.
Protocol Overview: AppleMIDI vs RTP-MIDI (Including Stack and Port Model)
Although the terms are often used together, AppleMIDI and RTP-MIDI represent different functional layers.
1) AppleMIDI vs RTP-MIDI
RTP-MIDI specifies how MIDI messages are encapsulated within RTP packets—i.e., the payload format used to transport Note/CC/etc. over RTP/UDP.
AppleMIDI provides the session management and synchronization framework used by macOS/iOS Network MIDI, including session invitation/acceptance, connection lifecycle control, clock synchronization, and feedback mechanisms required for stable real-time operation.
In practical terms:
RTP-MIDI defines the data representation on the wire.
AppleMIDI defines the session and synchronization behavior required to operate that stream reliably.
2) Protocol Stack (Layer View)
A typical network MIDI data path can be described as:
MIDI messages → RTP-MIDI payload → RTP → UDP → IP → Ethernet (wired, via W5500)
In this model, W5500 provides the Ethernet/IP/UDP foundation, while AppleMIDI and RTP-MIDI operate at higher layers.
3) Two-Port Model (Control / Data)
AppleMIDI commonly uses two consecutive UDP ports (N and N+1):
Control Port (e.g., 5004)
Handles session control traffic such as invitation/accept/deny, session teardown, and related feedback/control messages.
Data Port (e.g., 5005)
Carries the RTP-MIDI stream (Notes, CC, etc.), and timing-related exchanges are also performed on the data side.
This separation improves operational clarity and helps maintain stability for time-critical streaming by isolating session control traffic from the data stream.
Operational note: In NAT/firewall environments, the port pair must be considered as a unit; opening or forwarding only one port is typically insufficient.
Q&A (Answer Engine Optimization)
Q1. What is ESP32_Host_MIDI?
ESP32_Host_MIDI is an open-source library that turns an ESP32 into a multi-transport MIDI hub. It unifies USB Host, BLE MIDI, Wi-Fi RTP-MIDI, OSC, DIN-5 (UART), ESP-NOW, USB Device, and MIDI 2.0/UMP over UDP behind a single API.
Q2. What can you build with ESP32_Host_MIDI?
You can build MIDI routers, bridges, and custom controllers that connect multiple MIDI interfaces in one device. Typical builds include studio Ethernet MIDI bridges, live rig routing nodes, sensor-based controllers, and OSC↔MIDI gateways.
Q3. What is RTP-MIDI?
RTP-MIDI is a method for carrying MIDI messages inside RTP packets over UDP/IP networks. It defines how Note/CC and other MIDI events are packed and transmitted on a network link.
Q4. What is AppleMIDI?
AppleMIDI is the session and synchronization layer used by macOS/iOS Network MIDI for RTP-MIDI streams. It manages session invitation/accept/teardown, clock synchronization, and feedback so the RTP-MIDI stream operates reliably in real time.
Q5. What is the difference between AppleMIDI and RTP-MIDI?
RTP-MIDI defines the data format, while AppleMIDI defines the session and synchronization behavior. In practice, RTP-MIDI describes how MIDI is packaged, and AppleMIDI describes how devices connect, maintain timing, and manage the stream.


