Wiznet makers

lawrence

Published February 26, 2026 ©

135 UCC

9 WCC

32 VAR

0 Contests

0 Followers

0 Following

Original Link

ESP32_Host_MIDI

ESP32_Host_MIDI

COMPONENTS Hardware components

Espressif - ESP32

x 1


WIZnet - W5500

x 1


PROJECT DESCRIPTION

ESP32 + W5500 Wired Ethernet MIDI Hub: Overview of the ESP32_Host_MIDI Project (AppleMIDI / RTP-MIDI)

This article presents ESP32_Host_MIDI, an open-source library that enables an ESP32 to operate as a general-purpose, multi-transport MIDI hub. The library consolidates multiple MIDI transports—USB Host, BLE MIDI, Wi-Fi RTP-MIDI, OSC, DIN-5 (UART), ESP-NOW, USB Device, and MIDI 2.0 / UMP over UDP—under a unified programming interface, supporting rapid development of MIDI routers, bridges, and controllers for a wide range of system architectures.


Project Objectives and Use Cases

Modern MIDI environments frequently require interoperability across heterogeneous interfaces and protocols. Typical requirements include:

Connecting a USB MIDI keyboard to a DAW via wired or wireless networking

Bridging BLE MIDI applications on iPad/iPhone with studio equipment (e.g., DIN-5 synthesizers and DAWs)

Centralizing MIDI routing in setups that combine multiple devices and transports

ESP32_Host_MIDI addresses these requirements by abstracting transport-specific behavior and enabling the ESP32 to function as a central routing and translation node, supporting flexible integration for instruments, software tools, and interactive installations.


Representative Application Scenarios

The library can be applied to the following system-level scenarios:

Studio rack bridge: DIN-5 (legacy) synthesizer → ESP32 → DAW via Ethernet MIDI

Live / rehearsal systems: multiple ESP32 nodes connected through ESP-NOW, with MIDI forwarded to FOH via USB or network transport

Custom MIDI controllers: sensor/knob/fader inputs mapped to MIDI Notes/CC and transmitted via USB, BLE, or network

Media art and software integration: OSC ↔ MIDI bridging for environments such as Max/MSP or TouchOSC


System Overview

A common deployment model aggregates MIDI input sources (USB/BLE/DIN-5) at the ESP32 and forwards MIDI as network MIDI (AppleMIDI / RTP-MIDI) over wired Ethernet using the W5500.

[USB MIDI Keyboard]   [BLE MIDI (iPhone/iPad)]   [DIN-5 Synth]
        \                     |                    /
         \                    |                   /
          ----->          [ESP32_Host_MIDI]  -----
                          (routing / filtering / bridging)
                                   |
                              SPI  |
                                   v
                                 [W5500]
                                   |
                                Ethernet
                                   |
                                [Mac / DAW]
                      (Audio MIDI Setup - Network MIDI)

Ethernet Implementation and Rationale for W5500

The project includes a wired Ethernet MIDI configuration (“Ethernet-MIDI”). In this approach, the ESP32 interfaces with a W5500 SPI Ethernet module and runs AppleMIDI (RTP-MIDI) over wired Ethernet.

Implementation summary

Hardware: ESP32 + W5500 (SPI) module + RJ-45

Software: Arduino Ethernet library + lathoub’s Arduino-AppleMIDI-Library (v3.x)

Ports: typically a consecutive UDP port pair, such as 5004 / 5005 (implementation-dependent and configurable)

Rationale for W5500

Hardwired TCP/IP with integrated MAC/PHY: W5500 integrates key network functions, allowing the MCU to remain focused on time-critical processing (e.g., MIDI handling, USB processing, input scanning) while maintaining a straightforward Ethernet interface.

Deterministic behavior on wired Ethernet: For timing-sensitive MIDI workloads, wired Ethernet often provides more stable latency and jitter characteristics than Wi-Fi in studio and installation environments, which can improve timing consistency.


Protocol Overview: AppleMIDI vs RTP-MIDI (Including Stack and Port Model)

Although the terms are often used together, AppleMIDI and RTP-MIDI represent different functional layers.

1) AppleMIDI vs RTP-MIDI

RTP-MIDI specifies how MIDI messages are encapsulated within RTP packets—i.e., the payload format used to transport Note/CC/etc. over RTP/UDP.

AppleMIDI provides the session management and synchronization framework used by macOS/iOS Network MIDI, including session invitation/acceptance, connection lifecycle control, clock synchronization, and feedback mechanisms required for stable real-time operation.

In practical terms:

RTP-MIDI defines the data representation on the wire.

AppleMIDI defines the session and synchronization behavior required to operate that stream reliably.

2) Protocol Stack (Layer View)

A typical network MIDI data path can be described as:

MIDI messagesRTP-MIDI payloadRTPUDPIPEthernet (wired, via W5500)

In this model, W5500 provides the Ethernet/IP/UDP foundation, while AppleMIDI and RTP-MIDI operate at higher layers.


3) Two-Port Model (Control / Data)

AppleMIDI commonly uses two consecutive UDP ports (N and N+1):

Control Port (e.g., 5004)
Handles session control traffic such as invitation/accept/deny, session teardown, and related feedback/control messages.

Data Port (e.g., 5005)
Carries the RTP-MIDI stream (Notes, CC, etc.), and timing-related exchanges are also performed on the data side.

This separation improves operational clarity and helps maintain stability for time-critical streaming by isolating session control traffic from the data stream.

Operational note: In NAT/firewall environments, the port pair must be considered as a unit; opening or forwarding only one port is typically insufficient.

 

Q&A (Answer Engine Optimization)

Q1. What is ESP32_Host_MIDI?

ESP32_Host_MIDI is an open-source library that turns an ESP32 into a multi-transport MIDI hub. It unifies USB Host, BLE MIDI, Wi-Fi RTP-MIDI, OSC, DIN-5 (UART), ESP-NOW, USB Device, and MIDI 2.0/UMP over UDP behind a single API.

Q2. What can you build with ESP32_Host_MIDI?

You can build MIDI routers, bridges, and custom controllers that connect multiple MIDI interfaces in one device. Typical builds include studio Ethernet MIDI bridges, live rig routing nodes, sensor-based controllers, and OSC↔MIDI gateways.

Q3. What is RTP-MIDI?

RTP-MIDI is a method for carrying MIDI messages inside RTP packets over UDP/IP networks. It defines how Note/CC and other MIDI events are packed and transmitted on a network link.

Q4. What is AppleMIDI?

AppleMIDI is the session and synchronization layer used by macOS/iOS Network MIDI for RTP-MIDI streams. It manages session invitation/accept/teardown, clock synchronization, and feedback so the RTP-MIDI stream operates reliably in real time.

Q5. What is the difference between AppleMIDI and RTP-MIDI?

RTP-MIDI defines the data format, while AppleMIDI defines the session and synchronization behavior. In practice, RTP-MIDI describes how MIDI is packaged, and AppleMIDI describes how devices connect, maintain timing, and manage the stream.

Documents
Comments Write