Wiznet makers

Benjamin

Published March 03, 2026 ©

103 UCC

11 WCC

8 VAR

0 Contests

0 Followers

1 Following

Original Link

Async DMA-Driven W5500 Driver for STM32 + FreeRTOS with BSD Socket API

Interrupt-driven W5500 Ethernet driver using SPI DMA command queue, exposing BSD-style sockets and DNS/NTP/WebSocket clients on STM32 + FreeRTOS.

COMPONENTS Hardware components

WIZnet - W5500

x 1


STMicroelectronics - STM32F103RCT6

x 1

Software Apps and online services

WIZnet - WIZnet io Library

x 1


PROJECT DESCRIPTION

A Custom, Interrupt-Driven W5500 Driver Built for Real-Time STM32 Applications

Most developers reaching for the WIZnet W5500 rely on the official ioLibrary — a polling-based driver that works well for simple projects but becomes a bottleneck when an RTOS-based system demands non-blocking, high-throughput Ethernet. This project takes a fundamentally different approach: a fully asynchronous, DMA-driven W5500 driver that serializes all SPI access through a command queue and never blocks an RTOS task during I/O.

The result is a standalone C/C++ library (~5,600 lines) that exposes a BSD-style socket API (socket, connect, send, recv, sendto, recvfrom, disconnect, close) along with four application-layer clients — DNS resolver, NTP time sync, WebSocket, and a UDP convenience wrapper — all built on top of the asynchronous core.

The Command Queue: Heart of the Architecture

The central design element is a circular command queue (1,000 slots) that serializes every SPI transaction to the W5500. The application layer never touches SPI directly; instead, it pushes command structs onto the queue. A single running_cmd variable tracks the in-flight DMA transfer. When a transfer completes, the DMA callback processes the result and immediately pops the next command:

Commands enter from three sources. RTOS tasks push to the back at normal priority, protected by taskENTER_CRITICAL(). The W5500 INT pin (via GPIO EXTI) pushes a READ_SIR command to the front, ensuring interrupt events are handled before queued bulk transfers. A hardware timer ISR running at ~50 Hz periodically checks TX free-space registers and re-reads the interrupt register as a safety net against missed INT edges.

From Src/w5500.cpp:L253-L268:

void wiznetInterruptCallback(void)
{
    uint32_t isrm = taskENTER_CRITICAL_FROM_ISR();
    command_t cmd;
    generateGetRegCmd(&cmd, 0xFF, W5500_SIR, &common_regs.SIR, 1);
    cmd.cmd_type = READ_SIR;
    // Push to front of queue for immediate processing
    if (!queuePushFront(&command_queue, cmd)) {
        enqueueFailsInISR++;
    }
    taskEXIT_CRITICAL_FROM_ISR(isrm);
}

Zero-Copy DMA Transfers and Buffer Management

Every SPI transaction uses STM32 HAL DMA (HAL_SPI_Transmit_DMA, HAL_SPI_TransmitReceive_DMA), meaning the CPU is free during transfers. Each socket has its own ring-buffer pair (TX and RX) in STM32 RAM, managed by segment descriptor queues. The getTXBufferIndex() and getRXBufferIndex() functions implement wraparound-aware allocation:

From Src/w5500.cpp:L338-L373:

int16_t getTXBufferIndex(socket_t* socket, uint16_t len)
{
    if (queueIsEmpty(&socket->tx_buf_queue)) {
        return (len <= socket->tx_buf_len) ? 0 : -1;
    }
    int16_t queue_start = queueFront(&socket->tx_buf_queue)->start_index;
    int16_t queue_end = queueBack(&socket->tx_buf_queue)->end_index;

    if (queue_end >= queue_start) {
        int16_t space_at_end = socket->tx_buf_len - queue_end;
        int16_t space_at_beginning = queue_start;
        if (len <= space_at_end) return queue_end;
        else if (len <= space_at_beginning) return 0;
        else return -1;
    } else {
        if (len <= queue_start - queue_end) return queue_end;
        else return -1;
    }
}

This approach avoids dynamic memory allocation entirely — critical for deterministic real-time behavior.

 

Why W5500 Fits This Architecture

The W5500's hardware TCP/IP stack is what makes this async architecture viable. The chip handles retransmission, ARP, ICMP, and TCP state management internally, so the STM32 only needs to move data in and out of the chip's 32 KB buffer via SPI. This offloads protocol processing that would otherwise consume significant MCU cycles and RAM.

The 8 independent hardware sockets map directly to the driver's sockets[8] array, each with separate TX/RX ring buffers. The W5500's SIR (Socket Interrupt Register) bitmap allows the driver to identify which sockets need attention in a single register read — the READ_SIR → READ_SOC two-step sequence handles this efficiently.

A notable detail is the CR (Command Register) busy-check in dmaTXCompleteCallback() (Src/w5500.cpp:L444-L479): before writing a socket command, the driver checks if the previous command has been processed by the W5500. If the CR register is still non-zero, it re-enqueues the write and reads CR again. This prevents command collisions that could silently drop socket operations.

Application-Layer Clients

Four protocol clients are built on top of the socket layer, each demonstrating a different use case for the asynchronous core:

ClientFileLinesDescription
DNSdns_client.c620Full resolver with A record and CNAME following. Manually constructs/parses DNS packets with label compression pointer support. Uses UDP sockets.
NTPntp_client.c491RFC 5905 v4 compliant time sync. Handles epoch offset conversion and validates server stratum/version in responses.
WebSocketwebsocket_client.c560Client-side WebSocket over TCP — HTTP Upgrade handshake with Sec-WebSocket-Key, binary frame construction with masking, ping/pong keepalive, and close handshake. Includes its own Base64 encoder.
UDPudp_client.c145Thin convenience wrapper around sendto() / recvfrom().

All clients use the non-blocking socket API internally. DNS and NTP operate over UDP sockets, while WebSocket runs over a TCP connection. The WebSocket client requires the user to provide an external ws_rand() function for frame masking keys.

Current Limitations

This project is a driver library, not a turnkey application. Several gaps are worth noting:

  • No build system — No Makefile, CMakeLists.txt, or platformio.ini. Users must manually integrate the source and header files into their own STM32 project.
  • STM32H7-specific timer config — The timer prescaler is hardcoded for a 275 MHz clock (WIZNET_TIM_CLK 275000000 in w5500.cpp:L56), which corresponds to the STM32H7 series. Other families (F4, L4, etc.) require recalculation.
  • No TCP server modeconnect() (client) is implemented, but there is no listen() or accept(). The SOCKET_LISTEN status exists in the enum but is not wired to any functionality.
  • No usage examples — There is no main.c or sample application demonstrating initialization, socket setup, or protocol client usage.
  • No top-level license — Only w5500_macros.h carries a license header (WIZnet BSD). The rest of the codebase has no explicit license.
  • AI-assisted componentsqueue.hpp and some client headers list "AI Assistant" as author, indicating portions were generated with AI assistance.

FAQ

Q1. Which STM32 families are compatible? Any STM32 with SPI + DMA and FreeRTOS (CMSIS-RTOS) support. The only family-specific code is the timer prescaler in setWiznetHardware(), hardcoded for ~275 MHz. For STM32F4 (168 MHz) or similar, recalculate WIZNET_TIM_CLK and the PSC/ARR values accordingly.

Q2. How does this differ from WIZnet's official ioLibrary? The ioLibrary uses a polling model where SPI transactions block until complete. This driver queues all SPI access through a DMA command pipeline — no RTOS task ever blocks on I/O. The tradeoff is higher RAM usage (command queue + per-socket ring buffers) and more complex initialization.

Q3. What happens when the command queue overflows? Task-context functions return SOCKERR_QUEUE_FULL (-3). In ISR context, the command is silently dropped and the enqueueFailsInISR counter increments. The 1,000-slot queue is generously sized for typical use, but sustained high-throughput across many active sockets could theoretically saturate it.

Q4. Can this run without FreeRTOS? Not without modification. The driver depends on taskENTER_CRITICAL() / taskEXIT_CRITICAL() for thread safety and osDelay() / osKernelGetTickCount() for blocking waits. A bare-metal port would need equivalent primitives replacing these calls.

Q5. Is TCP server (listen) mode supported? Not currently. The socket API only implements client-side connect(). Adding server mode would require implementing listen(), accept(), and handling the SOCK_LISTEN → SOCK_ESTABLISHED state transition via interrupts.

Q6. How much RAM does the driver consume? The command queue alone occupies 1000 × sizeof(command_t) — roughly 15–17 KB depending on alignment. Each socket requires user-provided TX/RX ring buffers (at least W5500 socket buffer size + 3 bytes each). A minimal single-socket configuration needs ~20 KB; a full 8-socket setup with 2 KB buffers per socket would require 50+ KB total.

Documents
Comments Write