SipServer
A lightweight SIP registrar/proxy server running on ESP32-S3, with W5500 Ethernet/PoE support for building small wired VoIP networks.
Thumbnail Image: AI-generated image
Pocket Dial / SipServer — A Compact SIP Phone Server with ESP32-S3 and W5500 Ethernet
Recommended Components
- WIZnet W5500
- Waveshare ESP32-S3-ETH + PoE
- ESP32-S3
- SIP softphone / IP phone
- Ethernet switch / router
- ESP-IDF or Arduino IDE
- FreeRTOS / lwIP
PROJECT DESCRIPTION
📌 Overview
The pocket-dial repository implements a lightweight SIP registrar/proxy server called SipServer. According to the README, it can be built for Linux, Windows, and ESP32-S3, and on ESP32-S3 it can run either as a standalone Wi-Fi SoftAP server or through a W5500 Ethernet path.
This server allows SIP clients to register themselves using REGISTER, then exchange SIP signaling messages such as INVITE, RINGING, 200 OK, ACK, and BYE. In other words, it is closer to a call signaling server than a VoIP media server that directly handles voice data.
The original README focuses mainly on ESP32-S3 Wi-Fi SoftAP mode, but the repository also includes a W5500 Ethernet / PoE server path. According to the changelog, SipServerETH.ino and main/esp_main_eth.cpp were added so the server can run on a Waveshare ESP32-S3-ETH + PoE board over W5500 Ethernet.
📌 What is a SIP Server?
AI-generated image
A SIP Server is, in simple terms, a telephone operator for internet phones.
Just as old telephone operators helped connect one caller to another, a SIP Server manages the process that allows internet phones or softphones to find each other and start a call.
For example, imagine an office with phones using extension numbers such as 1001, 1002, and 1003. Each phone first registers itself with the SIP Server. The server then remembers something like, “extension 1001 is currently available at this IP address.”
When extension 1001 calls extension 1002, the SIP Server handles the connection process:
- It finds where extension
1002is located. - It tells
1002that someone is calling. - If the other side answers, it passes the call-start signals between both sides.
- When the call ends, it passes the call-termination signal.
The important point is that a SIP Server usually does not carry the voice itself. It manages the process of starting, accepting, and ending calls. Once the call is connected, the actual voice data can flow directly between the two phones.
More technically, SIP (Session Initiation Protocol) is an application-layer signaling protocol used to create, modify, and terminate internet voice, video, and multimedia sessions. SIP can use proxy servers to route requests to a user’s current location, while registration lets users tell the server where they are currently reachable. (datatracker.ietf.org)
SIP calls often use SDP (Session Description Protocol) as well. If SIP is responsible for “how to start and end the call,” SDP describes “what media will be used and which address and port should be used for that media.” SDP is a format for describing media details and transport addresses needed for multimedia sessions. (datatracker.ietf.org)
SIP Servers are commonly used in:
- Office or school extension phone systems
- Internet phone services
- IP phone and softphone networks
- Hotel, hospital, and office internal phone systems
- Intercom systems or temporary field communication networks
- Experimental VoIP networks
This project is a compact implementation of that SIP Server idea. The ESP32-S3 acts like a small telephone operator, allowing multiple SIP phones to register and call each other. With the W5500 Ethernet version, this small SIP Server can run over a wired LAN or PoE network instead of relying only on Wi-Fi.
📌 What This Project Does
The core function of this project is SIP endpoint registration and call signaling proxying.
A SIP client sends a REGISTER request to the server, and the server stores the extension number and client address in an in-memory registry. Later, when one client calls another extension, the server forwards the INVITE to the destination client and routes signaling messages such as RINGING, 200 OK, ACK, and BYE between both sides. The README explains that the server handles REGISTER, INVITE, BYE, and other SIP messages over UDP, along with SDP media negotiation.
The actual voice media is not relayed by the server. In the successful call flow shown in the README, the SIP server handles call setup, while RTP media flows directly between the caller and callee. This allows a small MCU to focus on call signaling without having to process real-time voice streams.
On ESP32-S3, the project has two operating modes. The first is Wi-Fi SoftAP mode, where the ESP32-S3 creates an open AP named esp32-sipserver and runs the SIP server at 192.168.4.1:5060. The second is W5500 Ethernet mode, where the board obtains an IP address through DHCP or static fallback and binds the SIP server to port 5060 on that Ethernet address.
📌 Features
SIP registration server
When a client sends a REGISTER request, the server stores its extension number and network address. The implementation also updates the address on re-registration, which helps when a client’s address changes.
Call signaling proxy
The server handles SIP methods and responses such as INVITE, ACK, BYE, CANCEL, 180 Ringing, 200 OK, 486 Busy Here, 480 Temporarily Unavailable, and 487 Request Terminated. Requests are dispatched through the RequestsHandler table.
SDP parsing
SIP messages containing an application/sdp body are parsed as SipSdpMessage. The server extracts RTP port information from SDP and records it in the call session, while the media stream itself remains direct between endpoints.
Cross-platform codebase
The project uses POSIX sockets on Linux, Winsock2 on Windows, and lwIP sockets on ESP32. UdpServer separates Linux/ESP_PLATFORM and Windows socket handling through conditional compilation.
W5500 Ethernet / PoE support
The repository includes W5500 Ethernet / PoE server support. Both an Arduino sketch and an ESP-IDF entry point are provided, targeting the Waveshare ESP32-S3-ETH + PoE board.
📌 System Architecture
AI-generated image
The architecture is relatively simple. At the bottom is UdpServer, a UDP socket abstraction. It receives UDP datagrams and passes the raw payload to the SIP server callback. When sending, it transmits UDP packets back to the destination sockaddr_in.
Above that is the SIP message parsing layer. SipMessageFactory checks whether the payload contains application/sdp and creates either a regular SipMessage or a SipSdpMessage. The parser then extracts SIP headers, From/To numbers, Call-ID, CSeq, Contact, and SDP media lines.
The main logic is handled by RequestsHandler. This object manages registered clients and active sessions. REGISTER updates the client registry, while INVITE checks whether the caller and callee are registered and then creates a session. BYE, CANCEL, BUSY, and UNAVAILABLE update session state and forward messages to the other endpoint.
In the ESP32-S3 W5500 Ethernet path, the board initializes the W5500, obtains an IP address, and starts SipServer on that IP address and SIP port 5060. In other words, once network bring-up is complete, the existing SIP engine is reused as-is.
📌 Role and Application of the WIZnet's Chip
WIZnet chip used: W5500
In this project, the W5500 allows the ESP32-S3 to operate as a wired Ethernet-based SIP server. In Wi-Fi SoftAP mode, the ESP32-S3 creates its own access point. In W5500 Ethernet mode, the board connects to an existing LAN or PoE-based network so SIP clients on that network can reach the server.
The Arduino entry point, SipServerETH.ino, defines the W5500 SPI pin mapping and initializes the chip with ETH.begin(ETH_PHY_W5500, ...). It waits for DHCP, applies a static IP fallback if needed, and then binds the SIP server to port 5060 using ETH.localIP().
The ESP-IDF entry point, esp_main_eth.cpp, shows a lower-level bring-up sequence. It initializes the SPI bus, creates the W5500 MAC and PHY using ETH_W5500_DEFAULT_CONFIG, esp_eth_mac_new_w5500, and esp_eth_phy_new_w5500, installs the Ethernet driver, attaches it to esp_netif, starts Ethernet, waits for DHCP, and then runs SipServer in a FreeRTOS task.
The network stack detail is important. This project’s W5500 Ethernet path is not a TOE design that directly uses the W5500 hardwired socket API. Instead, W5500 is attached through ESP32’s esp_eth / esp_netif / lwIP network interface. main/CMakeLists.txt also shows that the Ethernet build requires esp_eth, driver, lwip, and esp_netif components.
So, in this project, W5500 provides the wired Ethernet interface for the SIP server, while UDP/SIP socket handling runs on the ESP32-side lwIP socket API. The W5500 improves the physical network reliability and PoE deployment potential of the call signaling appliance.
📌 Implementation Notes
The W5500 Ethernet version targets the Waveshare ESP32-S3-ETH + PoE module. Both the changelog and source headers refer to this board. The default SPI pin mapping is SCLK=12, MISO=13, MOSI=11, CS=10, INT=14, and RST=-1.
In the ESP-IDF build, the SIP_TRANSPORT variable selects either the eth or wifi path. The default is eth, which uses esp_main_eth.cpp; the Wi-Fi path uses the original esp_main.cpp. This lets the same SIP engine run through either Wi-Fi SoftAP or W5500 Ethernet.
The SIP server itself is mostly transport-independent. SipServer receives an IP address and port, opens a UDP server, and connects the message factory with the request handler. Whether packets arrive through Wi-Fi or Ethernet, the upper SIP logic is reused.
The README mainly explains ESP32-S3 SoftAP mode, but the current repository also includes W5500 Ethernet / PoE support. Therefore, the project should be understood as supporting both a Wi-Fi standalone SIP server path and a W5500 wired SIP server path.
📌 Market & Application Value
AI-generated image
The most direct application is a small standalone VoIP/SIP network. For example, several SIP softphones or IP phones can register to the ESP32-S3 server and call each other without requiring a large PBX system.
Possible use cases include:
- Small lab VoIP test networks
- Educational SIP signaling practice
- Temporary field voice-network prototypes
- PoE-powered compact SIP server appliances
- Local calling systems for network-failure scenarios
- ESP32-S3-based VoIP signaling research
The W5500 + PoE path is especially meaningful. A SIP server is usually expected to stay online, and running it from a wired LAN with PoE power can simplify installation and maintenance. Compared with Wi-Fi SoftAP mode, Ethernet mode fits more naturally into an existing LAN where IP phones or softphones are already connected.
However, this project is not a full PBX. According to the README scope, features such as user authentication, account management, voicemail, PSTN integration, advanced NAT traversal, and RTP relay are not its main focus. It is best understood as a lightweight SIP registrar/proxy.
📌 External Indicators
The README describes build targets for Linux, Windows, ESP32-S3 with ESP-IDF, and ESP32-S3 with Arduino IDE. This means the server can first be tested on a desktop, then moved to the embedded ESP32-S3 target.
The changelog documents ESP32-S3 support, Arduino IDE support, dual-build CMake, platform abstraction, thread safety, and W5500 Ethernet / PoE support. The W5500 Ethernet path is explicitly listed as a separate addition, with both ESP-IDF and Arduino entry points.
The README also includes a SIP method/response table and call-flow diagrams, making it easier to understand the signaling behavior beyond the code itself. Successful calls, cancelled calls, and destination-not-found cases are each documented.
📌 WIZnet Strategic Value
This project shows W5500 being used not just for generic IoT data transfer, but as the Ethernet interface for a VoIP signaling appliance. Through W5500, the ESP32-S3 joins a wired LAN and runs a UDP-based SIP registrar/proxy on top of it.
When combined with a PoE board, the result can be a small SIP server where both power and networking are handled by a single cable. That fits well with maker projects, educational network equipment, field communication experiments, and compact embedded server prototypes.
Technically, this is not an example that directly uses the W5500 hardwired socket engine. Its value is that it implements a SIP server on top of the ESP-IDF Ethernet driver and lwIP socket path. W5500 provides a stable Ethernet network interface, while the ESP32-S3 software stack handles SIP processing.
📌 Summary
pocket-dial is a lightweight SIP registrar/proxy server that can run on Linux, Windows, and ESP32-S3. It supports SIP client registration, INVITE forwarding, SDP-based RTP port extraction, session state management, and BYE/CANCEL handling.
The key WIZnet-related feature is W5500 Ethernet / PoE support. On a Waveshare ESP32-S3-ETH + PoE board, the project initializes W5500, obtains an IP address through DHCP or static fallback, and runs the SIP server on UDP port 5060.
The W5500 path does not directly use the TOE/socket-library model. It uses the ESP-IDF esp_eth / esp_netif / lwIP Ethernet-interface path, so W5500 provides the wired network foundation while SIP/UDP socket handling is performed by the ESP32-S3 software stack.
📌 FAQ
Q1. What is pocket-dial?
According to the README, it is a lightweight SIP registrar/proxy server project called SipServer. It handles SIP client registration and call signaling proxying.
Q2. What is a SIP Server in simple terms?
It is like a telephone operator for internet phones. It remembers where each phone or softphone is located and manages the process of starting and ending calls.
Q3. Does the server handle the actual voice data?
No. The server proxies SIP signaling, while the README call flow shows RTP media flowing directly between caller and callee.
Q4. Which WIZnet chip is used?
The project uses W5500. W5500 Ethernet / PoE support was added for the Waveshare ESP32-S3-ETH + PoE board.
Q5. What does W5500 do here?
It provides the Ethernet interface that allows the ESP32-S3 to run as a SIP server on a wired LAN. The server binds to UDP port 5060 on the assigned IP address.
Q6. Does this project directly use the W5500 TOE/socket engine?
No. The ESP-IDF path uses esp_eth, esp_netif, and lwIP components. W5500 is attached as an Ethernet interface, while SIP/UDP socket handling is done by the ESP32 software stack.
Q7. Does it support both Wi-Fi and Ethernet?
Yes. The project has separate Wi-Fi SoftAP and W5500 Ethernet entry points. In the ESP-IDF build, SIP_TRANSPORT can select either wifi or eth.
Q8. What is this project useful for?
It is useful for small SIP experiments, educational VoIP signaling practice, PoE-powered compact SIP servers, and temporary local calling network prototypes.
