Industrial Offline Cache Gateway with WIZnet W5500: Why MQTT QoS2 Alone Cannot Prevent Data Loss
네트워크 장애와 재전송 간의 엔지니어링 갈등: 산업용 사물 인터넷에서의 신뢰성 딜레마 및 실질적인 해결책
1) Introduction
Industrial IoT gateways are often deployed in environments where network quality is unpredictable: factories with heavy electromagnetic noise, mines, oil fields, ships, outdoor substations, and remote pumping stations. In these environments, a gateway must continue collecting PLC, sensor, meter, and alarm data even when the cloud connection is temporarily unavailable.
A common assumption is that MQTT QoS2 solves this problem completely. MQTT QoS2 is defined as an “exactly once” delivery level, while QoS0 is “at most once” and QoS1 is “at least once.” However, QoS is a protocol-level delivery mechanism between MQTT endpoints. It does not automatically solve every engineering problem inside an industrial gateway.
The reference article highlights three practical issues:
- Embedded gateways have limited RAM and may run out of memory when unsent messages accumulate.
- Cloud or broker-side sessions can expire or behave differently depending on service configuration.
- Industrial data is time-series data, so out-of-order retransmission can be as dangerous as data loss.
Therefore, this project designs a more reliable industrial collection gateway using WIZnet W5500 Ethernet, local persistent storage, message sequence numbers, timestamping, and controlled retransmission. The goal is not to replace MQTT QoS, but to combine MQTT with an application-level offline cache.
2) Required Components
Core Hardware
| Component | Recommended Part | Purpose |
|---|---|---|
| Ethernet Controller | WIZnet W5500 / W5500 module | Stable wired Ethernet connection |
| MCU | RP2040, STM32H7, ESP32, or similar | Sensor acquisition and cache control |
| Storage | microSD, eMMC, FRAM + SD | Offline message persistence |
| RTC | DS3231 or MCU RTC + NTP sync | Accurate timestamps |
| Sensor / PLC Input | Modbus RTU, ADC, UART sensor, digital input | Industrial data source |
| Power Supply | 5V or 12V industrial power module | Gateway power |
| Protection | TVS diode, fuse, isolated RS485 transceiver | Field reliability |
Recommended WIZnet-Based Options
A W5500-based board is suitable because W5500 integrates a hardwired TCP/IP stack, 10/100 Ethernet MAC/PHY, SPI host interface, 8 independent sockets, and 32KB internal TX/RX buffer. WIZnet documents W5500 as a hardwired TCP/IP Ethernet controller with SPI up to 80MHz.
Possible board configurations:
| Option | Description |
|---|---|
| W5500-EVB-Pico | RP2040 + W5500 development platform |
| MCU + W5500 module | Flexible design for STM32, ESP32, RP2040 |
| Custom gateway PCB | W5500 + MCU + eMMC + RS485 + isolated power |
3) Hardware Setup
The gateway consists of four functional layers.
Layer 1: Data Acquisition
The MCU collects data from sensors, PLCs, meters, or controllers. Typical interfaces include:
- UART / RS485 Modbus RTU
- SPI sensors
- I2C sensors
- ADC current loop input through external ADC
- Digital input for alarm events
Each data sample should be converted into a compact message format before storage.
Example payload:
{
"seq": 10293,
"ts": 1713600205123,
"dev": "pump-01",
"type": "vibration",
"value": 0.83,
"unit": "g"
}For constrained gateways, keep each payload small. The reference article recommends limiting message payload size and adding a sequence identifier as part of the business-layer reliability design.
Layer 2: Network Interface
The MCU communicates with W5500 through SPI. W5500 handles Ethernet MAC/PHY and TCP/IP functions internally, reducing MCU-side TCP/IP workload. This is useful for industrial applications where deterministic behavior and stable wired connectivity are preferred.
Layer 3: Local Offline Cache
When the MQTT broker is reachable, the gateway publishes data normally.
When the broker is unreachable, the gateway writes messages to local storage.
Recommended cache fields:
| Field | Description |
|---|---|
| seq | Monotonic sequence number |
| timestamp | Acquisition time |
| topic | MQTT topic |
| qos | MQTT QoS level |
| payload | Sensor data |
| retry_count | Number of resend attempts |
| status | pending, sent, failed |
| crc | Payload integrity check |
Layer 4: Controlled Retransmission
When the network recovers, the gateway does not blindly flood the broker. Instead, it resends cached messages in sequence order and rate-limits publishing.
Recommended rules:
- Send oldest data first.
- Preserve timestamp and sequence number.
- Do not overwrite new real-time alarms.
- Use exponential backoff after failed publish.
- Use application-level ACK if the cloud platform supports it.
- Delete local cache only after publish success or server acknowledgment.
4) Interface Explanation
SPI: MCU to W5500
W5500 uses SPI as the host interface. SPI is simple and widely supported by embedded MCUs. A typical connection requires:
- SCK
- MOSI
- MISO
- CS
- RESET
- INT, optional but recommended
W5500 supports high-speed SPI and provides hardwired TCP/IP protocols including TCP, UDP, ICMP, IPv4, ARP, IGMP, and PPPoE.
SPI: MCU to microSD
If a microSD card is used, it can share the same SPI bus with W5500 as long as each device has a separate CS pin. Only one device should be selected at a time.
Important design notes:
- Pull CS pins high during boot.
- Use short SPI traces.
- Add proper decoupling capacitors.
- Avoid sharing SPI with noisy external cables.
- Use industrial-grade microSD or eMMC for long-term operation.
UART / RS485: PLC or Sensor Input
For Modbus RTU or industrial meters, use an isolated RS485 transceiver. The gateway reads register values, timestamps them, and stores them before publishing.
MQTT over TCP
MQTT runs over a TCP connection between the gateway and broker. QoS2 adds protocol-level handshaking, but broker support and cloud-specific behavior must be checked. For example, AWS IoT Core documents MQTT support with differences from the MQTT specification and states that certain QoS2 packet types are not supported.
This is why the gateway should not assume that QoS2 alone guarantees end-to-end industrial data persistence.
5) Wiring Table
Example wiring for an MCU + W5500 + microSD design:
| MCU Pin | W5500 Pin | microSD Pin | Description |
|---|---|---|---|
| 3.3V | VCC | VCC | 3.3V power |
| GND | GND | GND | Common ground |
| SPI SCK | SCK | SCK | SPI clock |
| SPI MOSI | MOSI | MOSI | MCU to device |
| SPI MISO | MISO | MISO | Device to MCU |
| GPIO17 | CS | - | W5500 chip select |
| GPIO5 | - | CS | microSD chip select |
| GPIO20 | RESET | - | W5500 reset |
| GPIO21 | INT | - | W5500 interrupt, optional |
Recommended reset behavior:
- Hold W5500 RESET low during MCU boot.
- Release RESET after power is stable.
- Initialize SPI.
- Initialize Ethernet.
- Initialize storage.
- Connect MQTT.
6) Software Environment Setup
Arduino-Based Prototype
Install the following libraries:
- Ethernet
- ArduinoMqttClient
- SD
- SPI
ArduinoMqttClient provides beginMessage() with retain and QoS parameters, and it can work with a generic Client implementation such as EthernetClient.
MQTT Broker
For local testing:
- Mosquitto broker on PC, Raspberry Pi, or industrial edge PC
- MQTTX or mosquitto_sub for monitoring
- Static IP network recommended
Example broker topics:
factory/line1/gateway01/telemetry
factory/line1/gateway01/alarm
factory/line1/gateway01/status
factory/line1/gateway01/ackCache File Format
For MCU prototypes, a line-based JSON cache is easy to inspect:
{"seq":1,"ts":1713600001000,"topic":"factory/line1/gateway01/telemetry","qos":1,"payload":{"temp":25.3},"sent":false}
{"seq":2,"ts":1713600002000,"topic":"factory/line1/gateway01/telemetry","qos":1,"payload":{"temp":25.4},"sent":false}For production gateways, SQLite or a fixed-size binary ring buffer is recommended.
7) Full Code Examples
The following Arduino-style example demonstrates the core logic:
- W5500 Ethernet initialization
- MQTT connection
- Sensor message generation
- Offline cache to SD card
- Reconnection
- Sequential resend
This is a reference implementation for prototyping. Pin numbers should be adjusted for the selected board.
#include <SPI.h>
#include <Ethernet.h>
#include <ArduinoMqttClient.h>
#include <SD.h>
#define PIN_W5500_CS 17
#define PIN_SD_CS 5
#define SENSOR_INTERVAL_MS 5000
#define RESEND_INTERVAL_MS 1000
#define MAX_PAYLOAD_SIZE 256
byte mac[] = { 0x02, 0x08, 0xDC, 0x55, 0x00, 0x01 };
IPAddress brokerIp(192, 168, 1, 10);
const int brokerPort = 1883;
const char mqttClientId[] = "wiznet-offline-gateway-01";
const char telemetryTopic[] = "factory/line1/gateway01/telemetry";
const char statusTopic[] = "factory/line1/gateway01/status";
EthernetClient ethClient;
MqttClient mqttClient(ethClient);
unsigned long lastSensorMs = 0;
unsigned long lastResendMs = 0;
uint32_t seqNo = 1;
const char cacheFile[] = "/cache.txt";
const char tempFile[] = "/cache_tmp.txt";
bool mqttOnline = false;
unsigned long nowMs() {
return millis();
}
float readSensorValue() {
// Replace with Modbus, ADC, I2C, or real sensor code.
static float value = 25.0;
value += 0.1;
if (value > 30.0) value = 25.0;
return value;
}
void selectNoneOnSpiBus() {
digitalWrite(PIN_W5500_CS, HIGH);
digitalWrite(PIN_SD_CS, HIGH);
}
bool initStorage() {
selectNoneOnSpiBus();
if (!SD.begin(PIN_SD_CS)) {
Serial.println("SD init failed");
return false;
}
if (!SD.exists(cacheFile)) {
File f = SD.open(cacheFile, FILE_WRITE);
if (f) f.close();
}
Serial.println("SD init OK");
return true;
}
bool initEthernet() {
selectNoneOnSpiBus();
Ethernet.init(PIN_W5500_CS);
Ethernet.begin(mac);
delay(1000);
IPAddress ip = Ethernet.localIP();
Serial.print("Ethernet IP: ");
Serial.println(ip);
if (ip == IPAddress(0, 0, 0, 0)) {
Serial.println("Ethernet DHCP failed");
return false;
}
return true;
}
bool connectMqtt() {
if (mqttClient.connected()) {
mqttOnline = true;
return true;
}
mqttClient.setId(mqttClientId);
mqttClient.setKeepAliveInterval(60);
Serial.print("Connecting MQTT... ");
if (!mqttClient.connect(brokerIp, brokerPort)) {
Serial.print("failed, error=");
Serial.println(mqttClient.connectError());
mqttOnline = false;
return false;
}
Serial.println("connected");
mqttOnline = true;
mqttClient.beginMessage(statusTopic, false, 1);
mqttClient.print("{\"status\":\"online\"}");
mqttClient.endMessage();
return true;
}
bool publishMqtt(const char* topic, const char* payload, uint8_t qos) {
if (!mqttClient.connected()) {
mqttOnline = false;
return false;
}
mqttClient.beginMessage(topic, false, qos);
mqttClient.print(payload);
int result = mqttClient.endMessage();
mqttClient.poll();
if (result == 0) {
mqttOnline = false;
return false;
}
return true;
}
void appendCacheLine(const char* topic, const char* payload, uint8_t qos) {
File f = SD.open(cacheFile, FILE_WRITE);
if (!f) {
Serial.println("Failed to open cache file");
return;
}
f.print(seqNo);
f.print("|");
f.print(nowMs());
f.print("|");
f.print(qos);
f.print("|");
f.print(topic);
f.print("|");
f.println(payload);
f.close();
Serial.print("Cached seq=");
Serial.println(seqNo);
}
bool parseCacheLine(String line, uint32_t &seq, unsigned long &ts, uint8_t &qos, String &topic, String &payload) {
int p1 = line.indexOf('|');
int p2 = line.indexOf('|', p1 + 1);
int p3 = line.indexOf('|', p2 + 1);
int p4 = line.indexOf('|', p3 + 1);
if (p1 < 0 || p2 < 0 || p3 < 0 || p4 < 0) {
return false;
}
seq = line.substring(0, p1).toInt();
ts = line.substring(p1 + 1, p2).toInt();
qos = line.substring(p2 + 1, p3).toInt();
topic = line.substring(p3 + 1, p4);
payload = line.substring(p4 + 1);
return true;
}
void resendCachedMessages() {
if (!mqttClient.connected()) {
return;
}
if (!SD.exists(cacheFile)) {
return;
}
File in = SD.open(cacheFile, FILE_READ);
if (!in) {
Serial.println("Cache read failed");
return;
}
File out = SD.open(tempFile, FILE_WRITE);
if (!out) {
Serial.println("Temp file open failed");
in.close();
return;
}
bool stopResend = false;
while (in.available()) {
String line = in.readStringUntil('\n');
line.trim();
if (line.length() == 0) {
continue;
}
uint32_t cachedSeq;
unsigned long cachedTs;
uint8_t qos;
String topic;
String payload;
if (!parseCacheLine(line, cachedSeq, cachedTs, qos, topic, payload)) {
out.println(line);
continue;
}
if (!stopResend) {
bool ok = publishMqtt(topic.c_str(), payload.c_str(), qos);
if (ok) {
Serial.print("Resent seq=");
Serial.println(cachedSeq);
delay(50);
} else {
stopResend = true;
out.println(line);
}
} else {
out.println(line);
}
}
in.close();
out.close();
SD.remove(cacheFile);
SD.rename(tempFile, cacheFile);
}
void createAndSendSensorMessage() {
float value = readSensorValue();
char payload[MAX_PAYLOAD_SIZE];
snprintf(
payload,
sizeof(payload),
"{\"seq\":%lu,\"ts\":%lu,\"dev\":\"gateway01\",\"type\":\"temperature\",\"value\":%.2f,\"unit\":\"C\"}",
(unsigned long)seqNo,
nowMs(),
value
);
uint8_t qos = 1;
bool sent = false;
if (mqttClient.connected()) {
sent = publishMqtt(telemetryTopic, payload, qos);
}
if (!sent) {
appendCacheLine(telemetryTopic, payload, qos);
} else {
Serial.print("Published seq=");
Serial.println(seqNo);
}
seqNo++;
}
void setup() {
Serial.begin(115200);
delay(1000);
pinMode(PIN_W5500_CS, OUTPUT);
pinMode(PIN_SD_CS, OUTPUT);
selectNoneOnSpiBus();
Serial.println("WIZnet W5500 Offline Cache Gateway Start");
bool storageOk = initStorage();
bool ethOk = initEthernet();
if (!storageOk) {
Serial.println("Warning: offline cache unavailable");
}
if (ethOk) {
connectMqtt();
}
}
void loop() {
mqttClient.poll();
if (!mqttClient.connected()) {
connectMqtt();
}
unsigned long now = millis();
if (now - lastSensorMs >= SENSOR_INTERVAL_MS) {
lastSensorMs = now;
createAndSendSensorMessage();
}
if (now - lastResendMs >= RESEND_INTERVAL_MS) {
lastResendMs = now;
if (mqttClient.connected()) {
resendCachedMessages();
}
}
}Optional MQTT Test Subscriber
Run this on a PC to monitor incoming messages:
mosquitto_sub -h 192.168.1.10 -t "factory/line1/gateway01/#" -vOptional Network Failure Test
Disconnect the Ethernet cable for several minutes, then reconnect it. During the offline period, the gateway should append messages to cache.txt. After reconnection, it should resend cached messages in sequence order.
8) Testing Steps
Test 1: Normal Online Publish
- Connect W5500 Ethernet to the same network as the MQTT broker.
- Power on the gateway.
- Open serial monitor.
- Confirm DHCP IP address.
- Confirm MQTT connection.
- Subscribe to
factory/line1/gateway01/#. - Check that telemetry messages arrive every 5 seconds.
Expected result:
factory/line1/gateway01/telemetry {"seq":1,"ts":5012,"dev":"gateway01","type":"temperature","value":25.10,"unit":"C"}Test 2: Offline Cache
- Disconnect Ethernet cable.
- Keep the gateway running for 1 to 5 minutes.
- Confirm serial log shows cached sequence numbers.
- Inspect SD card if needed.
Expected result:
Cached seq=15
Cached seq=16
Cached seq=17Test 3: Reconnection and Resend
- Reconnect Ethernet cable.
- Wait for MQTT reconnection.
- Confirm cached messages are resent.
- Check subscriber output.
Expected result:
Resent seq=15
Resent seq=16
Resent seq=17
Published seq=18Test 4: Order Verification
The cloud or subscriber should verify that sequence numbers are monotonically increasing.
If the subscriber receives:
101, 102, 103, 104the resend order is correct.
If it receives:
101, 104, 102, 103the retransmission logic must be fixed.
Test 5: Long Offline Duration
Run a longer test:
| Offline Duration | Expected Result |
|---|---|
| 10 minutes | All data cached and resent |
| 1 hour | Cache file grows but system remains stable |
| 24 hours | Storage capacity and wear behavior should be reviewed |
| 72 hours | Production-level endurance test |
The reference article argues that industrial reliability should be designed as a multi-layer system rather than relying only on MQTT QoS2.
9) Troubleshooting
Problem: Ethernet does not get an IP address
Possible causes:
- DHCP server unavailable
- Wrong W5500 CS pin
- SPI wiring error
- W5500 reset pin floating
- LAN cable or switch issue
Fix:
- Try static IP.
- Confirm CS pin with
Ethernet.init(PIN_W5500_CS). - Check 3.3V power stability.
- Add pull-up to CS.
- Confirm link LED.
Problem: SD card fails to initialize
Possible causes:
- SD CS pin conflict with W5500 CS
- SD card not formatted as FAT32
- Poor-quality card
- Insufficient power during writes
Fix:
- Ensure W5500 CS is HIGH before SD access.
- Use separate CS pins.
- Use industrial-grade microSD.
- Add bulk capacitor near SD socket.
Problem: MQTT reconnects repeatedly
Possible causes:
- Broker IP or port incorrect
- Client ID conflict
- Keepalive too short
- Broker rejects authentication
- Network instability
Fix:
- Use unique MQTT client ID.
- Increase keepalive interval.
- Check broker logs.
- Test with a local Mosquitto broker first.
Problem: Cached messages are duplicated
Possible causes:
- Gateway deletes cache before server confirmation.
- MQTT QoS1 can deliver duplicates.
- Application does not deduplicate by sequence number.
Fix:
- Use
seqas an idempotency key. - Deduplicate on the server side.
- Delete cache only after publish success or application ACK.
Problem: Data arrives out of order
Possible causes:
- Real-time messages and cached messages are published through separate paths.
- Resend queue does not use FIFO order.
- Multiple gateway instances share the same topic without source ID.
Fix:
- Add gateway ID and sequence number.
- Resend oldest cached data first.
- Separate topics for real-time alarms and historical telemetry.
- Reorder on the server side using timestamp and sequence number.
10) Use Cases & Market Potential
Factory Monitoring
A W5500 offline-cache gateway can collect machine temperature, vibration, motor current, and production count data. When the factory network fails, the gateway stores local data and resends it after recovery.
Energy and Utility Metering
Electricity, water, gas, and steam meters often require periodic readings. Missing data can affect billing or analysis. Local cache improves reliability during network outages.
Oil, Gas, and Mining
Remote industrial sites can lose connectivity for long periods. A gateway with persistent storage is more suitable than a simple MQTT publisher.
Building Automation
HVAC, access control, and energy monitoring systems need stable data logs. Wired Ethernet through W5500 is useful where Wi-Fi is unreliable.
Agriculture and Outdoor Equipment
Greenhouses, pump stations, and irrigation controllers can buffer sensor data during router or backhaul failures.
Market Potential
This architecture is relevant for:
- Industrial IoT gateway vendors
- Smart factory solution providers
- Predictive maintenance platforms
- Energy monitoring companies
- Remote telemetry systems
- Edge computing devices
The key market value is not just “sending MQTT data.” The value is reliable data continuity under real industrial failure conditions.
11) Module/Chip Technical Overview
WIZnet W5500
W5500 is a hardwired TCP/IP Ethernet controller. It integrates Ethernet MAC, PHY, and TCP/IP stack, allowing an external MCU to implement Ethernet applications through SPI. It supports 10BaseT/100BaseTX Ethernet, 8 independent sockets, 32KB internal TX/RX buffer, SPI mode 0/3, and Wake-on-LAN.
Why W5500 Fits This Project
Industrial gateways benefit from predictable wired networking. W5500 reduces the amount of networking code that the MCU must handle because TCP/IP functions are implemented in hardware. This is useful when the MCU also needs to manage sensors, local storage, watchdog, and retransmission logic.
Recommended Design Pattern
| Layer | Recommended Design |
|---|---|
| Network | W5500 Ethernet |
| Transport | TCP |
| Messaging | MQTT |
| Reliability | QoS1 or QoS2 depending on broker support |
| Persistence | SD/eMMC/FRAM |
| Application Integrity | Sequence number, timestamp, CRC, ACK |
| Recovery | FIFO resend with rate limit |
QoS Selection Strategy
| Data Type | Recommended QoS | Cache Required | Reason |
|---|---|---|---|
| Non-critical telemetry | QoS0 or QoS1 | Optional | Occasional loss may be acceptable |
| Production metrics | QoS1 | Yes | Must survive short outages |
| Alarm events | QoS1 + local cache + ACK | Yes | Duplicates are easier to handle than loss |
| Configuration changes | QoS2, if supported | Yes | Exactly-once behavior may be useful |
| Historical bulk upload | QoS1 + sequence | Yes | Order and completeness matter |
MQTT QoS2 can reduce duplicate delivery at the MQTT protocol level, but industrial reliability still needs local persistence, broker compatibility checks, and application-level ordering.
12) Conclusion
MQTT QoS2 is useful, but it is not a complete offline reliability strategy for industrial IoT gateways. In real deployments, a gateway can lose network connectivity for minutes, hours, or even days. During that time, RAM-only buffering is risky, broker sessions may expire, and message order can become inconsistent.
A better architecture is a layered design:
- Use WIZnet W5500 for stable wired Ethernet.
- Add persistent local cache using SD, eMMC, or FRAM.
- Add sequence numbers and timestamps to every message.
- Resend cached messages in FIFO order.
- Select MQTT QoS based on data importance.
- Use server-side deduplication and ACK when possible.
- Monitor cache size, storage health, reconnect count, and resend failures.
This design turns a simple MQTT gateway into a production-ready industrial data acquisition gateway. It is especially valuable for smart factories, energy monitoring, remote telemetry, and industrial automation systems where data continuity matters more than simply being “connected.”
