Make W5300 Network Camera for RealTime AI with STM32H7

W5300 Network camera that transmits uncompressed original video to network using W5300 and performs AI image processing

COMPONENTS Hardware components

WIZnet - WIZ830MJ

x 1

Software Apps and online services

OpenCV - OpenCV

x 1

microsoft - Visual Studio 2017

x 1

PROJECT DESCRIPTION

Introduction

A typical network camera sends compressed images due to network bandwidth. Compressed images can be problematic for image processing because they are degraded in quality. Using the W5300, we created a camera that can transmit uncompressed original images over the network and process the images.

Since the W5300 is capable of high-speed network transmission over 80Mbps, it is the best example for transmitting real-time original images from a camera.

The transmitted image can be processed not only for general image processing, but also for smarter applications by applying AI. AI-enabled network cameras play an important role in monitoring the environment through real-time detection and analysis, crime prevention, accident detection, resource optimization, and more. They can help create smarter and safer environments and build efficient security and monitoring systems.

Advantages of Network Camera with W5300

Real-time video and data transmission is smooth with fast data transmission and network communication performance.
Low-power operation extends battery life and reduces power costs.
High compatibility and connectivity by supporting various network protocols.
Data security enhancement and secure data transmission are possible.
Development tools and libraries make software and hardware development easy.
With various input/output ports and functions, scalability and implementation of additional functions are convenient.

Hardware

Semantics Circuit Diagram:

The MCU used is the STM32H7 series, running at 480Mhz.
Since the W5300's interface is an address/data bus, we chose the STM32H743 from the STM32H7 series as the MCU, which provides the data bus along with the camera interface.

The STM32 is easily scalable to higher performance with the same pin map.

Firmware

Network Performance Test

W5300 (https://www.wiznet.io/ko/product-item/w5300/) is an embedded hardwired TCP/IP ethernet controller chip with network performance up to 80Mbps – which makes it a good choice to build a camera where high-speed network communication is required.

The iperf program was used to check the network transmission rate using the W5300, and the test result was measured as 90Mpbs.

Transferring images

Transmits line-by-line image data according to the line and frame input signals generated by the camera module.

  while (1)
  {
    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
		ProcessCamTcps(_CAM_SOCK_NUM, ethBuf0, dDestport);

		//send line when line interrupt signal occurs
	  	if(gCameraFrameFlag>0)
	  	{
	  		gCameraFrameFlag = 0;
	  		CamImagTransferFrame(_CMD_CAM_FRAME, gLineCnt);

	  		printf(">%d, %d\r\n", gFrameCnt, gLineCnt);
	  		printf("-------------------\r\n");
	  		gFrameCnt++;
	  		gLineCnt = 0;
	  	}

		//send frame when frame interrupt signal occurs
		if(gCameraLineFlag>0)
		{
			gCameraLineFlag = 0;

			CamImagTransferLine(_CMD_CAM_LINE, gLineCnt);
			gLineCnt++;
		}
  }

Softwares

The PC software is a software that transmits and outputs images from the W5300 network camera server and applies image processing algorithms. In addition, deep learning algorithms can be applied to find objects and make various applications.

Basic Network Video Output

It works as a TCP client and outputs the video sent from the W5300 network camera server.

When data is received over the network, it is stored in the image buffer line by line as per the command.

int CNetImagePlayDlg::OnReceive(unsigned char* pDataBuffer)
{
	int cmd;
	int i;
	int page = 0;

	unsigned int size = _IMAGE_SIZE_X * 2 * _UDP_TX_BUF_RATE;

	cmd = pDataBuffer[0];
	page = pDataBuffer[2] << 8 | pDataBuffer[3];

	unsigned short data;
	int index = 0;


	if (cmd == _CMD_CAM_LINE)
	{
		if (m_gCamBufferIndex < _IMAGE_SIZE_Y)
		{
			memcpy((unsigned char*)&gImageBuffer[m_gCamBufferIndex * _IMAGE_SIZE_X * 1 * _UDP_TX_BUF_RATE], (unsigned char*)&pDataBuffer[4], _IMAGE_SIZE_X * 2 + _UDP_TX_BUF_RATE);
			m_gCamBufferIndex++;
		}
	}
	else if (cmd == _CMD_CAM_FRAME)
	{
		{
			str.Format(L"%x, %d, 0, %d", cmd, page, m_gCamBufferIndex);
			gstr_buf += str + L"\r\n";

			SetDlgItemText(IDC_EDIT2, gstr_buf);

			m_gCamBufferIndex = 0;

			DisplayCamImag(gImageBuffer, _IMAGE_SIZE_Y, _IMAGE_SIZE_X);
			memset(gImageBuffer, 0, _IMAGE_SIZE_X * 1 *_IMAGE_SIZE_Y);
		}
	}
	return 0;
}

Image output and video processing are implemented by creating threads so that they can be processed in real time.

int CNetImagePlayDlg::OnThreadProc()
{
	Mat gray_image;
	Mat in_image;
	
	cv::rotate(m_DisplayImg, in_image, ROTATE_90_COUNTERCLOCKWISE);
	cv::waitKey(10);
	
	if (gFlag_Ai == 1)
	{
		//image precessing1..
	}
	else if (gFlag_Ai == 2)
	{
		//image precessing2..
	}
	else if (gFlag_Ai == 3)
	{
		//image precessing2..
	}
	else imshow("Sensor", in_image);
	
	return 0;
}

Digital Image Processing

Apply a simple image processing algorithm using the transmitted image.
An example of extracting outlines and marking boundaries is applied using the image outline extraction function provided by OpenCV.

Outline Extraction Test Code

    Mat img_edge;

	//convert image from grey to color
    cvtColor(in_image, gray_image, COLOR_BGR2GRAY);             
    

    Mat thresh;
    threshold(gray_image, thresh, gThrVal, 255, THRESH_BINARY);

    imshow("Sensor", thresh);

    // detect the contours
    vector<vector<Point>> contours;
    vector<Vec4i> hierarchy;
    findContours(thresh, contours, hierarchy, RETR_TREE, CHAIN_APPROX_NONE);
    
    // draw contours on the original image
    Mat image_copy = in_image.clone();
    drawContours(image_copy, contours, -1, Scalar(0, 255, 0), 2);
    imshow("result", image_copy);

AI Image Processing (Deep Learning)

The deep learning framework uses Caffe. Caffe is one of the open source deep learning frameworks, and there are many other frameworks such as TensorFlow, PyTorch, and Darknet.

Example of object recognition/classification with OpenCV DNN algorithm
For a simple test, I used the following network. Thanks Katsuya Hyodo.

https://github.com/PINTO0309/MobileNet-SSD-RealSense/tree/master/caffemodel/MobileNetSSD

Object detection and display test examples

    //Loading deep learning models
    Net net = readNetFromCaffe("MobileNetSSD_deploy.prototxt.txt", "MobileNetSSD_deploy.caffemodel");

    //object detection
    Mat blob = blobFromImage(in_image, 0.007843, Size(300, 300), Scalar(127.5, 127.5, 127.5));
    net.setInput(blob);
    Mat detection = net.forward();

    //Output detected object information
    Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
    for (int i = 0; i < detectionMat.rows; i++)
    {
        float confidence = detectionMat.at<float>(i, 2);

        if (confidence > 0.4)
        {
            int id = (int)(detectionMat.at<float>(i, 1));
            int x1 = static_cast<int>(detectionMat.at<float>(i, 3) * in_image.cols);
            int y1 = static_cast<int>(detectionMat.at<float>(i, 4) * in_image.rows);
            int x2 = static_cast<int>(detectionMat.at<float>(i, 5) * in_image.cols);
            int y2 = static_cast<int>(detectionMat.at<float>(i, 6) * in_image.rows);

            //Display detected object locations
            rectangle(in_image, Point(x1, y1), Point(x2, y2), Scalar(0, 255, 0), 2);

            String label = format("%s: %.2f", obj_class[id].c_str(), confidence);

            int base;
            Size label_size = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &base);

            //Display the detected object box, type, and accuracy
            rectangle(in_image, Point(x1, y1 - label_size.height), Point(x1 + label_size.width, y1), Scalar(0, 255, 0), -1);
            putText(in_image, label, Point(x1, y1), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0), 1);
        }
    }

    //Output the resulting image
    imshow("result", in_image);

Example of face recognition with OpenCV DNN algorithm

Changing the DNN network allows for different AI processing.

For a simple test, we used the following network

https://github.com/keyurr2/face-detection

Conclusion.

The W5300 is a Hardwired TCP/IP high-speed networking solution and an excellent chipset to showcase networking performance in a low-cost MCU, especially for image processing applications that need to process original images, making high-speed data transfer very easy and simple.

Documents

STM32H7_NET_CAM Firware and PC Software
Network camera project source code for deep learning image processing
STM32H7-RP HW Schematic
Schematics
Blog Post
STM32H7-RP Board Test
Various test examples for STM32 and W5300

Comments Write