Machine learning algorithm partially reconfigured on FPGA for an image edge detection system

Gracieth Cavalcanti Batista; Johnny Öberg; Osamu Saotome; Haroldo F. de Campos Velho; Elcio Hideiti Shiguemori; Ingemar Söderquist

doi:10.1016/j.jnlest.2024.100248

Journal of Electronic Science and Technology, Volume. 22, Issue 2, 100248(2024)

Machine learning algorithm partially reconfigured on FPGA for an image edge detection system

Gracieth Cavalcanti Batista1...2,*, Johnny Öberg1, Osamu Saotome2, Haroldo F. de Campos Velho3, Elcio Hideiti Shiguemori2,4, and Ingemar Söderquist15 |Show fewer author(s)

Author Affiliations

¹Division of Electronic and Embedded Systems, KTH Royal Institute of Technology, Stockholm 164 40, Sweden

²Electronic Engineering Division, Aeronautics Institute of Technology, São José dos Campos SP 12228-900, Brazil

³Laboratory of Applied Computing and Mathematics, National Institute for Space Research, São José dos Campos SP 12227-900, Brazil

⁴Department of C4ISR, Institute of Advanced Studies, São José dos Campos SP 12228-001, Brazil

⁵Saab AB, Linköping 581 88, Sweden

show less

Figures & Tables(28)

Fig. 1. Flowchart of the procedure to estimate the UAV position with the proposed SVR technique.

Download full size

View in Article

Fig. 2. Edge and non-edge image patterns for the training phase input.

Download full size

View in Article

Fig. 3. Block diagram of the SVR prediction phase (standardized to the SVM classification phase).

Download full size

View in Article

Fig. 4. Details of all machines of the SVM classifier.

Download full size

View in Article

Fig. 5. Designed neuron for the proposed project.

Download full size

View in Article

Fig. 6. Designed circuit used to load the α values.

Download full size

View in Article

Fig. 7. Proposed SVM architecture (FSM with the pipelined datapath).

Download full size

View in Article

Fig. 8. FSM specification responsible for controlling the SVM classifier datapath.

Download full size

View in Article

Fig. 9. Stages and output variables of the FSM controller.

Download full size

View in Article

Fig. 10. Diagram of the accumulator counter block.

Download full size

View in Article

Fig. 11. Diagram of the machine counter block in Architecture N#1.

Download full size

View in Article

Fig. 12. Illustration of the reconfigurable region and the static area of the three architectures.

Download full size

View in Article

Fig. 13. Designed FIFOs used to load (a) SV and (b) Test values into the neurons.

Download full size

View in Article

Fig. 14. Designed shifter used to load the SV and Test values into the neurons in Architecture N#3.

Download full size

View in Article

Fig. 15. Diagram of the machine counter block in Architecture N#3.

Download full size

View in Article

Fig. 16. Design of the circuit that loads the SV and Test values into the neurons in Architecture N#9.

Download full size

View in Article

Fig. 17. Planned UAV trajectory marked with the red line.

Download full size

View in Article

Fig. 18. SVM classification results with different kernel functions.

Download full size

View in Article

Fig. 19. Neuron’s grain size: (a) reference block and (b) report on the neuron’s cell usage from Vivado Design Suite.

Download full size

View in Article

Fig. 20. Block diagrams of the proposed setup: (a) Option A where the user can choose between Architecture N#1 and Architecture N#3 and (b) Option B where the user can choose between Architecture N#1 and Architecture N#9.

Download full size

View in Article

Table 1. Overview of the representative related studies.

View table

View in Article

Table 1. Overview of the representative related studies.

Applications	Characteristics	Methods
Edge detection using residual learning	A residual deep neural network based on the VGG-16 architecture with deep supervision is developed.	DCNN [29]
Edge detection using two pyramid networks	A down-sampling pyramid network and a lightweight up-sampling pyramid network are constructed to enrich the multi-scale representation from the encoder and decoder, respectively.	Multi-stream learning approach [31]
Real-time image filtering and edge detection	The image information is collected by the camera, Gaussian filtering is applied to remove noise, then Sobel processing is performed, and the image edge processing is finally realized.	Gaussian filtering and Sobel edge processing algorithms implemented on FPGA [35]
Real-time image filtering and edge detection	The image filtering and edge detection is investigated and analyzed where LUT is applied instead of a multiplier, and a distributed algorithm is used in terms of hardware.	Method based on FPGA [36]
Integrated navigation systems	The proposed metaheuristic algorithms are reviewed compared with GA and PSO algorithms.	Metaheuristic algorithms [8]
Edge detection	A real-time data-driven fire propagator is used to support wildfire fighting operation and to facilitate the risk assessment and decision-making process.	Mono-dimensional noise-resistant algorithm [37]
Edge detection	The acquisition, storage, and image display of image data are completed by an FPGA-based image processing system, and the Sobel edge detection algorithm is processed and implemented.	Sobel edge detection algorithm implemented on FPGA [39]
Edge detection	The RFD mask used for edge detection is obtained by using various interpolation methods. The mask size is selected based on the figure of merit and edge preservation index. The edges obtained with the proposed approach in the FrFT domain are further used for image enhancement.	RFD in the FrFT domain [38]
Processing of colored UAV images	A novel guiding equation is used to optimize the positions of the improved cuckoo algorithm before the Levi flight. And after the Levi flight, a novel disturbance equation is applied to obtain a varied location for the next location.	Novel quaternion-based improved cuckoo algorithm [40]
Edge extraction	This is the original strategy of applying image convolution from segmented images.	Sobel’s algorithm [41]
Image convolution for UAV positioning estimation	In terms of image edge identification, Sobel’s and Canny’s algorithms are compared with MLP-NN.	Sobel’s, Canny’s, and MLP-NN algorithms, where the neural network is implemented on both CPU and FPGA [27]

Table 2. Description of Algorithm 1.

View table

View in Article

Table 2. Description of Algorithm 1.

Algorithm 1: SVR prediction phase
Require: SV; Alpha; Bias; Sigma; Test
1: for cont = 1:size(Test, 1) do
2: 　for j = 1:size(SV, 1) do
3: 　　for i = 1:size(SV, 2) do
4: 　　　if (i ≥ 1) && (i < size(SV, 2)) do
5: 　　　　aux = (SV(j, i) – Test(cont, i))²
6: 　　　　aux1 = (SV(j, i+1) – Test(cont, i+1))²
7: 　　　　SqDiff(j, i) = sqrt(aux + aux1)
8: 　　　else
9: 　　　　 aux = (SV(j, i) – Test(cont, i))²
10:　　　　aux1 = (SV(j, 1) – Test(cont, 1))²
11:　　　　SqDiff(j, i) = sqrt(aux + aux1)
12: 　　　end if
13: 　　　EXPin(j, i) = –SqDiff(j, i) / Sigma(i)
14: 　　　EXPout(j, i) = exp(EXPin(j, i))
15: 　　　AlphaMult(j, i) = Alpha(j) * EXPout(j, i)
16: 　　end for
17: 　end for
18: 　adderTree(cont, 1) = sum(AlphaMult)
19: 　BiasSum(cont, 1) = adderTree(cont, 1) + Bias
20: end for
21: for i = 1:size(BiasSum, 1) do
22: 　if BiasSum(i, 1) ≥ 0 then
23: 　　Class(i, 1) = 1
24: 　else
25: 　　Class(i, 1) = 0
26: 　end if
27: end for

Table 3. Description of control signals and the corresponding functions.

View table

View in Article

Table 3. Description of control signals and the corresponding functions.

Signal	Function
Load_SV	It loads FIFOs with the SV values.
Load_Test	It loads FIFOs with the Test values.
Clear_FIFOs	It is the command to clear all FIFOs.
Load_Square	It loads the D-flip-flop registers of S2: Square difference.
Load_AdderEXP	It loads the D-flip-flop registers of S3: Adder + EXP_function.
Load_AlphaMult	It loads the D-flip-flop registers of S4: Alpha_Mult.
Load_Accum	It loads the D-flip-flop registers of S5: Accumulator.
Load_AdderSGN	It loads the D-flip-flop registers of S6: Adder_Bias + SGN.
Clear_Accum	It clears the accumulator.
Reset_ALL_Regs	It resets all datapath registers.

Table 4. Summarized features of the hardware implementation.

View table

View in Article

Table 4. Summarized features of the hardware implementation.

Item	Feature
Input data	88 frames of 3×3 pixels
Classification type	Frame by frame
Kernel function	Gaussian—using the exponential function
Multi-class technique	One-vs-all
Word size and type	18-bit fixed-point
Architectures	FSM + pipelined datapath
Result	Binary
Description language	VHDL
Simulation and synthesis	Vivado 2019.1
FPGA device	Xilinx ZYNQ-7 ZC702

Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.
View table
View in Article
Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.
Parameter CPU GPU CPU + GPU
Frequency 5020 MHz 420 MHz 5372 MHz
Processing time 86.06 ms 83.1 s 64.7 μs

Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.

View table

View in Article

Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.

Architecture	Feature	Without DPR	With DPR
N#1	Clock period	100 ns	50 ns
	Latency	0.19 s	96 μs
	LUTs-static area	969	86
	Flip-flops-static area	535	111
	LUTs-reconfigurable region	/	924
	Flip-flops-reconfigurable region	/	169
	Power consumption	5 mW	7 mW
N#3	Clock period	50 ns	120 ns
	Latency	32.10 μs	77.04 μs
	LUTs-static area	2865	88
	Flip-flops-static area	637	110
	LUTs-reconfigurable region	/	927
	Flip-flops-reconfigurable region	/	273
	Power consumption	19 mW	9 mW
N#9	Clock period	50 ns	100 ns
	Latency	10.95 μs	21.60 μs
	LUTs-static area	8328	138
	Flip-flops-static area	792	80
	LUTs-reconfigurable region	/	955
	Flip-flops-reconfigurable region	/	275
	Power consumption	7 mW	4 mW

Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.
View table
View in Article
Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.
Architecture Total DPR run time Partial bitstream size
N#1 74.10 μs 875 KB
N#3 26.00 μs 913 KB
N#9 7.38 μs 831 KB

Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].

View table

View in Article

Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].

Features	Proposal in Ref. [54]	N#1 with DPR	N#9 without DPR
Clock frequency	50 MHz	20 MHz	20 MHz
Latency	1.231 ms	2.79 s	0.29 s
Image size	512×512	29127 times one frame of 3×3	29127 times one frame of 3×3
FPGA device	Altera’s Cyclone IV E: EP4CE10F17C8	Xilinx ZYNQ-7 ZC702	Xilinx ZYNQ-7 ZC702
Edge detection technique	Improved Canny algorithm	ML	ML

Tools

Get Citation

Copy Citation Text

Gracieth Cavalcanti Batista, Johnny Öberg, Osamu Saotome, Haroldo F. de Campos Velho, Elcio Hideiti Shiguemori, Ingemar Söderquist. Machine learning algorithm partially reconfigured on FPGA for an image edge detection system[J]. Journal of Electronic Science and Technology, 2024, 22(2): 100248

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Aug. 15, 2023

Accepted: Mar. 30, 2024

Published Online: Aug. 8, 2024

The Author Email: Batista Gracieth Cavalcanti (gracieth@kth.se)

DOI:10.1016/j.jnlest.2024.100248

Topics

laser devices and laser physics

Lasers and Laser Optics

Laser physics

laser manufacturing

Instrumentation, Measurement and Metrology

Table 1. Overview of the representative related studies.

Table 1. Overview of the representative related studies.

Table 2. Description of Algorithm 1.

Table 2. Description of Algorithm 1.

Table 3. Description of control signals and the corresponding functions.

Table 3. Description of control signals and the corresponding functions.

Table 4. Summarized features of the hardware implementation.

Table 4. Summarized features of the hardware implementation.

Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.

Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.

Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.

Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.

Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.

Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.

Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].

Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].

微信扫一扫：分享