1 Introduction
Unmanned autonomous vehicles (UAVs) began to be used during World War II to perform functions that risked soldiers’ lives, such as territory reconnaissance. Sometimes, they were also used for attacks [1]. In the past, UAVs had a surveillance role; while currently, they have a more active role in many areas such as agriculture and livestock, the wind industry, advertising, civil construction, workplace safety, and border security. Because current UAVs can be controlled from a long distance [1].
UAVs are often employed in high-risk environments. The ability to sense and avoid obstacles and rebuild their flight paths is an important feature that UAVs should possess, and the corresponding algorithms should be embedded in their guidance and control systems [2]. Amongst others, the navigation system has top priority because a fast and well-developed autonomous navigation system makes UAV operators work easier.
Autonomous navigation requires a robust and reliable self-localization ability. One strategy for UAV autonomous navigation is to combine global navigation satellite systems (GNSSs), such as the global positioning system (GPS), with an onboard inertial navigation system (INS) sensor for pose and attitude estimation. Additionally, positioning information with the INS sensor is also important because there is an accumulation of the drift error, which could lead to a great divergence between the estimated position and the actual position [3]. Many techniques can be used for addressing the data fusion of GNSS-INS, such as sequential Bayesian filtering including Kalman filters [4,5], extended Kalman filters [6], and particle filters [7]. Metaheuristic optimization approaches were applied to design integrated navigation systems based on GNSS-INS [8]. However, it could cause reliability issues in the GNSS-INS fusion, mainly if there is any obstacle between UAV and the emitting satellites or intentional and unintentional interference. Furthermore, the most common malicious attacks to disrupt the position, navigation, and time derived from GNSSs are spoofing (allowing the attacker to take control and/or making the receiver calculate a false position) and jamming (overpowering GPS satellite signals locally so that the receiver can no longer operate).
In addition, natural phenomena can interfere with the propagation of the GNSS signal. Ionosphere is such an atmospheric layer, which suffers ionization by solar radiation and locates the region of 48 km up to 965 km [9]. However, it is not a homogeneous atmospheric layer. Disturbance can appear on it, which has a negative impact on the propagation of electromagnetic waves, generating random fluctuations in the amplitude and phase of these waves. This kind of disturbance is called scintillations, which can cause full disruption to the emitted GNSS signal [10]. One permanent ionospheric disturbance is the equatorial ionization anomaly [11], sometimes also named equatorial fountain, responsible for increasing the electron density over the north and south regions of the magnetic equator; another factor associated with the ionospheric scintillation is the ionospheric bubbles [12–14]. The ionospheric bubble effects over the Brazilian territory can cover up to 30 degrees of south latitude, and this is a seasonal phenomenon, mainly during the summer period in the Southern Hemisphere [15].
Vision-based localization and vision-aided localization are the most predominant solutions to replace or supplement the GNSS-INS fusion in recent years [16]. However, the main disadvantage of these approaches is the computation load due to the vast amount of data to analyze and the complexity of interpreting data. Vision-based UAV localization covers relative visual localization (RVL) and absolute visual localization (AVL), which are also sometimes called frame-to-frame localization and frame-to-reference localization, respectively [16]. AVL is performed by matching or registering the current view of UAV against a visual memory built from the aforementioned data, called image-matching. The immunity to drift is achieved by ensuring complete independence between pose estimation. The most used image-matching techniques are template matching, feature points matching, deep learning matching, and visual odometry matching.
In this paper, the AVL concept is studied by using images from UAV equipped with synthetic aperture radar (SAR) sensors. The artificial intelligence (AI)-based image processing technique in this area is well-known and outperforms traditional image-matching methods. Numerous studies have focused on convolutional neural networks [17,18], weightless neural network architectures [19], self-configured neural networks [20,21], radial basis function neural networks (RBF-NNs) [22], deep learning [23,24], training of kernels [25], Siamese networks [26], and others. However, to the best of our knowledge, there have been no reports on machine learning (ML) techniques based on statistical learning theory (SLT).
Here, to address the challenge of image edge detection in an image-matching system, UAV pose estimation based on ML was applied, and the support vector machine (SVM) regression model (also known as SVR) was used to predict “edge” and “non-edge” patterns. The regression model of the proposed method was developed in MATLAB from a binary image database. The prediction phase was designed for a field programmable gate array (FPGA) device, with 18-bit fixed-point data output from the training phase. This design was implemented with dynamic partial reconfiguration (DPR). The SVR prediction phase used DPR in its synchronous datapath to modify the hardware granularity and present an adaptive layout, which has not been reported previously. As a result, the granularity of the reconfigurable region was gradually increased, where three architectures (Architecture N#1, Architecture N#3, and Architecture N#9) dynamically reconfigured in a Zedboard ZYNQ-7 device were realized.
2 Related work
The UAV pose estimation techniques based on image edge detection have been widely investigated, including the AI-based ones [27,28]. However, only several studies applied the deep learning-based and bio-inspired methods to realize edge detection [8,29]. Yang and co-authors overviewed the representative edge and object contour detection methods reported in the past two decades [30]. Amongst others, two methods have gained much attention. One was proposed by Al-Amaren et al. [29]. They applied a new visual geometry group network-16 layers (VGG-16)-based deep convolutional neural network (DCNN) for edge detection with residual learning and demonstrated that this methodology outperformed all the existing VGG-16-based techniques with superior performance and low complexity. The other one is the multi-scale representation. A bi-directional pyramid network was constructed by combining two pyramid networks (a down-sampling pyramid network and a lightweight up-sampling pyramid network) with a backbone network [31]. This contributed to a higher training speed and equivalent test accuracy. However, intense computing is the prerequisite with the designed neural network architectures. This indicates that high-performance computing systems are required.
One of the UAV pose estimation techniques is integrating INS with GNSS, where metaheuristic algorithms have been proposed to achieve pose estimation [8]. The corresponding simulation results are superior to those of the genetic algorithm (GA) and particle swarm optimization (PSO) in terms of navigation performance.
Due to its superiority, FPGA is gained increasingly interest in real-time, reliable, and low-cost embedded systems. Because FPGAs have higher computation efficiency than central processing units (CPUs) and graphics processing units (GPUs), especially with ML [32]. It is reconfigurable, which makes FPGA a promising candidate to realize high-speed operation. It can perform well with less execution time at a low cost and low power consumption. Besides, it has various unique design advantages including reliability, long-term maintenance, and flexibility [33]. Particularly, DPR, as a modern capability of FPGA, enables the user to reconfigure the area, part of which can be used dynamically while the rest stays operating normally [34].
In a real-time image filtering and edge detection system, which needs a real-time display of the image, Sun et al. used the Gaussian filtering and Sobel edge processing algorithms to process the image edge [35]. In Ref. [36], the same problem has been addressed by image filtering and edge detection, where a look-up table (LUT), instead of a multiplier, and a distributed algorithm are applied. However, there is still no enough information about the implementation parameters of FPGA.
Vivo et al. developed a mono-dimensional noise-resistant algorithm for edge detection [37]. Such an algorithm guarantees fast computation, making it very attractive for real-time image processing, remote sensing, and UAV surveillance. Kaur et al. [38] proposed an edge detection technique based on Riesz fractional derivative (RFD) in the fractional Fourier transform (FrFT) domain and demonstrated that the proposed approach is highly efficient. However, both were only validated in simulation while no experimental implementation was conducted. Although Zhang et al. applied the Sobel edge detection algorithm in an FPGA-based image processing system [39], it cannot process in real time. Moreover, neither the power and area consumption in the FPGA device, nor the time consumption to output the processed image was investigated. A quaternion-based improved cuckoo algorithm for faster processing was proposed and experimentally demonstrated [40]. Also, the processing time and quality were evaluated. However, the edges of the output images were not refined.
Based on image convolution from segmented images, Conte and Doherty used Sobel’s algorithm for edge extraction [41]. Similarly, with the same UAV flight experimental data, Braga et al. [42] performed image convolution, but an optimal neural network was adopted to execute the edge extraction. Here an optimization problem was solved by calculating the hyperparameters from the meta-heuristic called multi-particle collision algorithm [43]. Such an edge detection method is capable to localize UAV more precisely (with a smaller error) and faster than Sobel’s algorithm [42]. Braga applied convolution between segmented images/data for UAV positioning estimation [27] and found that better performance in image edge identification was obtained with the multi-layer perceptron neural network (MLP-NN) than that with both Sobel’s and Canny’s algorithms. The neural network was implemented on CPU and FPGA. Two pieces of hardware with FPGA embedded, Raspberry PI Model B-1 (FPGA Spartan-6 LX9) and Zybo Zynq 7000 (FPGA Artix-7), were employed. The results verify that FPGA Artix-7 can process faster, while FPGA Spartan-6 LX9 consumes less energy. By employing image convolution for UAV positioning with the SAR image, Silva et al. [28] also evaluated three different algorithms’ performance for edge detection including Canny’s algorithm, RBF-NN, and the fuzzy system. Their results show RNF-NN performs the best with a positioning error of 34.8 m, smaller than that of 63.6 m for Canny’s algorithm and 40.9 m for the fuzzy system. In Ref. [44], a system is developed with only a downward facing monocular RGB camera on UAV, where pre-existing satellite imagery of the flight location to which the UAV imager is compared and aligned. This results in an average positioning error of smaller than 8 m.
In brief, the representative studies overviewed in this paper are summarized in Table 1, where their applications, characteristics, and used methods are analyzed. Obviously, no programmable hardware implementation has been achieved, where an edge detection algorithm is applied for UAV pose estimation to reduce the cost, power consumption, and response time of the system. Also, no flexible system is available.

Table 1. Overview of the representative related studies.
Table 1. Overview of the representative related studies.
Applications | Characteristics | Methods | Edge detection using residual learning | A residual deep neural network based on the VGG-16 architecture with deep supervision is developed. | DCNN [29] | Edge detection using two pyramid networks | A down-sampling pyramid network and a lightweight up-sampling pyramid network are constructed to enrich the multi-scale representation from the encoder and decoder, respectively. | Multi-stream learning approach [31] | Real-time image filtering and edge detection | The image information is collected by the camera, Gaussian filtering is applied to remove noise, then Sobel processing is performed, and the image edge processing is finally realized. | Gaussian filtering and Sobel edge processing algorithms implemented on FPGA [35] | Real-time image filtering and edge detection | The image filtering and edge detection is investigated and analyzed where LUT is applied instead of a multiplier, and a distributed algorithm is used in terms of hardware. | Method based on FPGA [36] | Integrated navigation systems | The proposed metaheuristic algorithms are reviewed compared with GA and PSO algorithms. | Metaheuristic algorithms [8] | Edge detection | A real-time data-driven fire propagator is used to support wildfire fighting operation and to facilitate the risk assessment and decision-making process. | Mono-dimensional noise-resistant algorithm [37] | Edge detection | The acquisition, storage, and image display of image data are completed by an FPGA-based image processing system, and the Sobel edge detection algorithm is processed and implemented. | Sobel edge detection algorithm implemented on FPGA [39] | Edge detection | The RFD mask used for edge detection is obtained by using various interpolation methods. The mask size is selected based on the figure of merit and edge preservation index. The edges obtained with the proposed approach in the FrFT domain are further used for image enhancement. | RFD in the FrFT domain [38] | Processing of colored UAV images | A novel guiding equation is used to optimize the positions of the improved cuckoo algorithm before the Levi flight. And after the Levi flight, a novel disturbance equation is applied to obtain a varied location for the next location. | Novel quaternion-based improved cuckoo algorithm [40] | Edge extraction | This is the original strategy of applying image convolution from segmented images. | Sobel’s algorithm [41] | Image convolution for UAV positioning estimation | In terms of image edge identification, Sobel’s and Canny’s algorithms are compared with MLP-NN. | Sobel’s, Canny’s, and MLP-NN algorithms, where the neural network is implemented on both CPU and FPGA [27] |
|
3 Methodology
3.1 Image processing
The UAV coordinates can be identified from the maximum convolution between the segmented images from UAV (in the thermal infrared band) and the georeferenced Google Earth images. To estimate the UAV position, the procedure shown in Fig. 1 is conducted in this paper.

Figure 1.Flowchart of the procedure to estimate the UAV position with the proposed SVR technique.
1) The RGB-color images are mapped into gray-scale binary images, according to (1) [45].
$ \mathrm{g}\mathrm{r}\mathrm{a}\mathrm{y}\mathrm{s}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{e}=0.2989R+0.5870G+0.1140B $ (1)
where $ R $, $ G $, and $ B $ are the pixel intensities from the red, green, and blue images, respectively; grayscale is the scale of the obtained gray-scale binary image after mapping.
2) The obtained gray-scale image is then transformed into a binary image, i.e., pixels have two values of 0 (black) and 1 (white). To verify whether the grayscale value is smaller than the threshold or not, the image histogram is first calculated, and then a value is selected to minimize the variance among the grayscale levels of the image [46].
3) Based on the binary images, an edge detection scheme, the SVR approach, is applied to produce segmented images, which will be described in subsection 3.2 in detail.
4) Finally, image convolution between two segmented images is computed by
$ c\left(s{\mathrm{,}}\;t\right)=\sum _{x}\sum _{y}f(x{\mathrm{,}}\;y)w(x-s{\mathrm{,}}\;y-t) $ (2)
where $ f(x{\mathrm{,}}\;y) $ is the georeferenced image (from the Google Earth) and $ w(x-s{\mathrm{,}}\;y-t) $ is the UAV image; x and y are the line and column of the pixel position in a two-dimensional (2D) space mapping the matrix, respectively; s and t are the line and column of the matrix, respectively; $ c\left(s{\mathrm{,}}\;t\right) $ is the convolution output image related to the correlation with the input 2D image, whose maximum value identifies the UAV position.
As a validation, an image of 810×2000 pixels was adopted and a window of 80×140 pixels was extracted. In the segmentation process, the individual pixels within this window were focused to detect edges. With the image processing method proposed in Ref. [28], a new image was generated, where each pixel was labeled as either 0 or 1. Here the training dataset was composed of an image, which was separated into 208 frames of 3×3 pixels each. Amongst, 24 different examples are selected to represent “edge” patterns and 2 examples are to represent “non-edge” patterns, as shown in Fig. 2.

Figure 2.Edge and non-edge image patterns for the training phase input.
The classification phase dataset comprises 88 frames of 3×3 pixels each, and it is independently and identically distributed, i.e., it is entirely different from the training dataset. Then, the classification phase on hardware was designed to process frames of 3×3 pixels per time because images can be fitted into this frame size easily. Moreover, the hardware implementation speed is proportional to how often each frame is processed.
3.2 SVR modeling
In ML, Vapnik’s group developed SVM at the end of the 20th century [47,48]. Based on SLT [49] and Vapnik-Chervonenkis theory [50], SVM is known to be one of the most robust generalization algorithms because it can assign new examples to one category or another. Moreover, it is a non-probabilistic binary linear classifier, which builds a maximum-margin hyperplane to separate two classes per classification. It can solve the non-linear problems through kernel functions (RBF, polynomial function, and hyperbolic tangent function) and classify even more than two classes with multiclass techniques.
As one of the SVM versions for the regression model [51], SVR has also been proven to be an effective tool in real-value function estimation. As a supervised-learning approach, SVR trains using a symmetrical loss function to penalize high and low misestimates equally. Using Vapnik’s ε-insensitive approach, a flexible tube of the minimal radius is formed symmetrically around the estimated function, such that the absolute errors less than a certain threshold ε are ignored both above and below the estimate. In this manner, points outside the tube are penalized, but those within the tube, either above or below the function, receive no penalty. One of the main advantages of SVR is that its computational complexity is independent of the dimensionality of the input space. Additionally, it has an excellent generalization capability with high prediction accuracy [52].
Using the SVR training, the object function $ \mathop {{\mathrm{min}} }\limits_{w} ({1}/{2}){\left\| {{\boldsymbol{\mathbm{ω}}}}\right\|}^{2} $ should be found based on (3), where ||ω|| is the magnitude of the normal vector to the surface that is being approximated by this minimization and ω is the weight vector.
$ 100\left|{y}_{i}-\left\langle{\Cambriabifont\text{ω}{\mathrm{,}}\;{x}_{i}}\right\rangle-b\right| < \epsilon $ (3)
where $ {x}_{i} $ is a training sample with a labeled value as yi. The inner product plus intercept, $ \left\langle{\Cambriabifont\text{ω}{\mathrm{,}}\;{x}_{i}}\right\rangle+b $, is the prediction $ {f(x}_{i}) $ (technically known as the multivariate regression) for that sample, and $ b $ is a bias.
The SVR algorithm used in this paper consists of two phases, i.e., the training phase that fits an SVR model and the prediction or classification phase which returns a vector of predicted class labels for the predictor data.
The SVR accuracy was analyzed for this image edge detection system and the kernel function with the best response was found. Then, the training and classification phases were processed with MATLAB to test the kernel functions of linear function, RBF (also known as the Gaussian function), and polynomial function. Finally, the Gaussian function was found to be the best kernel function with 71 support vectors (SVs), 71 α values (α is the Lagrange multiplier that comes from the training phase as part of the SVR mathematical modeling), 9 σ values (σ is the parameter that adjusts the Gaussian curve to draw the separation from two classes, and also comes from the training phase), and one bias value as the training data output.
The input data X with 208 frames of 3×3 pixels each is organized in a matrix, as shown in (4).
$ {{\mathbf{X}}_{208 \times 9}} = \left[ {} \right] $ (4)
where the row is related to each frame F and the column is related to each pixel of each frame. Then, F is the representation for each frame, and it has 9 pixels.
Moreover, supervised learning, as an ML method, is labeled by a matrix of the desired input, as shown in (5).
$ {{\mathbf{Y}}_{208 \times 1}} = \left[ {} \right] $ (5)
where the element of the matrix, Ydi, i = 1, 2, ···, 208, is equal to ‘1’ or ‘0’. If it is ‘1’, then the correspondent frame of the matrix X is an “edge” pattern; if it is ‘0’, then the correspondent frame of the matrix X is a “non-edge” pattern. The rows of the two matrices are correlated. For example, the input data $ {F}_{P}^{47} $ is related to the desired input Yd47, where the 47th frame is analyzed, and P is related to the respective vector of nine pixels representing this 47th frame.
The pseudocode of the SVR prediction phase is shown in Table 2 step by step. The “for” loops of lines 1–3 navigate the matrices SV and Test position by position. The square difference is calculated in lines 5–11, and the exponential function is calculated in line 14 and then multiplied by α (alpha in Table 2) in line 15. After the “for” loops of the SV matrix are finished, the adder tree in line 18 sums up all the results from the exponential computation. The adder tree result with the bias value is added in line 19. Finally, the final result is compared with ‘0’ in lines 21–26. If the result is higher than ‘0’, then the respective Test datapoint belongs to the class that is processed by the respective SV matrix; otherwise, it does not. The last line ends the “for” loop of the Test matrix because each SV matrix is wholly processed for each Test datapoint.

Table 2. Description of Algorithm 1.
Table 2. Description of Algorithm 1.
Algorithm 1: SVR prediction phase | Require: SV; Alpha; Bias; Sigma; Test | 1: for cont = 1:size(Test, 1) do | 2: for j = 1:size(SV, 1) do | 3: for i = 1:size(SV, 2) do | 4: if (i ≥ 1) && (i < size(SV, 2)) do | 5: aux = (SV(j, i) – Test(cont, i))² | 6: aux1 = (SV(j, i+1) – Test(cont, i+1))² | 7: SqDiff(j, i) = sqrt(aux + aux1) | 8: else | 9: aux = (SV(j, i) – Test(cont, i))² | 10: aux1 = (SV(j, 1) – Test(cont, 1))² | 11: SqDiff(j, i) = sqrt(aux + aux1) | 12: end if | 13: EXPin(j, i) = –SqDiff(j, i) / Sigma(i) | 14: EXPout(j, i) = exp(EXPin(j, i)) | 15: AlphaMult(j, i) = Alpha(j) * EXPout(j, i) | 16: end for | 17: end for | 18: adderTree(cont, 1) = sum(AlphaMult) | 19: BiasSum(cont, 1) = adderTree(cont, 1) + Bias | 20: end for | 21: for i = 1:size(BiasSum, 1) do | 22: if BiasSum(i, 1) ≥ 0 then | 23: Class(i, 1) = 1 | 24: else | 25: Class(i, 1) = 0 | 26: end if | 27: end for |
|
4 FPGA implementation: DPR based on the SVR prediction phase
The mathematical formulation for this proposed SVR datapath design is based on Ref. [53], where the SVM datapath for the regression problem is reformulated. This UAV pose estimation system is thus achieved.
To standardize the nomenclature, we call the SVR prediction phase as the SVM classification phase because the concepts on MATLAB itself use this name equivalence. Furthermore, in hardware, the computation done for the classification phase is similar with that for the prediction phase with only minor modifications due to applying a kernel per dimension that occurs on the prediction datapath, but the components are the same. This can be explained and affirmed by the calculation details of the SVR prediction phase as shown in Fig. 3.

Figure 3.Block diagram of the SVR prediction phase (standardized to the SVM classification phase).
According to Fig. 3, all 71 machines are calculated, accumulated, and added to the bias so the decision-maker SGN can settle to which class the related vector Test (one frame of 3×3 pixels) belongs. SGN means the sign function where the output is 1 or 0 depending on the input condition. The SGN block decides if the most significant bit is 1, then the class belongs to an “edge” pattern; otherwise, it belongs to a “non-edge” pattern. In other words, if the result is 1, then the obtained datapoint representing the input data Test is inside the “prediction maker tube” of the SVR datapath; otherwise, the obtained datapoint is outside the “prediction maker tube”.
Figs. 4 and 5 describe the internal processes of each machine and each neuron, respectively. Notably, the neurons used in this project differ from those typically found in neural networks because SVM is a non-probabilistic algorithm, as explained in subsection 3.2. However, for the sake of simplicity and ease of understanding, these components are referred as neurons. Each neuron processes one dimension/feature of the input data. Because SVM is such an algorithm that solves nonlinear mapping by transforming the input space into a high-dimensional feature space. In other words, each feature represents each dimension; then, each datapoint is analyzed inside a 2D coordinate space representing each data feature.

Figure 4.Details of all machines of the SVM classifier.

Figure 5.Designed neuron for the proposed project.
In this case, SVM builds 9 dimensions because each frame is represented particularly by 9 pixels, so there are 9 neurons. Besides, 71 machines are adopted because the function, fitrsvm, used in MATLAB for training finds that it is the optimum number of machines for Bayesian optimization to reduce the generalization error.
The Gaussian kernel function is calculated through the EXP block, which is based on LUTs and parametrized by σ values, according to (6).
$ K\left({\mathbf{X}}_{i}{\mathrm{,}}\;{\mathbf{X}}_{j}\right)=\mathrm{exp}\left(-\frac{{\left\|{\mathbf{X}}_{i}-{\mathbf{X}}_{j}\right\|}^{2}}{2{\sigma }^{2}}\right) $ (6)
where X is the input data of the kernel function K, and it is represented in the Cartesian coordinate system by i and j. SV and Test are input through two first-in first-outs (FIFOs) because there are nine values of SV and nine values of Test. However, the α values are input by one register because they are loaded just once for each machine, as shown in Fig. 6. The signal Load_Alpha depends on the signal Count_Mach that counts every time a machine is completed; and then, a new α value is loaded to the circuit (see Figs. 6 and 7). The σ values are not loaded into the architecture because they are intrinsic in the EXP block. Then, the complete architecture of the SVM classifier is composed of a finite state machine (FSM) and a pipelined datapath, as shown in Fig. 7. There are ten control signals from FSM to control the datapath, and each signal has its respective function, as shown in Table 3. FSM has seven states, as shown in Fig. 8. The corresponding stages and output variables are shown in Fig. 9, where each stage is responsible for controlling parts of the SVM datapath. S0 initializes the entire circuit; S1 loads the FIFOs; S2 calculates the square difference of the neuron (see Fig. 5); S3 calculates the adder (+) and EXP blocks of the neuron (see Fig. 5); S4 calculates the multiplication (×) by the alpha block of the neuron (see Fig. 5); S5 calculates the accumulator block of the datapath (see Fig. 3); S6 calculates the adder (+) and SGN blocks, i.e., the decision-maker block, of the datapath (see Fig. 3).

Figure 6.Designed circuit used to load the α values.

Figure 7.Proposed SVM architecture (FSM with the pipelined datapath).

Table 3. Description of control signals and the corresponding functions.
Table 3. Description of control signals and the corresponding functions.
Signal | Function | Load_SV | It loads FIFOs with the SV values. | Load_Test | It loads FIFOs with the Test values. | Clear_FIFOs | It is the command to clear all FIFOs. | Load_Square | It loads the D-flip-flop registers of S2: Square difference. | Load_AdderEXP | It loads the D-flip-flop registers of S3: Adder + EXP_function. | Load_AlphaMult | It loads the D-flip-flop registers of S4: Alpha_Mult. | Load_Accum | It loads the D-flip-flop registers of S5: Accumulator. | Load_AdderSGN | It loads the D-flip-flop registers of S6: Adder_Bias + SGN. | Clear_Accum | It clears the accumulator. | Reset_ALL_Regs | It resets all datapath registers. |
|

Figure 8.FSM specification responsible for controlling the SVM classifier datapath.

Figure 9.Stages and output variables of the FSM controller.
The signal Full_FIFOs checks if both FIFOs are full. The signal Count_Accum checks if the accumulator counter block is done, i.e., if it has been processed 71 times, because it depends on the process of each machine, as shown in Fig. 10. The accumulator counter block depends on the machine counter block of Fig. 11, where it is verified if the neuron has processed nine times in Architecture N#1 and three times in Architecture N#3. To summarize, if there are nine or three neuron processes in each machine depending on the architecture, then Count_Mach rises, and the accumulator starts to count. The accumulator finishes when Count_Mach rises 71 times, i.e., there is a total of 71 machines processed. In Architecture N#9, the Count_Mach signal does not exist because the nine neurons inside each machine process simultaneously, so the accumulator counter block accumulates after each machine is processed.

Figure 10.Diagram of the accumulator counter block.

Figure 11.Diagram of the machine counter block in Architecture N#1.
Three architectures are used because the DPR feature of FPGA is activated, and then the granularity of the application is analyzed. These architectures are the same in the static area; while they differ among themselves in the reconfigurable region, as shown in Fig. 12. The reconfigurable regions of Architecture N#1, Architecture N#3, and Architecture N#9 have different FIFO circuits, and more repetitions of the neuron circuit are additionally required.

Figure 12.Illustration of the reconfigurable region and the static area of the three architectures.
In terms of the granularity, Architecture N#9 has the biggest grain where we have nine neurons and two register bank circuits in the reconfigurable region. In Architecture N#3, we have three neurons and two shifter circuits in the reconfigurable region. For Architecture N#1, we have just one neuron and two FIFOs, where the grain is the smallest.
1) Architecture N#1: In this architecture, FIFOs for the SV and Test values follow the behaviors shown in Fig. 13 (a) and (b), respectively. The Count_Mach signal goes up when each neuron processes nine times, as shown in Fig. 11.

Figure 13.Designed FIFOs used to load (a) SV and (b) Test values into the neurons.
2) Architecture N#3: In this architecture, FIFOs for the SV and Test values follow the behavior shown in Fig. 14. The Count_Mach signal goes up when the three neurons process three times, as shown in Fig. 15.

Figure 14.Designed shifter used to load the SV and Test values into the neurons in Architecture N#3.

Figure 15.Diagram of the machine counter block in Architecture N#3.
3) Architecture N#9: In this architecture, FIFOs for the SV and Test values follow the behavior shown in Fig. 16. And the machine counter is not required.

Figure 16.Design of the circuit that loads the SV and Test values into the neurons in Architecture N#9.
The features of the hardware implementation in the classification phase are briefly summarized in Table 4.

Table 4. Summarized features of the hardware implementation.
Table 4. Summarized features of the hardware implementation.
Item | Feature | Input data | 88 frames of 3×3 pixels | Classification type | Frame by frame | Kernel function | Gaussian—using the exponential function | Multi-class technique | One-vs-all | Word size and type | 18-bit fixed-point | Architectures | FSM + pipelined datapath | Result | Binary | Description language | VHDL | Simulation and synthesis | Vivado 2019.1 | FPGA device | Xilinx ZYNQ-7 ZC702 |
|
5 Results and discussion
The performance of UAV positioning for a planned trajectory in Brazil as shown in Fig. 17 was experimentally evaluated. A larger satellite georeferenced image embracing the studied region was selected to conduct the experiment. Both the georeferenced and UAV images were pre-processed into gray-scale images and binary images, and then edge extraction was performed. Finally, convolution between these two segmented images was used to estimate the UAV position.

Figure 17.Planned UAV trajectory marked with the red line.
5.1 Software implementation
The simulated results with different kernel functions were compared, as shown in Fig. 18. The Gaussian function performs the best with a minimum square error of 0.0146. As a comparison, the corresponding values obtained with the linear and polynomial functions are 0.2801 and 0.0586, respectively. Since a dataset of 88 pixels is applied in the simulation, it is feasible to analyze whether the Gaussian kernel function (blue trace) behaves the same as the observed data function (black trace). Obviously, the blue trace follows the same behavior of the black trace, which means that the Gaussian function shows the desired behavior with 100% accuracy. Besides, the response time of the SVM classifier is 0.745 ms in the processor of IntelCore i7-6500 CPU @ 2.50 GHz-64 bits.

Figure 18.SVM classification results with different kernel functions.
The SVM classifier was also compiled with MATLAB and processed in CPU (13th GEN INTEL® CORETM i9-13900KF (24 cores)), GPU (NVIDIA GeForce RTX 4090), and both, respectively. The obtained results are shown in Table 5. It demonstrates that the SVM classifier is processed the fastest when both CPU and GPU are simultaneously applied because CPU has GPU as a parallel processor.

Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.
Table 5. Comparison of the results from the SVM classifier processed in CPU, GPU, and both.
Parameter | CPU | GPU | CPU + GPU | Frequency | 5020 MHz | 420 MHz | 5372 MHz | Processing time | 86.06 ms | 83.1 s | 64.7 μs |
|
5.2 Hardware implementation
The SVM architecture was described in Verilog Hardware Description Language (VHDL) and targeted into Xilinx ZYNQ-7 ZC702. The same dataset of 88 frames applied for software implementation in subsection 5.1 was used for hardware implementation as test data. The test results achieved 84.6% accuracy in classifying “edge” and “non-edge” patterns.
DPR was implemented in the blocks of neuron and FIFO. Thus, there are two partially reconfigured (PR) regions and three reconfigurable modules (RMs) (one neuron and two FIFOs) in Architecture N#1. The accumulator and decision-maker were implemented into a static region. In Architecture N#3, there are two PR regions and five RMs (three neurons and two shifter circuits). In Architecture N#9, there are two PR regions and eleven RMs (nine neurons and two register bank circuits).
Table 6 compares the results from Architecture N#1, Architecture N#3, and Architecture N#9, respectively, implemented without and with DPR. The results are related to the clock period, latency, areas, and power consumption, where the areas are divided into static and reconfigurable regions in the case with DPR, while for that without DPR, there is only a static area. The circuitry area and power consumption can be significantly decreased with DPR. In term of the static area, Architecture N#1 with DPR is 7.6 times smaller than that without DPR; similarly, both Architecture N#3 and Architecture N#9 are superior with 17.68 times and 41.83 times smaller, respectively. Regarding power consumption, even though Architecture N#1 with DPR exhibited a 40% increase, Architecture N#3 and Architecture N#9 achieved a reduction of 47.36% and 57.14%, respectively. However, in Architecture N#3 and ArchitectureN#9, another drawback emerges in time response related to latency because the clock period of the implementation with DPR is 2 times or even larger than that without DPR. Therefore, Architecture N#1 with DPR is almost 2000 times faster than its counterpart without DPR. Comparatively, Architecture N#3 and Architecture N#9 are 2.4 times and 1.97 times slower, respectively.

Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.
Table 6. Comparison of the results from Architecture N#1, Architecture N#3, and Architecture N#9 without and with DPR.
Architecture | Feature | Without DPR | With DPR | N#1 | Clock period | 100 ns | 50 ns | Latency | 0.19 s | 96 μs | LUTs-static area | 969 | 86 | Flip-flops-static area | 535 | 111 | LUTs-reconfigurable region | / | 924 | Flip-flops-reconfigurable region | / | 169 | Power consumption | 5 mW | 7 mW | N#3 | Clock period | 50 ns | 120 ns | Latency | 32.10 μs | 77.04 μs | LUTs-static area | 2865 | 88 | Flip-flops-static area | 637 | 110 | LUTs-reconfigurable region | / | 927 | Flip-flops-reconfigurable region | / | 273 | Power consumption | 19 mW | 9 mW | N#9 | Clock period | 50 ns | 100 ns | Latency | 10.95 μs | 21.60 μs | LUTs-static area | 8328 | 138 | Flip-flops-static area | 792 | 80 | LUTs-reconfigurable region | / | 955 | Flip-flops-reconfigurable region | / | 275 | Power consumption | 7 mW | 4 mW |
|
Table 6 also investigates the influence of the grain size of the reconfigurable region with DPR. The findings reveal an unexpected pattern in the power consumption of Architecture N#3 compared with Architecture N#1. The power consumption of Architecture N#3 is higher than that of Architecture N#1, due to its clock operating frequency being approximately 2.5 times slower. Such discrepancy in the clock speed results in higher power consumption.
DPR uses the device’s internal configuration port (ICAP) path. Table 7 shows the total run time and partial bitstream sizes of each architecture. Here, RMs are partially reconfigured 9×71 times, 3×71 times, and 1×71 times in Architecture N#1, Architecture N#3, and Architecture N#9, respectively. Both the total DPR run time and partial bitstream size in Architecture N#9 are the smallest because its implementation requires fewer reconfigurability repetitions than the others.

Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.
Table 7. DPR information of Architecture N#1, Architecture N#3, and Architecture N#9.
Architecture | Total DPR run time | Partial bitstream size | N#1 | 74.10 μs | 875 KB | N#3 | 26.00 μs | 913 KB | N#9 | 7.38 μs | 831 KB |
|
Fig. 19 (a) shows the neuron’s RM of Architecture N#1, which is the minimum grain size and used as a reference. Fig. 19 (b) exhibits the used cells per instance of one neuron in Architecture N#1. Correspondingly, the neuron processes of Architecture N#3 and Architecture N#9 are three times and nine times, respectively, more extensive than that of Architecture N#1.

Figure 19.Neuron’s grain size: (a) reference block and (b) report on the neuron’s cell usage from Vivado Design Suite.
The block diagrams of the project setup are shown in Fig. 20. There are two options provided, and the user can choose between two architectures in each option using the signal Sel_NeuronMode.

Figure 20.Block diagrams of the proposed setup: (a) Option A where the user can choose between Architecture N#1 and Architecture N#3 and (b) Option B where the user can choose between Architecture N#1 and Architecture N#9.
Option A: If the user sets Sel_NeuronMode = 0, then Architecture N#1 is on, and the goal is to decrease power consumption; however, if the user sets Sel_NeuronMode = 1, then Architecture N#3 is on, and the goal is to decline the response time (as shown in Fig. 20 (a)). The areas of Architecture N#1 and Architecture N#3 are almost the same, and a slight difference is existed between FIFOs in Architecture N#1 and the shifters in Architecture N#3, where the shifters occupy a larger area than FIFOs. The power consumption is also increased in Architecture N#3 because its clock frequency is lower than that in Architecture N#1.
Option B: If the user sets Sel_NeuronMode = 0, then Architecture N#1 is on, and the goal is to realize the smallest area occupation; however, if the user sets Sel_NeuronMode = 1, then Architecture N#9 is on, and the goal is to reduce the power consumption and response time, simultaneously (as shown in Fig. 20 (b)). Architecture N#9 occupies a larger area than Architecture N#1 because it runs nine neurons at a time, instead, Architecture N#1 only runs one neuron at a time. In addition, the power consumption in Architecture N#1 is larger because Architecture N#1 has nine times more iterations for each machine while Architecture N#9 has just one iteration for each machine. The clock period in Architecture N#1 is twice smaller than that in Architecture N#9.
Therefore, it can be reasonably concluded that the best scenario is Option B because it shows the best solutions for the occupied area (Architecture N#1), latency (Architecture N#9), and power consumption (Architecture N#9), simultaneously.
As mentioned above, the performance of the SVR model applied in an image edge detection system has been investigated in an FPGA device where the granularity has been analyzed by its newest feature, DPR. As far as we know, no reports have referred to this topic except the similar one published recently [54]. In Ref. [54], a real-time edge detection system is implemented in FPGA (Altera’s Cyclone IV E: EP4CE10F17C8) with an improved Canny algorithm, where a gray-scale image size of 512×512 pixels from a standard literature database is used. It is equivalent to around 29127 frames of 3×3 pixels in the proposed system in this paper. Based on this, an approximation can be made to compare the work with our proposed architecture. The comparison focuses on the response time because it is the only information that has been provided in Ref. [54] (see Table 8). It is demonstrated that the fastest response time of 0.29 s is achieved in the scenario of Architecture N#9 without DPR. Although this value is longer than 1.231 ms reported in Ref. [54], it is not reasonable to conclude that our proposed method is inferior to the one in Ref. [54]. Because no other criteria such as power consumption can be compared. Moreover, the applied FPGA devices and clock frequencies are different.

Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].
Table 8. Performance comparison of our proposed architecture with the one reported in Ref. [54].
Features | Proposal in Ref. [54] | N#1 with DPR | N#9 without DPR | Clock frequency | 50 MHz | 20 MHz | 20 MHz | Latency | 1.231 ms | 2.79 s | 0.29 s | Image size | 512×512 | 29127 times one frame of 3×3 | 29127 times one frame of 3×3 | FPGA device | Altera’s Cyclone IV E: EP4CE10F17C8 | Xilinx ZYNQ-7 ZC702 | Xilinx ZYNQ-7 ZC702 | Edge detection technique | Improved Canny algorithm | ML | ML |
|
6 Conclusion
UAV positioning estimation should be associated with an alternative GNSS signal strategy, particularly for critical missions. This estimation is still more relevant over regions under strong effects of ionospheric scintillation. Developing embedded systems for UAV positioning estimation is valuable, where image edge detection is one of the main sub-topics. Here, the SVR algorithm, more precisely, its classifier datapath, was used to address this problem and thus obtain “edge” and “non-edge” patterns, implemented in FPGA with DPR. FSM controlled the proposed datapath of the SVR classifier described in VHDL and targeted it in Xilinx Zedboard ZYNQ-7000 using an 18-bit fixed-point word. This SVR algorithm not only achieved as good accuracy as other algorithms that have been applied in the image edge detection approach; but it also was simpler to be implemented on hardware when compared with neural networks. Because SVR tends to converge to the optimal response faster with a less computational load. During the DPR design process, efficient ways to decrease power consumption, occupied area, and latency were explored depending on the layout implementation (through Architecture N#1, Architecture N#3, and Architecture N#9), and two options which can effectively make a balance among energy, area, and execution time were offered.
Although the software implementation exhibited a slower processing speed compared with the hardware implementation in this system, it achieved a higher success rate. This is primarily due to the hardware limitations, including processing power, memory capacity, and computational efficiency (i.e., the hardware’s performance dilemma). The software was specifically designed to optimize functionality and achieved a superior success rate within these constraints. Despite the execution time becoming another drawback in this hardware implementation, it can be alleviated by using DPR. Therefore, the application of DPR successfully reduced both the occupied area and dynamic power. In detail, as the grain size and circuit complexity increased, the clock frequency and occupied area also increased, while the power consumption and latency decreased. It was also found that as the complexity of the datapath in RMs increased, the clock period increased.
There still exists much work, which is worthy to be further studied in the future. For example, the classification accuracy might be enhanced by applying an optimizer in the training phase (such as globally asynchronous locally synchronous to the architecture), but it also leads to reduced latency and higher power consumption. This SVR algorithm can be potentially applied in other data fusion tasks of UAV scenarios, such as i) positioning with the fusion of coordinate estimation by computer vision and INS via SVR and ii) implementing the convolution routine in FPGA and exploring its processing-system side (the ARM cortex). In terms of other applications, our proposed methodology also can be helpful. For example, in service-oriented networks (SONs), services are treated as independent entities that can be accessed and utilized by users or other services, where the network infrastructure is built to enable efficient and secure communications between different services and thus allow them to interact and exchange data. SONs often utilize web services, application programming interfaces, or service-oriented architectures to facilitate the integration and interoperability of various services. Our methodology could be beneficial to adapt service nodes, which are damaged/attacked in real time or process similar tasks in the same FPGA area. Therefore, it is possible to reduce the resources while simultaneously maintaining the comparative power efficiency. In Ref. [55], an FPGA-based technology has been proposed to implement the embedded architecture of biologically inspired SON, whereas our reconfigurable FPGA method could transform this proposal into a more robust and self-organized system. Nevertheless, it is still challenging to realize an embedded, low-cost, reliable system for the upcoming ML-based hardware implementations.
Disclosures
The authors declare no conflicts of interest.