Real-time all-directional 3D recognition and multidistortion correction via prior diffraction neural networks

Min Huang; Bin Zheng; Ruichen Li; Yijun Zou; Xiaofeng Li; Chao Qian; Huan Lu; Rongrong Zhu; Hongsheng Chen

doi:10.1117/1.AP.7.5.056005

1 Introduction

The growing demand for advanced computer vision systems in fields such as autonomous navigation, industrial automation, and human–computer interaction has underscored the critical importance of three-dimensional (3D) object recognition.1^–3 Unlike traditional two-dimensional (2D) imaging, 3D recognition provides a more comprehensive understanding of object shapes and spatial relationships, which are essential to the accurate performance of the system in dynamic environments. However, several substantial challenges remain inherent to the application of current 3D recognition methods due to their heavy reliance on scanning techniques that are sensitive to factors such as deformation, changes in orientation, occlusion, and noise.4^–6 Incomplete or distorted data, often produced by unilateral scanning or environmental disturbances, pose significant impediments to recognition accuracy. Furthermore, the high cost of acquiring, storing, and processing 3D scan data, along with uneven data distribution, limits the scalability and practical applications of the technology in real-world scenarios.7^–10 Overcoming these limitations is essential to enable reliable and efficient 3D recognition in complex environments. A key challenge in 3D object recognition is the limitations of current systems, which often depend on fixed viewing angles, rigid object orientations, and predefined patterns. In dynamic environments, such constraints severely hinder the ability to achieve accurate recognition, particularly when addressing occlusions, irregular shapes, and varying poses.

Metasurfaces are a promising solution to this issue. These ultrathin two-dimensional structures, capable of manipulating electromagnetic waves at nanosecond speeds, have already demonstrated their potential as components in lenses,10^,11 cloaking,12^–14 intelligent sensors,15^,16 and direction-of-arrival estimation.17^–19 Metasurfaces can generate random point clouds in full space,20 thereby offering a novel mechanism through which 3D recognition can be enhanced21^–26 by eliminating the need for conventional scanning methods. Real-time recognition27^–33 becomes possible through direct interaction with electromagnetic waves by integrating metasurface technology with deep learning models, particularly in the form of diffraction neural networks. This approach provides high-speed, light-driven object identification while drastically reducing data processing demands. Diffraction neural networks, modeled after the behavior of electromagnetic waves passing through metasurfaces, process vast amounts of data at the speed of light,34^–36 offering advantages such as low power consumption,37^–39 immediate imaging, and real-time object recognition.40^,41 Despite these advancements, current systems still face challenges in handling objects with complex geometries or multiple orientations, making further development essential to realize the full potential of metasurface-driven 3D recognition for real-world applications.

We present a deep knowledge prior diffraction neural network (DNN) for real-time all-directional 3D object recognition and multidistortion correction. It excels in accurately identifying diverse object forms in complex environments, including those with dynamically changing postures. As demonstrated in Fig. 1, the network accurately recognizes challenging scenarios, such as curled-up kittens or children crossing a road, contributing to enhanced safety in intelligent driving applications. This system effectively addresses hardware and data constraints, enabling accurate object identification and the correction of complex distortions. The deep knowledge prior generates optimized metasurface parameters with minimal training data by leveraging random Gaussian noise (the deep knowledge prior part in Fig. 1), whereas the diffraction neural network refines these parameters to improve accuracy when comparing distorted and real images (2D/3D). Experimental results using distorted 2D patterns and 3D models demonstrated the ability of deep knowledge prior diffraction neural networks to accurately correct distortions and recognize objects from various perspectives in real time. The system effectively handled multiple distortions, including both planar and 3D distortions, accounting for changes in the viewpoint of the observer. In 3D recognition, experimental results show that the integrated system accurately identifies both dynamic and static airplanes. It continuously monitors the electric field at specific positions during orientation changes, enabling analysis of the airplane structure. Stable electric field variations indicate that dynamic movement and arbitrary orientation changes do not affect recognition results, enhancing the suitability for complex and dynamic environments. This scanning-free method exhibits the versatility necessary for a wide variety of potential applications, including in various imaging systems, and provides autonomous and efficient 2D/3D object recognition. The deep knowledge prior diffraction neural network sets a new benchmark in physical neural network training paradigms, expanding their target capabilities and setting a precedent for innovations in intelligent imaging, advanced recognition technologies, and next-generation metasurface applications.

$Application of a real-time all-directional 3D recognition multiple distortion correction system in automatic driving. The metasurface array supported by the deep knowledge prior diffraction neural network can recognize obstacles or people with different poses in a large field of view.$

Figure 1.Application of a real-time all-directional 3D recognition multiple distortion correction system in automatic driving. The metasurface array supported by the deep knowledge prior diffraction neural network can recognize obstacles or people with different poses in a large field of view.

Download full size

View all figures

2 Results

2.1 Principle and Architecture of Deep Knowledge Prior Diffraction Neural Network

In conventional DNNs, the phase distribution of a metasurface serves as the equivalent of a weight—a trainable parameter. However, the depth of these networks is inherently limited by the number of metasurface layers, which constrains their ability to form sufficiently deep architectures.42^–44 To overcome this limitation while preserving the physical implementation, we introduced a deep convolutional DNN. This approach was implemented using two distinct components: a deep generator network responsible for generating the phase array and a DNN that handles distortion correction and object recognition. The deep generator network produces phase distributions by utilizing them as prior knowledge to guide the training process.45^–48 The objective of the network is to optimize this distribution to achieve the desired transmission characteristics by treating the transmission coefficient distribution as a 2D image. We assume that $x$ is the image generated in the optimization process, $x_{0}$ is the degraded image [Fig. 2(a)], and the knowledge prior generator network continuously optimizes from $x_{0}$ and generates the optimal solution $x^{*}$ , transforming the inverse problem of image recovery into an energy minimization problem $x^{*} = \min_{x} E (x; x_{0}) + R (x) .$ (1)The problem was divided into task-related data items and regular items. In the data item $E (x; x_{0}) = ‖ x - x_{0} ‖^{2} .$ (2)The regular term $R (x)$ corresponds to a prior in the generated network. This item makes the iterative image $x$ approach $x_{0}$ continuously so that the objective function has a certain smoothing effect on $x$ and will not overfit. Here, convolutional networks are used to implement the image restoration (transmission coefficient generation) task, so the generator network $x = f_{θ} (z)$ maps the random vector $z$ to the image $x$ . Replace $R (x)$ with prior, extracted by convolutional neural network implicitly, $θ^{*} = \arg \min_{θ} E (f_{θ} (z); x_{0}),$ (3) $x^{*} = f_{θ^{*}} (z),$ (4)where $z$ is a fixed random code of the initial input network, such as a random sequence of Gaussian distribution; $θ$ is the network parameter initialized randomly at the beginning; and $θ^{*}$ is the optimal solution of the parameters obtained through training, corresponding to the optimal output $x^{*}$ of the network. The implementation of the deep prior generator network is described in detail in Note 1 in the Supplementary Material. In this approach, the phase is treated as an optimization target rather than as a directly trained parameter. Notably, the knowledge prior generator network does not require a large training dataset; instead, it iteratively updates the phase prior starting from an initial random sequence. Because the generator network is decoupled from the implementation of the physical system, its architecture can be tailored with flexibility, allowing for the customization of the number of layers, functions, and operations. This design increased the width and depth of the network, enhancing the ability of the model to represent and abstract complex information. Consequently, the network can learn more intricate patterns and capture higher-order features. This flexible architecture provides a robust foundation for a DNN to leverage deep knowledge priors, enabling it to extract substantial sample features and achieve superior generalization performance.

$Design of prior diffraction neural network. (a) Working process of the correction–recognition diffraction neural network based on deep knowledge prior. (b) Detailed network model of prior diffraction neural network.$

Figure 2.Design of prior diffraction neural network. (a) Working process of the correction–recognition diffraction neural network based on deep knowledge prior. (b) Detailed network model of prior diffraction neural network.

Download full size

View all figures

The deep knowledge prior neural network comprises 10 convolutional layers and employs an encoding–decoding architecture with nonlinear mapping functions [Fig. 2(b)]. The input was a random sequence following a Gaussian distribution, and the output was the optimized transmission coefficient for the metasurface. Three independently optimized deep knowledge prior neural networks were paired with DNNs, each consisting of three fully connected layers. In fact, the three generator networks can employ different architectures as they do not directly influence one another. Rather than relying on predefined optimal phase distributions, the networks iteratively refine the phase combination across all layers, ensuring their collective interaction precisely shapes the wavefront and enhances object recognition. This parallel design enhances modularity, avoids gradient entanglement, and allows task-specific customization, leading to a more stable and scalable optimization framework. Each metasurface array layer was composed of $64 \times 64$ units, with supercell partitioning employed to ensure the phase accuracy between the units, which is crucial for maintaining imaging fidelity. During training, the loss function used was the mean squared error, whereas image quality was assessed using the structural similarity index (SSIM) (see Appendix for detailed definitions). The input to the DNN, which consists of a metasurface array, is a plane wave that passes through a metal plate with a hollow pattern, effectively generating the diffraction pattern of the object. To ensure the accuracy of the input, we simulated and extracted the diffraction pattern of each object under plane wave illumination using CST Microwave Studio.

2.2 Inspiration for Samples and Training Results of Deep Knowledge Prior Diffraction Neural Network

The quality of the images captured by machine vision systems is crucial for enabling algorithms to efficiently and accurately extract information that is essential to the process of decision-making. These systems must operate within real-world coordinate systems and measurement units for precise measurement and control; this necessitates the pre-establishment of a mapping system between the pixel coordinates and real-world coordinates to ensure accuracy throughout the process. However, real-world conditions are often far from ideal. Equipment limitations and challenges during system integration can introduce image distortion. Distortions confined to the 2D plane are referred to as planar distortions, whereas those affecting 3D objects are classified as spatial or 3D distortions.

Under conditions in which it is impossible to position the camera perpendicular to the target, perspective distortion is introduced into the image [Fig. 3(a)]. In addition, imperfections in the optical properties of the camera introduce radial distortions such as barrel and pincushion (pillow) distortions [Fig. 3(b)]. Barrel distortion causes the image to bulge outward, creating a barrel-like appearance, whereas pincushion distortion pulls the image inward, narrowing it toward the center. An improper shooting angle can also result in the pattern rotating and shifting away from the center of the view. These distortions can overlap and compound the overall image deformation. Barrel and pincushion distortions affect the 2D plane, whereas perspective distortion deforms the image in 3D space [Fig. 3(c)]. Such distortions significantly impair the accuracy of machine vision systems, making it essential not only to ensure the precise calculation of the transformation between pixel and real-world coordinates but also to apply effective image correction techniques. In 3D space, random factors such as object orientation and observer perspective further complicate recognition [Fig. 3(d)]. To simulate real-world conditions, aircraft were displayed with varying orientations within the diffraction system during the experiment [Fig. 3(e)]. Notably, the correction–recognition integrated system based on electromagnetic diffraction eliminates the need for 3D-to-2D coordinate transformation and minimizes the lens distortion at its source.

$Dynamic and static three-dimensional recognition in the actual scene. (a) The camera lens is not parallel to the picture, resulting in perspective distortion. Moreover, the object has a different degree of plane distortion due to the inappropriate shooting angle. (b) Specific plane distortions caused by subjective and objective causes. Under nonideal shooting conditions, the picture has single or double plane random variation. (c) Different attitudes of the aircraft from the same perspective. The superimposed random distortion of the plane and three-dimensional space caused by the shooting angle. (d) Aircraft at different attitudes observed from different viewing angles. (e) To simulate the identification of aircraft by observers in the actual scene, the aircraft with different attitudes are recognized by the diffraction system in the experiment. (f) The training results of the deep phase generation network show that the phase loss of each metasurface is close to 0. (g) The loss of the training and test sets versus epoch. After thousands of epochs, the correction–recognition integrated system achieves an accuracy rate of 97.7% and 94.6% for the training and test sets, respectively. (h) The training results for the conventional diffraction neural network over the epoch. It is in an underfitting state, and the accuracy is only 55.2% for the test set.$

Figure 3.Dynamic and static three-dimensional recognition in the actual scene. (a) The camera lens is not parallel to the picture, resulting in perspective distortion. Moreover, the object has a different degree of plane distortion due to the inappropriate shooting angle. (b) Specific plane distortions caused by subjective and objective causes. Under nonideal shooting conditions, the picture has single or double plane random variation. (c) Different attitudes of the aircraft from the same perspective. The superimposed random distortion of the plane and three-dimensional space caused by the shooting angle. (d) Aircraft at different attitudes observed from different viewing angles. (e) To simulate the identification of aircraft by observers in the actual scene, the aircraft with different attitudes are recognized by the diffraction system in the experiment. (f) The training results of the deep phase generation network show that the phase loss of each metasurface is close to 0. (g) The loss of the training and test sets versus epoch. After thousands of epochs, the correction–recognition integrated system achieves an accuracy rate of 97.7% and 94.6% for the training and test sets, respectively. (h) The training results for the conventional diffraction neural network over the epoch. It is in an underfitting state, and the accuracy is only 55.2% for the test set.

Download full size

View all figures

Based on the above, the dataset includes samples with single random distortions (e.g., planar distortion only), double random distortions (planar and 3D), and aircraft in various flight attitudes. Planar distortions include barrel distortion, pincushion distortion, rotation, scaling up, and scaling down. For 3D distortions, the digital metal plate was tilted along the $x$ - and $y$ -axes relative to the metasurface. Notably, multiple distortions can occur simultaneously in both the numerical data and physical patterns. The recognition task must handle aircraft with varying orientations, positions, and distances from the system (see Note 2 in the Supplementary Material for simulation settings of the aircraft models in different poses). To better demonstrate the effectiveness of distortion correction and object recognition, the training and experimental samples were limited to single-object cases. However, the system is inherently capable of handling multiple-object scenarios, provided that the training dataset includes corresponding multi-object samples. This enables simultaneous distortion correction and recognition for multiple targets. To emulate real-world conditions more accurately and account for experimental errors, Gaussian noise was introduced to control variations in attitude, distance, and other parameters. Of the dataset, 70% was used to generate gradients, whereas the remaining 30% was reserved for testing (see Note 3 in the Supplementary Material for the detailed dataset composition). The standard for defining correction and recognition accuracy was set to an $SSIM \geq 0.85$ , indicating well-recovered and easily recognizable imaging results. Figures 3(f) and 3(g) illustrate the training process, showing a significant decline in loss, with accuracy rates reaching $\sim 98 %$ for the training set and 95% for the test set. For comparison, we performed the same correction–recognition task using conventional DNNs. Compared with the conventional DNN with the same number of layers, the prior DNN achieved clearer distortion correction and 3D recognition results with the same number of iterations, whereas the conventional network struggled to distinguish the results (see Note 4 in the Supplementary Material). Despite the increase in the number of layers, the traditional DNN cannot achieve accurate identification, as shown in Figs. S4(g) and S4(h) in the Supplementary Material. Although the loss continued to decrease, the average SSIM reached only 0.8, indicating room for improvement [Fig. 3(h)].

2.3 Experimental Results of Multiple Distortion Correction and Dynamic Three-Dimensional Recognition

We designed a prior DNN implemented in a three-layer metasurface array operating at 10.5 GHz [Fig. 4(a)]. The unit cell had a period of 6 mm, and the metasurface measured $384 mm \times 384 mm$ (see Note 5 in the Supplementary Material for details on the metasurface design). The distance between the metasurfaces and the imaging plane was 200 mm. Notably, any unit with high transmittance and a phase coverage of $\sim 2 π$ is applicable in this design. We adopted a supercell strategy in the metasurface design to minimize the coupling between the units, where each supercell represents a locally optimized block of unit cells arranged to suppress near-field interactions and enhance overall stability. One port of the vector network analyzer was connected to a standard gain horn antenna operating in the 8 to 12 GHz frequency band, whereas the other port was connected to a probe mounted on a sweeping platform. The probe scanned the imaging plane with a step size of 4 mm. The distortion experiment samples included both 3D distorted digits and planar distorted patterns to demonstrate the correction accuracy of the system and highlight the capacity advantages of the deep knowledge prior network. Figure 4(b) shows an example in which two planar distortions, scaling (scale-up/scale-down) and rotation, occur simultaneously. A planar distorted pattern was positioned against the metasurface. To simulate 3D distortion, a planar distorted digital metal plate was placed on an acrylic shelf at an angle of 30 deg from the metasurface in both directions (see Note 6 in the Supplementary Material for the experimental setup for the 3D distortion correction). As shown in Fig. 4(c), the imaging surface displays a recognizable standard pattern despite the double distortion. The average SSIM between the recovered images (for both single and double random distortions) and target images reached 0.95. The clarity of the numbers and patterns on the output plane demonstrates the excellent correction capability of the integrated correction–recognition system. Furthermore, Fig. S7(a) in the Supplementary Material shows that the distortions occurring at varying distances and orientations yield good correction results.

$Experimental results of multiple distortion correction and static aircraft real-time recognition. (a) Experimental setup for planar and three-dimensional distortion correction and three-dimensional recognition. The aircraft is suspended on a graduated rack to simulate different flight attitudes. (b), (c) Experimental results of image restoration with double random distortion, including stereoscopic distortion. (d) The experimental recognition results are identified according to the diffraction distribution under different attitudes. The orientation of the aircraft is demonstrated and labeled with (θ,φ) in the left coordinate system.$

Figure 4.Experimental results of multiple distortion correction and static aircraft real-time recognition. (a) Experimental setup for planar and three-dimensional distortion correction and three-dimensional recognition. The aircraft is suspended on a graduated rack to simulate different flight attitudes. (b), (c) Experimental results of image restoration with double random distortion, including stereoscopic distortion. (d) The experimental recognition results are identified according to the diffraction distribution under different attitudes. The orientation of the aircraft is demonstrated and labeled with $(θ, φ)$ in the left coordinate system.

Download full size

View all figures

Next, we verified the ability of the integrated correction–recognition system to recognize aircraft from different viewing angles [the left part marked with $(θ, φ)$ in Fig. 4(d)]. The aircraft models were mounted on an acrylic shelf, and their flight attitudes were simulated by rotating them in 3D space. The acrylic shelf was marked with pitch and azimuth scales on the top and sides, respectively. In the experiment, three distinct flight attitudes were tested: aircraft flying toward, away from, and near the metasurface array [Fig. 4(d)]. Electromagnetic waves were irradiated onto the metal aircraft, passing through the metasurface array to form complete and clear images of the aircraft on the imaging surface. The average SSIM between the experimental identification and training results was 0.95. For comparison, the electric field distribution measured without the system gives a completely different result [Fig. S6(d) in the Supplementary Material]. Although the manually controlled attitude, distance, and position of the aircraft were not ideal, showing deviations of single or double digits from the set targets, the high accuracy of the system demonstrated its robustness. To validate the universality of the system further, we tested aircraft samples with distances and positions that deviated from the ideal setup (see Note 7 in the Supplementary Material for additional results on the effects of distance and position on 3D aircraft recognition) and achieved an average SSIM of 0.93.

In autonomous driving scenarios, pedestrians, cyclists, and motor vehicles are often in motion. Real-time monitoring and analysis of moving humans or objects provide essential information for autonomous driving systems, enhancing both safety and intelligence. To demonstrate the recognition capability of the proposed correction-recognition integrated system for dynamic objects, an experiment was conducted using a 2D turntable to control the dynamic variation of pitch and azimuth angles of a suspended airplane. The pitch angle varied from $- 15$ to 15 deg, and the azimuth angle ranged from $- 45$ to 45 deg. The imaging plane was stably scanned with a 4 mm step size to measure the electric field distribution. Continuous field strength monitoring was conducted at six key positions on the imaging plane corresponding to the head, airfoil, and empennage of the airplane. These six critical locations clearly outline the shape of the airplane. As shown in Figs. 5(a) and 5(b), the recognition results remain stable and accurately depict the shape of the airplanes during continuous changes in their flight attitudes. The six highlighted positions consistently maintain an electric field strength of approximately $- 60 dB$ , whereas nonhighlighted regions (e.g., position 7) correspond to background noise. Videos 1 and 2 provide full animations of the dynamic recognition results for two airplanes. These results demonstrate that dynamic attitude transformations do not interfere with the recognition performance of the correction–recognition integrated system.

Figure 5.Experimental demonstration of dynamic three-dimensional recognition. (a) Measured electric field distribution with dynamic left-right steering ( $φ$ change) (Video 1, MP4, 1.92 MB [URL: https://doi.org/10.1117/1.AP.7.5.056005.s1]). (b) Measured electric field distribution with dynamic forward and backward tilt ( $θ$ change) (Video 2, MP4, 1.70 MB [URL: https://doi.org/10.1117/1.AP.7.5.056005.s2]). (c) Measured electric field distribution with dynamic movement in all directions (Video 3, MP4, 1.84 MB [URL: https://doi.org/10.1117/1.AP.7.5.056005.s3]).

Download full size

View all figures

To further simulate the motion of road users, the performance of the system in recognizing objects with dynamic displacement was also evaluated. A third airplane was suspended and moved in four directions—up, down, left, and right. As shown in Fig. 5(c), the six highlighted positions displayed stable field strength distributions, in contrast to the environmental noise observed in nonhighlighted areas. The complete recognition process of the airplane lateral movement is shown in Video 3. The clear airplane shapes on the output plane demonstrate the potential of the correction–recognition integrated system to identify walking pedestrians, cycling riders, and driving vehicles. These findings confirm that the proposed system can effectively recognize humans and objects in various motion states, making it highly suitable for diverse and dynamic environments.

3 Discussion and Conclusion

We proposed a high-capacity, integrated system for distortion correction and object recognition, capable of simultaneously correcting 2D/3D image distortions and recognizing objects from multiple angles in real time through wave-based processing. An independent deep knowledge prior neural network gathers low-level statistical prior information before learning, allowing it to predict parameter distributions using only one image. We solved the issue of linking the depth of the diffraction network to the number of physical layers by combining a deep knowledge prior network with a DNN, thereby increasing the capacity for trainable samples. The fuzzy distribution of the metasurface transmission coefficients was obtained by training random sequences under a Gaussian distribution. These prior values are then input into the DNN, enabling backward propagation through both networks, which shortens the convergence time and improves the training efficiency. Deep knowledge prior networks offer a novel solution to the network depth limitations imposed by physical architecture and hardware constraints, driving the development of electromagnetic neural networks. Furthermore, although the system enables real-time 3D recognition, the quantitative characterization of axial spatial resolution and its associated confidence interval is not included in the current work. These aspects will be addressed in future studies through controlled measurements and comparative error analysis across varying depth positions. The highly integrated passive distortion correction–recognition system can independently recognize 3D objects and serve as a preprocessing unit for imaging or recognition systems, significantly lowering equipment costs. With broad applications in robotics, intelligent transportation, public safety, military security, and identity authentication, real-time object recognition has significant potential for commercial applications.

Our integrated distortion correction and all-directional recognition system excels in static recognition and is highly capable in terms of the real-time recognition of moving objects, including high-speed targets. The system performance is unaffected by object distortion or changes in orientation owing to motion, making it ideal for dynamic target recognition. Traditional systems exhibit slow recognition speeds, low success rates, and declining accuracy for moving objects, thereby limiting their practical applications. By contrast, our system, which is based on a passive metasurface array, offers real-time recognition enabled by the intrinsic propagation speed of electromagnetic waves, along with large data storage capacity, wide bandwidth, and high accuracy. We cautiously note that the apparent resolution of $\sim λ / 19$ , obtained from simulations, does not imply a direct physical surpassing of the diffraction limit. Instead, it reflects the system’s capability in resolving spatial patterns under near-field model assumptions and ideal sampling.34^,42^,49 The theoretical origin of this phenomenon will be further investigated in follow-up studies. In addition, it minimizes redundant data generation, transmission, and processing, significantly reducing the feature extraction time and improving the recognition efficiency. We acknowledge that the current study does not include experimental validation in reflection mode. Extending the sensing system to operate effectively in reflection-mode configurations is an important future direction. This will involve not only reconfiguring the measurement setup but also evaluating the signal-to-noise performance under real-world conditions with limited optical access.

4 Appendix: Methods

4.1 Data Collection

Patterns that distort only in a 2D plane form an input matrix of 0 to 1. An electromagnetic wave passes through the pattern at a certain angle or distance from the metasurface array in 3D space and reaches the first metasurface to generate diffraction. We simulated the diffraction process of the 3D distorted pattern reaching the metasurface in CST Microwave Studio and used the diffraction electric-field distribution as the input of the neural network. To characterize the uncertainty of the measurement system, we monitored field amplitude variations across repeated scans with varied object positions and orientations. Although a full quantitative uncertainty analysis is not yet included in this work, initial observations suggest standard deviation in electric field intensity measurements remains within $\pm 2 dB$ at critical sensing points. A more detailed assessment of axial and lateral uncertainty will be conducted in follow-up work.

4.2 SSIM Function

The SSIM is an index for measuring the similarity of two images, mainly used to detect the similarity of two images of the same size or to detect the degree of image distortion. The SSIM calculation is based on three comparative measures between samples $x$ and $y$ : luminance, contrast, and structure, $S (x, y) = f (I (x, y), c (x, y), s (x, y)) .$ (5)

Brightness is measured using the average gray level of the image. The brightness contrast function $I (x, y)$ is a function of $μ_{x}$ and $μ_{y}$ , $μ_{x} = \frac{1}{N} \sum_{i - 1}^{N} x_{i},$ (6) $I (x, y) = \frac{2 μ_{x} μ_{y} + c_{1}}{μ_{x}^{2} + μ_{y}^{2} + c_{1}} .$ (7)

Contrast is measured using the standard deviation of the image. The contrast function $c (x, y)$ is a function of $σ_{x}$ , $σ_{y}$ , $σ_{x} = {(\frac{1}{N - 1} \sum_{i - 1}^{N} (x_{i} - μ_{x})^{2})}^{\frac{1}{2}},$ (8) $c (x, y) = \frac{2 σ_{x} σ_{y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}} .$ (9)

Structure contrast is an image divided by its own standard deviation. The structural contrast function $s (x, y)$ is a function of $\frac{x - μ_{x}}{σ_{x}}$ and $\frac{y - μ_{y}}{σ_{y}}$ , $s (x, y) = \frac{σ_{x y} + c_{3}}{σ_{x} σ_{y} + c_{3}},$ (10) $σ_{x y} = \frac{1}{N - 1} \sum_{i - 1}^{n} (x_{i} - μ_{x}) (y_{i} - μ_{y}) .$ (11)The constants $c_{1}$ , $c_{2}$ , and $c_{3}$ are used to avoid instability problems when the denominator is 0. Considering the above three quantities, the general equation of the SSIM is expressed as follows: $SSIM (x, y) = {(I (x, y))}^{α} {(c (x, y))}^{β} {(s (x, y))}^{γ} .$ (12)In general, $α$ , $β$ , and $γ$ take 1, and let $c_{3} = c_{2} / 2$ obtain a simplified SSIM formula, $SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})} .$ (13)The SSIM values range from 0 to 1. The larger the value, the smaller the gap between the output image and the undistorted image; that is, the better the image quality.

4.3 Training and Experiment Details

The prior DNN was trained with Python (v3.7) and PyTorch (v1.9.0) on a server (NVIDIA Quadro RTX 5000 GPU and Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz with 256 GB RAM). In the experiment, the transmitting source was generated by the VNA (Ceyear 3672L), and the transmitted wave signal was collected using a probe fixed to the sweep platform and processed by the VNA.

Acknowledgments

Acknowledgment. The work was supported by the National Key Research and Development Program of China (Grant Nos. 2022YFA1404704, 2022YFA1405200, and 2022YFA1404902), the National Natural Science Foundation of China (NNSFC) (Grant Nos. 61975176 and 62071423), the Key Research and Development Program of Zhejiang Province (Grant Nos. 2022C01036 and 2024C01160), the Natural Science Foundation of Zhejiang Province (Grant No. LR23F010004), the Top-Notch Young Talent of Zhejiang Province, and the Fundamental Research Funds for the Central Universities.

Min Huang received the BS degree from Southwest Jiaotong University, Chengdu, China, in 2020, and the PhD from Zhejiang University, Hangzhou, China, in 2025. Her research interests include metasurfaces, intelligent electromagnetic wave manipulation, and neural networks.

Bin Zheng received the PhD from Zhejiang University, Hangzhou, China, in 2015. He is a professor with the College of Information Science and Electronic Engineering, Zhejiang University. He has been awarded the Top-Notch Young Talent of China, and the Excellent Youth Foundation of Zhejiang Province. His research interests include transformation optics, metamaterials, metasurfaces, and invisibility cloaks.

Ruichen Li received the BS degree from Xidian University, Xi’an, China, in 2020, and the PhD from Zhejiang University, Hangzhou, China, in 2025. Her research interests include electromagnetic invisibility, artificial electromagnetic materials, and metasurfaces.

Yijun Zou received the BS degree from Zhejiang University, Hangzhou, China, in 2020. He is currently pursuing the PhD with the School of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His research interests include metasurfaces, holographic imaging, and machine learning.

Xiaofeng Li received the BS degree from Air Force Engineering University, Xi’an, China, in 2019, and the PhD from the same university, in 2024. His research interests include metasurfaces, holographic imaging, and novel transmit/reflect array antennas.

Chao Qian received a PhD from Zhejiang University, Hangzhou, in 2020. He was a visiting PhD student at California Institute of Technology from 2019 to 2020. In 2021, he joined the Zhejiang University/University of Illinois at Urbana-Champaign Institute as an assistant professor. His research interests are on metamaterials/metasurfaces, electromagnetic scattering, inverse design, and deep learning.

Huan Lu received the MS degree from Xidian University, Xi’an, China, in 2019, and the PhD from Zhejiang University, Hangzhou, China, in 2023. Her research interests include metamaterials, electromagnetic scattering, artificial intelligence, and intelligent electromagnetic cloaks.

Rongrong Zhu received the PhD from College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, in 2019. She is an assistant professor in Hangzhou City University now. Her main research interest is intelligent metasurface.

Hongsheng Chen (fellow, IEEE) received the BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2000 and 2005, respectively. In 2005, he joined the College of Information Science and Electronic Engineering, Zhejiang University, and was promoted to full professor in 2011. In 2014, he was honored with the distinguished Cheung-Kong Scholar award. Currently, he serves as the Dean of College of Information Science and Electronic Engineering, Zhejiang University. He received the National Science Foundation for Distinguished Young Scholars of China in 2016, and the Natural Science Award (first class) from the Ministry of Education, China, in 2020. His current research interests are in the areas of metamaterials, invisibility cloaking, transformation optics, and topological electromagnetics.

Category: Research Articles

Received: Jan. 22, 2025

Accepted: Jul. 15, 2025

Published Online: Aug. 21, 2025

The Author Email: Rongrong Zhu (rorozhu@hzcu.edu.cn), Hongsheng Chen (hansomchen@zju.edu.cn)

DOI:10.1117/1.AP.7.5.056005

CSTR:32187.14.1.AP.7.5.056005