The growing demand for advanced computer vision systems in fields such as autonomous navigation, industrial automation, and human–computer interaction has underscored the critical importance of three-dimensional (3D) object recognition.1
Advanced Photonics, Volume. 7, Issue 5, 056005(2025)
Real-time all-directional 3D recognition and multidistortion correction via prior diffraction neural networks
Robust three-dimensional (3D) recognition across different viewing angles is crucial for dynamic applications such as autonomous navigation and augmented reality; however, the application of the technology remains challenging owing to factors such as orientation, deformation, and noise. Wave-based analogous computing, particularly diffraction neural networks (DNNs), constitutes a scan-free, energy-efficient means of mitigating these issues with strong resilience to environmental disturbances. Herein, we present a real-time all-directional 3D object recognition and distortion correction system using a deep knowledge prior DNN. Our approach effectively addressed complex two-dimensional (2D) and 3D distortions by optimizing the metasurface parameters with minimal training data and refining them using DNNs. Experimental results demonstrate that the system can effectively rectify distortions and recognize objects in real time, even under varying perspectives and multiple complex distortions. In 3D recognition, the prior DNN reliably identifies both dynamic and static objects, maintaining stable performance despite arbitrary orientation changes, highlighting its adaptability to complex and dynamic environments. Our system can function either as a preprocessing tool for imaging platforms or as a stand-alone solution, facilitating 3D recognition tasks such as motion sensing and facial recognition. It offers a scalable solution for high-speed recognition tasks in dynamic and resource-constrained applications.
1 Introduction
The growing demand for advanced computer vision systems in fields such as autonomous navigation, industrial automation, and human–computer interaction has underscored the critical importance of three-dimensional (3D) object recognition.1
Metasurfaces are a promising solution to this issue. These ultrathin two-dimensional structures, capable of manipulating electromagnetic waves at nanosecond speeds, have already demonstrated their potential as components in lenses,10,11 cloaking,12
We present a deep knowledge prior diffraction neural network (DNN) for real-time all-directional 3D object recognition and multidistortion correction. It excels in accurately identifying diverse object forms in complex environments, including those with dynamically changing postures. As demonstrated in Fig. 1, the network accurately recognizes challenging scenarios, such as curled-up kittens or children crossing a road, contributing to enhanced safety in intelligent driving applications. This system effectively addresses hardware and data constraints, enabling accurate object identification and the correction of complex distortions. The deep knowledge prior generates optimized metasurface parameters with minimal training data by leveraging random Gaussian noise (the deep knowledge prior part in Fig. 1), whereas the diffraction neural network refines these parameters to improve accuracy when comparing distorted and real images (2D/3D). Experimental results using distorted 2D patterns and 3D models demonstrated the ability of deep knowledge prior diffraction neural networks to accurately correct distortions and recognize objects from various perspectives in real time. The system effectively handled multiple distortions, including both planar and 3D distortions, accounting for changes in the viewpoint of the observer. In 3D recognition, experimental results show that the integrated system accurately identifies both dynamic and static airplanes. It continuously monitors the electric field at specific positions during orientation changes, enabling analysis of the airplane structure. Stable electric field variations indicate that dynamic movement and arbitrary orientation changes do not affect recognition results, enhancing the suitability for complex and dynamic environments. This scanning-free method exhibits the versatility necessary for a wide variety of potential applications, including in various imaging systems, and provides autonomous and efficient 2D/3D object recognition. The deep knowledge prior diffraction neural network sets a new benchmark in physical neural network training paradigms, expanding their target capabilities and setting a precedent for innovations in intelligent imaging, advanced recognition technologies, and next-generation metasurface applications.
Sign up for Advanced Photonics TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
Figure 1.Application of a real-time all-directional 3D recognition multiple distortion correction system in automatic driving. The metasurface array supported by the deep knowledge prior diffraction neural network can recognize obstacles or people with different poses in a large field of view.
2 Results
2.1 Principle and Architecture of Deep Knowledge Prior Diffraction Neural Network
In conventional DNNs, the phase distribution of a metasurface serves as the equivalent of a weight—a trainable parameter. However, the depth of these networks is inherently limited by the number of metasurface layers, which constrains their ability to form sufficiently deep architectures.42
Figure 2.Design of prior diffraction neural network. (a) Working process of the correction–recognition diffraction neural network based on deep knowledge prior. (b) Detailed network model of prior diffraction neural network.
The deep knowledge prior neural network comprises 10 convolutional layers and employs an encoding–decoding architecture with nonlinear mapping functions [Fig. 2(b)]. The input was a random sequence following a Gaussian distribution, and the output was the optimized transmission coefficient for the metasurface. Three independently optimized deep knowledge prior neural networks were paired with DNNs, each consisting of three fully connected layers. In fact, the three generator networks can employ different architectures as they do not directly influence one another. Rather than relying on predefined optimal phase distributions, the networks iteratively refine the phase combination across all layers, ensuring their collective interaction precisely shapes the wavefront and enhances object recognition. This parallel design enhances modularity, avoids gradient entanglement, and allows task-specific customization, leading to a more stable and scalable optimization framework. Each metasurface array layer was composed of units, with supercell partitioning employed to ensure the phase accuracy between the units, which is crucial for maintaining imaging fidelity. During training, the loss function used was the mean squared error, whereas image quality was assessed using the structural similarity index (SSIM) (see Appendix for detailed definitions). The input to the DNN, which consists of a metasurface array, is a plane wave that passes through a metal plate with a hollow pattern, effectively generating the diffraction pattern of the object. To ensure the accuracy of the input, we simulated and extracted the diffraction pattern of each object under plane wave illumination using CST Microwave Studio.
2.2 Inspiration for Samples and Training Results of Deep Knowledge Prior Diffraction Neural Network
The quality of the images captured by machine vision systems is crucial for enabling algorithms to efficiently and accurately extract information that is essential to the process of decision-making. These systems must operate within real-world coordinate systems and measurement units for precise measurement and control; this necessitates the pre-establishment of a mapping system between the pixel coordinates and real-world coordinates to ensure accuracy throughout the process. However, real-world conditions are often far from ideal. Equipment limitations and challenges during system integration can introduce image distortion. Distortions confined to the 2D plane are referred to as planar distortions, whereas those affecting 3D objects are classified as spatial or 3D distortions.
Under conditions in which it is impossible to position the camera perpendicular to the target, perspective distortion is introduced into the image [Fig. 3(a)]. In addition, imperfections in the optical properties of the camera introduce radial distortions such as barrel and pincushion (pillow) distortions [Fig. 3(b)]. Barrel distortion causes the image to bulge outward, creating a barrel-like appearance, whereas pincushion distortion pulls the image inward, narrowing it toward the center. An improper shooting angle can also result in the pattern rotating and shifting away from the center of the view. These distortions can overlap and compound the overall image deformation. Barrel and pincushion distortions affect the 2D plane, whereas perspective distortion deforms the image in 3D space [Fig. 3(c)]. Such distortions significantly impair the accuracy of machine vision systems, making it essential not only to ensure the precise calculation of the transformation between pixel and real-world coordinates but also to apply effective image correction techniques. In 3D space, random factors such as object orientation and observer perspective further complicate recognition [Fig. 3(d)]. To simulate real-world conditions, aircraft were displayed with varying orientations within the diffraction system during the experiment [Fig. 3(e)]. Notably, the correction–recognition integrated system based on electromagnetic diffraction eliminates the need for 3D-to-2D coordinate transformation and minimizes the lens distortion at its source.
Figure 3.Dynamic and static three-dimensional recognition in the actual scene. (a) The camera lens is not parallel to the picture, resulting in perspective distortion. Moreover, the object has a different degree of plane distortion due to the inappropriate shooting angle. (b) Specific plane distortions caused by subjective and objective causes. Under nonideal shooting conditions, the picture has single or double plane random variation. (c) Different attitudes of the aircraft from the same perspective. The superimposed random distortion of the plane and three-dimensional space caused by the shooting angle. (d) Aircraft at different attitudes observed from different viewing angles. (e) To simulate the identification of aircraft by observers in the actual scene, the aircraft with different attitudes are recognized by the diffraction system in the experiment. (f) The training results of the deep phase generation network show that the phase loss of each metasurface is close to 0. (g) The loss of the training and test sets versus epoch. After thousands of epochs, the correction–recognition integrated system achieves an accuracy rate of 97.7% and 94.6% for the training and test sets, respectively. (h) The training results for the conventional diffraction neural network over the epoch. It is in an underfitting state, and the accuracy is only 55.2% for the test set.
Based on the above, the dataset includes samples with single random distortions (e.g., planar distortion only), double random distortions (planar and 3D), and aircraft in various flight attitudes. Planar distortions include barrel distortion, pincushion distortion, rotation, scaling up, and scaling down. For 3D distortions, the digital metal plate was tilted along the - and -axes relative to the metasurface. Notably, multiple distortions can occur simultaneously in both the numerical data and physical patterns. The recognition task must handle aircraft with varying orientations, positions, and distances from the system (see Note 2 in the
2.3 Experimental Results of Multiple Distortion Correction and Dynamic Three-Dimensional Recognition
We designed a prior DNN implemented in a three-layer metasurface array operating at 10.5 GHz [Fig. 4(a)]. The unit cell had a period of 6 mm, and the metasurface measured (see Note 5 in the
Figure 4.Experimental results of multiple distortion correction and static aircraft real-time recognition. (a) Experimental setup for planar and three-dimensional distortion correction and three-dimensional recognition. The aircraft is suspended on a graduated rack to simulate different flight attitudes. (b), (c) Experimental results of image restoration with double random distortion, including stereoscopic distortion. (d) The experimental recognition results are identified according to the diffraction distribution under different attitudes. The orientation of the aircraft is demonstrated and labeled with
Next, we verified the ability of the integrated correction–recognition system to recognize aircraft from different viewing angles [the left part marked with in Fig. 4(d)]. The aircraft models were mounted on an acrylic shelf, and their flight attitudes were simulated by rotating them in 3D space. The acrylic shelf was marked with pitch and azimuth scales on the top and sides, respectively. In the experiment, three distinct flight attitudes were tested: aircraft flying toward, away from, and near the metasurface array [Fig. 4(d)]. Electromagnetic waves were irradiated onto the metal aircraft, passing through the metasurface array to form complete and clear images of the aircraft on the imaging surface. The average SSIM between the experimental identification and training results was 0.95. For comparison, the electric field distribution measured without the system gives a completely different result [Fig. S6(d) in the
In autonomous driving scenarios, pedestrians, cyclists, and motor vehicles are often in motion. Real-time monitoring and analysis of moving humans or objects provide essential information for autonomous driving systems, enhancing both safety and intelligence. To demonstrate the recognition capability of the proposed correction-recognition integrated system for dynamic objects, an experiment was conducted using a 2D turntable to control the dynamic variation of pitch and azimuth angles of a suspended airplane. The pitch angle varied from to 15 deg, and the azimuth angle ranged from to 45 deg. The imaging plane was stably scanned with a 4 mm step size to measure the electric field distribution. Continuous field strength monitoring was conducted at six key positions on the imaging plane corresponding to the head, airfoil, and empennage of the airplane. These six critical locations clearly outline the shape of the airplane. As shown in Figs. 5(a) and 5(b), the recognition results remain stable and accurately depict the shape of the airplanes during continuous changes in their flight attitudes. The six highlighted positions consistently maintain an electric field strength of approximately , whereas nonhighlighted regions (e.g., position 7) correspond to background noise. Videos 1 and 2 provide full animations of the dynamic recognition results for two airplanes. These results demonstrate that dynamic attitude transformations do not interfere with the recognition performance of the correction–recognition integrated system.
Figure 5.Experimental demonstration of dynamic three-dimensional recognition. (a) Measured electric field distribution with dynamic left-right steering (
To further simulate the motion of road users, the performance of the system in recognizing objects with dynamic displacement was also evaluated. A third airplane was suspended and moved in four directions—up, down, left, and right. As shown in Fig. 5(c), the six highlighted positions displayed stable field strength distributions, in contrast to the environmental noise observed in nonhighlighted areas. The complete recognition process of the airplane lateral movement is shown in Video 3. The clear airplane shapes on the output plane demonstrate the potential of the correction–recognition integrated system to identify walking pedestrians, cycling riders, and driving vehicles. These findings confirm that the proposed system can effectively recognize humans and objects in various motion states, making it highly suitable for diverse and dynamic environments.
3 Discussion and Conclusion
We proposed a high-capacity, integrated system for distortion correction and object recognition, capable of simultaneously correcting 2D/3D image distortions and recognizing objects from multiple angles in real time through wave-based processing. An independent deep knowledge prior neural network gathers low-level statistical prior information before learning, allowing it to predict parameter distributions using only one image. We solved the issue of linking the depth of the diffraction network to the number of physical layers by combining a deep knowledge prior network with a DNN, thereby increasing the capacity for trainable samples. The fuzzy distribution of the metasurface transmission coefficients was obtained by training random sequences under a Gaussian distribution. These prior values are then input into the DNN, enabling backward propagation through both networks, which shortens the convergence time and improves the training efficiency. Deep knowledge prior networks offer a novel solution to the network depth limitations imposed by physical architecture and hardware constraints, driving the development of electromagnetic neural networks. Furthermore, although the system enables real-time 3D recognition, the quantitative characterization of axial spatial resolution and its associated confidence interval is not included in the current work. These aspects will be addressed in future studies through controlled measurements and comparative error analysis across varying depth positions. The highly integrated passive distortion correction–recognition system can independently recognize 3D objects and serve as a preprocessing unit for imaging or recognition systems, significantly lowering equipment costs. With broad applications in robotics, intelligent transportation, public safety, military security, and identity authentication, real-time object recognition has significant potential for commercial applications.
Our integrated distortion correction and all-directional recognition system excels in static recognition and is highly capable in terms of the real-time recognition of moving objects, including high-speed targets. The system performance is unaffected by object distortion or changes in orientation owing to motion, making it ideal for dynamic target recognition. Traditional systems exhibit slow recognition speeds, low success rates, and declining accuracy for moving objects, thereby limiting their practical applications. By contrast, our system, which is based on a passive metasurface array, offers real-time recognition enabled by the intrinsic propagation speed of electromagnetic waves, along with large data storage capacity, wide bandwidth, and high accuracy. We cautiously note that the apparent resolution of , obtained from simulations, does not imply a direct physical surpassing of the diffraction limit. Instead, it reflects the system’s capability in resolving spatial patterns under near-field model assumptions and ideal sampling.34,42,49 The theoretical origin of this phenomenon will be further investigated in follow-up studies. In addition, it minimizes redundant data generation, transmission, and processing, significantly reducing the feature extraction time and improving the recognition efficiency. We acknowledge that the current study does not include experimental validation in reflection mode. Extending the sensing system to operate effectively in reflection-mode configurations is an important future direction. This will involve not only reconfiguring the measurement setup but also evaluating the signal-to-noise performance under real-world conditions with limited optical access.
4 Appendix: Methods
4.1 Data Collection
Patterns that distort only in a 2D plane form an input matrix of 0 to 1. An electromagnetic wave passes through the pattern at a certain angle or distance from the metasurface array in 3D space and reaches the first metasurface to generate diffraction. We simulated the diffraction process of the 3D distorted pattern reaching the metasurface in CST Microwave Studio and used the diffraction electric-field distribution as the input of the neural network. To characterize the uncertainty of the measurement system, we monitored field amplitude variations across repeated scans with varied object positions and orientations. Although a full quantitative uncertainty analysis is not yet included in this work, initial observations suggest standard deviation in electric field intensity measurements remains within at critical sensing points. A more detailed assessment of axial and lateral uncertainty will be conducted in follow-up work.
4.2 SSIM Function
The SSIM is an index for measuring the similarity of two images, mainly used to detect the similarity of two images of the same size or to detect the degree of image distortion. The SSIM calculation is based on three comparative measures between samples and : luminance, contrast, and structure,
Brightness is measured using the average gray level of the image. The brightness contrast function is a function of and ,
Contrast is measured using the standard deviation of the image. The contrast function is a function of , ,
Structure contrast is an image divided by its own standard deviation. The structural contrast function is a function of and ,
4.3 Training and Experiment Details
The prior DNN was trained with Python (v3.7) and PyTorch (v1.9.0) on a server (NVIDIA Quadro RTX 5000 GPU and Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz with 256 GB RAM). In the experiment, the transmitting source was generated by the VNA (Ceyear 3672L), and the transmitted wave signal was collected using a probe fixed to the sweep platform and processed by the VNA.
Acknowledgments
Acknowledgment. The work was supported by the National Key Research and Development Program of China (Grant Nos. 2022YFA1404704, 2022YFA1405200, and 2022YFA1404902), the National Natural Science Foundation of China (NNSFC) (Grant Nos. 61975176 and 62071423), the Key Research and Development Program of Zhejiang Province (Grant Nos. 2022C01036 and 2024C01160), the Natural Science Foundation of Zhejiang Province (Grant No. LR23F010004), the Top-Notch Young Talent of Zhejiang Province, and the Fundamental Research Funds for the Central Universities.
Min Huang received the BS degree from Southwest Jiaotong University, Chengdu, China, in 2020, and the PhD from Zhejiang University, Hangzhou, China, in 2025. Her research interests include metasurfaces, intelligent electromagnetic wave manipulation, and neural networks.
Bin Zheng received the PhD from Zhejiang University, Hangzhou, China, in 2015. He is a professor with the College of Information Science and Electronic Engineering, Zhejiang University. He has been awarded the Top-Notch Young Talent of China, and the Excellent Youth Foundation of Zhejiang Province. His research interests include transformation optics, metamaterials, metasurfaces, and invisibility cloaks.
Ruichen Li received the BS degree from Xidian University, Xi’an, China, in 2020, and the PhD from Zhejiang University, Hangzhou, China, in 2025. Her research interests include electromagnetic invisibility, artificial electromagnetic materials, and metasurfaces.
Yijun Zou received the BS degree from Zhejiang University, Hangzhou, China, in 2020. He is currently pursuing the PhD with the School of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His research interests include metasurfaces, holographic imaging, and machine learning.
Xiaofeng Li received the BS degree from Air Force Engineering University, Xi’an, China, in 2019, and the PhD from the same university, in 2024. His research interests include metasurfaces, holographic imaging, and novel transmit/reflect array antennas.
Chao Qian received a PhD from Zhejiang University, Hangzhou, in 2020. He was a visiting PhD student at California Institute of Technology from 2019 to 2020. In 2021, he joined the Zhejiang University/University of Illinois at Urbana-Champaign Institute as an assistant professor. His research interests are on metamaterials/metasurfaces, electromagnetic scattering, inverse design, and deep learning.
Huan Lu received the MS degree from Xidian University, Xi’an, China, in 2019, and the PhD from Zhejiang University, Hangzhou, China, in 2023. Her research interests include metamaterials, electromagnetic scattering, artificial intelligence, and intelligent electromagnetic cloaks.
Rongrong Zhu received the PhD from College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, in 2019. She is an assistant professor in Hangzhou City University now. Her main research interest is intelligent metasurface.
Hongsheng Chen (fellow, IEEE) received the BSc and PhD degrees from Zhejiang University, Hangzhou, China, in 2000 and 2005, respectively. In 2005, he joined the College of Information Science and Electronic Engineering, Zhejiang University, and was promoted to full professor in 2011. In 2014, he was honored with the distinguished Cheung-Kong Scholar award. Currently, he serves as the Dean of College of Information Science and Electronic Engineering, Zhejiang University. He received the National Science Foundation for Distinguished Young Scholars of China in 2016, and the Natural Science Award (first class) from the Ministry of Education, China, in 2020. His current research interests are in the areas of metamaterials, invisibility cloaking, transformation optics, and topological electromagnetics.
Get Citation
Copy Citation Text
Min Huang, Bin Zheng, Ruichen Li, Yijun Zou, Xiaofeng Li, Chao Qian, Huan Lu, Rongrong Zhu, Hongsheng Chen, "Real-time all-directional 3D recognition and multidistortion correction via prior diffraction neural networks," Adv. Photon. 7, 056005 (2025)
Category: Research Articles
Received: Jan. 22, 2025
Accepted: Jul. 15, 2025
Published Online: Aug. 21, 2025
The Author Email: Rongrong Zhu (rorozhu@hzcu.edu.cn), Hongsheng Chen (hansomchen@zju.edu.cn)