Gradient-descent optimization of metasurfaces based on one deep-enhanced RseNet

Yi Xu; Fu Li; Jianqiang Gu; Quan Xu; Zhen Tian; Jiaguang Han; Weili Zhang

doi:10.3788/COL202523.083601

1. Introduction

As artificial planar materials composed of sub-wavelength structures with tailored electromagnetic properties, metasurfaces exhibit powerful and flexible manipulation of electromagnetic responses including amplitude, phase, polarization, and frequency. Recently, metasurfaces have been enthusiastically pursued by many researchers due to their exotic performances and simple fabrication superior to many conventional devices^[1]. Similar to conventional materials in nature whose properties depend on the atoms and molecules as well as their arrangement, metasurface properties are determined by the units (hereafter denoted as meta-atoms) that make them up and how these meta-atoms are arranged^[2]. Metasurfaces have facilitated the development of diverse functional devices, such as polarizers^[3], waveplates^[4], beam deflectors^[5], metalens^[6], vector beam generators^[7], and holograms^[8].

The common metasurface design methodology is a trial-and-error process relying on brute force simulations to solve Maxwell’s equations, which have severely hampered its development. Moreover, as the number of meta-atom design parameters increases, the efficiency of conventional electromagnetic simulation methods based on time-domain finite-difference or finite-element methods decreases significantly. Currently, the most commonly used scheme is to scan the space of all design parameters of the meta-atom in a certain step size to fit the objectives until the desired criteria are reached. Some researchers have attempted to integrate optimization algorithms to accelerate the exploration of desired parameters, including particle swarm optimization^[9], the ant colony algorithm^[10], and topological optimization^[11], yet these approaches remain ineffective due to the extensive and time-consuming inherent features of electromagnetic simulations. Machine learning, especially deep learning, has revolutionized the field of text generation^[12], image recognition^[13], object classification^[14], and speech translation^[15]. Unlike classical algorithms, deep learning usually learns by training on a dataset with a large amount of data under supervision. By constantly adjusting the trainable parameters of the neural network during the training process to establish an abstract nonlinear relationship between the inputs and outputs of each batch of the data, neural networks eventually learned how to extract features, which provides a powerful tool for metasurface design with the thrift of professional physical insights and experience^[16].

Currently, a number of deep learning-based schemes have been utilized to achieve the end-to-end metasurface design, including variational auto-encoder (VAE)^[17], generative adversarial network (GAN)^[18,19], and statistical machine learning^[20], which directly connect the structural parameters of each meta-atom with the target metasurface performance. Most of them incorporate several deep learning models to implement distinct functions of discrete modules such as feature extraction and representation, forward prediction, and inverse retrieval. These models are then integrated into a loop, resulting in increased configuration complexity and thus higher training costs. Also, they pose challenges in identifying issues when the network fails to converge.

To find a solution as straightforward as possible to the metasurface design problem, we propose an end-to-end paradigm based only on one deep learning model to implement both the forward and inverse processes. This model is a deep-enhanced residual neural network, which maps the profound physical relations from meta-atom structures to their electromagnetic responses, empowering accurate forward prediction. We employ one-hot encoding to label and parameterize multiple categories of meta-atoms in a unified way with the ability to ensure all patterns conform to micro-nano fabrication requirements superior to image representation^[18,21]. We present an inverse method grounded in the gradient-descent algorithm for the global optimization of metasurfaces. By treating the entire metasurface as a design unit and the light field distribution upon arbitrary polarizations and frequencies as the design target, we can efficiently optimize the design. Specifically, with the well-trained neural network fixed in place, the derivatives of the difference between the actual and objective responses with respect to the parameters of each meta-atom are simultaneously calculated by leveraging the parallel computing capabilities of graphics processing units (GPUs). As proof-of-concept applications, we experimentally demonstrate several multifunctional metasurface devices with different genes of meta-atoms, including a polarization-independent vortex beam generator, a polarization-multiplexing hologram, and a polarization/frequency-multiplexing metalens. The consistency between the network output and the measured results clarifies the reliability and universality of our design scheme and meta-atom configuration, which is poised to significantly advance the intelligent, efficient, and generalized design of the metasurfaces, thereby fostering the broader application of deep learning in fields such as meta-optics and on-chip computation.

2. Results

2.1. Design principle

Figure 1 illustrates the proposed end-to-end metasurface design framework with a deep learning model as the core. This data-driven model is delicately constructed to predict the optical response for a given meta-atom deterministically. The training details of this model are formulated in Sec. 2.2. For the forward process, the structural class and parameters of the meta-atom are fed into the network as the input to yield the spectrum within the desired frequency band. For the inverse design, the optimal geometries of each meta-atom, rather than the required phase and amplitude at each spatial location of the metasurface, are directly solved by minimizing the objective function via the gradient-descent algorithm, given the target transmission responses at a few specific frequency points and polarization states. A typical design process starts from the intended function requirements of the target metasurface. The target far-field spectra of each meta-atom that needs to be involved in the loss computation are selected and derived using the Rayleigh–Sommerfeld diffraction integral formula. On the other end of the loop, the classes of all meta-atoms and the corresponding structural parameters are randomly generated within the design parameter space in the first cycle of optimization, which are then input into the well-trained forward prediction neural network to elicit the current spectra used to compute the loss together with the target ones. With the network hyperparameters locked, the calculated loss is backward differentiated layer by layer; thus, the partial derivatives for the category and dimension variables of each meta-atom are obtained through the chain rule. Each meta-atom in the whole metasurface is optimized once as soon as the gradient is updated. The optimized parameters will participate in the next cycle of iterations until reaching the defined objectives. To overcome the fact that the gradient descent method often leads to getting stuck in a local optimal solution, every 20 steps, we re-perform random sampling for the meta-atom with the largest loss.

Figure 1.Schematic diagram of the proposed metasurface design framework based on only one forward neural network.

Download full size

View all figures

This is an efficient and fast computation of gradients for arbitrary input parameters benefiting from the parallel capacity of GPUs and automatic differentiation^[22]. The meta-atoms are all-silicon structures operating in transmission mode, which contain at least one and up to three pillars in the horizontal and vertical directions within one lattice. Each pillar is in the center of its aliquot area. This kind of meta-atoms harnessing interference and coupling among adjacent pillars within a unit have been deployed to enhance the modulation degrees of freedom, thereby augmenting the design flexibility^[23]. Hereafter, we refer to meta-atoms with the same number and arrangement of pillars as the same class, and there are 9 classes in total as shown in Fig. 2(a). We selected three representative groups of different classes of meta-atoms, and their amplitude-phase coverage at frequencies of 0.7 and 1.0 THz is shown in Fig. 2(b). The result indicates that they have distinct amplitude-phase distributions, which together cover a large part of the amplitude-phase space. It is necessary to judge whether the updated meta-atom still belongs to the original category after each cycle of updating the category and structural parameters of the meta-atom (in this paper, we use the Softmax function to judge the category). If it does not belong to the original category, the category and structure parameters need to be resampled again. Similarly, the meta-atoms that have been trapped in the local optimum will be resampled randomly.

Figure 2.Illustration of the all-silicon meta-atom library. (a) Schematic diagram of meta-atoms for nine classes of silicon pillars. (b) Transmissive amplitude-phase spectra of 5000 randomly sampled meta-atoms of structure types 1 × 1 (top row), 1 × 2 (middle row), and 1 × 3 (bottom row) under x- and y-polarized incidences at frequencies of 0.7 and 1.0 THz, respectively.

Download full size

View all figures

2.2. Design and evaluation of forward neural networks

The architecture and settings of the proposed deep learning model are illustrated in Fig. 3(a). This deep-enhanced feed-forward residual neural network consists of 19 fully connected layers, including 15 residual blocks. In particular, the residual blocks employ skip connections to bypass two fully connected layers, where the input is element-wise summed with the output from the residual path. We introduce these residual blocks to minimize gradient vanishing and gradient explosion to facilitate the information flow in such a deep model. The input layer takes in a tensor of size 27 including 9 categories and 18 geometry variables, adopting the one-hot encoding strategy, as detailed in the encoding table in Sec. 2 of the Supplementary Material. The network outputs a $4004 \times 1$ tensor of the real and imaginary parts of the complex transmission coefficients sampled over 0.3 to 1.2 THz with an interval of 0.0009 THz for the $x$ - and $y$ -polarization incidences, respectively. In the total dataset of 170,747 samples after data augmentation by mirroring the structure, a 1:1:1 split for the training, validation, and test set is assigned. The supervised training process is conducted by minimizing the mean squared error (MSE) between the prediction result generated from the network and the ground truth provided by full-wave simulations. The neural network is trained with a learning rate of 0.0001 and a batch size of 64. A stair-step learning rate decay with a $γ$ of 0.9 every 100 epochs is also employed. The last activation function is LeakyReLU, aiming to fit the negative terms of the output. All the other activation functions are set to ReLU to make the network converge faster. In addition, we quantify and compare three scales of neural network models referred to as base, medium, and large models, whose difference only lies in the dimensionality of all hidden layers of 500, 2000, and 5000, respectively. The loss of the training and validation sets under the same dataset and hyperparameters is shown in Fig. 3(b). It is evident that as the dimensionality of the hidden layer increases, the accuracy of the network becomes higher and higher. The minimum loss of the large model in the validation set is $\sim 4 \times 10^{- 6}$ after $4 \times 10^{6}$ epochs. To the best of our knowledge, this is the highest accuracy currently available using both real and imaginary parts for training the model. We randomly selected two sets of meta-atom parameters in the test set and computed the corresponding electromagnetic spectra using the trained large model. As shown in Fig. 3(c), the predictions of the network overlap almost perfectly with the ground truth, which also indicates that our network has a very high precision. More random sampling results in the test set are described in the Supplementary Material, Sec. 4.

Figure 3.Framework and characterization of the deep learning model. (a) Components and hyperparameters of each layer of the neural network. (b) Training and validation loss curves for three model configurations: base, medium, and large. (c) True and predicted amplitude and phase values for two randomly selected parameter sets from the test dataset.

Download full size

View all figures

2.3. Design and experimental demonstration of versatile metasurfaces

To verify the efficacy and universality of our proposed single deep learning model-based end-to-end metasurface design framework and the multi-class meta-atom library, we designed and demonstrated several metasurface devices with different functions in the terahertz band. The first prototype is a polarization-multiplexing holographic metasurface working at 1.0 THz. The target holograms of the alphabet “A” and “I” are displayed under the incidence of terahertz beams with $x$ - and $y$ -polarizations at a distance of $z = 10 mm$ from the metasurface, respectively.

The holographic metasurface design typically involves the following sequential steps: 1) Solve the desired amplitude and phase for each meta-atom on the metasurface that corresponds to the given target pattern using the inverse Rayleigh–Sommerfeld integral diffraction formula. 2) Identify the meta-atom from the available library that most closely approximates the calculated amplitude and phase above. 3) Employ the Gerchberg–Saxton (GS) algorithm^[24] or alternative optimization techniques^[25] to enhance the holographic performance in instances where the library lacks meta-atoms with the requisite amplitude-phase combinations, whereas the proposed end-to-end paradigm automatically optimizes each meta-atom and arranges them into the whole metasurface based on the amplitude-phase coverage of the library of meta-atoms once the design requirements and hyperparameters are set. Detailed information on the paradigm-optimized configuration is described in Sec. 5 of the Supplementary Material. It is worth emphasizing that even when designing the same pattern under identical conditions, the paradigm produces nearly different designs each time while maintaining high accuracy. The metasurface designed by the neural network is directly used for fabrication, which contains $50 \times 50$ units. The overall scanning electron microscope (SEM) image and local magnification of the fabricated metasurface are shown in Fig. 4(a). We characterized the fabricated samples using a home-built terahertz time-domain near-field scanning imaging system. The intensity and phase of the incident terahertz spots as the reference are illustrated in Fig. 4(b). The detailed information on the sample fabrication and experimental setup are described in Appendix A. The paradigm-optimized, electromagnetically simulated, and experimentally measured results are depicted in Fig. 4(c), compared with the target pattern. It can be concluded that the measured results achieve the target functions and meanwhile maintain high consistency with the results designed by our end-to-end design framework. Specifically, the holographic metasurface generates distinct letter patterns “A” and “I” under $x$ - and $y$ -polarized incidences, respectively, in good accordance with design objectives. The measured efficiencies achieve 8.8% for the “A” projection under $x$ -polarized incidence and 10.5% for the “I” projection under $y$ -polarized incidence, where the efficiency is defined as the ratio of the total power at the image plane to that of the incident beam within the same area of an $8.0 mm \times 8.0 mm$ square. The fidelities for paradigm-optimized patterns “A” and “I” were determined to be 0.013 and 0.010, while experimental measurements yielded corresponding values of 0.054 and 0.050. This close correspondence between network predictions and experimental outcomes demonstrates excellent agreement with the predetermined design specifications. (Detailed information can be found in Sec. 8 of the Supplementary Material.) Further, the results indicate that the proposed paradigm is superior to the GS algorithm because the paradigm can optimize both amplitude and phase according to the coverage of the library, whereas the GS algorithm is suitable for phase-only modulation, wherein the input plane has a uniform amplitude distribution (i.e., amplitude = 1) and the algorithm optimizes the phase distribution to achieve the desired intensity profile at the output plane.

Figure 4.Design and characterization results of the designed holographic metasurface. (a) The SEM image of the whole sample and the magnification of the selected area. (b) The intensity and phase of the incident terahertz beams scanned without samples at 1.0 THz. (c) The target, paradigm-optimized, electromagnetically simulated, and experimentally measured results of the hologram for x- (top row) and y-polarizations (bottom row), respectively.

Download full size

View all figures

We also designed and fabricated a polarization-independent focused vortex beam generator metasurface. Similar to the holographic metasurface, first we derived the ideal phase of each meta-atom that constitutes the metasurface according to Eq. (1), and the designed paradigm then intelligently searches for structures that fit the requirements. Equation (1) can be denoted as $φ = φ_{vortex} (x, y) + φ_{lens} (x, y),$ (1)where $φ_{vortex} (x, y) = l θ (x, y)$ is the helical phase distribution for vortex beam generation and $φ_{lens} (x, y) = \frac{ω}{c} (\sqrt{x^{2} + y^{2} + f^{2}} - f)$ represents the phase distribution of the focusing lens. $(x, y)$ represents the Cartesian coordinates of a meta-atom, $θ (x, y)$ represents the angle transferred to the polar coordinates, $l$ is the topological charge number of the vortex beam, $c$ is the light speed in vacuum, $ω$ is the angular frequency, and $f$ is the focal length. This metasurface aims to generate a second-order vortex beam ( $l = + 2$ ) at $f = 8 mm$ for both $x$ - and y-polarization incidences operating at 0.7 THz, which consists of $51 \times 51$ meta-atoms with a lattice size of 200 µm. The target and the paradigm-optimized phase profiles under $x$ - and $y$ -polarizations of the metasurface are depicted in Fig. 5(a). We also fabricated the metasurface, and the whole and local magnification of the SEM image are shown in Fig. 5(b). It can be clearly observed from the local magnified SEM image that the polarization-independent vortex metasurface designed by the proposed paradigm does not consist entirely of $C_{4}$ -symmetric meta-atoms. It is worth emphasizing that traditional methods often prefer the $C_{4}$ -symmetric meta-atoms in the design of polarization-independent devices^[26,27]. This significantly breaks the limitation of the traditional perception of how dielectric pillars modulate electromagnetic waves. For further investigation, we similarly performed simulation verification and experimental characterization, and the results are shown in Figs. 5(d) and 5(e), respectively. Figure 5(c) exhibits the scanning results of the terahertz beam without samples as the reference. The simulated and experimental results consistently demonstrate that the designed vortex metasurface almost perfectly achieves the intended design function. It exhibited superior efficiency with 52.5% for $x$ -polarized incidence and 33% for $y$ -polarized incidence, respectively. The target orbital angular momentum (OAM) mode purities achieved 93.3% for $x$ polarization and 90.6% for $y$ polarization, remarkably exceeding other OAM modes, which demonstrates that our proposed vortex beam generator metasurface possesses high fidelity. (Detailed information can be found in Sec. 8 of the Supplementary Material.) Yet it is noted that the experiment achieved optimal performance at $z = 8.5 mm$ . The experimental results were slightly deviated due to the same imperfections of the experimental conditions and defects of the samples.

Figure 5.Design and characterization results of the designed vortex metasurface. (a) Target and paradigm-optimized phases of the metasurface for x- and y-polarizations, respectively. (b) The SEM image of the whole sample and magnification of the selected area. (c) The intensity and phase of terahertz beams scanned without samples at 0.7 THz. (d) The phase and intensity of the xoy plane at z = 8 mm, as well as the intensity of xoz and yoz sections calculated by electromagnetic simulation software for x-polarization (top row) and y-polarization (bottom row), respectively. (e) The phase and intensity of the xoy plane at z = 8.5 mm, as well as the intensity of xoz and yoz sections measured experimentally for x-polarization (top row) and y-polarization (bottom row), respectively.

Download full size

View all figures

3. Discussion and Conclusion

In summary, we have proposed a deep learning-based end-to-end metasurface design paradigm and experimentally demonstrated its application for versatile metasurfaces. Compared with the previous deep learning method, this work is carried out on only one neural network, which is a deep-enhanced fully connected ResNet, serving both as the surrogate of electromagnetic simulation software in forward spectrum prediction and as the path for gradient descent in backward computation. By integrating this deep learning model in the optimization loop, the structures of all meta-atoms on the metasurface can be reversely retrieved automatically and straightforwardly. One-hot encoding is introduced to label and parameterize the multiple categories of meta-atoms as a one-dimensional tensor in a unified way, enabling multi-class design by the same paradigm. A phase-modulated vortex generator and an amplitude-phase-modulated hologram metasurface working at terahertz frequencies are designed as prototypes, in which the great consistency between the measured results and the preset targets proves the ability of the proposed paradigm. The streamlined design of the paradigm proposed here will expedite the research and development (R&D) of terahertz metasurfaces. Furthermore, the universality of our model will also accelerate the development of planar optical devices across other wavelength bands, thereby fostering a deeper integration of artificial intelligence (AI) and optics.

Category: Research Articles

Received: Jan. 24, 2025

Accepted: Apr. 3, 2025

Published Online: Jul. 15, 2025

The Author Email: Jianqiang Gu (gjq@tju.edu.cn), Weili Zhang (weili.zhang@okstate.edu)

DOI:10.3788/COL202523.083601

CSTR:32184.14.COL202523.083601

微信扫一扫：分享