Chinese Optics Letters, Volume. 23, Issue 8, 083401(2025)

Enhanced profile reconstruction of small-angle X-ray scattering measurement via correlation learning

Hairui Yang1,2,3, Zhaolong Wu2, and Hong Yu1,2、*
Author Affiliations
  • 1Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai 201800, China
  • 2Zhangjiang Laboratory, Shanghai 201210, China
  • 3University of Chinese Academy of Sciences, Beijing 100049, China
  • show less

    Small-angle X-ray scattering (SAXS) is a promising metrology technology for complex nanostructures in semiconductor manufacturing. However, parameter reconstruction based on SAXS measurement often faces challenges in achieving high precision and repeatability due to the increasing complexity of structures and the demands for precise measurement. To address these challenges, a correlation learning-based method is proposed to enhance the accuracy and reduce the uncertainty of the profile reconstruction in SAXS measurement. This method leverages the long short-term memory (LSTM) mechanism to capture and learn inherent parameter correlation effectively. The precision and reliability of the proposed method are demonstrated through the simulations of synthetic Si gratings. Our method exhibits remarkable measurement accuracy with an improvement of at least 13.9%, and the measurement repeatability is nearly 1.4 times higher compared to the previous learning-based methods. We expect that our approach will provide a novel solution for SAXS measurement, enabling accurate and reliable profile reconstruction of nanostructures.

    Keywords

    1. Introduction

    Small-angle X-ray scattering (SAXS), as a non-contact, non-destructive, and high-resolution measurement scheme[1,2], has gradually become one of the most important techniques for measuring diverse nanostructures, especially in material science and biology[35]. It is also applied to inline critical dimension metrology in semiconductor manufacturing to ensure product quality and optimize fabrication process[68]. However, SAXS measurement, a typical ill-posed inverse problem[9], is not straightforward as it involves a complicated process of estimating structural parameters from the scattering patterns[10,11]. Low signal-to-noise ratios often lead to inaccurate or even erroneous reconstruction results and high uncertainty[12]. These drawbacks become more severe with the increasing complexity of the structures[13], which will seriously interfere with the correct perception of the nanostructures and fabrication process. Thus, it is crucial to employ certain methods to enhance measurement accuracy and reduce uncertainty[12].

    Incorporating certain priors into the reconstruction process can improve the measurement precision in various semiconductor measurement techniques[11,12,14,15]. This strategy is also beneficial for improving reconstruction accuracy and for reducing uncertainty in the SAXS inverse problem such as providing a fin-shaped mirrored prior[16,17]. However, conventional iteration-based methods[3,7,11,18,19], despite incorporating certain prior knowledge, still suffer from inherent inefficiency because they require calling the predefined forward physical model to calculate the scattering patterns and assess many candidate solutions[13,20]. Deep learning-based methods have gained significant attention in semiconductor metrology[2124] due to their exceptional capabilities in rapid inference and robust feature extraction[25]. Recently, researchers have extended their application to parameter reconstruction for nanostructured samples in SAXS measurement, including the tilt of memory hole structures, the width and height of simple trapezoidal gratings, and the profile of complex profile gratings[22,26,27]. These approaches can be regarded as model-dependent neural networks[28], providing prior knowledge from the perspective of the physical forward model. These successes provide valuable insights: by integrating reliable prior knowledge into the reconstruction algorithm or guiding the algorithm to learn such prior knowledge, we can greatly enhance the accuracy and reliability of the reconstruction process[29]. In semiconductor manufacturing, the etching process often leads to the aspect ratio-dependent etching (ARDE) effect[30] or significant tilt variation near the edge of the wafer[31]. These phenomena also result in strong nonlinear inter-layer correlation for the nanostructure[32]. Hence, guiding the reconstruction process to learn this correlation serves as a potential enhancement scheme.

    In this work, we propose a correlation learning-based method for profile reconstruction in SAXS measurement. Utilizing the long short-term memory (LSTM) mechanism[33], our approach can actively learn the inter-layer correlation of profile structural parameters. Based on this characteristic, we design and evaluate a novel network model to reconstruct the profile parameters of synthetic complex Si gratings. The analysis results demonstrate that this learning-based approach significantly enhances accuracy and reduces uncertainty in nanostructure profile measurement.

    2. Theory and Method

    2.1. X-ray scattering principle

    SAXS measurement can well extract structural parameters of nanostructured samples, whose schematic diagram is shown in Fig. 1. Figure 1(a) illustrates the fundamental process of small angle X-ray scattering. The monochromatic X-ray beam impinges upon the sample at an angle of incidence (AOI), and subsequently, the scattering pattern within a small angle region is recorded by a detector. After exposures of multiple AOIs, we can integrate the recorded scattering patterns to get the scattering map of the sample[16], which is shown in Figs. 1(b) and 1(c). According to the kinematical approximation, if we know the relative electron destiny ρ(r) of a sample, its Fourier transform F(q) is F(q)=ρ(r)exp(iq·r)dr,where r is the spatial vector in real space, q is the transferred wave vector in the reciprocal space, and the scattering vector q is considered as the vector difference between the incidence wave vector and the scattering wave vector, and q=kk. According to the first-order Born approximation, the scattering map is expressed as I(q)F(q),F*(q)Ω=|F(q)|2. Thus, considering the periodicity of samples, surface roughness, and noise, the scattering map is further expressed as[1,16]I(q)=|F(q)|2S(q)exp(q2σDW2)+Inoise(q),where S(q) is the structure factor, exp(q2σDW2) is the Debye–Waller factor, and σDW is the root mean square of the surface roughness amplitude. Inoise(q) is the noise, including the background noise and Poisson noise.

    (a) Schematic diagram of the SAXS measurement, (b) scattering patterns of multiple AOIs, and (c) scattering map of the sample.

    Figure 1.(a) Schematic diagram of the SAXS measurement, (b) scattering patterns of multiple AOIs, and (c) scattering map of the sample.

    Figure 2(a) is the top and side views of a complex profile grating sample. The sample is characterized by an inter-line distance L, which is called the grating pitch. Hence, the structure factor S(q) can be expressed as a=δ(|qx|2πaL), where a is the diffraction order. In the side view, the sample can be modeled as a stack of rectangles. Its profile is demonstrated by a structural profile parameter set t={CDn,CPn}nN, where N is the structure’s layer number and CDn and CPn are the critical dimension and center position of the nth layer, respectively[34].

    (a) Top and side views of a complex profile grating sample. The profile is demonstrated by a structural parameter set t = {CDn, CPn}nN. (b) Various types of phenomena, including tilting, twisting, bowing, tapering, and potential combinations thereof. (c) Kendall τ correlation coefficients for the CDs and CPs. (d),(e) Correlation ellipses for the CDs and CPs, which demonstrate that the parameters of two adjacent layers are more correlated than those of the distant layers.

    Figure 2.(a) Top and side views of a complex profile grating sample. The profile is demonstrated by a structural parameter set t = {CDn, CPn}nN. (b) Various types of phenomena, including tilting, twisting, bowing, tapering, and potential combinations thereof. (c) Kendall τ correlation coefficients for the CDs and CPs. (d),(e) Correlation ellipses for the CDs and CPs, which demonstrate that the parameters of two adjacent layers are more correlated than those of the distant layers.

    In an ideal condition, the CDs of each layer are equal, and the CPs of each layer are all zero. However, in the fabrication processes (i.e., the etching or multi-patterning processes)[35], nanostructures may demonstrate various types of phenomena[7], including tilting, twisting, bowing, tapering, and potential combinations thereof, which are depicted in Fig. 2(b). To investigate the impact of these phenomena on the profile parameters, we generated 100,000 samples. The samples consist of basic structures, including the above-mentioned basic profile phenomena[7]. These basic structures are randomly stacked to form different grating samples. Each grating sample is a stack of 40 rectangles with a thickness of 75 nm for a single rectangle. Through interpolation, the rectangle size of each layer varies smoothly with the height of a sample[34]. The mean value of CD at each height is about 75 nm, with a range of ±20nm, and the mean value of CP is 0 nm, with a range of ±5nm. The correlation ellipses for the CPs and CDs are shown in Figs. 2(d) and 2(e), which illustrate that the parameters of two adjacent layers are more correlated than those of two distant layers, and this correlation may exhibit nonlinear characteristics. We calculate the Kendall τ correlation coefficient to quantify this kind of relationship in Fig. 2(c). If two layers are further apart, the correlation coefficient between the corresponding parameters of these two layers will decrease. The correlation between the CPs of the two distant layers even reaches a nonlinear negative correlation. Similar characteristics can also be observed using Spearman correlation analysis. Utilizing this natural knowledge of the inherent inter-layer correlation may offer a potential way to improve SAXS measurement accuracy and uncertainty.

    2.2. Correlation learning-based method

    We propose a novel method that can learn the knowledge about the inter-layer correlation to enhance SAXS measurement accuracy and reduce uncertainty. For the SAXS inverse problem of reconstructing structural parameters from the scattering map, the scattering map (input) can be regarded as an angular series, while the structural parameters (output) can be seen as a layered series. The LSTM mechanism can effectively convey and express information across different series and easily learn the inherent correlations within each series[36]. Hence, using the LSTM mechanism to solve this inverse problem has distinct advantages. Specifically, the LSTM achieves these through gate structures. The gate structures enable selective transmission of information through the Sigmoid function layer for point-by-point multiplication. The Sigmoid function regulates the weight of information exchange between neural layers. There are three types of gate structures in an LSTM cell, namely, the forget gate, input gate, and output gate.

    The forget gate determines which information from the last state needs to be forgotten. The forget gate gets the hidden state Ht1 from the last state and the input information xt of the current state, and then outputs a weight ft for the last cell state Ct1. The Sigmoid function limits ft between 0 to 1, which represents the forgetfulness extent of the hidden state Ht1. In this way, the LSTM can retain important information for a long period, and the memory can be dynamically adjusted with the current input information xt. The weight ft, as well as its forgetting effect kt on Ct1, can be written as {ft=σ(wf[Ht1,xt]+bf)kt=Ct1ft,where wf and bf are the weights and biases of the forget gate, respectively, and is the Hadamard product.

    The input gate is the structure that determines how many components from the current input information xt to add into the cell. The tanh function extracts the valid information from the xt. Using the Sigmoid function, the input gates controls the extent it of the valid information. In other words, the input gate filters the extracted valid information and scores each component from 0 to 1. The higher the score, the more component jt will enter the current cell state Ct. The extent it of the valid information and component jt from the current input information xt to add into the cell is demonstrated as {it=σ(wi[Ht1,xt]+bi)jt=tanh(wg[Ht1,xt]+bg)it,where wi, wg, bi, and bj are the weights and biases of the input gate, respectively.

    The output gate is the neural gate that calculates the current state’s output. Like the forget and input gates, the Sigmoid function also exists in the output gate, which helps to further extract the information from the current cell state Ct. Also, the current cell state Ct is mapped to the interval (1,1) as the current hidden state Ht by the tanh function. The current cell state Ct, output ot, and hidden state Ht are expressed as {Ct=kt+jtot=σ(wo[Ht1,xt]+bo)Ht=tanh(Ct)ot,where wo and bo are the weight and bias of the output gate, respectively.

    The overall framework (named CLNet) has a residual module and a correlation learning module in Fig. 3. The residual module is first used to extract features from the scattering map, and the correlation learning module is further used to reconstruct the profile parameters using the output of the residual module. The residual module is based on the backbone of ResNet34 and has a separate convolution layer at the front end, followed by four large blocks, which comprise three, four, six, and three groups of small blocks, respectively. Each of the small blocks contains two convolutional layers. An adaptive average pool layer then follows these and finally extracts a 40×25 angular series. The correlation learning module comprises two distinct channels dedicated to capturing the inherent parameter correlations of the CDs and CPs, respectively. The channels directly reconstruct the CDs and CPs, respectively, from the 40×25 series, which is shown in Fig. 3. In each channel, the series needs an LSTM cell to take the progress of 40 time steps. The whole operations of the LSTM cell are also illustrated in the local enlargement of Fig. 3, and the mechanism is described by Eqs. (3)–(5). The bidirectional LSTM structure, which involves two sets of LSTM cells that process the series in the forward and backward directions, is employed for accurate identification and reconstruction[37]. The bidirectional LSTM structure produces an output at each time step, and all the outputs form a new layered series to be fed into a fully connected layer with a Sigmoid function at the end of each channel. After linearly scaling the outputs of both channels to their respective reconstruction ranges, the reconstructed CDs and CPs are finally estimated. To suppress the overfitting caused by a large number of model weights, batch and layer normalization are performed after each convolution layer and LSTM layer[38,39].

    Framework architecture of the correlation learning-based method. The framework, namely, as CLNet, consists of a residual module and a correlation learning module. The residual module is a modified ResNet34, which has an adaptive average pooling layer at the beginning and another one at the end. The correlation learning module is a Bi-LSTM structure, which can directly reconstruct the profile parameters.

    Figure 3.Framework architecture of the correlation learning-based method. The framework, namely, as CLNet, consists of a residual module and a correlation learning module. The residual module is a modified ResNet34, which has an adaptive average pooling layer at the beginning and another one at the end. The correlation learning module is a Bi-LSTM structure, which can directly reconstruct the profile parameters.

    2.3. Data generation and training

    We generate the scattering maps of the grating samples mentioned in Sec. 2.1. The sample material is Si. The grating pitch L is 170 nm, and σDW is set to 2 nm for the surface roughness. The X-ray photon energy is 24 keV. The exposures of each sample are taken at 100 AOIs, uniformly spanning from 20° to 20°. The data recorded by the detector with 512 pixel × 512 pixel only contain information in the middle few rows of pixels[27]. Hence, we take out 4 pixel × 512 pixel from each AOI’s scattering patterns. These data of multiple AOIs are concatenated together to 400 pixel × 512 pixel. Poisson noise is added to the scattering data, where the noise level is related to the photon number. The photon number is 10 times the number of pixels in the scattering data, which is set according to actual experimental conditions[40]. The background noise level is a relative intensity of 1×106[10]. It should be noted that our method can be applied to other X-ray photon energies theoretically.

    The dataset includes the scattering maps and the corresponding structural parameters of the samples. The train, validation, and test sets are divided into 8:1:1. The model is trained on a workstation with two NVIDIA Quadro RTX 8000 GPUs and implemented using PyTorch. The Adam optimizer is used to update the network weights with the initial learning rate lr=1×104 and β1=0.9, β2=0.999. The loss function is the L1 loss with the physical symmetry prior[27]. The batch size is 16. These hyperparameters are chosen to ensure that the CLNet model can sufficiently understand and express the underlying information from the dataset. The training epoch is 200, taking about 25 h in total.

    3. Results and Discussion

    Figure 4 shows the reconstructed results and the corresponding fitting scattering data of one randomly selected instance in the test set. The reconstructed results for profiles CD and CP are shown in Figs. 4(a)4(c), respectively. It shows that our method can precisely estimate the structural parameters. Figure 4(d) shows the measured scattering map of this sample recorded by the detector. To further evaluate the accuracy of our method, we recalculated the fitting data and chose six q-slices to compare with the map in Fig. 4(d). Figures 4(e)4(j) show the measured data, fitting data, and corresponding residuals at qx=0.073, 0.147, 0.220, 0.293, and 0.366nm1 and qz=0nm1. The fitting scattering data at these slices exhibit great consistency with the measured data. The proposed method achieves high fitting accuracy in low and medium q regions, where the measured scattering intensity is significant, and the signal-to-noise ratio is high. However, in the high q regions, the noise impact becomes more significant, leading to increased residuals and reduced accuracy.

    Reconstructed results and corresponding fitting scattering data of a randomly selected sample. (a)–(c) Reconstructed results for profiles CD and CP, respectively. (d) Measured qx-qz scattering map obtained from the scattering patterns at 100 AOIs, uniformly spanning from −20° to 20°. (e)–(j) Measured data, fitting data, and the corresponding residuals at the q-slices, which are indicated by the dashed green lines in (d). In the high q regions, the residual increases due to the low signal-to-noise ratio of the measured data.

    Figure 4.Reconstructed results and corresponding fitting scattering data of a randomly selected sample. (a)–(c) Reconstructed results for profiles CD and CP, respectively. (d) Measured qx-qz scattering map obtained from the scattering patterns at 100 AOIs, uniformly spanning from −20° to 20°. (e)–(j) Measured data, fitting data, and the corresponding residuals at the q-slices, which are indicated by the dashed green lines in (d). In the high q regions, the residual increases due to the low signal-to-noise ratio of the measured data.

    Reconstruction can be achieved for diverse samples, including phenomena such as tilting, bowing, and tapering, as illustrated in Fig. 5. Figures 5(a) and 5(b) are the reconstructed CDs and CPs of three randomly selected samples in the test set, respectively. In addition to CLNet, the results of well-trained ResNet34[41], ResNet50[27,41], AlexNet[22,42], EffNet-b2[43], and TinyViT-5m[44] are also shown in Fig. 5 for ablation and comparison experiments. The hyperparameters used by these models are consistent with those described in Sec. 2.3. It can be seen that our method performs more excellently in comparison. To quantitatively analyze and compare the performance of these six models, we use three kinds of key metrics to evaluate 10,000 instances in the test set: mean absolute error (MAE), R-squared value (R2), and slope. {MAE=1Mm=1M|UmVm|R2=1m=1M(UmVm)2m=1M(UmU¯)2slope=m=1M(UmVmMU¯V¯)m=1MUm2MU¯2,where U is the ground truth, V is the reconstructed parameter, and M is the total number of samples.

    Reconstructed results of three randomly selected samples in the test set. Each grating consists of 40 layers. (a), (b) Corresponding reconstructed CDs and CPs, respectively. Besides our method, the plots also include results from five other models, namely, ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyVit-5m. These are used for ablation and comparative experiments.

    Figure 5.Reconstructed results of three randomly selected samples in the test set. Each grating consists of 40 layers. (a), (b) Corresponding reconstructed CDs and CPs, respectively. Besides our method, the plots also include results from five other models, namely, ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyVit-5m. These are used for ablation and comparative experiments.

    For clarity, it is essential to distinguish the three types of MAE since each grating sample has 40 layers, denoted as per-layer MAE, per-sample MAE, and overall MAE[27]. The per-layer MAE refers to the MAE of each layer for all samples, and the per-sample MAE indicates the MAE of each sample for all 40 layers. The MAE of the entire test set is denoted as the overall MAE. The results of per-layer MAE for CD and CP are shown in Figs. 6(a) and 6(c), respectively. Our method has the minimum error at each layer. The fitting results of CLNet for the edge layers demonstrate significant improvement compared to AlexNet and EffNet-b2. However, this phenomenon is insignificant for ResNet34, ResNet50, and TinyViT-5m. Figures 6(b) and 6(d) illustrate the cumulative error distribution of the per-sample MAE for CD and CP, respectively. Compared with other models, the cumulative curves of CLNet reached 100% most quickly, indicating that our method has achieved high accuracy in reconstructing structural parameters. The CLNet method has the overall MAE for the CD of 0.110 nm, while the ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyViT-5m have those of 0.132, 0.142, 0.510, 0.456, and 0.156 nm, respectively. The overall MAE for the CP of our method is 0.031 nm, while the values of ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyViT-5m are 0.036, 0.041, 0.098, 0.087, and 0.045 nm, respectively. The results indicate that our method achieves a 20.0% improvement in accuracy for reconstructing the CD and a 13.9% improvement for reconstructing the CP than the second-highest achieved by ResNet34. The maximum per-sample MAE of the CLNet is 1.981 nm for CD and 0.631 nm for CP, whereas ResNet-34 and TinyViT-5m yield the second-highest results of 2.264 nm for CD and 0.764 nm for CP, respectively.

    (a), (c) Per-layer MAE for CD and CP in the test set, respectively. (b), (d) Cumulative error distributions of per-sample MAE for CD and CP, respectively. Per-layer MAE represents the mean absolute errors for each layer of all samples, and per-sample MAE represents the mean absolute errors for individual samples.

    Figure 6.(a), (c) Per-layer MAE for CD and CP in the test set, respectively. (b), (d) Cumulative error distributions of per-sample MAE for CD and CP, respectively. Per-layer MAE represents the mean absolute errors for each layer of all samples, and per-sample MAE represents the mean absolute errors for individual samples.

    It should be noted that in Fig. 6(d), the cumulative error distribution curve rises quickly within 0.01 nm. Under the ideal sample situation, where the ground truth CPs of each layer are equal to zero, the CLNet’s reconstructed results for CP are almost perfect, which is shown in Fig. 7. In other words, our method fully utilizes the LSTM mechanism to learn the prior knowledge about this kind of inter-layer correlation.

    (a), (b) Reconstructed CP and the corresponding per-layer MAE, respectively, under the ideal situation where the ground truth CPs of each layer are equal to zero.

    Figure 7.(a), (b) Reconstructed CP and the corresponding per-layer MAE, respectively, under the ideal situation where the ground truth CPs of each layer are equal to zero.

    The R2 and slope are widely used to evaluate the fitting performance of the methods. R2 for the CD of our proposed method is 0.9995, while those of the compared models are all less than 0.9993. For CP, R2 of CLNet is 0.9953, while those of other methods are less than 0.9942. This indicates that our method has superior fitting ability. The slopes of the CLNet are 0.9993 for CD and 0.9942 for CP, respectively. In addition to the CLNet, the second-highest achieved results for CD and CP are 0.9997 (TinyViT-5m) and 0.9920 (ResNet34). The slope achieved by TinyVit-5m for the CD is closer to 1 than our method, but the slope for the CP achieved by TinyVit-5m is only 0.9897. The detailed evaluation results are shown in Table 1. Notably, our method still demonstrates good fitting ability at the background level of 105, with both the R2 and the slope approaching 1.

    • Table 1. Evaluation Comparison of Six Types of Methods

      Table 1. Evaluation Comparison of Six Types of Methods

      MethodOverall MAE (nm)Max per-sample MAE (nm)R2 (arb. units)Slope (arb. units)
      CDCPCDCPCDCPCDCP
      CLNet (our method)0.1100.0311.9820.6310.99950.99530.99930.9942
      ResNet340.1320.0362.3960.7640.99930.99420.99780.9920
      ResNet500.1420.0412.9670.6450.99920.99260.99950.9867
      AlexNet0.5100.0983.8711.3580.99330.96440.99340.9558
      EffNet-b20.4560.0872.6061.0650.99480.97160.99680.9625
      TinyViT-5m0.1560.0452.2640.7420.99910.99210.99970.9897

    Uncertainty quantification holds significant importance as it enables the estimation of method repeatability and facilitates comparability across diverse approaches[45,46]. In order to quantify the model uncertainty of our methods, we sample the posterior distribution using stochastic gradient Langevin dynamics (SGLD)[47] and perform Bayesian inference averaging on the obtained samples. Typically, uncertainty is represented by calculating the three times standard deviation (3σ)[48]. Figure 8 illustrates the uncertainty (3σ) quantifications for CD and CP of one randomly selected sample, using our method and the five methods of comparison. Besides accurately obtaining structural parameters, our method achieves confidence intervals better than the other methods. To further understand and evaluate the uncertainty of our method, we calculated the mean 3σ for the overall test set. The CLNet method has the mean 3σ for CD of 0.322 nm, while the ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyViT-5m have those of 0.527, 0.570, 0.673, 0.673, 0.795, and 0.424 nm, respectively. The mean 3σ for CP of our method is 0.116 nm, while the values of ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyViT-5m are 0.204, 0.155, 0.216, 0.231, and 0.189 nm, respectively. The CLNet reduces the uncertainty for CD and CP by 24.1% and 25.2%, respectively. The results demonstrate that our method not only performs superior in terms of accuracy but also enhances the measurement repeatability by nearly 1.4 times.

    (a), (b) Model uncertainty quantification for CD and CP, respectively, based on six types of methods, including CLNet (our method), ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyVit-5m. Our method performs best in terms of reconstruction accuracy and repeatability.

    Figure 8.(a), (b) Model uncertainty quantification for CD and CP, respectively, based on six types of methods, including CLNet (our method), ResNet34, ResNet50, AlexNet, EffNet-b2, and TinyVit-5m. Our method performs best in terms of reconstruction accuracy and repeatability.

    4. Conclusion

    In this work, we propose a correlation learning-based method for reconstructing complex profile nanostructures in the field of SAXS measurement. Considering the inter-layer correlation of profile structural parameters and the nature of SAXS signals, we have designed and evaluated the CLNet framework, which incorporates an LSTM mechanism. The CLNet actively learns inter-layer correlations of samples, significantly enhancing the accuracy and uncertainty of profile reconstruction. Notably, our novel approach improves reconstruction accuracy by at least 13.9% and also exhibits a measurement repeatability that is nearly 1.4 times superior to the best among five other learning-based methods. In the future, we hope to extend our method to a broader range of nanostructures, such as 3D hole nanostructures. Additionally, this proposed approach may have potential applications in other profile metrology techniques, such as optoelectronic coupled devices (OCDs), which need further exploration.

    [7] M. Wormington, A. Ginsburg, I. Reichental et al. X-ray critical dimension metrology solution for high aspect ratio semiconductor structures. Metrology, Inspection, and Process Control for Semiconductor Manufacturing XXXV, 11611, 150(2021).

    [13] J. Reche, Y. Blancquaert, G. Freychet et al. Dimensional control of line gratings by small angle X-ray scattering: Shape and roughness extraction. 2020 31st Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), 1(2020).

    [18] R. Ciesielski, L. M. Lohr, H. Mertens et al. Pushing the boundaries of EUV scatterometry: reconstruction of complex nanostructures for next-generation transistor technology. Metrology, Inspection, and Process Control XXXVII, 12496, 447(2023).

    [22] S. Liu, T. Yang, J. Zhang et al. X-ray scatterometry using deep learning. Tenth International Symposium on Precision Mechanical Measurements, 12059, 481(2021).

    [24] B. Dey, D. Cerbu, K. Khalil et al. Unsupervised machine learning based CD-SEM image segregator for OPC and process window estimation. Design-Process-Technology Co-optimization for Manufacturability XIV, 11328, 317(2020).

    [26] A. Baranovskiy, I. Grinberg, M. G. Greene et al. Deep learning for the analysis of X-ray scattering data from high aspect ratio structures. Metrology, Inspection, and Process Control XXXVII, 12496, 837(2023).

    [29] S. S. Ginosar. Modeling Visual Minutiae: Gestures, Styles, and Temporal Patterns(2020).

    [30] J. Zhang, T. Lan, Y. Gao et al. CDSAXS study of 3d NAND channel hole etch pattern edge effects and etched hole pattern variance. Metrology, Inspection, and Process Control XXXVIII, 12955, 813(2024).

    [35] C. Petti. 3d memory: etch is the new litho. Advanced Etch Technology for Nanopatterning VII, 10589, 1058904(2018).

    [37] S. Siami-Namini, N. Tavakoli, A. S. Namin. The performance of LSTM and BiLSTM in forecasting time series. 2019 IEEE International Conference on Big Data (Big Data), 3285(2019).

    [38] S. Santurkar, D. Tsipras, A. Ilyas et al. How does batch normalization help optimization?. Advances in Neural Information Processing Systems, 31(2018).

    [39] J. L. Ba, J. R. Kiros, G. E. Hinton. Layer normalization(2016).

    [40] C. Settens, B. Bunday, B. Thiel et al. Critical dimension small angle X-ray scattering measurements of FinFET and 3D memory structures. Metrology, Inspection, and Process Control for Microlithography XXVII, 8681, 200(2013).

    [41] K. He, X. Zhang, S. Ren et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770(2016).

    [42] A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25(2012).

    [43] M. Tan, Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105(2019).

    [44] K. Wu, J. Zhang, H. Peng et al. Tinyvit: Fast pretraining distillation for small vision transformers. European Conference on Computer Vision, 68(2022).

    [45] B. Bunday, G. Orji. Metrology. 2021 IEEE International Roadmap for Devices and Systems Outbriefs, 1(2021).

    [47] C. Li, C. Chen, D. Carlson et al. Preconditioned stochastic gradient Langevin dynamics for deep neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 30(2016).

    Tools

    Get Citation

    Copy Citation Text

    Hairui Yang, Zhaolong Wu, Hong Yu, "Enhanced profile reconstruction of small-angle X-ray scattering measurement via correlation learning," Chin. Opt. Lett. 23, 083401 (2025)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: X-ray Optics

    Received: Dec. 26, 2024

    Accepted: Apr. 14, 2025

    Published Online: Jul. 23, 2025

    The Author Email: Hong Yu (yuhong@zjlab.ac.cn)

    DOI:10.3788/COL202523.083401

    CSTR:32184.14.COL202523.083401

    Topics