In recent years, machine-learning (ML)-based techniques have surged in popularity as tools for addressing problems in optics and photonics.1
Advanced Photonics, Volume. 6, Issue 5, 056006(2024)
Nested deep transfer learning for modeling of multilayer thin films
Machine-learning techniques have gained popularity in nanophotonics research, being applied to predict optical properties, and inversely design structures. However, one limitation is the cost of acquiring training data, as complex structures require time-consuming simulations. To address this, researchers have explored using transfer learning, where pretrained networks can facilitate convergence with fewer data for related tasks, but application to more difficult tasks is still limited. In this work, a nested transfer learning approach is proposed, training models to predict structures of increasing complexity, with transfer between each model and few data used at each step. This allows modeling thin film stacks with higher optical complexity than previously reported. For the forward model, a bidirectional recurrent neural network is utilized, which excels in modeling sequential inputs. For the inverse model, a convolutional mixture density network is employed. In both cases, a relaxed choice of materials at each layer is introduced, making the approach more versatile. The final nested transfer models display high accuracy in retrieving complex arbitrary spectra and matching idealized spectra for specific application-focused cases, such as selective thermal emitters, while keeping data requirements modest. Our nested transfer learning approach represents a promising avenue for addressing data acquisition challenges.
1 Introduction
In recent years, machine-learning (ML)-based techniques have surged in popularity as tools for addressing problems in optics and photonics.1
One major limitation of DL-based inverse design, however, is the exceptionally high cost of acquiring high-quality labeled data. Most sufficiently complex structures require full-wave simulations to predict optical responses. In some circumstances, simulating even a single structure can take on the order of hours, and one may need hundreds of thousands to millions of samples to train a single model accurately, posing the largest bottleneck for building and scaling nanophotonic inverse design models. Moreover, for a given task, the model is trained with certain constraints and assumptions about the design being predicted, such as a fixed material for the substrate or cladding of a metasurface and predefined geometries of the resonant elements. Once trained, the model would only be able to give useful design suggestions for that specific set of limitations. Introducing additional geometric parameters and including variables of drastically different natures (e.g., indicators of material choices) both raise the complexity of the task steeply.
One approach to addressing the above issue is the use of transfer learning. For some types of structures, although tackling a complete version with enough complexity for practical applications is exceedingly slow, simulating thousands of data points for a simplified toy version with reduced degrees of freedom can be relatively fast and manageable. Projecting this difference to the training of DL models, rather than initializing the weight values in a network randomly, the initial layers’ values are taken from another network that has already been trained to predict a similar, and in many cases, simpler task.37,38 In a sense, instead of learning a complex task from scratch, the model needs to learn just enough to account for the differences in the two data sets. As such, transfer learning can potentially allow for faster convergence to accurate predictions using fewer data. This relies on the assumption that the features and relations learned for the first task will also have high predictive power for the next one. Previous work has shown the ability to transfer knowledge between inverse39,40 and forward41
Sign up for Advanced Photonics TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
In this work, we report a nested transfer learning approach wherein models are trained to predict structures of gradually increasing complexity, with transfer operations done between each model [Fig. 1(a)]. This “nesting” strategy effectively allows for a small amount of data per network and can model a significantly higher level of optical complexity than in previously reported works. We demonstrate this on both forward prediction and inverse design of multilayer thin film structures [Fig. 1(b)]. Since both neural networks and thin film structures use the term “layers” for their components, hereafter we refer them to “network layers” and “structure layers,” respectively, to distinguish them. For forward modeling, we build transfer models up to 30 structure layers deep using a bidirectional recurrent neural network (RNN) architecture. For the inverse model, we build up to 10 structure layers, using a convolutional partial mixture density network (MDN) [Fig. 1(c)]. For each data set, the material used at each structure layer is randomly selected from a prechosen list. We stress that the high degree of freedom in the design variables represents a significant challenge for modeling, surpassing those of previously reported thin-film inverse design tasks.36 Because photonic devices composed of building blocks in regular shapes are typically described by a vector of discrete variables, the degrees of complexity of their design process are somewhat comparable. It is thus reasonable to assume that conclusions drawn from the study of thin films are applicable to devices showing a higher visual complexity, such as many metasurfaces and photonic/plasmonic crystals. The combination of free material choice at every layer, as well as fully continuous thickness values within a wide range, results in a design-to-response mapping that is highly sensitive to small changes. Despite this, our nested transfer approach allows us to achieve accurate retrieval without a significant increase in data requirements. We evaluate the forward and inverse models on arbitrary random spectra and can recreate closely matching designs. For the inverse model, we implement a postprocessing optimization method using the architecture to further improve results. Finally, as a proof-of-concept demonstration of the model’s ability to address realistic problems, we present a design of selective thermal emitter conceptually similar to multilayer metamaterials for thermophotovoltaics.47
Figure 1.(a) Schematic illustrating the principle of nested transfer. Left to right: Weights from the previous model are taken after training (box with dashed lines) and used for initialization of the weights of the next model (solid-colored lines between neurons), as the complexity of the output gradually increases. (b) Diagram of a multilayer thin-film structure. The structure has a choice of any of four materials at each layer, with the constraint that no two neighboring layers are the same material. (c) Architecture of the mixed convolutional MDN used for the inverse design. There are initial pairs of convolutional and max pooling layers leading to fully connected layers. The output is split into categorical channels predicting the material choice at each layer and a final MDN channel representing probability distributions of the layer thicknesses.
2 Materials and Methods
For the forward model, we use a bidirectional RNN [Fig. 2(a)]. For an RNN, rather than information flowing strictly to the next network layer as in a feedforward network, information can also flow from one layer back to itself.48 A standard fully connected network does not assume that relations between any given features are more important than any others from initialization, and input variables can be arranged in any order provided they are consistent throughout the data set. In RNNs, neurons store an internal state or memory that affects how they process subsequent inputs. This allows them to handle inputs of arbitrary length, processing them differently based on recent inputs and learning context. Therefore, RNNs excel in handling sequential data such as time series forecasting and natural language processing,49
Figure 2.(a) Diagram of a bidirectional RNN used for the forward model. (b) Training curves for nested transfer forward prediction for thin-film structures of increasing complexity (see legends), up to 30 layers. (c) Comparison of requested ground-truth spectrum (blue) and the spectrum predicted by the forward model (orange) for a randomly chosen case in the test data set.
The final network architecture for forward modeling uses a series of bidirectional LSTM layers to initially process the input data before connecting to fully connected layers to get the final output. The full architecture is shown in Sec. S1 in the Supplementary Material. The transfer occurs from six structure layers all the way up to 30 structure layers, yielding accurate predictions for a higher level of complexity than that previously reported. For the loss function, we use the root mean squared error (RMSE). A different data set is generated for each structure with a different number of layers. Large-sized data sets are generated by calculating transmission and reflection using Fresnel equations.55,56 Most previous demonstrations of ML on thin-film optics have had fixed choices of materials at each layer based on prior physics-based intuitions guided by the researchers. This makes the modeling task much simpler, as material choice interacts with other design variables at a very fundamental level. Here, we also allow for free material choice at all structure layers to demonstrate the ability to learn a more complex mapping while using fewer overall data. Materials are chosen randomly from one of four oxides: , , , and . For all models demonstrated here, four possible materials are chosen during data generation, with the constraint that no two neighboring structural layers are the same material. For a 10-layer structure, this represents , or over 75,000 possible combinations for material choice alone. The structure is placed on a semi-infinite glass substrate and surrounded by air cladding. We calculate the reflectance spectrum between 400 and 2500 nm, with the wavelength range discretized into 300 points.
The inverse model, unlike the forward network, needs to predict both categorical variables, the material choice at each layer, as well as continuous variables, the layer thicknesses. This complicates the modeling significantly, and to deal with this, the model branches into multiple outputs. The initial layers are three pairs of convolutional and max pooling layers that first aim to learn key spectral features such as the location and shape of peaks. This is followed by a series of fully connected layers to learn the relation and importance between these spectral features. All the initial layers use the ReLU activation function. The network then branches into sets of outputs for a structure with layers [Fig. 1(c)]. There are four output neurons for each of the first sets, representing the relative likelihood of each of the four possible material choices for each structure layer. The last set of the output, connected to an MDN layer, comprises a series of neurons encoding parameters for several probability distributions over the possible range of thicknesses. Each mixture is parametrized by a mean and a variance , as well as a mixing weight . For the inverse modeling here, 32 mixtures were used. The MDN is chosen for the layer thicknesses rather than a typical fully connected layer for the final output due to its demonstrated ability to converge accurately when processing multimodal data.57 The final outputs model two types of data, the categorical material choices and the continuous layer thicknesses, and so two different loss functions are used. The outputs representing the material choices use categorical cross-entropy as the loss function, as well as a SoftMax activation function so that the four outputs sum to one. The values then can be interpreted as an estimated probability for each of the four choices. The MDN output representing the continuous layer thicknesses uses the negative log likelihood as its loss function, which measures how well the actual probability distribution of the data matches the expected one produced by the model. No activation function is used for the final MDN layer.
3 Results
For the forward model, transfer was tested at different conditions to compare which gave the best results at 30 structure layers. The data for two comparative studies are presented in Sec. S2 in the Supplementary Material. It is concluded that the step size of four and transferring all but the final network layer of weights yielded the best results when the 30-structure layer model converged, resulting in a final nested transfer protocol. For this, 26,000 samples are first generated for a six-layer structure, of which 70% are used for training, and a model is trained from scratch for 500 epochs. After training, a new equal-sized data set is generated for a 10-layer structure and is initialized with weights from the pretrained six-layer structure model following the protocol, and this model is trained for the same number of epochs, behaving similarly in reaching convergence [Fig. 2(b)]. The transfer process is repeated, increasing the size of the thin-film structure by four layers at a time, with information accumulating with all the successive transfers. Finally, a model for 30-layer structures is trained, which can reach an RMSE on test data of 0.05, accurately reproducing arbitrary spectra [Fig. 2(c)]. Across all models used in the final nested transfer procedure, a total of 127,400 (i.e., ) samples are used for training. Previous works have reported on transfer learning for inverse design for thin-film optics; however, their use of transfer was to better learn the simpler forward model and used that forward model in conjunction with other optimization techniques.43,44 Alternately, the transfer has been used to more efficiently learn internal variables like mixture density parameters for a single model.40 The results showcased here fully model both the forward and inverse directions, and on top of that, deal with a larger number of structure layers, more options of materials, and a broader band of wavelengths while achieving comparable prediction accuracy. For other types of optical structures, such as metasurfaces, transfer learning has been used with full inverse design models, albeit with simple constraints on the possible designs.39 We show that the nested transfer method can model a significantly higher degree of complexity than existing benchmarks while keeping data requirements modest. To demonstrate how much the addition of the transfer protocol improves the training, we compare two models, with and without transfer, using different regression metrics. The results are shown in Sec. S2 in the Supplementary Material. These improvements afforded by nested transfer enable the potential advantage of using the bidirectional RNN architecture for the accurate retrieval of optical spectra with high complexity.
As the inverse modeling is considerably more complex than the forward modeling, more data are needed. The increase in optical complexity rises exponentially with the number of possible layers. We account for this, while still using fewer data, by scaling the size of the data sets linearly with the number of layers. We also start at a lower initial layer number to allow for more consecutive transfer and learning of simple tasks. Here, an initial two-layer data set and model are generated and trained, respectively, before then transferring weights to a three-layer case. The two-layer case trains with 20,000 samples in the data set, with 70% of the data used directly for training and the remaining 30% used for validation. For the three-layer case, the weights from the initial convolutional and pooling layers are transferred, and a new data set is generated with 30,000 total samples. This process is then repeated for each layer transfer up to 10 layers, which uses 70,000 samples for training and 30,000 for validation, for a total data set size of 100,000 samples. We find that, unlike in the simpler case of forward transfer, increasing the structure layer number more than one at a time can cause overfitting of the test data. This may be due to the significantly higher complexity in the inverse modeling typically requiring larger data sets to train from scratch.
The final configuration reached is transferring from two-layers to three-layers, three-layers to four-layers, and so on, with eight weight layers in the network transferred at each step. Across all models, a total of 378,000 training samples are used. The models are trained for 300 epochs, each using an Adam optimizer with a learning rate of 0.01 and a scheduler reducing the learning rate by 70% when the average loss across all outputs does not decrease for 10 consecutive epochs [Fig. 3(a)]. The loss functions are not easily interpretable, but we can estimate the accuracy of proposed designs by simulating them and calculating the RMSE of the produced spectra compared to the original ground truths. For the MDN’s output, we take the mean of each of the distributions that gives the highest likelihood value for each parameter, as all distributions calculated by the MDN for this data set tend to be unimodal or quasi-unimodal. For a 10-layer case, simply taking a single mean output from the probability distributions for the layer thicknesses and the highest probability values for the material choices of each layer, we get a response RMSE of 0.15. A selected case from the test data set is shown in Fig. 3(b), showing a decent agreement on most features. The remaining discrepancy, most likely caused by the naive sampling strategy, can be drastically reduced by implementing a postprocessing procedure. For this, a forward model needs to be trained to act as an estimator of the proposed candidate designs’ viability. Using the same type of network used in the forward nested transfer protocol, we train a model for 10-layer forward prediction on the same data set for inverse modeling. This model is trained for 500 epochs without any prior transfer and can reach a test set RMSE below 0.01, yielding accurate estimation of the optical responses of arbitrary candidate designs.
Figure 3.(a) Training curves for nested transfer inverse design up to 10 layers. The categorical loss (left panel) refers to the outputs representing the material choices at each layer, while the continuous loss (right panel) refers to the negative log likelihood for the MDN predicting the thickness of each layer. (b) Comparison of requested ground-truth spectrum (blue) and the design suggested by the model for a randomly chosen case in the test data set without postprocessing. The model-suggested design is [
The postprocessing procedure involves sampling the MDN output distributions for the design variables one at a time and fixing the best estimated value before moving on to the next variable. The full details for the procedure are given in Sec. S3 in the Supplementary Material. A comparison of a random requested spectrum, initial model suggestion design, and the design after postprocessing is shown in Fig. 3(c), where even though the initial design deviates wildly from the ground truth in a relatively rare case, it is recovered through postprocessing. An unexpected phenomenon observed for this data set is that the retrieved designs do not necessarily stick to the 10-layer structure. It is not rare, as shown in Fig. 3(c), that two adjacent layers take the same material, resulting in essentially a reduced layer number. Other than the obvious cause that the distinctness constraint was only applied to data generation but not inverse design, another possible reason is the close refractive index values of the chosen oxides. Although both issues can be avoided in the implementation, the current model offers the flexibility in finding equivalent designs with fewer physical layers, partially a consequence of the weights transferred from the nine- and eight-layer models. For the task under study, the large thickness ranges and free material choice at each layer represent a significant challenge for modeling, and the complete network can still retrieve accurate solutions. In the selected case in Fig. 3(c), the postprocessing gives a 77% reduction in the RMSE between the requested and model-suggested spectra. We stress again that previous works using transfer learning on thin-film structures have primarily studied transfer between forward models and at significantly lower layer numbers and modeling complexity than we report here (see a comparison in Table S2 in the Supplementary Material). We compare these models based on the maximum number of layers, number of material choices, and length of the design vector to demonstrate the increased difficulty of the modeling task. The modeling of both forward and inverse directions with free material choice at each layer gives our method more flexibility to tackle design requests for complex real-world applications.
We demonstrate this flexibility of our nested transfer procedure on specific applications. We focus on the use of thin-film structures for selective thermal emissions. Thin-film stacks can be used as optical filters to enhance the transmission, reflection, or absorption over large bandwidths and with a high contrast.36,56 At infrared wavelengths, these properties have well-established connections to the thermal emissivity of materials.58
Figure 4.(a) Diagram of thin-film structure used for selective thermal emission. (b) Comparison between requested ground-truth spectrum and spectrum produced by model suggested design for an arbitrary test data set sample. The model-suggested design is [
4 Discussion
We propose a method of iterative nested transfer learning to gradually build forward prediction and inverse design models of increasing complexity while using small data sets at each step. The forward model can accurately reproduce arbitrary spectra for 30-layer thin-film stacks. This approach is extended to inverse design models which are built up from 2 to 10 layers of thin-film stacks allowing a free material choice at each layer. A postprocessing method using a pretrained forward network is used to further reinforce the design accuracy. The forward model uses a bidirectional RNN-based architecture, and the inverse model uses a convolutional MDN architecture. The complexity arising from the broad wavelength range and free material choice represents some of the most challenging tasks to model that have been demonstrated in DL-based inverse design. The accuracy of the forward model dealing with up to 30 structure layers with the same material choice is also among the most complex modeling tasks previously shown. Despite the high degree of complexity, the nested transfer method combined with postprocessing allows for accurate recreations of arbitrary spectra while keeping data requirements modest. Finally, the same architecture and training approach are applied to a modified data set for predicting designs for selective thermal emitters, generating close approximations of unrealistic idealized spectra for thermophotovoltaic applications. While the results here are restricted to thin-film stacks, this same approach of gradually building complexity with transfer learning can be extended to a wider variety of structures, where the computational requirements for generating a suitably large data set for the desired degree of complexity may not be feasible. Even structures that require full-wave simulations can quickly generate larger data sets for simplified versions with reduced degrees of freedom and continually use small data sets as the complexity and number of design variables increase. In another vein, other than transferring information to cope with increasing layer numbers, generalization of geometry to, e.g., multilayer core shells, has proved viable.41 If the problem is formulated properly, it might be possible as well to ease the augmentation of materials, benefiting the search of all dimensions of the design space. The use of RNNs in optical modeling, especially for structures and processes that can be described by sequential inputs/outputs, is also worth further exploring. We foresee that this will enable high-performance inverse design models to be built that previously would have been computationally unfeasible, allowing for new application-specific designs to be searched for.
Acknowledgment
Acknowledgment. The authors acknowledge the financial support of the National Institute of General Medical Sciences of the National Institutes of Health (1R01GM146962-01).
Rohit Unni received his PhD in materials science from the University of Texas at Austin in 2024 and his bachelor’s degree from Washington University in St. Louis in 2016. His research interests incorporate bridging nanophotonics and machine learning from multiple directions, including inverse design, computer vision, and next generation foundation models.
Kan Yao is currently a research fellow in the University of Texas at Austin. He received his PhD in Electrical Engineering from Northeastern University, Boston, USA, in 2017. His research interests span various areas of photonics, such as plasmonics, metamaterials and metasurfaces, light-matter interactions, chiroptics, quantum photonics, and device design.
Yuebing Zheng is a professor at the University of Texas at Austin. He holds the Cullen Trust for Higher Education Endowed Professorship in Engineering. He received his PhD from Pennsylvania State University in 2010 and did postdoctoral research at the University of California, Los Angeles from 2010 to 2013. His research is at the forefront of optics and photonics, where they innovate optical manipulation and measurement to transform scientific research and tackle pressing global challenges.
[3] K. Yao, Y. Zheng. Nanophotonics and Machine Learning: Concepts, Fundamentals, and Applications(2023).
[5] E. Alpaydin. Introduction to Machine Learning(2014).
[50] Ö. Batur Dİnler, N. Aydin. An optimal feature parameter set based on gated recurrent unit recurrent neural networks for speech segment detection. Appl. Sci., 10, 1273(2020).
[52] F. Informatik et al. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks, 237-243(2001).
[56] M. Born, E. Wolf. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light(1999).
Get Citation
Copy Citation Text
Rohit Unni, Kan Yao, Yuebing Zheng, "Nested deep transfer learning for modeling of multilayer thin films," Adv. Photon. 6, 056006 (2024)
Category: Research Articles
Received: Mar. 31, 2024
Accepted: Sep. 11, 2024
Posted: Sep. 12, 2024
Published Online: Oct. 24, 2024
The Author Email: Zheng Yuebing (zheng@austin.utexas.edu)