Symbiotic evolution of photonics and artificial intelligence: a comprehensive review

Fu Feng; Dewang Huo; Ziyang Zhang; Yijie Lou; Shengyao Wang; Zhijuan Gu; Dong-Sheng Liu; Xinhui Duan; Daqian Wang; Xiaowei Liu; Ji Qi; Shaoliang Yu; Qingyang Du; Guangyong Chen; Cuicui Lu; Yu Yu; Xifeng Ren; Xiaocong Yuan

doi:10.1117/1.AP.7.2.024001

1 Introduction to Photonics and AI

The development of artificial intelligence (AI) has undergone multiple transformations since its inception in the 1940s, when McCulloch and Pitts introduced the concept of neural networks.1 In its early stages, AI focused on symbolic AI based on logic, resembling the logical circuits of hardware systems such as computers.2 During the latter part of the 20th century, the emphasis shifted to Bayesian AI, grounded in statistics and recognizing the inherent randomness of the natural world.3 This era saw the development of notable algorithms such as support vector machines (SVMs)4 and AdaBoost,5 which spurred significant progress in AI research. These approaches were effective for data with simpler distributions but struggled with more complex ones. The rise of neural AI, driven by the advent of deep learning and fueled by big data and immense computational power, marked a pivotal paradigm shift in the early 21st century.6 Initially, deep-learning models involved relatively fewer parameters compared with the later, more complex architectures, but a breakthrough came in 2012 when Krizhevsky et al.7 successfully trained a deeper neural network (AlexNet) using large data sets, demonstrating the superior performance of deep learning over traditional machine learning in handling complex data sets. Deep learning, particularly with a specific architecture of convolutional neural networks (CNNs), revolutionized domains such as image recognition,8^,9 natural language processing,10 and speech recognition,11 with many algorithms being commercialized. Unlike earlier machine-learning techniques, deep learning requires substantial data and significant computational resources, leading to the development of AI-oriented hardware solutions such as graphics processing units, tensor processing units, and neural network processing units. These technologies have significantly accelerated model training and inference, enabling the construction of models with large numbers of parameters capable of tackling more complex tasks. Recently, the field of AI has experienced transformative developments with the emergence of generative AI and large models such as generative pretrained transformers (GPTs).12 By learning from vast amounts of natural text produced by humans, GPT has demonstrated an advanced understanding of the intricate structures and logic underlying human language, successfully passing the Turing test. However, the exponential growth in the size of these models, with parameter counts increasing $\sim 10$ fold annually, far outpaces advancements in computing power in the post-Moore era. This discrepancy underscores the urgent need for new computing paradigms to support modern generative models as the current rate of progress in hardware development may not suffice to keep up with the demands of evolving AI.

The blossoming of AI has profoundly impacted fundamental disciplines, with its close ties to photonics becoming evident at the inception of neural networks,13 the deep-learning methods, and various intelligent algorithms applied for designing complex photonic structures14 where traditional design approaches fall short. AI’s capability to process and analyze large data sets has enabled the discovery of novel materials, such as for photovoltaics, leading to enhanced light absorption and efficiency.15 With the rapid advancement of optical imaging technologies—exemplified by the integration of high-performance charge-coupled device (CCD) sensors into smartphones—capturing high-quality images has become a standard and effortless process. In turn, as these detailed visuals are shared online, they frequently appear alongside textual descriptions, location data, and other contextual information, transforming the Internet into a rich source of multimodal data. This expansive, multimodal environment has subsequently fueled the growth of deep-learning methodologies, including CNN-based models, that leverage these diverse data streams for more nuanced analysis and understanding. The integration of AI and photonics has not only enhanced image-processing capabilities but also revolutionized fields such as medical imaging,16 astronomy,17 and autonomous driving.18 High-resolution and high-speed image-capture technologies have enabled precise diagnostic tools in healthcare, improved the clarity of celestial observations,16 and enhanced the safety and efficiency of autonomous vehicles.18 Benefiting from the progress in CNN technology, sophisticated network architectures such as U-net have been proposed to address tasks closely related to photonics, including segmentation, deblurring, superresolution, and image registration. U-net, with its encoder–decoder structure, has proven highly effective in segmenting medical images, thus aiding in early disease detection and treatment planning.19 Similarly, superresolution techniques have been pivotal in enhancing the details of satellite imagery, providing better data for environmental monitoring and urban planning.20 These tasks constitute a major part of low-level tasks in the field of machine vision, forming the foundation for more complex applications such as object detection, scene understanding, and augmented reality.21 The influences of AI on optical communication,22 optical performance monitoring (OPM),23 and optical sensing signal processing24 are also profound to tackle the complicated physical phenomena. As AI continues to evolve, its synergy with photonic technologies promises to unlock new potentials, driving innovation and improving accuracy and efficiency across various domains.

Photonics, the science of light generation, manipulation, and detection, plays a pivotal role in technological advancements across various sectors. The invention of the laser in 1960 marked a revolutionary breakthrough in the field of photonics,25 followed by the rise of fiber-optic communications in the 1970s, allowing information to be transmitted and manipulated at the speed of light through fine optical fibers.26 In the 1980s, breakthroughs in nonlinear optics, such as frequency conversion technology, expanded the applications of lasers.27 The discovery of photonic crystals in the 1990s promoted the development of new optical devices,28 and the advent of superresolution imaging techniques in the 2000s allowed optical microscopes to surpass the traditional diffraction limit.29 The development of photonic integrated circuit (PIC) technology in the 2010s enabled the integration of photonic devices onto a single chip, achieving higher speed and lower power consumption.30^,31 With the rapid development of photonics and computing power in the 2020s, AI has injected new vitality into photonics research and applications,32 moving beyond pattern classification33 and image reconstruction,34 which date back to the 1990s. AI has greatly enhanced the functionality and application scope of photonics in areas such as optical design optimization, complex data processing, and intelligent sensing systems. In the future, photonics, aided by AI, is expected to revolutionize information processing and transmission, further driving technological and societal progress.

The advancement of photonics is also revolutionizing the development of AI with its merits of fast speed and low power. The development of integrated photonics is at the forefront, with research focusing on creating compact, efficient, and scalable photonic circuits that can perform complex computations.32 Photonic circuits, leveraging the unique properties of light, offer a promising solution for efficient matrix computation. Matrix computation based on coherent photonic circuits utilizes the phase and amplitude of light waves to perform highly parallel and energy-efficient computations, surpassing traditional electronic methods in both performance and efficiency. In the photonic circuit, the $2 \times 2$ Mach–Zehnder interferometer (MZI) is one of the principal computational units. In 2017, Shen et al.32 experimentally demonstrated a two-layer optical neural network utilizing an MZI mesh construction and showed its utility for vowel recognition. In 2023, Wu et al.35 used an incoherent light source as input signals and designed a simplified MZI mesh to perform real-valued matrix-vector multiplication to ease challenges for packaging and calibration with only half the number of phase shifters. Matrix computation based on incoherent photonic circuits provides an alternative approach by utilizing the intensity of light, offering robust and scalable solutions despite the lack of coherence. Advances in optical computing based on PICs represent a significant leap in the integration and miniaturization of photonic components, enabling compact and high-performance optical processors capable of handling complex AI tasks. However, challenges such as hybrid integration of the photonics circuits,36 integration with electronic components,37 managing heat dissipation,38 and overcoming material limitations39 remain. From the material perspective, silicon photonics has promoted photonic computing with a compact, low-cost, and highly reliable chip. Although limited by inherent material properties, silicon alone makes it difficult to express merits in every aspect of photonic computing. Emergent materials, such as lithium niobate on insulator,40 semiconductor optical amplifiers (SOAs),41 nonvolatile phase change materials,42 ferroelectrics,43 and quantum dots,44 have been explored to support silicon photonics toward a faster, smarter, and more efficient engine that is capable of solving advanced tasks.

The diffraction principle offers an alternative method for realizing optical computing for large matrix computations effectively, allowing light to propagate freely in space, substantially enhancing computing density and scalability. This attribute has attracted numerous researchers to apply the diffraction scheme to neural networks, which often require large-scale matrix computations. Using light diffraction for optical convolution can achieve fast and energy-efficient image processing. This has led to exploration into more flexible and typical computing schemes for diffractive optics, paralleling principles akin to those of artificial neural networks (ANNs) in the development of optical neural networks utilizing diffractive properties. Optical neural networks have potential advantages over electrical neural networks, such as high throughput, light-speed processing, low power consumption, and high parallelism, which have shown great advantages compared with electrical computing.45^–48 In 2018, the Ozcan group pioneered the use of diffractive properties of light to perform neural network inference, introducing this approach as a diffractive deep neural network ( $D^{2} NN$ ).13 Since then, optical neural networks have been widely explored to fulfill complex computing tasks, such as feature detection,13^,49^–51 object classification,13^,52^–56 speech recognition,32 and optical element design.57^–59

Photonics and AI are both at the vanguard of technological evolution, charting a course toward unprecedented innovations across disciplines. The synergy between AI and photonics is illustrated in Fig. 1. In the category of AI for photonics, the symbiosis, intertwining the manipulation of light with intelligent algorithms, has catalyzed advancements in photonic design, optical imaging, optical communication, and optical data analysis, among others. In the complementary category of photonics for AI, the adoption of photonic platforms and novel materials to implement AI computations promises unprecedented performance gains for AI. As this review paper unfolds, we delve into the intricate dance between photonics and AI, exploring how their combined force is reshaping the technological landscape across various fields, how AI can enhance photonic technologies, and how photonic methods can boost the performance of AI systems.

Figure 1.Schematic of the synergy between photonics and AI.

Download full size

View all figures

Section 2 focuses on the role of AI in enhancing photonic systems. It explores how AI contributes to the design and optimization of photonic devices, utilizing methods such as deep learning, gradient-based inverse design, and global optimization. This section also examines AI’s impact on optical imaging, covering topics such as phase, polarization, and spectral imaging. Furthermore, it highlights AI’s contributions to optical data acquisition and analysis, particularly in areas such as optical signal processing, optical communication nonlinear compensation, performance monitoring, and sensing.

Section 3 shifts the focus to how photonic technologies are advancing AI. It emphasizes the use of photonic circuits to accelerate matrix computation, a key operation in AI algorithms, and explores optical computing based on light diffraction, which holds the potential for faster and more energy-efficient AI systems. In addition, this section addresses emergent materials for photonic computing, including new photonic components and structures that could potentially improve the performance of AI systems.

In Sec. 4, a brief discussion about the challenges and future directions is given, highlighting the development of the integration of quantum technologies with AI and photonics and the profound impact of AI and photonics in healthcare. Together, these sections provide a comprehensive review of the mutually beneficial relationship between AI and photonics, demonstrating how each field drives the progress of the other.

2 AI for Photonics

AI has made significant contributions to a wide range of fields, with photonics being one of the most impactful areas of application. The ability of AI to handle vast amounts of data, optimize complex systems, and make predictions with high accuracy has catalyzed advancements in photonic technologies. In particular, AI has shown great promise in the design, optimization, and analysis of photonic systems, facilitating the development of new materials, devices, and imaging techniques. By leveraging deep-learning algorithms and optimization methods, AI is driving innovation in photonics, enabling faster, more efficient designs and novel applications.

This section will explore the role of AI in photonics, focusing on key areas where AI is making a transformative impact. First, we will discuss the design and optimization of photonic systems using AI, highlighting methods such as deep-learning methods, gradient-based inverse design, global optimization, and individually inspired algorithms. Next, we will examine how AI is revolutionizing optical imaging techniques, including phase, polarization, and spectral imaging. Finally, this section will cover AI’s contributions to optical data acquisition and analysis, particularly in optical signal processing, optical communication, performance monitoring, and sensing parameter analysis. Through these discussions, we will demonstrate the profound impact of AI on photonics.

2.1 Design and Optimization

Designing and optimizing advanced photonic devices with excellent properties has been a long-term goal in the field of photonics. Over time, researchers have faced challenges in achieving high performance and multifunctionality, which prompted the exploration of novel approaches. The introduction of AI for the design and optimization of these devices has brought about innovative functionalities and dramatic improvements in performance.

This achievement not only demonstrates the great potential of AI in photonics but also highlights important directions for further exploration and optimization in the future. In this section, we will introduce the history, design principles, and typical applications of various intelligent algorithms in photonics. These methodologies encompass deep-learning methods, gradient-based inverse design, global optimization, and individually inspired algorithms, each offering unique advantages for addressing complex design and optimization challenges. Figure 2 provides an overview of these approaches, categorizing their applications and methodologies. The specific details of these approaches are further elaborated in Secs. 2.1.1–2.1.4.

Figure 2.Artificial intelligence for photonic devices.

Download full size

View all figures

2.1.1 Deep-learning methods

Deep learning, an exceptional and transformative subdivision of machine learning, is the driving force behind the rapid advancement of AI. Backed by neural networks, deep learning has become one of the most widely used machine-learning frameworks. Its algorithms more closely mimic the human learning process, and as a result, it has achieved notable success in a variety of areas, such as speech recognition, image processing, and classification. The inverse design algorithm process based on deep-learning methods consists mainly of two aspects, i.e., the forward design process and the inverse design process of neural networks. The core of deep-learning technology lies in the construction of ANNs.

Artificial neurons are the basic information processing units of deep neural networks (DNNs) formed by mimicking the workings of biological neurons. Artificial neurons consist of input nodes, output nodes, weights $W_{i}$ , biases $b$ , the weighted sum of inputs, and activation functions, as shown in Fig. 3(a). Input and output nodes are equivalent to the dendrites and synapses of biological neurons, respectively, and are responsible for signal reception and transmission. The weights represent the regulation work of the axon when it receives signal stimulation. The bias represents the size of the cell body’s own potential, the weighted sum of inputs, and the activation function together to complete the cell body’s function, and the commonly used activation functions are the sigmoid function, tanh functions, ReLU functions, etc. Neurons receive messages from different sources through activation states and pass them onto other neurons. In other words, deep learning is a hierarchical approach to representation learning. It is constructed by sequentially combining simple yet nonlinear modules. Each module is designed to transform a low-level representation into a more abstract and higher-level representation. With combinations of such transformations, deep learning can learn complicated functional representations.60 ANN is a machine-learning model inspired by the structure and function of neuronal networks in the human brain. It consists of a series of interconnected nodes (i.e., artificial neurons) that are organized according to a hierarchical structure. ANNs typically contain an input layer, a hidden layer, and an output layer, where the neurons in each layer are responsible for receiving, processing, and transmitting information. The input layer is in the first layer of the network and is responsible for receiving raw data as input to the neural network, such as image pixels, time series data, or physical signals. The output layer is in the last layer of the network and generates the final results based on the task requirements, such as classification labels, regression values, generated images, or other structured data. The intermediate layer between the input and output layers is the hidden layer, which contains multiple neurons that process the input data through a nonlinear activation function to extract complex features and underlying patterns. The number and size of the hidden layers directly determine the depth and expressiveness of the network, which in turn affects the learning and inference ability of the network. The layers are fully connected to each other, forming the structure of a DNN, as shown in Fig. 3(b).

Figure 3.ANN modeling. (a) Artificial neuron structure, (b) ANN model structure, and (c) photonic devices are described by two types of labels: physical variables $x$ and physical responses $y$ .

Download full size

View all figures

To train the ANN, a large training data set is first generated through electromagnetic simulation. These training data are used to iteratively adjust the weights of neurons until the network can accurately capture the data distribution in the training set. During the initialization of the ANN, the weights of all neurons are randomly assigned. For the network to learn the input–output relationships correctly, the network weights need to be adjusted iteratively so that the input–output relationships of the network gradually approach the target values in the training set. The updating of the network weights is done through the backpropagation algorithm,61^,62 which optimizes the network performance by minimizing the loss function. The loss function measures the deviation between the network output and the true value in the training set. ANN can use a variety of loss functions, choosing the most appropriate type based on the task requirements. For example, common loss functions include mean square error loss, cross-entropy loss, and mean absolute error loss. Among them, mean squared error loss is widely recognized as one of the most commonly used loss functions due to its excellent performance in dealing with continuous numerical prediction problems, $L (w_{i}, b_{k}) = \frac{1}{N} \sum_{l = 1}^{N} {[y_{train, l} - y_{ANN} (x_{train, l})]}^{2},$ (1)where $x_{train}$ and $y_{train}$ are $N$ random samples from the training data, $y_{ANN}$ is the prediction result of the network for $x_{train}$ , and $N$ is called the batch size. If $N$ is equal to the size of the training set, the entire training set is used for training at each iteration, a method called batch gradient descent. If $N$ is equal to 1, then one training sample is randomly selected for training in each iteration, which is called stochastic gradient descent. If the value of $N$ is between 1 and the size of the training set, i.e., $N$ is smaller than the size of the training set but larger than 1, then a portion of the training samples are randomly selected at each iteration, which is called small-batch gradient descent. Small-batch gradient descent is the most used training method in practice because it effectively approximates the gradient computed using the entire training set while balancing the computational cost of the training process. Backpropagation is used to compute the adjustment of the network weights so that the gradient $\nabla_{w} L$ of the loss function decreases. Consider a discriminative network consisting of a single neuron [shown in Fig. 3(a)] that is trained using stochastic gradient descent. Starting with the gradient $\nabla_{y} L$ of the loss function with respect to the output $y$ , this gradient is backpropagated to the weights $w$ by the chain rule. To compute $\frac{\partial L}{\partial w}$ , first compute $\frac{\partial L}{\partial a} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial a}$ . It specifies how $a$ should be adjusted in $φ (a)$ to minimize the loss function. $\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial w}$ is then computed, which further indicates how the weights $w$ should be adjusted in $a (w)$ . Backpropagation can be easily generalized to deep networks containing multiple layers of connected neurons, enabling the computation of its gradient $\nabla_{w} L$ for each neuron. Once $\nabla_{w} L$ is computed for all neurons, all weight vectors are updated by gradient descent $w = w - α \nabla_{w} L$ , where $α$ is the learning rate.

Photonic devices modeled by neural networks can be described by two types of labels, as shown in Fig. 3(c). The first type of label describes the physical variables of the device, including geometry, material properties, and electromagnetic excitation source, which are denoted by $x$ . The second type of label describes the physical responses of the device, i.e., a series of outputs related to spectral or performance characteristics, which are denoted by $y$ . In electromagnetism, the physical response is usually a single-valued function of the physical variables, so that a given input is uniquely mapped to an output. For example, a metasurface consisting of nanopillars with a fixed geometry and material configuration will produce a unique transmission spectrum. However, this mapping relationship is irreversible. For most electromagnetics problems, a given physical response usually corresponds to more than one possible input. For example, a transmission spectrum can correspond to different arrangements of nanopillars that can form different metasurfaces. To solve this problem, forward and inverse neural networks are usually used.

Forward networks model the direct mapping from input to output, offering significant benefits such as high computational efficiency and accuracy in predicting physical responses. They are particularly well suited for deterministic problems where a single-valued relationship exists between the input and output. In addition, forward networks can serve as surrogate solvers to replace computationally expensive simulations, significantly reducing the time required for tasks such as device optimization or design validation. However, their primary limitation lies in their inability to handle inverse problems, where multiple possible inputs may correspond to a given output. This limits their effectiveness in applications that require exploring the entire solution space or identifying all possible inputs for a given output. Conversely, inverse networks attempt to solve the inverse mapping problem from output to input, where the challenge lies in capturing the inherent one-to-many ambiguities. A major challenge for discriminative networks in solving such inverse problems is that they tend to learn direct input–output mappings, which are difficult to cope with the multivariability and ambiguity in inverse problems. Due to the lack of generative mechanisms, discriminative networks usually struggle to capture the complex multimodal relationships between inputs and outputs and thus perform poorly in solving inverse problems. The limitations are mainly reflected in the optimization objective: discriminative networks usually aim at minimizing the mean square error, and this approach tends to produce a fuzzy average solution while ignoring the possibility of diversity in the solution space.

By contrast, tandem networks, which integrate forward and inverse networks in a single framework, effectively address one-to-many ambiguities,63 as shown in Fig. 4(a). Specifically, a tandem network first trains a forward model, ensuring the physical consistency of predictions. On top of this forward model, an inverse model is trained to explore the solution space, leveraging the forward model to constrain and validate the inverse predictions. This method provides more stable and diversified solutions for inverse problems, reducing ambiguity in the solution space. Such an approach is particularly suitable for complex photonic device design tasks with multiple solutions. Therefore, depending on the type of device labeling handled by the neural network, different classes of architectures—forward, inverse, or tandem networks—should be considered to adapt to the characteristics of the forward problem from input to output and the inverse problem from output to input.

As for the design and evaluation of nanophotonic devices, predicting the optical response usually involves solving Maxwell’s equations numerically, and the process is often time-consuming. However, a trained ANN can rapidly predict the optical response through forward propagation (FP). Moreover, a trained neural network can efficiently aid in the design of nanophotonic devices. In recent years, the application of neural networks in photonics has made great progress, benefiting from the development of deep learning itself and open-source software libraries such as TensorFlow.9^,64

Subsequently, we introduce the applications of deep-learning methods for nanophotonic devices,14^,63^,65^–128 such as metamaterials/metasurfaces,63^,65^,80^–82^,93^–113^,115^–120 nanoparticles,67^,79^,114 photonic crystals,66^,77^,121^–123 grating couplers,72^,128 power splitters,73^,124^–126 microwave cloaks or antennae,68^,78 silicon color design,71^,127 optical storage,69 optical switches,75 soliton microcombs,70 and plasmonic nanodimers,74 as shown in Fig. 5. In addition, deep learning has proven effective in optimizing focused ion beam nanofabrication processes129 and enhancing the performance of coherent beam combining.130

Figure 4.Different neural network architectures. (a) Tandem networks: these consist of several modules connected in series, and the different modules are connected to each other through an intermediate layer to form an overall network structure. (b) CNNs: these consist of multiple convolutional, pooling, and fully connected layers. The convolutional layer extracts the local features of the image, the pooling layer is used to reduce the dimensionality and enhance the generalization ability of the model, and the fully connected layer maps the extracted features to the output of the final task. (c) GANs: these consist of a generator and a discriminator. Generator to generate fake data, discriminator to determine the authenticity of the data; the two, through the confrontation training, are constantly optimized, and finally, the generator can generate samples that are very similar to the real data. (d) Variational autoencoders: these consist of an encoder, which maps the input data to a probability distribution in the latent space, and a decoder, which reconstructs the data from the samples in the latent space. (e) Physics-informed neural networks: PINNs fit input–output relationships through neural networks while embedding physical equations (e.g., partial differential equations, initial and boundary conditions) as constraint terms in the loss function. During the training process, the network uses physical constraints to guide learning, realizing the integration of data-driven and physical models.

Download full size

View all figures

We present several typical examples to introduce modeling architectures for ANNs. Malkiel et al.113 trained and tested a two-way DNN comprising two interconnected networks: one for geometric prediction and the other for spectral prediction. Once trained, the inverse network can be queried to derive the geometry of nanostructures based on measured or predicted transmission spectra. The resulting geometry can then be used as input to a trained direct network, which in turn computes the predicted transmission spectrum. In most cases, neural networks with more layers tend to show better performance. However, fully connected deep neural networks (FCDNNs) often face the gradient vanishing problem, so increasing the depth of FCDNN does not necessarily improve the performance. Tang et al.124 solved this problem using a residual DNN, which extends the training depth of the forward and inverse problems to eight hidden layers.

CNNs, as a foundational architecture in deep learning, have revolutionized fields such as image recognition, object detection, and natural language processing.131 CNNs are now increasingly being applied in scientific domains, including the design and optimization of nanophotonic devices, due to their ability to process high-dimensional structured data efficiently. The core concept of CNNs lies in their unique architectural features: convolutional layers, pooling layers, and fully connected layers, as shown in Fig. 4(b). Convolutional layers apply learnable filters or kernels to input data to extract hierarchical features. These filters slide across the input, performing convolution operations that preserve spatial relationships and focus on localized patterns, such as edges or textures. Pooling layers then down sample the data, reducing its spatial dimensions while retaining the most critical information, which enhances computational efficiency and robustness to variations in the input. Two major advantages of CNNs over FCDNNs are parameter sharing and sparsity of connections. Parameter sharing: unlike FCDNNs, where each connection has a unique weight, CNNs reuse the same convolutional filters across different regions of the input. This dramatically reduces the number of parameters for training, lowering the risk of overfitting and making the model more computationally efficient. The sparsity of connections: in CNNs, neurons in one layer are connected only to a small, localized region of the previous layer called the receptive field, rather than to all neurons. This local connectivity enables CNNs to focus on spatially related features, capturing hierarchical structures within the data. These features are particularly advantageous in the design and optimization of nanophotonic devices, where the data often exhibit spatial dependencies. For example, CNNs can efficiently process and analyze device geometries and material configurations to predict optical responses. Furthermore, the hierarchical feature extraction mechanism enables CNNs to learn multiscale representations, making them robust for tasks involving complex and diverse design parameters. Ma et al.105 proposed a CNN model that contains two bidirectional neural networks for the automatic design and optimization of three-dimensional (3D) chiral metamaterials with strong chiral optical response at specific wavelengths, as shown in Fig. 6(a).

$Application of deep-learning methods. (a) Metamaterials: demonstrate the process of metamaterial image evolution during a certain number of training steps.65" target="_self" style="display: inline;">65 (b) Photonic crystal: mode switching among different bulk modes in a topologically trivial lattice designed by an ANN.66" target="_self" style="display: inline;">66 (c) Nanoparticles: simultaneous inverse design of structural parameters and material information of core-shell nanoparticles from given electric and magnetic dipoles extinction spectra using deep learning.67" target="_self" style="display: inline;">67 (d) Microwave cloak: at 8.2-GHz frequency, the reflection spectrum shows that the spectrum predicted based on ANNs matches well the real spectrum obtained by simulation.68" target="_self" style="display: inline;">68 (e) Optical storage: sketches of different geometric models encoding 2, 3, 4, or 5 bit sequences using ANNs to store the encoded information.69" target="_self" style="display: inline;">69 (f) Soliton microcomb: second-order and higher-order dispersion is obtained from the target microcomb using the Lugiato–Lefever equation and genetic algorithm, and the microcavity geometry is obtained using a pretrained forward DNN coupled with GA.70" target="_self" style="display: inline;">70 (g) Silicon color design: schematic of silicon nanostructures and generated colors.71" target="_self" style="display: inline;">71 (h) Grating coupler: schematic diagram of the grating coupler structure, in which the guided light incident from the left is vertically diffracted by a column with a periodic staggered height of 220 nm and a grating with an L-shaped cross section partially etched to 110 nm.72" target="_self" style="display: inline;">72 (i) Power splitter: forward and inverse modeling of nanophotonic devices using deep-learning networks, which can take the device topology design as input and the spectral response of components as labels and vice versa.73" target="_self" style="display: inline;">73 (j) Plasmonic nanodimers: based on the analysis of Born–Kuhn-type plasma nanodimers, neural networks capable of successfully predicting chiral properties and further inverse design of the plasma structure to achieve the desired circular dichroism were designed.74" target="_self" style="display: inline;">74 (k) Optical switch: all-optical plasma switches use neural networks to predict spectra through hidden layers after inputting geometric details.75" target="_self" style="display: inline;">75$

Figure 5.Application of deep-learning methods. (a) Metamaterials: demonstrate the process of metamaterial image evolution during a certain number of training steps.⁶⁵ (b) Photonic crystal: mode switching among different bulk modes in a topologically trivial lattice designed by an ANN.⁶⁶ (c) Nanoparticles: simultaneous inverse design of structural parameters and material information of core-shell nanoparticles from given electric and magnetic dipoles extinction spectra using deep learning.⁶⁷ (d) Microwave cloak: at 8.2-GHz frequency, the reflection spectrum shows that the spectrum predicted based on ANNs matches well the real spectrum obtained by simulation.⁶⁸ (e) Optical storage: sketches of different geometric models encoding 2, 3, 4, or 5 bit sequences using ANNs to store the encoded information.⁶⁹ (f) Soliton microcomb: second-order and higher-order dispersion is obtained from the target microcomb using the Lugiato–Lefever equation and genetic algorithm, and the microcavity geometry is obtained using a pretrained forward DNN coupled with GA.⁷⁰ (g) Silicon color design: schematic of silicon nanostructures and generated colors.⁷¹ (h) Grating coupler: schematic diagram of the grating coupler structure, in which the guided light incident from the left is vertically diffracted by a column with a periodic staggered height of 220 nm and a grating with an L-shaped cross section partially etched to 110 nm.⁷² (i) Power splitter: forward and inverse modeling of nanophotonic devices using deep-learning networks, which can take the device topology design as input and the spectral response of components as labels and vice versa.⁷³ (j) Plasmonic nanodimers: based on the analysis of Born–Kuhn-type plasma nanodimers, neural networks capable of successfully predicting chiral properties and further inverse design of the plasma structure to achieve the desired circular dichroism were designed.⁷⁴ (k) Optical switch: all-optical plasma switches use neural networks to predict spectra through hidden layers after inputting geometric details.⁷⁵

Download full size

View all figures

Figure 6.Typical examples of nanophotonic devices based on deep-learning methods. (a) 3D chiral metamaterial: schematic of designed 3D chiral metamaterials and their predicted reflection and circular dichroism spectra.¹⁰⁵ (b) Topology-optimized metasurface: schematic diagram of metasurface inverse design based on training of the GAN and topology optimization. The generated devices can be fed back to the neural network for retraining and optimization.⁹⁹ (c) Power splitter: inverse design of power splitter based on GAN combined with simulation neural network and self-attention mechanism.¹²⁵

Download full size

View all figures

Another neural network with rapidly expanding applications is generative adversarial networks (GANs). GANs solve the challenging task of generating new samples that are statistically similar to a given data set. This process is inherently more difficult than discriminative modeling because it involves learning the data distribution rather than simply classifying or regressing data points.132 GANs comprise two components, a generator that creates new data instances, and a discriminator that evaluates whether these instances are real (from the training set) or fake (from the generator), as shown in Fig. 4(c). These networks engage in a zero-sum game; the generator tries to fool the discriminator, whereas the discriminator aims to correctly distinguish real data from generated data.133 Through iterative training, the generator gradually produces increasingly realistic samples. Jiang et al.99 combined GAN with an accompanying optimization framework for generating topologically optimized metasurface inverse designs, as shown in Fig. 6(b). Although GANs are very effective, the following challenges are often encountered. Mode collapse occurs when the generator in a GAN produces a limited set of outputs, failing to capture the diversity present in the training data. Non-convergence occurs when the model fails to stabilize during training, persistently oscillating instead of reaching a steady solution. Vanishing gradients occur when the discriminator becomes too confident in distinguishing real and fake data, causing the gradients used to update the generator to diminish significantly and hindering its learning process. Exploding gradients occur when the gradients become excessively large during training, destabilizing the optimization process and preventing model convergence. To address these limitations, researchers have explored complementary models such as the variational autoencoders (VAEs). VAEs are generative models that encode input data into a continuous latent space, allowing for smooth interpolation and probabilistic sampling, as shown in Fig. 4(d). Unlike GANs, VAEs maximize the likelihood of the data by optimizing both the reconstruction error and a regularization term that enforces a prior distribution on the latent space.134 This dual objective makes VAEs well suited for capturing complex data distributions. Ma et al.65 proposed the use of a VAE as a probabilistic model to design devices, and by employing a semisupervised learning strategy, the method effectively improves the performance of the model. In addition, GANs can be used in conjunction with other generative models to enhance the stability of the network. In the application of power splitter design, Xu et al.125 proposed an inverse design method that combines GANs, a simulation neural network, and a self-attention mechanism. The simulation neural network improves design accuracy through spectral comparison, whereas the self-attention mechanism extracts detailed features of the spectrum by exploring its global interconnections, as shown in Fig. 6(c).125

The importance of GANs and VAEs becomes particularly evident when addressing the curse of dimensionality. This phenomenon occurs in high-dimensional spaces, where data sparsity increases exponentially, making it challenging to analyze or model effectively. Traditional methods often fail to capture the underlying structure of such data due to their inability to handle these sparsity issues and the complex relationships between features. By leveraging adversarial training, GANs can model complex, high-dimensional data distributions effectively. The generator in GANs learns to produce data samples that mimic the real data distribution, whereas the discriminator guides this process by distinguishing between real and generated samples. This iterative approach allows GANs to learn detailed and realistic representations, which are invaluable for generating high-dimensional data such as images, videos, or even 3D models. VAEs approach the problem differently by learning a probabilistic latent space representation of the data. This latent space effectively reduces dimensionality while preserving the essential features of the high-dimensional data. VAEs use a combination of encoder–decoder architecture and a probabilistic framework to ensure that the generated samples are both realistic and diverse. This makes VAEs particularly useful for applications requiring smooth interpolation in high-dimensional spaces or probabilistic modeling of the data. By utilizing GANs and VAEs, we address the limitations imposed by the curse of dimensionality, enabling effective modeling, generation, and interpretation of high-dimensional data. This capability is indispensable for various fields, including image synthesis and medical data modeling.

However, GANs and VAEs rely on data-driven approaches and may struggle in situations where data are scarce or when physical laws need to be explicitly incorporated into the modeling process. For problems governed by well-defined physical principles, physics-informed neural networks (PINNs) provide a complementary approach. PINNs are a class of neural networks that incorporate physical laws, expressed as partial differential equations (PDEs), into their architecture and loss functions, as shown in Fig. 4(e). Unlike traditional neural networks, which rely purely on data, PINNs leverage governing equations to guide the training process.135 This enables PINNs to solve forward and inverse problems for PDEs with high accuracy, even in scenarios where data are scarce or noisy.94^,110^,115^,135 A key feature of PINNs is their ability to embed physical constraints directly into the loss function. For example, a PINN solving a PDE will include terms in its loss function that penalize deviations from the equations of motion, boundary conditions, and initial conditions. This ensures that the network outputs solutions consistent with the underlying physics. By incorporating physics-based priors, PINNs can often generalize better than purely data-driven models, especially in high-dimensional and sparse data regimes. Chen et al.94 solved inverse scattering problems in metamaterials using PINNs, as shown in Figs 7(a) and 7(b). Riganti and Negro115 developed and employed auxiliary PINNs to solve forward, inverse, and coupled integrodifferential problems of radiative transfer theory, as shown in Fig. 7(c). Chen and Dal Negro110 used the finite-element method (FEM) combined with PINNs to realize the complex electric field reconstruction and the inversion of the complex dielectric function. Figure 7(d) shows the real part of the contour used in the FEM simulation, the real and imaginary parts of the complex electric field $E_{z}$ (as the training data of the PINN, the blank area indicates that the internal data of the cylinder are excluded), the inversion results of the complex dielectric function with the number of iterations, and the real and imaginary parts of the complex electric field $E_{z}$ reconstructed by the PINN after 104 iterations. These results validate the effectiveness of PINNs in complex electric field reconstruction and complex dielectric function inversion. Despite their strengths, PINNs also face challenges. One common issue is the optimization difficulty due to competing loss terms (e.g., data-driven loss terms and physical constraint loss terms), which can lead to imbalance during training. Moreover, solving stiff equations or handling complex boundary conditions can result in poor convergence or suboptimal solutions. In addition, computational cost can be high as PINNs often require solving expensive integrals or employing fine-grained sampling in the solution domain.

Figure 7.Applications of PINN in nanophotonics. (a) Schematic of a PINN for solving inverse problems in photonics based on partial differential equations.⁹⁴ (b) PINN reconstruction of the dielectric constant profile from a data set of known scattered field profiles.⁹⁴ (c) Schematic of the auxiliary PINNs solution to the radiative transfer theory problem.¹¹⁵ (d) Contours for finite-element method forward scattering simulations, inversion results for the complex dielectric function, real and imaginary parts of the complex electric field $E_{z}$ , and the complex electric field $E_{z}$ reconstructed from PINNs.¹¹⁰

Download full size

View all figures

Deep-learning methods exhibit dramatic advantages over traditional algorithms. First, once a deep-learning model is trained, it typically runs in far less time than traditional algorithms and has more potential to find better local optimal solutions. Second, deep-learning methods allow for easier implementation of inverse design than traditional optimization algorithms. Third, deep learning provides a series of diverse neural network structures with great flexibility, and we can choose the appropriate neural network to optimize the design according to the needs of the designed devices. Many of the problems that arise during training can also be solved by properly adjusting the hyperparameters or network structure. In addition, as one of the frontier technologies in the field of computational science, deep learning still has huge room for applications.

2.1.2 Gradient-based inverse design

Gradient-based inverse design has been in the spotlight since the late 1990s. Initially, Dobson and Cox136 pioneered the applications of gradient optimization algorithms to the bandgap optimization problem for photonic crystals. At the beginning of the 21st century, Sigmund et al. used topology algorithms to optimize the design of many photonic devices.137^,138 Subsequently, Vučković et al.139^,140 and Huan et al.141 proposed the objective first method139^,140 and steepest descent,141 respectively, to further optimize the gradient optimization algorithm. In recent years, research on gradient optimization algorithms has experienced rapid growth,142^–160^,161 demonstrating the intense scientific interest and application potential of the field.

In the inverse design problem of optical devices, gradient-based algorithms specify the function of the device by quantifying the efficiency of mode conversion between a predetermined set of input and output modes. These input and output modes are set by the user and remain constant during the optimization process. The input modes $i = 1, 2, 3, \dots, M$ , at frequencies $ω_{i}$ , can be described by the equivalent current density distribution $J_{i}$ , and the electromagnetic field $E_{i}$ generated by each input mode $i$ satisfies Maxwell’s system of equations $\nabla \times μ_{0}^{- 1} \nabla \times E_{i} - ω_{i}^{2} ε E_{i} = - i ω_{i} J_{i}$ , where $ε$ denotes the distribution of dielectric constants and $μ_{0}$ represents the permeability of free space. For each input mode $i$ , specify a set of output modes $j = 1, 2, 3, \dots, N_{i}$ whose magnitude is restricted between $α_{i j}$ and $β_{i j}$ . If the output modes are waveguide-guided modes with modal electric field $ε_{i j}$ and magnetic field $H_{i j}$ , the orthogonal nature of the modes can be used to construct the correlation constraints $α_{i j} \leq | \iint (E_{i} \times H_{i j} + ε_{i j} \times H_{i}) n d r_{⊥} | \leq β_{i j}$ . Here, $n$ is a unit vector pointing in the direction of propagation, and $r_{⊥}$ denotes a coordinate perpendicular to the direction of propagation. According to $\nabla \times E_{i} = - i ω μ_{0} H_{i}$ , it can obtain $α_{i j} \leq | \iint (E_{i} \times H_{i j} + ε_{i j} \times \frac{i}{ω μ_{0}} \nabla \times E_{i}) n d r_{⊥} | \leq β_{i j}$ .

More generally, the output mode amplitude can be specified by putting a linear function $L_{i j}$ of the electric field $E_{i}$ of the input mode $i$ , $α_{i j} \leq | L_{i j} (E_{i}) | \leq β_{i j}$ , where $V = {E : R^{3} \to C^{3}}$ is the set space of all possible electric field distributions, and $L_{i j}$ : $V \to C$ is a linear function mapping the electric field distribution to a complex scalar. Here, $R$ is the set of real numbers, $C$ is the set of complex numbers, and $V$ is the spatial set of electric-field distributions. The gradient-based inverse design algorithm constructs the desired optics by numerically solving the system of Maxwell’s equations and combining it with numerical optimization techniques.

The gradient-based inverse design algorithm includes two main methods, the objective first method139^,140 and the steepest descent method;141 the flow chart is shown in Fig. 8(a). When using the objective first method, the electric field $E_{i}$ satisfies the performance constraints presented $L_{i j} (E_{i})$ . Subsequently, the alternating-direction method of the multiplier optimization algorithm is applied to minimize the physics violation. As for the steepest-descent method, the researchers further improved this algorithm and introduced the adjoint method. The method can be efficiently used to calculate the gradients by implementing electromagnetic simulations with a single-time reversal. In the process of optimizing the parameters of a system, we usually base ourselves on the known laws of physics, which are mostly embodied in the form of PDEs. Such optimization problems, defined as optimization problems under PDE constraints, have a wide range of application scenarios.162^,163 The adjoint method provides an effective means for solving such optimization problems.164

Figure 8.(a) Flow chart of the gradient-based inverse design algorithm. (b) Flow chart of the adjoint method.

Download full size

View all figures

The adjoint method has become a well-established technique for optimizing parameters and solving practical problems, with wide-ranging applications. In addition, the time-dependent adjoint methods are equally applicable for solving optimal control problems. In the face of a control complicated problem with state transitions that it is difficult to find a closed solution, the gradient descent method would be a good choice given the constraints. On the other hand, when dealing with optimal control problems, randomness in the system transfer process needs to be considered, and thus, the adjoint methods can deal effectively with this type of randomness. Letting $t$ denote time, we consider optimization over the time $0 \leq t \leq T$ .

Suppose we have a system $g [x (0), p] = 0, h (x, \dot{x}, p, t) = 0$ . Determining the initial state $x (0)$ and the subsequent evolution of $x$ follows the ordinary differential equation (ODE).

Loss function: $F (x, p) = {min}_{p} \int_{0}^{T} f (x, p, t) d t$ . This is an integral of time. Similarly, define the Lagrange function that $L = \int_{0}^{T} [f (x, p, t) + λ^{T} h (x, \dot{x}, p, t)] d t + μ^{T} g [x (0), p]$ . After derivation, it is obtained that $d_{p} L = \int_{0}^{T} [\partial_{x} f d_{p} x + \partial_{p} f + λ^{T} (\partial_{x} h d_{p} x + \partial_{x} h d_{p} \dot{x} + \partial_{p} h)] d t + μ^{T} [\partial_{x (0)} g d_{x (0)} x (0) + \partial_{p} g]$ . Then, the $d_{p} \dot{x}$ and $d_{p} L$ are simplified. The result is $d_{p} L = \int_{0}^{T} [\partial_{x} f + λ^{T} \partial_{x} h - {\dot{λ}}^{T} \partial_{\dot{x}} h - λ^{τ} d_{t} (\partial_{\dot{x}} h)] d_{p} x d t + \int_{0}^{T} (\partial_{p} f + λ^{T} \partial_{p} h) d t + λ^{T} \partial_{\dot{x}} h d_{p} x |_{T} + (μ^{T} \partial_{x (0)} g - λ^{T} \partial_{\dot{x}} h d_{p} x) |_{0} d_{p} x (0) + μ^{T} \partial_{p} g$ . The multipliers $x$ and $y$ can be chosen so that both terms in brackets are equal to 0. A single gradient descent is completed in only three steps throughout the optimization process. (1) For the current $p$ , compute $x (0)$ from $g [x (0), p] = 0$ , and by $h (x, \dot{x}, p, t) = 0$ solving for all $x (t)$ . (2) Write the ODE satisfied by the multiplier and solve $λ (t)$ and $μ (t)$ inversely using the condition that the two bracketed terms equal 0. (Note that for ODE, the solution is performed along the time inverse.) (3) Calculate the gradient $d_{p} F = \int_{0}^{T} (\partial_{p} f + λ^{T} \partial_{p} h) d t + μ^{T} \partial_{p} g$ , where the multiplier has been calculated, and then, the gradient goes down.

Thus, in each step of the gradient descent, only a finite number of simulations need to be performed,161 and a set of ODEs is solved, which dramatically reduces the computational efforts; the flow chart is shown in Fig. 8(b). There are two main areas of application of the adjoint method to optimization problems with constraints.164 First, for systems with unknown parameters, the system parameters can be estimated after collecting input and output data. The loss here is the deviation between the output of the system and the actual measured output. Second, in designing a system with a specific function, the loss reflects the degree of deviation of the system output from the objective function, followed by the optimization of the system parameters to complete the inverse design.

In the improved gradient-based inverse algorithm, the adjoint algorithm is used to calculate the gradient and optimize the parameters and structure.165^–167 With the development of AI and information technology, more and more types of nanophotonic devices have recently been designed by the gradient-based inverse algorithm.140^,142^–160 Examples include spatial mode multiplexers,140^,142^–146 wavelength demultiplexers,143^,144^,148^–150 power splitters,142^–144^,147 Fano resonators,151 grating couplers,152^–155 diamond devices,156 SPIN software,157 and metalenses,158^–160 as shown in Fig. 9.

Figure 9.Nanophotonic device by gradient-based inverse design. (a) Spatial mode multiplexer: optimal design patterns and simulated field ( $E_{y}$ ) evolution for spatial pattern multiplexer.¹⁴⁵ (b) Power splitter: scanning electron microscopy (SEM) image of the fabricated broadband $1 \times 3$ power splitter and the electromagnetic energy density in the device at 1550 nm.¹⁴³ (c) Wavelength demultiplexer: simulated electromagnetic energy density of a three-channel wavelength multiplexer at three operating wavelengths.¹⁴⁹ (d) Grating couplers based on diamond design: inverse-designed vertical coupler with analog field superimposed in red.¹⁵⁶ (e) Fano resonators: SEM image of a cascaded Fano–Lorentzian resonator. The enlarged image shows the reflector designed in inverse direction on the silicon waveguide in the resonator-waveguide coupling region.¹⁵¹ (f) Grating couplers: the electric field in the structure of the grating coupler with a target bandwidth of 120 nm is simulated at 1550 nm.¹⁵⁴ (g) SPIN software optimization process: (1) continuous optimization; (2) discretization; (3) discrete optimization. Fabrication constraints are enforced at this time.¹⁵⁷ (h) Metalens: the metalenses are illuminated by normally incident $x$ -polarized plane waves. The incident field outside the aperture of the metalens is blocked by a layer of perfect electrical conductors.¹⁵⁹

Download full size

View all figures

Although gradient-based optimization methods are effective when the design space is relatively smooth and microscopic, topological optimization demonstrates its unique advantages when faced with optimization problems that require the exploration of complex structural shapes or material distributions. Topology optimization is a mathematical method that optimizes the material distribution in each region for a given load, constraints, and performance metrics, thereby effectively improving the overall performance of the design. It has become a frontier and a research direction with wide design freedom in the field of structural optimization. The concept was first proposed by Bendsøe and Kikuchi in 1988 to solve the problem of how to distribute materials in each region to achieve optimal structural performance.168 The method has since been further expanded to applications in a variety of fields such as mechanical engineering.169 Sigmund introduced topology optimization to optics and designed and optimized a number of photonic devices.137^,138^,170^–176 With the development of design methods in nanophotonics, different types of topology optimization have been introduced into the field, including the variable density method,177 the level set method,178 and bidirectional evolutionary structure optimization;179 the flow chart is shown in Fig. 10. In recent years, topology optimization algorithms have been applied to optimize the design of a variety of nanophotonic devices, metasurfaces,180^–182 metalenses,176^,183^,184 nanoparticles,185 topological phases,186^,187 mode converters,188 metal reflectors,176 waveguide crossings,145 quantum logic gates,189 and dispersive materials.190

Figure 10.(a) Flow chart of the variable density method. (b) Flow chart of the level set method. (c) Flow chart of the bidirectional evolutionary structure optimization.

Download full size

View all figures

For the gradient-based inverse algorithm, we present some examples for designing and optimizing photonic devices. The spatial mode multiplexer, which can separate the fundamental ${TE}_{00}$ and second-order ${TE}_{10}$ modes of the 750-nm-wide multimode input waveguide and routes them to a separate 400-nm-wide single-mode output waveguide, is shown in Fig. 11(a).144 The device has a compact size with a footprint of only $3.55 μ m \times 2.55 μ m$ , as shown in Figs. 11(b) and 11(c). To test the spatial mode multiplexer, two multiplexers were placed back-to-back and connected by an $80 - μ m$ section of a multimode waveguide. The measurements of its $S$ parameters are shown in Fig. 11(d). Next, the three-channel wavelength demultiplexer is designed to separate specific wavelengths of 1500, 1540, and 1580 nm, as shown in Fig. 11(e).144 It is compact and has a footprint of only $5.5 μ m \times 4.5 μ m$ [as shown in Figs. 11(f) and 11(g)]. The measured $S$ parameters are shown in Fig. 11(h). Insertion loss of the three output channels is 3.0 dB at 1500 nm, 3.1 dB at 1540 nm, and only 1.2 dB at 1580 nm. The channels achieve 8.3, 12.6, and 12.3 dB, respectively. The third example is the broadband three-way power splitter, which distributes the power from the input waveguide equally to the three output waveguides [Fig. 11(i)].144 The final design has a footprint of $3.8 μ m \times 2.5 μ m$ , as shown in Figs. 11(j) and 11(k). The measured $S$ parameters of the three-way power splitter are shown in Fig. 11(l). Over the entire operating bandwidth, the splitter has an insertion loss of 0.4 dB and a power imbalance of 4.4%. Very recently, a multifunctional integrated photonic platform with an ultracompact footprint has been realized based on inverse design.147 The inverse design algorithm is developed by combining the adjoint gradient algorithm and geometric restriction algorithm. Based on inverse design, a high-performance coupler with a footprint of only $4 μ m$ by $2 μ m$ is realized, as shown in Fig. 11(m). The multifunctional photonic platform includes 86 inverse designed-fixed couplers and 91 phase shifters, shown in Figs. 11(n)–11(p). The footprint of the whole photonic platform is only $3 mm \times 0.2 mm$ , which is reduced by 1 order of magnitude compared with traditional previous integrated photonic platforms. The integrated photonic platform can be used to perform quantum simulations and machine-learning tasks, exhibiting excellent scalability.

Figure 11.Nanophotonic devices by the gradient-based inverse design. (a) Spatial mode multiplexer.¹⁴⁴ (b) Inverse design results (silicon regions are shown in black and silica regions in white). (c) Optical microscope image of the final fabricated device. (d) Experimentally measured $S$ parameters of the back-to-back test structure. (Shaded areas indicate the minimum and maximum values from three different measured devices from three dies, and solid lines indicate the average values.) (e) Three-channel wavelength demultiplexer.¹⁴⁴ (f) Inverse design results. (g) Optical microscope image of the final fabricated device. (h) Experimentally measured $S$ parameters. (i) Three-way power splitter.¹⁴⁴ (j) Inverse design results. (k) Optical microscope image of the final fabricated device. (l) Experimentally measured $S$ parameters (dashed line indicates perfect 1/3 beam splitting ratios). (m) SEM image of the inverse designed-fixed coupler.¹⁴⁷ (n) Schematic diagram of the computing platform, consisting of input generator, photonic processor, and complex output.¹⁴⁷ (o) Optical microscope image of photonic platform.¹⁴⁷ (p) Photograph of the photonic platform and wire bonding [the red square marks one platform detail in panel (o)].¹⁴⁷

Download full size

View all figures

The gradient-based inverse design algorithm can automate the design of photonic devices, which helps users to be free from tedious design work. Based on this algorithm, the user only needs to provide high-level parameter guidance. The algorithm has an extensive parameter search space that provides a full set of parameters for constructing manufacturable devices, and it is suitable for designing any type of passive and linear photonic components.

2.1.3 Global optimization

In photonics, due to the fluctuating nature of the problem, many local minima are usually generated, which makes global optimization algorithms particularly critical. Global optimization algorithms refer to computational methods that can find a globally optimal solution within a given search range or at least obtain a solution close to the global optimum for a given problem.191 In recent years, a variety of global optimization techniques have been applied to the inverse design of photonic devices, including the genetic algorithm (GA),192^–198 particle swarm optimization (PSO),199^–204 the ant colony algorithm,205^–207 the artificial bee colony,208^,209 the bat algorithm,210^–212 cuckoo searches,213^,214 the differential evolution algorithm,215^–217 covariance matrix adaptation evolutionary strategy,218^–220 and Bayesian optimization.221^–224 These methods have demonstrated their effectiveness and value in the inverse design of photonic devices.

We give a few examples to introduce global optimization algorithms. As one of the most classical algorithms in the field of optimization, the GA draws on the mechanism of inheritance and evolution in nature to obtain optimal solutions.225 GA was originally proposed by Hollan in the 1960s226 and was further developed by Hollan.227 The fundamental process of the GA can be broken down into the following steps. (1) Initialization: A random population of solutions is generated, with each solution represented as a chromosome. The genes on the chromosome correspond to potential solutions to the problem. (2) Fitness evaluation: A fitness value is calculated for individuals in the population, reflecting their relative strengths and weaknesses in solving the problem. The design of the fitness function is important for the performance of the algorithm. (3) Selection: A probability-based selection mechanism that prioritizes individuals with high fitness values for inclusion in the subsequent generation. (4) Crossover: A genetic crossover operation combines the characteristics of selected individuals to produce new individuals, thereby simulating the genetic recombination that occurs in biological reproduction. (5) Mutation: Genetic mutation is employed on specific individuals with the objective of introducing novel combinations of genes and enhancing the diversity of the population. (6) The generation of a new population is initiated. Based on the results of the processes, a new generation of population is formed. (7) Iteration: The steps are repeated until the termination conditions are met, such as reaching a predetermined number of iterations or finding a satisfactory solution. The basic flow of a GA is shown in Fig. 12(a). The performance and efficiency of a GA depend heavily on the chromosome representation, fitness function, and the design of selection, crossover, and mutation strategies. After two decades of development, GA has been widely used in scientific research and engineering problems. Especially in recent years, with the emergence of intelligently designed nanophotonic devices by inverse design algorithms, these algorithms have also been successfully introduced into the inverse design process of nanophotonic devices, as shown in Fig. 13, demonstrating typical application examples.192^–198

Figure 12.(a) Flow chart of the GA. (b) Coding method.

Download full size

View all figures

Figure 13.Nanophotonic device designed based on GA. (a) Polarization route: SEM image of a $970 nm \times 1240 nm$ polarization router.¹⁹⁷ (b) Metasurface absorber: schematic of the optimized binary pattern $A_{0}$ in the crystal cell and SEM image of the pattern $A_{0}$ array.¹⁹³ (c) Chiral plasmonic metasurface: top view of design pattern A and SEM image of chiral metasurface.¹⁹⁴ (d) Broadband absorption optimization: structural schematics and absorption spectra of the different generations.¹⁹⁵ (e) Metasurface design: different combinations of coefficients on the pattern of light produced.¹⁹⁶ (f) Optical frequency microcombs: SEM image of the photonic-crystal resonators. The inset on the right highlights a section of the chirped corrugation.¹⁹⁸

Download full size

View all figures

In GA, chromosome representation is a critical step that defines how the features or parameters of the optimized region are encoded for computational processing. This representation directly impacts the efficiency and success of the optimization process. Specifically, in the inverse design of optical devices, the chromosome representation can take many forms depending on the nature of the problem. First is spatially distributed representations, where the optimized area is discretized into cells such as pixels, grids, or blocks, each of which is encoded in a specific format. For example, a binary code can represent two materials, e.g., 0 for material A and 1 for material B, whereas a multilevel code can be used for more complex systems, e.g., 0 for air, 1 for silicon, and 2 for silicon nitride. The second one is geometric parameter representation, where the shape of the optimized region can be defined by geometric parameters (e.g., width, height, and spacing), which can be directly assigned as gene values within the chromosome. Third is continuous variable representation, where for continuous optimization problems, genes can encode continuous design variables such as refractive index, material thickness, or other physical properties. In the inverse design process of GA, these encoding methods are integrated as part of the core steps. These steps include the selection of a suitable chromosome representation, which structures the design variables influencing device performance, the definition, and tuning of a fitness function to evaluate these designs, and the iterative application of genetic operators such as selection, crossover, and mutation. The iterative process continues until convergence criteria are met, resulting in the optimal design.

When building the model, the chromosome is digitized, adopting “0” and “1” binary numbers; “0” represents the pixel point of the air column, and “1” represents the pixel point of silicon. Such encoding allows one chromosome to represent the state of all pixel points in the simulated region, and each gene represents the state of one pixel point, as shown in Fig. 12(b). In GA, the fitness function is chosen to evaluate the performance of the chromosome. Strong adaptability means a greater possibility of survival, and the fitness function is set based on evaluation metrics in different scenarios. In nanophotonic device design, it is set as the optimal function to measure the performance of the device.

In GA, the genetic operator consists of three operations: selection, crossover, and mutation. The selection operation aims at selecting the good performers from the current population into the next generation and eliminating the poor performers based on the fitness of the individuals. Commonly used selection methods include roulette-wheel selection, tournament selection, and elite selection. The key to the crossover operator is to generate new individuals by exchanging or recombining some part of the genes of the two parents, thus introducing a new solution space for the GA and enhancing the search capability. The crossover process includes determining crossover points and selecting gene exchange strategies such as single-point crossover, multipoint crossover, and uniform crossover. For example, in a single-point crossover, chromosome segments located on or before or after the crossover point can be exchanged after randomly selecting the crossover point, as shown in Fig. 14(a). The variation operator explores more optimal solutions by changing the values of genes at certain locations on the chromosome. This involves randomly selecting individuals and performing gene-taking operations based on the probability of a particular mutation, resulting in a new individual after the mutation, as shown in Fig. 14(b).

Figure 14.(a) Crossover operator and (b) variation operator.

Download full size

View all figures

In the application of GAs for device inverse design, it is impossible to perform iterations indefinitely. Thus, it is necessary to set clear termination conditions to ensure that the algorithm can be terminated at the appropriate time and that the results obtained match the expectations. There are two main considerations for the termination conditions of an iteration, i.e., the number of iterations and the fitness level. There is no fixed optimal value for the number of iterations; setting too many iterations may lead to computational load, whereas setting few iterations may fail to find the global optimal solution. Therefore, the choice of the number of iterations needs to be adjusted according to the specific nature of the problem. Another important measurement is fitness. When the fitness of an individual reaches the preset target value, the solution can be regarded as the optimal solution, and the optimization can be terminated. By setting the number of iterations and the target level of fitness, it will be clear when the iteration process should stop.

GAs are well suited for solving complex optimization problems due to their unique adaptability. Such problems often involve the simultaneous optimization of multiple system parameters, and sometimes even lack a clearly defined or unique optimal solution. Not only can it effectively deal with single-objective optimization problems, but GAs can possess a natural advantage for more complex multiobjective optimization problems. Through selection, mutation, and crossover operators within the algorithm, they can rapidly find an optimal solution to the problem.

Another algorithm is PSO. PSO, first proposed by Kennedy and Eberhart, is a group intelligence algorithm built according to the social behavior model.228 In PSO, each point in the solution space is regarded as a particle with no volume and no mass; the particles move through the search space according to their own speed while evaluating their performance in the optimization problem with a fitness value. The particles follow the currently known best solution and search in the solution space, whereas the initial state of the algorithm is a randomly placed group of particles. At each iteration, the particles are updated according to two extreme values: one is the optimal solution found by the particle itself, i.e., particle best ( $p$ ), and the other is the global best solution in the current population, i.e., global best ( $g$ ).

We suppose that a population of $m$ particles is traveling at a certain velocity in $D$ -dimensional space, the position of the particles is denoted as $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i j})$ ; the velocity of the particles can be denoted as $v_{i} = (v_{i 1}, v_{i 2}, \dots, v_{i j})$ ; the best position experienced by the particles can be denoted as $p_{i} = (p_{i 1}, p_{i 2}, \dots, p_{i j})$ ; and the best position passed by all the particles within the population is denoted as $p_{g} = (p_{g 1}, p_{g 2}, \dots, p_{g j})$ . The positions and velocities of the particles are updated according to the following equation: $v_{i j}^{k + 1} = v_{i j}^{k} + c_{1} r_{1} (p_{i j}^{k} - x_{i j}^{k}) + c_{2} r_{2} (p_{g j}^{k} - x_{i j}^{k}),$ (2) $x_{i j}^{k + 1} = x_{i j}^{k} + v_{i j}^{k + 1},$ (3)where $i$ represents the $i$ th particle, $j$ represents the $j$ th dimension of velocity or position, $k$ represents the number of iteration generations, $v$ and $x$ are the velocity and position of the particle, respectively, and both parameters are constrained by a specific range. $c_{1}$ and $c_{2}$ are the learning factors. $r_{1}$ and $r_{2}$ are random numbers between 0 and 1. The PSO iterative process is shown in Fig. 15(a).

Figure 15.(a) PSO iteration process. (b) Flow chart of PSO.

Download full size

View all figures

Although PSO involves a small number of parameters, several of the key parameters have a decisive impact on the performance and convergence of the algorithm. Currently, the theoretical study of PSO is still in its infancy, so the parameter settings still need to rely on experience to a large extent.229 The parameters that need to be set include the number of particles, the dimension of the particles, the search range, the maximum velocity $V_{max}$ , the learning factor, and the end condition.

The basic steps of PSO begin with initializing the population, which involves randomly setting the velocity, weight, and position of particles, and determining the population size and number of iterations. Next, the fitness function is utilized to calculate the fitness of each particle, and the best fitness value is assigned to its corresponding local optimal position $p_{i}$ . Subsequently, the best value is selected from all local optimal positions $p_{i}$ of the particles and assigned as the global optimal position $g_{i}$ . Then, the positions and velocities of the particles are updated according to Eqs. (2) and (3), and the range is judged and restricted. Finally, we judge whether the condition is satisfied or not, and if the condition is satisfied, the iteration is terminated; otherwise, we recalculate the adapted value. The PSO flow chart is shown in Fig. 15(b).

PSO has few parameters, simple calculation, strong optimization ability, high accuracy, and fast convergence speed. As an effective optimization design method, particle swarm is widely used in the design of nanophotonic devices,199^–204 as shown in Fig. 16.

Figure 16.Nanophotonic device designed based on PSO. (a) Power splitter: binary particle swarm optimized $2 \times 2$ power splitter.¹⁹⁹ (b) Nanosensor: schematic diagram of a nanosensor consisting of periodic gold nanoridges.²⁰¹ (c) Optical coupler: structure of the proposed multisegment directional coupler.²⁰⁰ (d) Photonic crystal: simulation results of p-polarized incident wave.²⁰⁴ (e) Varifocal lens: schematic of the varifocal lens. The inset shows a single-cell sample.²⁰²

Download full size

View all figures

The GA and PSO are commonly used optimization methods in the inverse design of optical devices. The GA is particularly suitable for solving complex, multipeaked, and nonlinear optimization problems by simulating crossover, mutation, and selection operations in the natural selection process to evolve populations. By contrast, PSO, which simulates the behavior of a flock of birds or a school of fish and updates the position through the experience of the group and individuals, shows faster convergence and simpler implementation and excels especially in continuous optimization problems. Table 1 provides a detailed comparison of the characteristics of GA and PSO, including their search capability, convergence speed, applicability, and usability. The selection of the appropriate algorithm depends on the characteristics of the specific problem, such as the dimensionality of the problem, the presence of multiple peaks, and the presence of complex constraints. To determine the optimal algorithm and parameter configuration, a series of experiments may be required to evaluate the performance of different algorithms on specific problems.

Table 1. Comparison of GA and PSO features.

View table

View all Tables

Table 1. Comparison of GA and PSO features.


Features	Genetic algorithm (GA)	Particle swarm optimization (PSO)
Search capability	Powerful global search capability for high-dimensional multipeak problems. Population diversity through crossover and mutation.	Weaker global search capability, but prone to local optimization in complex problems. Fast convergence, possible premature convergence.
Convergence speed	Convergence is slower, especially in complex problems, requiring more iterations and computational resources.	Convergence is faster, especially for continuous optimization problems.
Computational complexity	High, especially for large populations, with multiple manipulations and fitness assessments per generation.	Low, only the particle fitness needs to be evaluated for each update.
Applicability	For discrete or combinatorial optimization problems, capable of handling nonlinear constraints and multiobjective problems.	Suitable for continuous optimization problems and particularly suited for parameter optimization of optical components.
Usability	The implementation is complex and requires careful tuning of parameters such as population size, crossover, and mutation probabilities.	The implementation is simple with few major parameters such as particle velocity and position update factor.

Compared with the traditional local optimization algorithm, the global optimization algorithm has a greater probability of discovering the global optimal solution of the problem during the solution process. It can explore a broader design space, thus identifying more refined design solutions. A distinctive feature of the global optimization algorithm is its strong adaptability and flexibility, which makes it unrestricted by problem-specific constraints and enables it to be widely used in various types of complex optimization scenarios.

2.1.4 Individually inspired algorithms

Individually inspired algorithms refer to those search processes that start from a single candidate solution and continuously optimize and improve this solution in subsequent iterations. Typical representatives of such algorithms include the simulated annealing algorithm,230^–237 the hill-climbing algorithm,238^–241 and the tabu search.242^–244 Through gradual adjustment and optimization, these algorithms can efficiently approach or reach the optimal solution to the problem.

The simulated annealing algorithm was first proposed by Kirkpatrick et al.235 in 1983, and it has been used mainly to solve discrete optimization problems. The algorithm is inspired by the physical process of metal annealing, in which a solid substance is slowly cooled at high temperatures and gradually forms an ordered crystal structure. The overall flow of the simulated annealing algorithm is shown in Fig. 17(a). The method starts with a specific initial temperature $T_{0}$ and an initial iteration length $N_{0}$ and gradually decreases the temperature as the iterative process proceeds. At each fixed temperature, the algorithm generates a new solution through some perturbation mechanism, such as random fine-tuning or small changes to the current solution. This perturbation can be either a random change in the parameters of the current solution or exploring new candidate solutions by performing a local search on the solution. The algorithm then compares the newly generated solution with the current solution and evaluates its performance based on the given objective function. If the newly generated solution has a better function value, it should always be accepted. If not, according to the Metropolis principle, the worse solution will still be accepted with probability, $P (S_{i}, S_{j}) = {\begin{cases} 1, & f (S_{j}) < f (S_{i}), \\ \exp {[f (S_{i}) - f (S_{j})] / T}, & f (S_{j}) \geq f (S_{i}), \end{cases}$ (4)where $f (S_{i})$ and $f (S_{j})$ represent the values of the current solution and the new solution of the objective function, respectively, while $T$ represents the temperature parameter. The choice of the objective function is crucial because it directly determines whether the algorithm can effectively reach the desired optimization goal. As the temperature $T$ gradually decreases, the algorithm gradually reduces the probability of accepting an inferior solution with the tolerance between the solutions. Due to the simulated annealing algorithm’s simplicity of structure, its exploration of the high-dimensional parameter optimization space may become relatively limited, especially in scenarios where the algorithm randomly selects new solutions from the solution space. The increase in the number of parameters synchronously reduces the search efficiency and the likelihood of obtaining an optimal solution.

Figure 17.(a) Flow chart of the simulated annealing algorithm. Nanophotonic device based on simulated annealing algorithm optimized design. (b) Metasurface: simulated near-electric field distribution under $x$ -polarized normal incidence.²³³ (c) Spin Hall device: schematic of an on-chip broadband photonic spin element, where the incident light is coupled into different waveguides according to its spin states.²³⁴

Download full size

View all figures

The simulated annealing algorithm performs well in a wide range of application scenarios due to its simplicity, which does not require any knowledge of the specific problem and enhances the robustness of the algorithm in the face of random initial guesses. Although the convergence of the algorithm is supported by rigorous mathematical arguments,236 it is not always guaranteed to reach a globally optimal solution. A key parameter of the algorithm is the critical temperature, which represents an equilibrium point at which a new solution to the objective function is likely to be accepted, whereas the algorithm can still explore other possible solutions. Determining the appropriate value of the initial temperature can be challenging when we lack in-depth knowledge about the specific problem. For this reason, Basu and Frazer237 suggested that the initial temperature setting can be determined experimentally.

The simulated annealing algorithm was first introduced to the field of optical inverse design by Hara et al.230 in 1996 to optimize the design of p-type - n-type - p-type - n-type (pnpn) differential optical switches, and significant results have been achieved considering the fabrication errors in the structural parameters. This pnpn structure refers to a structure consisting of alternating layers of p-type and n-type semiconductor materials. In recent years, the simulated annealing algorithm has been applied to optimize the design of various types of nanophotonic devices,231^–234 dramatically improving the device performance, as shown in Figs. 17(b) and 17(c).

The hill-climbing algorithm is a local search algorithm. It has the advantage of reaching the highest point in the solution space without the traversal process. The algorithm greatly optimizes the search process by heuristically selecting candidate nodes with higher values.241 While searching for the optimal solution, the algorithm does not need to record previous paths, thus reducing the storage requirements in a large-scale parameter space. The mountain-climbing algorithm is modeled after the steps of mountain climbing, where an initial position is randomly selected to start climbing; each time it moves in the direction of a higher height until it can no longer ascend [see Fig. 18(a)]. It starts from a node and compares it with the surrounding nodes; if the current node is the highest node, it will finish; if not, it moves to the higher neighboring node and continues the process. The optimal node in each neighboring space is selected as the new current solution until the final optimal solution with excellent performance is determined.

Figure 18.(a) Flow chart of hill-climbing algorithm. Optimized design of nanophotonic device based on hill-climbing algorithm. (b) Graphene metasurfaces: structure of the first optimized metasurface.²³⁸ (c) One-dimensional photonic crystal split-beam nanocavity: schematic diagram of symmetrical cavity design.²³⁹

Download full size

View all figures

The advantage of the hill-climbing algorithm is that it does not require a traversal process to reach the highest point in the solution space, and it greatly improves efficiency by heuristically selecting nodes with higher values. Due to its fundamental and easy-to-use nature, the hill-climbing algorithm has been widely used in the optimization of nanophotonic devices.238^–240 For example, Khajeh et al.238 used the graphene metasurface optimally designed by the hill-climbing algorithm to successfully implement a graphene-based broadband polarization converter, as shown in Fig. 18(b). Lin et al.239 used the algorithm to fine-tune the radii of the three air holes in the center of the beam-splitting cavity of a one-dimensional photonic crystal, which was continuously optimized to improve the quality factor of the resonant modes; the result led to a significant improvement in the quality factor of the second-order TE mode to $1.99 \times 10^{4}$ , as shown in Fig. 18(c).

Direct binary search is an iterative search algorithm that was first used in the synthesis process of digital holograms.245 In binary digital hologram synthesis, the core challenge is to find the binary transmittance function of the hologram. Direct binary search is used to directly manipulate the transmittance of the hologram to produce an optimal reconstruction and to find the binary transmittance function that minimizes the mean-squared error between the reconstructed image and the original object.246 With the development of intelligent optimization algorithms, direct binary search has been used in more application occasions, and some improved direct binary search has been developed. The improved direct binary search operates in an iterative manner. By using this method, the device first needs to be discretized into pixels. Each pixel has two possible pixel states with two different materials, which are denoted by 1 and 0. During each iteration of the direct binary search, the pixel switches between these two states, and the pixel to be perturbed is randomly selected. Next, the FOM or objective function of the device in the precondition is calculated. If the FOM is improved, the perturbation is maintained, the next parameter is perturbed, and the FOM is evaluated. If the FOM is not improved, the perturbation is discarded. At this point, an opposite perturbation can be applied and the FOM is reevaluated. This perturbation loop will not stop until all parameters are solved, which completes one iteration of the direct binary search. Such iterations continue until the FOM converges to a stable value. Defining an upper limit on the total number of iterations and a minimum change in the FOM is used to force numerical convergence;247^–251 the direct binary search flow chart is shown in Fig. 19.

Figure 19.Direct binary search flow chart.

Download full size

View all figures

From the above description, the direct binary search happens to be a special case of the hill-climbing algorithm whose solution space consists of binary variables. In both algorithms, improved solutions are accepted, and worse solutions are rejected. Neighbor nodes in the binary integer solution space can be defined as direct binary search. By simply flipping each pixel in order, it is possible to traverse within all neighbor nodes.

Direct binary search is a simple iterative algorithm for nanophotonic device design. The discrete structures generated by direct binary search are more conducive to being fabricated using conventional fabrication techniques such as focused ion beam milling or electron beam lithography. Currently, direct binary search has been widely used for the design optimization of nanophotonic devices such as mode converters,252 power splitters,252^–255 and polarization splitter-rotators,256 as shown in Fig. 20.

Figure 20.Optimized design of nanophotonic devices based on direct binary search. (a) Mode converter: optimized layout of ${TE}_{1} - {TE}_{0}$ mode converters and optimized optical field distribution for mode-order converter.²⁵² (b) Power splitter: SEM image of the entire manufacturing facility consisting of a dual-mode 3 dB power divider and three mode multiplexers.²⁵⁵ (c) Polarization splitter-rotator: ${TM}_{0} - {TE}_{0}$ mode simulated light field, ${TM}_{0} - {TE}_{0}$ mode cross-sectional light field at input and output ports.²⁵⁶

Download full size

View all figures

The direct binary search offers notable advantages in photonic device inverse design. Its time complexity is $O (\log N)$ , making it significantly more efficient than linear search $[O (N)]$ when handling large-scale data or parameter spaces. By progressively narrowing the search range, it can approximate optimal solutions with fewer steps, improving accuracy. The method is simple to implement and is well suited for single-dimensional parameter optimization as it avoids reliance on complex mathematical models. However, direct binary search has notable limitations. It is effective only for single-peaked or monotonic functions and may fail to find global optima for complex, nonmonotonic functions. As a one-dimensional search algorithm, it is not inherently applicable to multidimensional problems, and transforming such problems into multiple one-dimensional searches often proves inefficient. In addition, the algorithm’s performance is sensitive to the choice of the initial search range, and inappropriate selections can lead to inefficiency or failure. Although it converges quickly, achieving high accuracy in photonic device design may require additional iterations, increasing runtime. In summary, the direct binary search is efficient and straightforward for suitable scenarios, particularly for single-dimensional and monotonic optimization problems. However, its limitations in multidimensional and complex function optimization often necessitate combining it with other algorithms to enhance performance.

The tabu search, first proposed by Glover, is a metaheuristic stochastic search algorithm that is mainly used to deal with combinatorial optimization problems.253 The algorithm improves on the hill-climbing algorithm by starting with finding the direction that brings the greatest improvement to the objective function from among multiple possible search directions. To prevent the algorithm from searching the same path repeatedly, tabu search employs a memory technique called “tabu list” to record and identify previous search trajectories and guide subsequent search directions. The length of the tabu list can be fixed or adjusted during the iteration process and essentially serves as a window to limit recent search moves to avoid undoing previous moves. During the iteration process of the tabu search, there may be situations where all candidate moves are in the tabu list, or even though the current move is in the tabu state, its release can significantly improve the target value. To break this limitation, certain tabu items may be reconsidered, a strategy called aspiration; the corresponding rule is called aspiration criterion. Although the tabu search has the possibility of falling into a loop, the search process can be terminated by setting a termination criterion (a predetermined number of iterations); the tabu search flow chart is shown in Fig. 21(a).

Figure 21.(a) Tabu search flow chart. Nanophotonic device based on tabu search optimized design. (b) Polarization filters based on photonic lattices: optimized holes-in-slab configuration (57 scatterers).²⁴³ (c) Beam shaping of 2D photonic lattices: photonic lattice used for the beam-shaping problem. The dashed line indicates the plane used to calculate the desired beam.²⁴⁴

Download full size

View all figures

The advantages of the tabu search are its efficient local optimum solution skipping mechanism and fast convergence speed, which enable it to find the optimal solution in a small number of iterations. However, there is a risk of finding a locally optimal solution because the tabu search may not cover the entire parameter space. The selection of the search path depends on the neighborhood structure, i.e., the mapping relationship between the initial solution and its neighborhood, which has a significant impact on the results. In recent years, the tabu search algorithm has been applied to the design optimization of a variety of nanophotonic devices,242^–244 such as Gagnon in the design optimization of polarization filters based on photonic lattices243 and beam shaping of two-dimensional (2D) photonic lattices,244 as shown in Figs. 21(b) and 21(c).

In Sec. 2.1, we have reviewed the history, design principles, and typical applications of typical intelligent algorithms in the design and optimization of photonic devices. Different intelligent algorithms such as deep learning, gradient-based inverse design, global optimization, individual-inspired algorithms, and AI can be combined to enhance the innovation and functionality of photonic devices, thus opening new opportunities for both photonics research and engineering applications. Intelligent algorithms have demonstrated tremendous potential in photonic device design and optimization; however, several challenges have to be faced in solving complicated problems or achieving powerful applications. In the future, the diversification and combination of algorithms, improvement of computational efficiency, enhancement of multiobjective optimization capabilities, resolution of constraint handling issues, and improvement of algorithm interpretability will be key factors in further advancing AI in the field of photonics.

2.2 AI for Optical Imaging

Optical imaging is a pivotal technology, encompassing cameras, telescopes, microscopes, and endoscopes. It extends human visual capabilities, enabling the digitization, recording, and analysis of scenes, and facilitates the visualization of objects that are too far, too small, difficult to access, or invisible to the naked eye. State-of-the-art optical imaging involves a set of methods to obtain images by detecting light across various properties, including intensity, phase polarization, and wavelength. Light intensity detection is foundational, as photoelectric detectors can directly measure it. Image sensors, or photodetector arrays, commonly capture images formed by lenses. Advanced intensity-based imaging techniques, such as single-pixel imaging, lensless imaging, optical tomography, and nonline-of-sight imaging, are computationally reconstructed from indirect measurements.257^,258 Likewise, polarization, phase, and wavelength-sensitive imaging techniques are typically inferred through reconstruction from light-intensity data. Subsequent image processing—denoising, demosaicking, deblurring, resolution enhancement, and style transfer—is often necessary before further analysis. Analytical techniques enable advanced functionalities such as segmentation, tracking, post-capture refocusing, image stitching, odometry, depth estimation, and 3D surface reconstruction. These processes are invaluable in fields ranging from computational optical imaging and computer vision to virtual reality, consumer optics, biomedicine, industrial inspection, remote sensing, target recognition, augmented reality, and astronomy.259^–262

As demands on optical imaging methods and technologies escalate, challenges emerge throughout the pipeline—from image acquisition and reconstruction to processing, analysis, and application. These include the complexity of mathematical modeling, the ill-posed nature of inverse problems, the efficiency of optimization methods, and the robustness required for algorithm adaptation to specific applications.263^–266 AI, epitomized by DNNs, constitutes one of the most dynamically evolving research domains and is also significantly transforming the field of optical imaging. CNN is a deep-learning algorithm specifically designed to process grid-like data, such as images. Unlike traditional neural networks, CNNs automatically learn spatial hierarchies of features, making them ideal for tasks involving images, such as classification, object detection, and segmentation. The process of training a CNN involves feeding labeled images into the network, adjusting the weights based on the error between the predicted and actual outputs, and updating the network using backpropagation and optimization techniques (e.g., stochastic gradient descent). This iterative process allows CNN to learn the best filters and patterns for image recognition. CNNs are excellent for classifying images into categories (e.g., identifying animals or objects) and can not only identify objects in an image but also locate them, which is essential for tasks such as facial recognition or autonomous driving. In medical imaging or satellite imagery, CNNs can be used to segment and identify specific regions, such as tumors in X-rays or identifying land areas in aerial views. Moreover, CNNs can automatically extract features from images, eliminating the need for manual feature engineering. By understanding these core concepts and techniques, readers can effectively use CNNs to address and solve various image-based challenges, ranging from simple classification tasks to complex segmentation and detection problems.267^–269 This section is structured around individual optical properties, detailing AI’s role in augmenting phase, polarization, and spectral imaging, with a focus on the photonic aspects of AI for optical imaging. It excludes general intensity and color imaging, as these areas are predominantly addressed by digital image processing, computer vision, and computer graphics communities.270^–272

2.2.1 Phase imaging

In wave optics, coherent optical wave fields are characterized by a complex amplitude function, where the term “phase” denotes the phase of the wave. Neither the human eye nor conventional imaging devices can discern this phase information due to the rapid oscillation of light waves at frequencies approaching $10^{15} Hz$ , far exceeding the response capabilities of these observers. Phase objects, defined as entities with consistent amplitude transmission but variable spatial refractive index or thickness, necessitate the retrieval of phase information. This is particularly critical in domains such as optical metrology, material science, adaptive optics, X-ray crystallography, electron microscopy, and biomedical imaging, where most specimens are phase objects.273^,274 Consequently, phase-imaging techniques, notably quantitative phase imaging (QPI), have become essential in advancing these disciplines.81^,275^–280 QPI is categorized into interferometric and noninterferometric phase measurements, with the former concentrating on phase unwrapping and the latter on phase retrieval. Historically, these methods have depended on intricate optical configurations and labor-intensive image analysis, hindering their broad application and scalability. Nonetheless, the advent of AI has addressed these impediments by streamlining the image processing workflow and facilitating the extraction of quantitative data with unparalleled speed and precision.

Interferometric phase measurement

Interferometric imaging techniques rely on measuring phase differences between coherent signals to extract valuable information about the underlying scene or object. However, the measured phase is typically wrapped within the range $[- π, π]$ or $[0, 2 π]$ , leading to ambiguities and discontinuities known as phase wraps. Phase unwrapping is the process of reconstructing the unwrapped phase from these wrapped measurements, a critical step in obtaining accurate and meaningful information from interferometric data. Traditional phase-unwrapping methods often struggle with noisy or complex data,281^–283 motivating the exploration of AI-driven approaches to address these challenges.

AI-powered phase unwrapping algorithms leverage machine-learning and deep-learning techniques to overcome the limitations of traditional methods. CNNs, RNNs, and GANs are among the AI architectures commonly employed for phase-unwrapping tasks.284^–286 These algorithms learn complex patterns and relationships from training data, enabling them to effectively unwrap phase maps with high accuracy and robustness, even in the presence of noise, discontinuities, or artifacts. Zhang et al.287 proposed a new approach based on CNN that transforms phase unwrapping into a multiclass classification problem. They introduced an efficient segmentation network to identify classes, as shown in Fig. 22(a), which can be used to identify phase discontinuity locations and improve performance.287 Park et al.288 introduced a new deep-learning model that combines digital holography with a Pix2Pix GAN to automatically reconstruct unwrapped focused-phase images. This model surpasses numerical phase-unwrapping methods by addressing challenges such as abrupt phase changes, enabling faster phase unwrapping rates. The cross section and 3D representation of one cell with wrapped and unwrapped signals are shown in Fig. 22(b). It demonstrates robust performance across various cell images, outperforming recent U-net models. This method holds promise for real-time observation of biological cell morphology and movement.288 Zheng et al.291 demonstrated a method for the diagnosis of lung squamous cell carcinoma by TI-DIC microscope and deep convolutional neural network (DCNN). The DCNN classifier with the optical property maps exhibits high accuracy, significantly outperforming the same DCNN classifier on the DIC images. The label-free quantitative phase microscopy together with deep learning is emerging as a promising approach for in situ rapid cancer diagnosis.291 Zhang et al.292 published a DCNN-based two-dimensional phase-unwrapping method, which consists of three steps including segmentation, summation, and refinement. The key advantage of this method lies in the introduction of DNNs, which provide both the coarse segmentation result of the wrapped phase map and a degree of antinoise capability. Moreover, this method can obtain a satisfactory unwrapping result even under a severe noise condition.292 Lu et al.293 proposed an enhanced antispeckle deep neural unwrapping network (E-ASDNUN) approach to achieve high-quality absolute phase reconstruction for coherent digital holography. This method designs a special network-based noise filter and embeds it into a deep neural unwrapping network to enhance antinoise capacity in the image feature recognition and extraction process. Meanwhile, it also demonstrates much better robustness than the typical U-net neural network and the traditional phase-unwrapping algorithms in reconstructing high wrapping densities and high noise levels of phase images.293

Figure 22.(a) Network architecture for phase unwrapping.²⁸⁷ (b) One quantitative phase image of multiple lung cancer cells. The images are focused manually and then unwrapped by the quality-guided unwrapping algorithm. The unwrapped focused-phase images are used for labeled training in the model. The cross section and 3D representation of one cell with wrapped and unwrapped signals are shown.²⁸⁸ (c) The DNN blindly outputs artifact-free phase and amplitude images of the object using only one hologram intensity. This DNN is composed of convolutional layers, residual blocks, and upsampling blocks and rapidly processes a complex-valued input image in a parallel, multiscale manner.²⁸⁹ (d) (i) The intensity data are captured by illuminating the sample from different angles with an LED array. (ii) Training CNN to reconstruct high-resolution phase images. The input to the CNN is low-resolution intensity images; the output of the CNN is the ground-truth phase image reconstructed using the traditional FPM algorithm. The network is then trained by optimizing the network’s parameters that minimize a loss function calculated based on the network’s predicted output and the ground truth. (iii) The network is fully trained using the first data set at 0 min and then can be used to predict phase videos of dynamic cell samples frame by frame.²⁹⁰

Download full size

View all figures

Noninterferometric phase measurement

Phase retrieval, the process of recovering the phase information of a wavefront from intensity measurements, is a fundamental problem encountered in various imaging modalities, including optical microscopy, X-ray crystallography, and electron microscopy. Traditional phase-retrieval algorithms, such as Gerchberg–Saxton and Fienup algorithms, iteratively update the phase estimate until convergence to the true solution.294^–297 However, these methods often suffer from slow convergence rates, sensitivity to initialization, and susceptibility to local minimum, particularly in the presence of noise and artifacts.298^–300 In recent years, the integration of AI techniques, particularly deep learning, has emerged as a promising approach to address these challenges and enhance phase-retrieval performance.

AI-enhanced phase-retrieval techniques leverage deep-learning algorithms to improve the efficiency, accuracy, and robustness of phase-recovery processes. CNNs have been employed to learn complex mappings between intensity measurements and phase distributions, enabling faster convergence and enhanced reconstruction quality compared to traditional iterative methods. In addition, generative models, such as VAEs and GANs, have been utilized to generate high-quality phase estimates from noisy or incomplete intensity data, circumventing the limitations of traditional algorithms. These AI-driven approaches offer the potential to revolutionize phase retrieval in diverse applications, including biomedical imaging, materials science, and astronomy. Rivenson et al.289 reported a CNN-based method, trained through deep learning, that can perform phase recovery and holographic imaging reconstruction using a single hologram intensity. They validated this approach by reconstructing the complex-valued images of various samples, such as blood and Papanicolaou (Pap) smears as well as thin sections of human tissue samples, all of which demonstrated successful elimination of the twin-image and self-interference-related spatial artifacts that arise due to lost phase information during the hologram detection process. This DNN is shown in Fig. 22(c). It is composed of convolutional layers, residual blocks, and upsampling blocks and rapidly processes a complex-valued input image in a parallel, multiscale manner.289 Wang et al.301 implemented the “one-to-all” self-attention armed convolutional neural network (SACNN) in speckle reconstruction. SACNN effectively extracts local and global speckle properties from diverse sparse patterns, including unseen glass diffusers and untrained detection positions. This advancement opens avenues for enhancing the field of view (FOV) and depth of field in imaging applications, particularly in deep-tissue imaging with complex scatters.301 Shimobaba et al.302 proposed a dynamic-range compression and decompression scheme for digital holograms that uses a DNN. The proposed scheme uses a DNN to predict the original gradation holograms from the binary holograms, and the error-diffusion algorithm of the binarization process contributes significantly to training the DNN. The performance of the scheme exceeds that of modern compression techniques such as JPEG 2000 and high-efficiency video coding.302 Ma et al. developed a deep-learning-based method for compressed ultrafast photography (CUP) reconstruction that substantially improves the image quality and reconstruction speed. This method decomposes a large 3D event data cube $(x, y, t)$ into massively parallel 2D imaging subproblems, which are much simpler to solve with a DNN.303 Nguyen et al.290 presented a novel conditional GAN to reconstruct video sequences of dynamic live cells captured through Fourier ptychographic microscopy (FPM). The workflow of the FPM video reconstruction method is shown in Fig. 22(d). They demonstrated the reconstruction of high-resolution dynamic cell videos using only the initial FPM data set, achieving a 50-fold speedup compared with traditional methods. This technique presents a promising deep-learning approach for continuous monitoring of large live-cell populations, capturing spatial and temporal information with subcellular resolution over extended periods.290

The integration of AI in phase imaging offers several advantages over conventional methods. AI-powered algorithms can adaptively learn from diverse data sets and generalize well to unseen data, making them suitable for a wide range of imaging scenarios and applications. Moreover, these algorithms often exhibit superior performance in handling complex data, leading to more reliable and accurate phase reconstructions. AI-powered phase imaging finds applications in fields such as terrain mapping, deformation monitoring, medical imaging (e.g., magnetic resonance imaging), and materials science.

Despite the promising advancements, several challenges remain in the development and deployment of AI-powered phase-imaging algorithms. Robustness to variations in imaging conditions, scalability to large data sets, and interpretability of AI models are among the key research directions for future investigations. In addition, the integration of domain knowledge and physics-based constraints into AI algorithms could further enhance their performance and reliability in real-world applications.

2.2.2 Polarization imaging

Polarization imaging, which captures the polarization state of light reflected or transmitted by objects, provides multidimensional information about objects’ physical and optical properties, thereby laying the foundation for enhanced imaging quality. Polarization imaging is typically employed in complex imaging scenarios where issues such as overexposure, noise, and low contrast are common, alongside challenges posed by scattering media and specular reflections. The intensities of multiple polarization directions, as well as the parameters such as the degree of polarization (DoP) and the angle of polarization (AoP) provided by polarization imaging, often mitigate adverse imaging conditions. Furthermore, in applications such as target detection, segmentation, medical diagnostics, and 3D imaging under complex imaging conditions, the physical structures, compositions, and characteristic differences between objects and backgrounds exhibit distinct polarization properties. These differences manifest in variations of photon polarization states after propagation through the imaging environment, ultimately reflected in collected polarization parameters such as DoP, AoP, and the elements of the Mueller matrix. This capability enables polarization imaging to provide additional information beyond what conventional optical imaging methods can offer, thereby crucially enhancing imaging quality. Once equipped with these rich polarization data, the focus of polarization information processing lies in effectively utilizing them to identify application-specific polarization parameters and maximize the benefits of polarization imaging.

The integration of polarization imaging with AI enables more sophisticated and intelligent analysis of polarization information captured in polarization imaging, leading to improved performance in a wide range of applications. Traditional nonlearning-based methods for processing polarization information often rely on theoretical models and typically introduce assumptions and prior information in computations to simplify problems and reduce computational complexity. However, these assumptions and prior information may pose challenges regarding the applicability of the scene, thereby affecting the imaging result. In recent years, AI-assisted approaches have been introduced into polarization imaging due to their significant advantages in recognizing and analyzing large volumes and high-dimensional complex polarization information, extracting image structures and features, enhancing processing efficiency, and fusing multisource information. Therefore, this combination can further enhance the advantages of polarization imaging and has already proven effective in various fields, including image enhancement, reflection separation, image segmentation and detection, imaging through scattering media, and 3D reconstruction.

AI-assisted polarization image enhancement

Enhancing, merging, and reconstructing the multidimensional information obtained from polarized imaging often leads to improved image quality, laying the foundation for better performance in subsequent tasks such as object detection, recognition, classification, and tracking. Polarization acquisition devices enhance information content while simultaneously attenuating intensity information, thereby reducing the signal-to-noise ratio of the imaging process. Crucial polarization parameters such as DoP and AoP are derived from intensity measurements using nonlinear operators. This process can potentially amplify noise in intensity measurements, causing polarization information to be submerged within the noise and consequently affecting the performance of polarization imaging. Therefore, denoising has become a crucial step in preprocessing polarization information. Most nonlearning denoising methods typically treat noise as an additive white Gaussian noise and their processing relies on prior knowledge, which may not effectively address practical applications with various materials and conditions. Hence, deep learning has been introduced into polarization image denoising to enhance performance. Li et al.304 introduced a polarization-denoising residual dense network (PDRDN) designed to capture rich hierarchical representations. The network leverages residual dense blocks, where features generated by preceding layers are integrated using local feature fusion and further combined through global feature fusion,304 as shown in Fig. 23(a). Thus, the proposed method can restore the polarization information, including the degree of linear polarization (DoLP) and angle of linear polarization (AoLP), from a strong noisy background. Hu et al.309 presented a deep-transfer learning-based network based on the residual dense network for polarimetric image denoising, which involves fine-tuning a pretrained denoising model, originally trained on a large-scale color image data set, with a smaller polarimetric image data set. Experimental results demonstrate that the proposed method effectively denoises and restores polarization details in polarimetric images. Li et al. proposed a noise modeling method for realistic data synthesis and a four-stage U-shape network structure310 inspired by vision transformer.311 The network architecture enables efficient multiresolution feature processing with low computational cost, leveraging successive transformer blocks at each stage. Experimental evaluation on a real-world polarized color image data set of paired noisy and reference images demonstrates its effectiveness for data synthesis and denoising.310

Figure 23.Examples of network structure for AI-assisted polarization imaging. (a) Architectures of polarization denoising residual dense network (PDRDN) and residual dense block (RDB).³⁰⁴ (b) Architecture of FIPNet, which consists of three parts: feature extraction layer, fusion layer, and reconstruction layer.³⁰⁵ (c) A reflection separation network takes a cascaded architecture with three modules: semireflector orientation estimation, polarization-guided separation, and separated layers refinement.³⁰⁶ (d) A network tailored to polarization-based dehazing pipeline, which consists of two stages: transmitted light estimation and original scene radiance reconstruction.³⁰⁷ (e) A network with multibranch architecture to handle different hierarchical inputs. The physics-based prior confidence map for the weighted fusion of different inputs and the self-supervised AoLP loss to force the network to learn the prior knowledge between the normal and AoLP.³⁰⁸

Download full size

View all figures

DoLP, AoLP, and intensity images offer complementary information from various perspectives. For instance, DoLP assists in observing surface geometry, edges, and roughness features, whereas AoLP demonstrates advantages under low-light conditions. To achieve improved imaging results, image fusion is necessary, but direct fusion may compromise the image structure. To address this issue, Meng et al.305 proposed a polarization mapping paradigm as an alternative to improve feature utilization and information interpretability and reduce noise. The network architecture for fusion of intensity and polarization images (FIPNet) can achieve image fusion that allows more multidimensional polarization image fusion, as shown in Fig. 23(b). The proposed method effectively captures detailed information on shadows, texture, albedo, and surface orientation, demonstrating robustness across a wide variety of near-infrared, indoor, and outdoor scene images.305

The polarization acquisition method by division-of-focal-plane polarization image sensors results in a gap between the pixels capturing the same spectral polarization feature. To recover color and polarization signatures between the pixels capturing the same spectral polarization feature and enhance image resolution, image demosaicking is essential, which is an interpolation process of missing pixel values. Sun et al.312 proposed a CNN-based color polarization demosaicking CNN, which has a two-branch design that ensures polarization fidelity, enhances resolution, and preserves image details while providing accurate polarization information. Liu et al.313 presented a novel AoP loss calculation method and a multibranch network for color polarization demosaicking, with R, G, B, and polarization modules operating in parallel to enhance stability and polarization fidelity. The proposed approach improves network convergence speed by a factor of 3 and significantly enhances image quality. In addition, the AoP loss method addresses optimization challenges and demonstrates versatility for tasks such as denoising and dehazing in polarimetric imaging.

Traditional digital camera sensors can only capture a limited range of the real-world dynamic spectrum. As a result, the resulting low dynamic range images often contain overexposed or underexposed areas, failing to reflect our perceptual ability to discern details in both bright and dark regions of a scene. Due to variations in the orientation of polarizing filters, the attenuation of natural light also differs. Therefore, multiple images captured by a polarizing camera can be considered a set taken at different exposure times. Ting et al.314 presented a deep framework to address the snapshot high dynamic range (HDR) reconstruction problem using polarization images. It first estimates HDR luminance and then predicts the multiexposure images, followed by a merge and tone mapping step to output an HDR image. A polarized HDR data set is also created, and the method is evaluated both quantitatively and qualitatively, demonstrating its ability to reconstruct details and produce visually appealing textures in the outputs.314

AI-assisted polarization reflection separation

Reflection separation to enhance the image quality is of vital importance for both human and machine perception. One intriguing property of reflection is that reflected light is often polarized, which can facilitate reflection removal. The nonlearning methods are based on handcrafted priors that are based on observations from specific natural images, but these priors may get violated in various real scenarios when expected properties are weakly observed. Learning-based methods, leveraging the comprehensive modeling capabilities of deep networks, hold the promise of further improving polarization-based reflection removal. Lyu et al.306 derived a semireflector orientation constraint that establishes a well-posed physical image formation model for reliable layer separation. They trained an end-to-end DNN with gradient loss to effectively suppress undesired reflections in the transmission layer and recover a high-quality reflection layer,306 as shown in Fig. 23(c). Li et al.315 proposed a framework that contains two cascade networks, PolarNet and FusionNet, for processing single polarized images and combining multiple separated results for refinement, and one polarization simulation engine to recover polarized image set by physically simulating light traveling in a transparent medium. Due to the natural properties of polarization for separating the reflections as well as the elegant network for training, an unprecedented quality can be achieved, which is demonstrated by the extensive experiments conducted on both synthetic data and real-world captures. Lei et al.316 proposed a polarized reflection removal model with a two-stage architecture for estimating reflection and transmission components based on the special relationship between reflection and polarized light. A general decomposition loss called perceptual normalized cross-correlation is proposed to minimize the correlation between estimated reflection and transmission at different feature levels. The method can handle different types of reflection well and remove the reflection without introducing artifacts.

AI-assisted polarization image segmentation and detection

Incorporating polarization into image segmentation and detection enables the acquisition of richer information beyond contextual or boundary details. By inversely inferring the discrepancies in the material and structure of targets from the variances in polarized characteristics obtained from the scene, the efficiency of target classification and detection is heightened. Given that such challenges often necessitate the effective and dynamic fusion of both multispectral and multimodal cues, this unequivocally underscores the prominence of deep-learning methods in tackling this problem. Zhu et al.317 proposed a physics-based multiscale image fusion cascaded object detection neural network (PMIF-CODNN) and a polarized object detection benchmark data set (PODB). The network incorporates a multiscale fusion module and a multidecision adaptive attention mechanism, where the polarization-guided multiscale image fusion technique effectively extracts and merges polarization and intensity features. The proposed method improves the accuracy and precision (by $\sim 10 %$ ) of object detection in adverse weather conditions. In the medical field, Dong et al.318 introduced a dual-modality framework combining polarization and traditional microscopy imaging for cervical intraepithelial neoplasia grading. The approach integrates polarimetry features to quantitatively capture microstructural variations in hematoxylin-eosin-stained pathological sections, demonstrating accuracy, generalizability, and interpretability.

Material segmentation presents heightened challenges compared with typical image segmentation tasks, primarily due to the absence of clearly distinguishable visual features in conventional RGB appearances, especially with transparent objects that lack texture and color information. Existing nonlearning methods often rely on assorted assumptions such as contextual information319 or boundary detection;320 however, these assumptions do not apply to all scenarios, thereby affecting the effectiveness of material segmentation. To better leverage polarized information, the incorporation of deep-learning methods is imperative for achieving robust multimodal glass segmentation. Qiao et al.321 introduced a polarization video glass segmentation network (PGVS-Net) that exploits historical multiview spectral polarization cues to segment glass areas. PGVS-Net utilizes RGB-polarization memory and a polarization-guided integration module to align input frames with view-dependent polarization cues from prior spectral data. This approach surpasses glass segmentation performance on RGB-only video sequences and delivers more robust results compared with per-frame RGB-P single-image segmentation methods. Mei et al.322 introduced a learning-based glass segmentation network, PGSNet, which leverages multiscale pixel-wise dependencies to dynamically enhance local contextual cues while utilizing global cross-domain contextual information to achieve robust segmentation. The validation results demonstrate the effectiveness and robustness of integrating trichromatic color and polarization cues. Kalra et al.323 proposed a framework with a novel backbone tailored to handle the unique textures of polarization imaging. This backbone can be integrated with architectures such as mask R-CNN (e.g., polarized mask R-CNN) to deliver accurate and robust instance segmentation of transparent objects, even in cluttered scenes and across various background conditions.323 Liang et al.324 derived a novel DNN, MCubeSNet, which learns to focus on the most informative combinations of imaging modalities (RGB, polarization, and near-infrared) for each material class with a newly derived region-guided filter selection layer. The experimental results clearly show the importance of multimodal imaging for outdoor material segmentation.

AI-assisted polarization imaging through scattering media

Polarization-based image restoration methods analyze the disparity in polarization characteristics between the object and the background light, estimating alterations in their irradiance. These methods excel in eliminating background scattered light and enabling target recognition and 3D reconstruction. Although traditional model-based polarization reconstruction methods are effective, the linear assumptions of image degradation models and certain assumptions or prior information utilized in model parameter estimation often cannot guarantee universality across diverse scenarios. In recent years, the deep-learning method has developed rapidly and has been considered an effective way to outperform the traditional ones and boost the performance in polarization-based image restoration. Zhou et al.307 proposed a generalized physical formation model of hazy images, introduced a polarization-based dehazing pipeline without relying on specific clues, by adopting deep learning to estimate necessary physical parameters (infinite airlight, DoP of both transmitted light and airlight), and designed a two-stage U-net network architecture that leverages semantic and contextual information to handle the spatially variant real-world scattering to improve the clarity of original scene radiance recovery, as shown in Fig. 23(d). Qi et al.325 proposed $U^{2} R$ -pGAN, an unsupervised polarimetric underwater image recovery method based on the CycleGAN architecture. By eliminating the reliance on paired data, the method integrates the advantages of unsupervised learning and polarization modulation, introducing polarization loss functions to enhance detail restoration. Experimental results across various objects and conditions demonstrate its effectiveness and potential for practical underwater applications.

AI-assisted polarization 3D reconstruction

Surface normals provide detailed 3D information about the surface of objects. As the polarization of light changes differently when the light interacts with the surfaces of different shapes and materials, light that is reflected off an object has a polarization state that corresponds to shape. Previous shape from polarization (SfP) methods have high error rates and limited generalization to mixed materials and lighting conditions; therefore, the combination of SfP and deep learning has proved to improve the performance. Ba et al.326 were among the first to integrate deep learning into solving the SfP problem. They merged the established physical model with deep learning by feeding both polarization images and ambiguous normal maps into the network to learn surface normal information from polarization data.326 This approach led to a notable reduction in errors, demonstrating its efficacy. Zou et al.327 addressed the challenge of estimating the 3D body shape of clothed humans from single polarized 2D images. They proposed a dedicated two-stage deep-learning approach based on polarization images: stage one aims at inferring the fine-detailed body surface normal; stage two gears to reconstruct the 3D body shape of clothing details.327 Empirical evaluations on a synthetic and real-world data set demonstrate the qualitative and quantitative performance of the approach in estimating human poses and shapes. Shao et al.308 presented a learning-based method for transparent surface estimation from a single-view polarization image. A multibranch architecture is proposed to handle different hierarchical inputs rather than directly concatenating them,308 as shown in Fig. 23(e). The physics-based prior confidence map is defined from raw polarization images and is used for the weighted fusion of different inputs. In addition, the self-supervised AoLP loss is proposed by exploiting the confidence map to force the network to learn the prior knowledge between the normal and AoLP. Lei et al.328 introduced a scene-level approach for surface normal estimation from a single polarization image in real-world settings. They presented a data set with paired polarization images and ground-truth normal maps for complex scenes and proposed a learning-based framework incorporating viewing encoding, a multihead self-attention module, and an efficient polarization representation. The method demonstrates the ability to generate high-quality normal maps from polarization images, effectively generalizing from near-field to far-field outdoor scenes.328

2.2.3 Spectral imaging

Spectral imaging (SI) provides both spectral and spatial information of high resolution, serving as a significant tool for material identification and chemical analysis, and can find applications in diverse fields such as remote sensing, clinical diagnosis, and machine vision. Classical SI methods achieve the high-dimensional spectral-spatial data cube through the scheme of time division or focal plane division, which degrades the imaging speed or the spatial resolution and always relies on bulky combinations of relay lens group and scanning component. In recent years, with the development of AI, the framework of SI has been revolutionized by incorporating advanced optical system design and AI reconstruction, and various snapshot, high-resolution, and compact SI methods have been developed. On the other hand, AI also greatly enhances SI-based detection and classification by reaching the information theoretical limitation owing to its extraordinary data processing and analyzing capability.

AI-assisted SI system

The AI-assisted SI system can be divided into two groups, where the modification is designed on the lens329^–333 or the detector,334^–337 which are two essential elements of a typical imaging system.

In past decades, by inserting the conventional dispersion elements (grating or prisms) and the spatially variant mask into the lens group, the spatial-spectral data were compressively encoded on one intensity image, and the snapshot high-throughput hyperspectral image could be reconstructed with AI (deep-learning) algorithm based on the compressive-sensing theory.338^,339 In 2019, researchers from KAIST took advantage of recent advances in diffractive optical technology and presented a much more compact, diffraction-based snapshot hyperspectral imaging method, using only a novel diffractive optical element (DOE) in front of a conventional, bare image sensor,329^,330 as shown in Fig. 24(a). By leveraging the wavelength-dependent property of the diffractive element, a spectrally varying PSF is generated, as shown in Fig. 24(b). They developed a network shown in Fig. 24(c) to separate the spectral channel from one blurred snapshot image according to the spectrally dependent PSF, and a hyperspectral image with high spatial resolution and 25 spectral channels is outputted, with its reconstructed spectrum close to the ground truth [Fig. 24(d)]. To further increase the spectral channel and reconstruction precision, researchers from Stanford University and Universidad Industrial de Santander proposed a system composed of a DOE and a color-coded aperture (CCA),333 as shown by the schematic setup in Fig. 24(e). The generated spectrally dependent PSF by the DOE is then filtered by the CCA, located close to the sensor, to enable a shift-variant spectrally dependent response to improve the modulation freedom. Under the AI-assisted framework, the DOE and the CCA are designed by an end-to-end approach, incorporating the recovery network, as demonstrated in Fig. 24(e). A hyperspectral image with 49 spectral channels can be reconstructed from one snapshot intensity image, with improved precision compared with the pure DOE method [Fig. 24(f)], whereas the cost is more PSF calibration work due to the shift variant property.

$AI-assisted snapshot compact SI. (a)–(d) Results of the spectral combining of the AI reconstruction and the DOE design with diffractive rotation.329" target="_self" style="display: inline;">329 (a) The fabricated DOE that generates spectrally varying PSFs for SI. Inset: a camera installed with the DOE. (b) The PSFs at different wavelengths. (c) Overview of the network architecture. (d) The RGB image of a reconstructed SI and the comparison between the reconstructed spectrum and the ground truth of point 1 in the scene. (e)–(g) Results of the shift-variant color-coded diffractive SI system.333" target="_self" style="display: inline;">333 (e) Optimization of the optical elements is carried out using an end-to-end AI approach. (f) RGB image of a reconstructed hyperspectral image and the comparison between the reconstructed spectrum and the ground truth of point 1 in the scene. SCCD types 1 to 3 denote three different types of CCA utilized in the system. Spiral denotes a system without CCA. (h)–(j) Different types of pixelated filter array: (h) Fabry–Perot filter;335" target="_self" style="display: inline;">335 (i) freeform-shaped metasurface filter;336" target="_self" style="display: inline;">336 (j) film filter.337" target="_self" style="display: inline;">337 (k)–(m) Results of computational SI with CMOS-compatible random array of Fabry–Perot filters shown in panel (h).335" target="_self" style="display: inline;">335 (k) Performance of hyperspectral image reconstruction simulated for three hyperspectral image data sets, including the RGB show of reconstruction and the error map between the reconstruction and the ground truth. (l) Experimental results of the SI for a standard color sample. (m) The dependence of the frame rate on the image resolution for AI-based reconstruction and the iterative reconstruction with 50 iteration steps.$

Figure 24.AI-assisted snapshot compact SI. (a)–(d) Results of the spectral combining of the AI reconstruction and the DOE design with diffractive rotation.³²⁹ (a) The fabricated DOE that generates spectrally varying PSFs for SI. Inset: a camera installed with the DOE. (b) The PSFs at different wavelengths. (c) Overview of the network architecture. (d) The RGB image of a reconstructed SI and the comparison between the reconstructed spectrum and the ground truth of point 1 in the scene. (e)–(g) Results of the shift-variant color-coded diffractive SI system.³³³ (e) Optimization of the optical elements is carried out using an end-to-end AI approach. (f) RGB image of a reconstructed hyperspectral image and the comparison between the reconstructed spectrum and the ground truth of point 1 in the scene. SCCD types 1 to 3 denote three different types of CCA utilized in the system. Spiral denotes a system without CCA. (h)–(j) Different types of pixelated filter array: (h) Fabry–Perot filter;³³⁵ (i) freeform-shaped metasurface filter;³³⁶ (j) film filter.³³⁷ (k)–(m) Results of computational SI with CMOS-compatible random array of Fabry–Perot filters shown in panel (h).³³⁵ (k) Performance of hyperspectral image reconstruction simulated for three hyperspectral image data sets, including the RGB show of reconstruction and the error map between the reconstruction and the ground truth. (l) Experimental results of the SI for a standard color sample. (m) The dependence of the frame rate on the image resolution for AI-based reconstruction and the iterative reconstruction with 50 iteration steps.

Download full size

View all figures

By closely attaching a pixelated filter array on the detector surface, with the filter transmission function designed to be highly irrelevant across a wide band, snapshot hyperspectral imaging can be achieved through AI reconstruction from the image formed by a classical lens on the decorated detector. Various pixelated filter arrays have been fabricated, based on the Fabry–Perot filter,335 freeform shaped metasurface,336 and even a cheap film337 [shown in Figs. 24(h)–24(j)]. By attaching a Fabry–Perot filter array consisting of 64 types of highly irrelevant filter functions with a pixel pitch perfectly matching the image sensor [Fig. 24(h)], researchers from the Panasonic Holdings Corporation achieved a video-rate hyperspectral camera capable of collecting spectral information on real-world scenes with sensitivities and spatial resolutions (3 pixels) comparable with those of a typical RGB camera. The high spectral precision is verified by the error map within 10% tested in simulation on various hyperspectral image data sets in Fig. 24(k) and the spectrum reconstructed for a standard color sample in the experiment that is close to the ground truth is shown in Fig. 24(l). Moreover, with assistance from the AI, the reconstruction speed ensures a frame rate of up to 34.4 frames per second for full-HD resolution ( $1920 pixels \times 1080 pixels$ ), much faster than the traditional iterative scheme, as shown in Fig. 24(m).

AI-assisted SI-based application

Besides upgrading the scheme in acquiring the hyperspectral data cube with enhanced spatial resolution and response speed, AI also helps to extract more underlying information from the signal spectrum and greatly expands the function of spectral imaging in machine vision, clinical diagnosis, and remote sensing.340^–342

In 2023, researchers from Purdue University proposed a passive heat-assisted detection and ranging (HADAR) method,340 which utilizes AI clearly separating the self-emitted and the scattered thermal radiation within the infrared hyperspectral imaging cube, and successfully decomposing the TeX (T, temperature, physical status; $e$ , emissivity, material fingerprint; X, texture, surface geometry) information about the scene, which are mixed in the photon streams, as illustrated in Fig. 25(a). Consequently, HADAR not only overcomes the “ghost effect” that puzzled the thermal vision for a long time and reveals the texture clearly through the darkness as if it were day but also provides physical attributes beyond the scope of daylight RGB vision, as shown in Fig. 25(b). With the extracted texture, HADAR can achieve the depth information of objects, which is a critical scene attribute for autonomous navigation. As shown in Fig. 25(c), the ranging precision at night by HADAR is comparable to the daylight RGB stereovision and beats the conventional thermal ranging without AI-assisted decomposition.

Figure 25.Heat-assisted detection and ranging (HADAR) with AI-assisted decomposition.³⁴⁰ (a) Pipeline of HADAR: HADAR takes thermal photon streams as input, records hyperspectral-imaging heat cubes, addresses the ghosting effect through AI-assisted TeX decomposition, and generates TeX vision for improved detection and ranging. (b) TeX vision demonstrated on the database and the outdoor experiments, showing that HADAR sees textures through the darkness with a comprehensive understanding of the scene. (c)–(h) Ranging based on the raw thermal images (c), (d), AI reconstructed images in the HADAR technique at night (e), (f) and daylight RGB vision (g), (h).

Download full size

View all figures

Besides machine vision, AI also greatly expands the function of SI in biomedicine and clinics. Multiple imaging models suitable for unstained tissues such as photoacoustic imaging and autofluorescence imaging allow the distinction of the endogenous components based on their absorption or emission spectrum at high spatial resolutions.343^,344 With the assistance of AI, it is possible to enable virtual staining with the same diagnostic capability as real staining (gold standard) and avoid the laborious and expensive staining process. In 2024, researchers from Verily Life Sciences LLC proposed an end-to-end platform for digital pathology using hyperspectral autofluorescence microscopy and deep-learning-based virtual histology.343 The AI converts the autofluorescence spectral imaging into two classical stained images with H&E and Masson’s trichrome and then performs the automated scoring. The end-to-end pipeline is illustrated in Fig. 26(a). Averaging over a limited spectral band such as those captured by conventional AF imaging microscopes did not allow for easy differentiation among different tissue features, whereas spectral projections of the hyperspectral image with tuned channel-related weights based on a correlation calculation could differentiate features, as shown in Figs. 26(b)–26(i), demonstrating the information of the endogenous component is encoded in the spectral signal from the unstained slice. By developing a neural network [Fig. 26(j)], researchers achieved virtual staining images that are very similar to the bright-field image of the real stained slice [Fig. 26(k)]. The scores estimated by AI based on the virtual stained images show a high correlation with the scores estimated by experts based on the real stained images [Fig. 26(l)–26(o)], demonstrating the high reliability of the AI-assisted end-to-end automated pathology platform.

Figure 26.AI-assisted end-to-end platform for digital pathology using hyperspectral autofluorescence microscopy and deep-learning-based virtual histology.³⁴³ (a) Automated workflow with virtual staining and AI scoring that mimics the current pathology workflow. (b)–(e) Classical H&E stained images (b) or the immunofluorescence images [(c) elastin + $α$ -SMA, (d) nuclei, and (e) CD]⁶⁸ of a tissue slice. (f)–(i) Images of the adjacent slice generated by a linear projection of the autofluorescence spectral image with different channel-related weights to enhance different components [(f) a uniform projection mimicking the autofluorescence intensity imaging result, (g) extracellular matrix, (h) nuclei, (i) macrophages]. (j) Neural network architecture of the generator of virtual stainer. AF, autofluorescence; BF, bright field. (k) BF real and virtual images stained with H&E. (l)–(o) Correlation of the slide level nonalcoholic steatohepatitis feature attributes predicted by segmentation models on real stains versus virtual stains [(l) percent steatosis, (m) percent lobular inflammation, (n) log-normalized hepatocyte balloon count, (o) fibrosis density].

Download full size

View all figures

There are still limitations to the current AI-assisted spectral imaging technique. (1) A large workload is required to prepare a comprehensive data set for the designed spectral imaging system. There are public hyperspectral data cubes that can be utilized to simulate the captured image to generate the training data set automatically. In such cases, the established system should be carefully calibrated, including the noise model and the PSF across the FOV. (2) The output would be less confidential when the system is applied on a scene, which is very different from that corresponding to the training data set, which is a common problem also faced by the deep-learning network in other fields. Including the physical model in the AI calculation can help to improve the robustness of the output result. Even so, AI-assisted spectral imaging would become a powerful tool for disruptive detection to accelerate the revolution of precise diagnosis, industry, and human–robot social interaction, among others, with the coming age of huge data and ultrahigh computational capability.

In Sec. 2.2, we have explored the applications of AI for optical imaging, particularly its role in phase, polarization, and spectral imaging. AI is one of the most dynamically evolving research domains and has introduced significant transformations in the field of optical imaging. However, several challenges, such as robustness to variations in imaging conditions, scalability to large data sets, and interpretability of AI models, still remain in the development and deployment of AI-powered optical imaging. In the future, the integration of domain knowledge and physics-based constraints into AI could further enhance their performance and drive advancements in healthcare, industry, and human–machine interaction.

2.3 AI for Optical Data Acquisition and Analysis

The rapid development of AI has significantly expanded its applications in optical signal and data processing. Beyond its transformative impact on optical imaging, AI has become a cornerstone in optical data acquisition and analysis, significantly enhancing system performance and enabling new possibilities for advanced applications. In optical communication, AI-driven deep-learning algorithms excel at modeling and predicting nonlinear effects, enabling efficient real-time compensation and substantially improving transmission performance. Similarly, in OPM, AI employs big data analysis and deep learning to extract valuable insights from extensive monitoring data, facilitating intelligent monitoring and fault prediction. This enhances the accuracy and efficiency of performance assessments while reducing maintenance costs. In addition, in optical sensing parameter analysis, AI increases system sensitivity and accuracy through intelligent data analysis, allowing for precise monitoring and analysis in diverse applications such as environmental monitoring and the biomedical field. Overall, AI in optical signal processing enhances system performance and introduces new technological breakthroughs in optical communication, monitoring, and sensing, demonstrating a broad spectrum of potential applications and the transformative impact of AI in photonics.

2.3.1 AI methods for optical signal processing

Photons as information carriers have shown capabilities in emerging applications unreachable by electrons due to the advantages of high bandwidth, low latency, and low loss. The utilization of optical signals for data transmission, sensing, and processing is widely applied in fields such as medical healthcare, military reconnaissance, and optical communications.22^,345^,346 With the development of big data and the Internet of Things, there is a growing demand for larger communication capacities, more accurate perception precision, and faster computing speeds. Consequently, effective processing and analysis of optical signals has become crucial. Benefiting from the rapid development of AI, intelligent learning methods, particularly represented by recurrent neural network (RNN), are extensively employed for optical signal processing.14

RNNs have outstanding performance in handling time-series data and adapting to dynamic changes in optical signals, providing robust technological support for optical signal processing. Different from traditional neural networks considering only the input at the current moment [Fig. 27(a)], RNNs introduce a recurrent structure that enables the network not only to consider the input at the current moment but also to retain a memory of previous inputs [Fig. 27(b)]. The basic network structure of RNN consists of the input, hidden, and output layers. $X (t)$ , $h (t)$ , and $o (t)$ are the input, hidden, and output states, respectively, where $t$ denotes time. The model parameters $W_{1}$ , $W_{2}$ , and $W_{r}$ represent input, output, and recurrent weight matrices, respectively. The RNN can be unfolded in time into a multilayer network and in the hidden layer; each node has a recurrent connection, allowing information to propagate through the sequence. Specifically, for each time step, RNN accepts the current input and the hidden state from the previous time step as input and produces output and the hidden state of the current time step. However, RNNs suffer from issues of vanishing or exploding gradients due to weight sharing, making it challenging to learn long-term dependencies. To overcome these issues, a special RNN architecture called long short-term memory (LSTM) has emerged.347 In LSTM networks, decisions are made regarding whether to forget, delete, or store information based on the importance assigned to the information. The main structure of an LSTM consists of forget gate $z^{f}$ , input gate $z^{i}$ , output gate $z^{o}$ , and cell state $c^{t}$ [Fig. 27(c)]. The forget gate determines which past information from $c^{t - 1}$ needs to be forgotten, the input gate determines which current information needs to be stored in $c^{t}$ , and the output gate determines which information from $c^{t}$ needs to be passed to the next unit. It can more accurately model and learn time-series data and their long-term dependencies by bettering storing and accessing information.

Figure 27.Schematic diagram of RNN. (a) Traditional neural network architecture with input, hidden, and output layers. (b) RNN architecture and an unfolding structure with $t$ time steps. $X (t)$ : input state. $h (t)$ : hidden state. $o (t)$ : output state. $W_{1}$ , $W_{2}$ , and $W_{r}$ represent input, output, and recurrent weight matrices, respectively. (c) LSTM cell architecture with forget, input, output, and cell states.

Download full size

View all figures

2.3.2 Optical communication nonlinear compensation

As next-generation mobile communication systems and cloud computing infrastructure continue to expand, the demand for communication capacity is steadily rising. Although high-order modulation formats and multiplexing techniques have been employed to increase capacity, they have also introduced more complex signal impairments. The primary bottleneck limiting capacity remains the nonlinear Shannon capacity limit of transmitted information. Conventional compensation methods such as digital backpropagation348 and optical phase conjunction349 rely on deterministic models to solve the nonlinear Schrödinger equation for impairment compensation. However, optical fibers typically exhibit complex nonlinear transformations that are challenging to model accurately with mathematical equations due to dispersion, nonlinear effects, and noise interference. These traditional approaches have limited compensation capabilities and struggle to meet the compensation of complex impairments in fiber-optic communication systems.

Neural networks possess powerful fitting capabilities and can approximate any nonlinear function, making them widely used for optimizing channel impairments in fiber-optic systems.22^,350^,351 These algorithms directly compensate for nonlinear impairments by learning from received data, rather than addressing the nonlinear Schrödinger equation. Compared with a feedforward neural network, RNN is better applied to nonlinear compensation in optical fiber links by learning sequence information. LSTM is utilized to compensate for the fiber nonlinearities in digital coherent systems and provide superior performance compared with digital backpropagation.352 Dai et al.353 proposed a nonlinear equalization technique enabled by LSTM at the end of offline digital signal processing. To enhance the nonlinear processing capability of LSTM, bidirectional long short-term memory (Bi-LSTM) is proposed for nonlinear equalization of optical fiber354 [Fig. 28(a)]. It extends the traditional unidirectional LSTM to simulate the dependence of the present state on past and future states, consisting of both forward and backward LSTM layers. Deligiannidis et al.355 evaluated three bidirectional RNN models as post-processing units for the compensation of fiber nonlinearities in digital coherent systems carrying different multiplexed signals [Fig. 28(b)]. This work suggested that all of them are promising nonlinearity compensators. Some research proposed to combine CNN with Bi-LSTM to increase the processing capacity of the network and show good nonlinear compensation performance356^,357 [Fig. 28(c)]. However, almost all equalization schemes based on neural networks have high computational complexity. To reduce the computational complexity, the central-oriented LSTM358 (Co-LSTM) and some combined attention mechanisms359 have been proposed. The application of neural networks in fiber optical communication systems provides new insights and methods for solving complex impairment compensation problems, promising to further enhance the capacity of optical communication links.

Figure 28.Functions of RNN in nonlinear compensation for optical communication. (a) Schematic diagram of LSTM based on sliding window.³⁵⁴ The autoencoder is represented by the blocks Tx BRNN, channel, and Rx BRNN. (b) The principle of Bi-RNN models.³⁵⁵ The Bi-RNN model processes distorted symbols with intersymbol dependencies to estimate bitwise BER, optimizing complexity, and performance for 16-QAM and 32-QAM. (c) Architecture of LSTM combined with CNN for nonlinear compensation.³⁵⁶ The feature maps $y_{f}$ from the convolutional layer are fed into either two dense layers (forming the CNN + MLP structure, with the number of layers determined by the Bayesian optimizer) or a single Bi-LSTM layer.

Download full size

View all figures

2.3.3 Optical performance monitoring

Real-time monitoring of various channel impairments in complex optical networks is essential to ensure their reliable operation. OPM technology serves as a cornerstone for maintaining the reliability of optical networks. By enabling early detection of physical impairments in the network, OPM allows technicians to perform maintenance in advance and to prevent network failures.360 A large number of the frequency-domain, time-domain, and polarization-domain monitoring techniques have been proposed, such as the differential pilot-aided technique and time-domain pilot aided with fractional Fourier transformation technique.361^,362 However, all these methods require prior knowledge based on precise physical models, and it is difficult to get accurate results due to complex relationships and inevitable personal errors.

Compared with traditional methods, neural network models, with their excellent nonlinear modeling capabilities, can automatically learn internal relationships by adjusting weights during training. In the training stage, the inputs of the models are damage indicator feature vectors of the eye diagrams/asynchronous delay-tap plots/amplitude histograms, and their corresponding labels are used as the target. The models then learn the mapping between input features and labels and can be used for real-time monitoring. Neural networks have successfully been applied to monitor various impairments in optical networks such as optical signal-to-noise-ratio (OSNR), chromatic dispersion (CD), and polarization-mode-dispersion, which can change with temperature and path reconfiguration.363^,364 Some researchers have used CNN technology to estimate OSNR by training CNN to extract features from eye diagrams or constellation diagrams. However, as OSNR increases, the changes in constellation or eye diagrams become less significant, leading to decreased accuracy of CNN-based recognition methods. Compared with CNN and other models, RNNs, especially LSTM models, excel in handling time-series data, making them advantageous for extracting features from time-varying signal transmission data.23^,365 Wang et al.366 proposed using LSTM-NN to establish relationships between time-varying transmission data and corresponding OSNR without manual feature extraction. This method significantly reduces input data size compared with CNN models. The LSTM models combined with CNN and an attention mechanism are also proposed to further enhance the ability of network monitoring.367^,368 LSTM combined with an attention mechanism is used to develop a high-precision performance monitoring framework for optical networks.369 Zhang et al.370 introduced an attention mechanism into Bi-LSTM to establish an optical network fault identification model, demonstrating its reliability and accuracy through multiple comparative experiments. These studies demonstrate that leveraging AI techniques can effectively enhance the accuracy and reliability of optical network performance prediction and play a crucial role in the future of high-capacity, dynamic, and flexible optical networks.

2.3.4 Optical sensing parameter analysis

In the field of optical data acquisition and analysis, the application of AI is rapidly evolving, particularly in optical sensing. These advancements not only enhance data-processing efficiency but also expand the applications of optical sensing. Optical sensing involves receiving light signals and converting them into electrical signals or other forms of output to perceive the characteristics of objects or environmental conditions. It is widely used in biomedical, transportation, and environmental monitoring fields.371 The data generated by optical sensors are typically complex time-series or spatial information that requires further analysis and processing to derive useful insights. Analyzing and processing optical sensor parameters are crucial for achieving high-performance perception. Traditional methods often rely on prior knowledge and model-based analysis of physical relationships.372^,373 However, in scenarios such as autonomous driving and virtual reality, where a large number of optical parameters of different dimensions need processing, traditional methods exhibit limitations in establishing accurate mathematical models, resulting in low accuracy and slow perception speed.

Fortunately, neural networks learn complex features and relationships directly from raw data without strict reliance on physical models, well suited for handling complex environmental changes and resisting interference in optical sensor parameter processing.374^,375 Fiber-optic sensing technology, a cornerstone of modern optical sensing, leverages optical fibers to detect various physical parameters such as temperature, pressure, and vibration. AI enhances these systems by enabling smarter data analysis and pattern recognition, leading to more accurate predictions and real-time decision-making. Recent advancements include the development of multifunctional fiber-optic sensors that can simultaneously detect multiple signals, such as temperature and humidity, with enhanced accuracy thanks to AI-driven algorithms that efficiently interpret complex data patterns. In environmental awareness applications, the combination of fiber-optic stress sensors with the SSA-LSTM model has been proposed to improve measurement accuracy, effectively reducing the impact of surrounding environments and light source fluctuations.24 To further enhance neural network processing capabilities, a pattern recognition strategy based on LSTM and CNN is proposed for vibration sensing376 [Fig. 29(a)]. In the biomedical field, LSTM has been used to evaluate multiwavelength diffuse correlation spectroscopy for direct assessment of blood flow and oxygen saturation. A study employing LSTM-CNN algorithms with fiber-optic sensors measures heart rate and respiratory signals during surgery, showing performance superior to those of traditional algorithms such as Fourier transforms377 [Fig. 29(b)]. Perception based on the LSTM model is also widely used in the field of transportation. Sabih and Vishwakarma378 combined Bi-LSTM with CNN to monitor crowded abnormal scenes by assessing temporal and spatial features of optical flow frames in videos [Fig. 29(c)]. Research proposed the RNN-LSTM model with hyperbolic tangent in hidden layers to predict potential vibrations of high-speed trains.379 These advancements highlight the effectiveness of RNN models in processing optical sensor parameters, overcoming traditional limitations, and enabling more accurate and adaptive optical sensing systems.

Figure 29.Various optical-sensing applications implemented using LSTM. (a) LSTM-CNN model for vibration sensing.³⁷⁶ The optical cable is installed directly above the PCCP pipe and fixed with fixtures. Different signals exhibit distinct characteristics across the frequency band and more pronounced local features in the time-frequency domain. Based on LSTM and CNN architectures, a neural network was designed using time-domain waveforms along with their DWT and STFT as inputs. This integrated feature set enables effective pattern recognition. (b) Optical fiber sensing based on the LSTM-CNN model in the surgery.³⁷⁷ The LSTM-CNN framework is utilized to process perioperative heart rate (HR) and respiratory rate (RR) frequency signals. Trends are extracted from HR and RR, whereas CNN and LSTM are employed for feature extraction and processing, respectively. (c) Crowded abnormal scene detection using Bi-LSTM and CNN.³⁷⁸ The proposed methodology utilizes optical flow features to capture frame-level spatial information. Temporal information across the data set is modeled using a Bi-LSTM. The key components of the proposed architecture include constructing an optical feature matrix, integrating a CNN with a Bi-LSTM, and implementing a novel inference mechanism.

Download full size

View all figures

Moreover, integrated photonic sensors utilize compact photonic chips to perform complex detection and analysis tasks.380 AI significantly enhances the functionality of these devices by enabling the rapid interpretation of vast data streams. For instance, AI is crucial in the development of silicon-based photonic chips for rapid biomolecular detection, where it enhances optical signals through surface plasmon resonance and accelerates data processing, significantly improving both detection sensitivity and throughput. The emerging field of label-free optical biosensors allows for the chemical-free optical examination of biological samples. AI plays a pivotal role here by employing advanced algorithms to analyze and interpret optical data from complex biological samples, facilitating real-time monitoring of cellular behavior and molecular interactions. In addition, AI-assisted light-field sensing technology is widely used in integrated imaging sensors. Recently, metasurfaces, gratings, dispersive media, and other structures have been used to realize the encoding of light-field information.381^,382 Then, AI-assisted algorithms are used to realize the processing of output data and finally obtain the real physical world information. This AI integration not only increases the accuracy of such measurements but also enables the continuous learning and adaptation of sensor responses based on newly gathered data.

In summary, Sec. 2.3 has discussed the application of RNN models as powerful tools for optical signal processing across various optical systems. With the advent of the information age, we firmly believe that neural networks will continue to expand their applications in intelligent optical signal processing. Furthermore, exploring the implementation of neural network models through optical hardware holds promise for achieving lower energy consumption and real-time information processing.383^–385

3 Photonics for AI

The intersection of photonics and AI has paved the way for significant advancements in computational technology. Following the discussion on AI for photonics, particularly in photonics design, optimization, imaging, and data processing, we now focus on how photonics facilitates AI by enabling high-performance computing. Large models such as GPT are becoming more intelligent, with parameter sizes expanding $\sim 10$ -fold annually—far outpacing post-Moore computing advancements. This gap underscores the critical need for transformative computing paradigms, with optical computing poised to usher in a revolutionary shift in addressing these challenges.

In recent years, in addition to leveraging light to address the high-speed interconnection bottleneck in high-performance computing, optical computing has also emerged as a compelling alternative to traditional electronic computing, garnering increasing interest due to its unique capabilities. Optical computing has been researched for nearly half a century. It has begun to flourish recently due to the hunger for computing resources in the AI era.

The advantages of optical computing stem from the differences in physical properties of photons and electrons. First, the bandwidth of optics ( $\sim 100 THz$ ) is much larger than that of electronic circuits ( $\sim 5 GHz$ ), which enables optical data transmission to be significantly faster than electrical transmission, especially when utilizing the full optical bandwidth. Second, light offers multiple dimensions for encoding and processing information, such as space, wavelength, and polarization. Through multiplexing and demultiplexing techniques, high-parallelism information processing can be achieved. Third, although electrical transmission generates substantial heat, causing energy loss, light propagation in media results in negligible thermal effects. When scattering losses are ignored, the energy loss of light is much lower than that of electricity.

There are typically three metrics used to assess the performance of optical computing: latency (s), throughput (OP/s), and energy efficiency (J/OP). Optical computing’s low latency benefits from its large bandwidth, whereas its high throughput is driven by high parallelism. Energy efficiency is determined by both energy consumption and the number of operations. On the one hand, light’s low energy consumption is primarily due to the analog nature of optical computing, where computations are performed via propagation, and its low-loss characteristics allow for the use of low-power light sources. On the other hand, the large number of operations benefits from the high parallelism inherent in optical systems.

In this section, we specifically focus on AI-related optical computing tasks, with a particular emphasis on three important topics: matrix computation performed by PICs, optical neural networks leveraging optical diffraction characteristics, and emergent materials for photonic computing.

3.1 Photonic Circuits Empower Matrix Computation

Optical computing based on photonic circuits offers one of the competitive candidates to perform complex computational tasks by confining and manipulating light within waveguides. By designing photonic circuits, the light propagation can be precisely controlled, enabling tasks such as matrix-vector multiplication,32 convolutions,386 and matrix inversion.387 This approach provides several key advantages. First, the confinement of light within waveguides ensures controlled propagation, resulting in high computational precision. Second, the programmability of on-chip photonic devices allows for the implementation of arbitrary matrices, making it a highly versatile and general-purpose computing method. Third, its compact size and compatibility with CMOS technology enable seamless integration of optical and electronic components, paving the way for hybrid systems. Together, these features make photonic circuit-based optical computing an operator for large-scale computational tasks in the context of optical-electronic hybrid architectures.

In this section, we discuss how photonic circuits empower matrix computation, a fundamental operation in many AI algorithms. Photonic circuits, leveraging the unique properties of light, offer a promising solution for efficient matrix computation. At first, matrix computation based on coherent photonic circuits utilizes the phase and amplitude of light waves to perform highly parallel and energy-efficient computations, surpassing traditional electronic methods in both performance and efficiency. Following this, various calibration strategies for enhancing the accuracy and efficiency of photonic computing meshes are examined. In addition, we introduce matrix computation based on incoherent photonic circuits, providing an alternative approach by utilizing the intensity of light, offering robust and scalable solutions. Advances in optical computing based on PICs represent a significant leap in the integration and miniaturization of photonic components, enabling compact and high-performance optical processors capable of handling complex AI tasks. These advancements in photonics for AI highlight the transformative potential of photonic-enhanced AI systems, underscoring a promising future for the synergy between photonics and AI.

3.1.1 Matrix computation based on coherent photonic circuits

In coherent photonic circuits, light transmission involves complex interference phenomena determined by both the amplitude and phase. Control over these amplitudes and phases is essential for manipulating the interference outcomes. The $2 \times 2$ MZI is one of the principal computational units in these circuits. As illustrated in Fig. 30(a), the MZI comprises two 50:50 directional couplers, an internal phase shifter, and an external phase shifter. These phase shifters modulate the amplitude and phase, respectively, allowing the MZI to perform arbitrary $2 \times 2$ unitary [ $U (2)$ ] transformations. The transfer matrix for an MZI unit can be determined by sequentially multiplying the transfer matrix of 50:50 directional couplers $M_{cp}$ , internal phase shifter $M_{ps} (θ)$ , and external phase shifter $M_{ps} (ϕ)$ , $T_{mzi} = M_{cp} M_{ps} (θ) M_{cp} M_{ps} (ϕ) = j e^{j θ} (\begin{matrix} e^{j ϕ} \sin θ & \cos θ \\ e^{j ϕ} \cos θ & - \sin θ \end{matrix}) .$ (5)

Figure 30.Matrix computation using an MZI mesh. (a) Legend for interpreting the symbols used in other subgraphs. Two predominant methods are illustrated: (b) the Reck scheme³⁸⁸ and (c) the Clement scheme.³⁸⁹ The left side of the figure displays the spatial layout of the MZIs, with the number in each yellow block indicating the order of light manipulation by each MZI. The red dashed arrows denote the sequence for decomposing the unitary matrix. The colors blue and green surrounding the red arrows indicate column and row eliminations, respectively. The right side of the figure shows the corresponding elimination order of unitary matrix elements. (d) MZI mesh for universal complex-valued matrix through SVD decomposition.

Download full size

View all figures

Now, let us consider a case where the $2 \times 2$ MZI unit is incorporated into the $N$ -channel waveguide array and connected to channels $i$ and $i + 1$ . The MZI unit at this location affects only the $i$ and $i + 1$ channels for both input and output, leaving all other channels unchanged. Consequently, the transfer matrix $T_{i}$ for this $N$ -channel waveguide array can be viewed as embedding the MZI transfer matrix $T_{mzi}$ within an $N \times N$ identity matrix, specifically in the region spanning rows $i$ to $i + 1$ and columns $i$ to $i + 1$ , as shown in Eq. (6), $T_{i} = (\begin{matrix} 1 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & u_{11} & u_{12} & \dots & 0 \\ 0 & \dots & u_{21} & u_{22} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & 0 & 0 & \dots & 1 \end{matrix}),$ (6)where $u_{i j}$ is the element of matrix $T_{mzi}$ . A universal matrix can be decomposed into a sequence of a series of matrix $T_{i}$ and a diagonal matrix $D$ . This decomposition process can be fulfilled by executing the element elimination process. Specifically, left multiplying the target matrix by $T_{i}$ allows for row elimination and right multiplying by $T_{i}^{- 1}$ enables column elimination. Through sequential elimination operations, off-diagonal elements of the target matrix are zeroed, converting it into a diagonal matrix. Several element elimination procedures have been proposed for implementing universal unitary matrices. In 1994, Reck et al.388 first proved that the $N \times N$ universal unitary matrix can be physically implemented by cascading the MZI units. The column-only elimination procedure is implemented to transform the unitary matrix $M$ to a diagonal matrix $D$ , which is shown in Fig. 30(b). The elimination process can be mathematically written in Eq. (7), where the superscript of $T_{i}$ represents the sequence of matrix decomposition, corresponding to the red circled number in Fig. 30, $M [T_{1}^{(1)}]^{- 1} [T_{2}^{(1)}]^{- 1} [T_{3}^{(1)}]^{- 1} [T_{1}^{(2)}]^{- 1} [T_{2}^{(2)}]^{- 1} [T_{1}^{(3)}]^{- 1} = D .$ (7)

In 2016, Clements et al.389 proposed an alternative elimination procedure, which incorporates both row and column eliminations performed in an alternating sequence, as shown in Fig. 30(c). The mathematical form representing this process is $T_{3}^{(2)} T_{2}^{(2)} M [T_{1}^{(1)}]^{- 1} [T_{3}^{(3)}]^{- 1} [T_{2}^{(3)}]^{- 1} [T_{1}^{(3)}]^{- 1} = D .$ (8)

By transposing all eliminating matrices $M$ from the left side to the right side and multiplying by their inverses, the decomposition formula of the target unitary matrix $M$ can be obtained. The different decomposition methods correspond to different MZI mesh structures. Reck’s method aligns with the triangular mesh, whereas Clement’s method correlates with the rectangular mesh. The rectangular mesh offers several advantages including path-length uniformity, compactness, and loss insensitivity and has become the predominant structure for MZI meshes.390

A universal complex matrix can also be implemented optically through singular value decomposition (SVD), expressed as $M = U Σ V^{†}$ , where $U$ is a unitary matrix, $V^{†}$ is the transpose conjugate of unitary matrix $V$ , and both can be realized using MZI mesh. The matrix $Σ$ is a diagonal matrix with complex elements, which can be represented by an array of MZI attenuators. Thus, the MZI mesh for the universal matrix can be constructed, as illustrated in Fig. 30(d). Based on this principle, Shen et al.32 first experimentally demonstrated a two-layer optical neural network in 2017, utilizing an MZI mesh construction, and showed its utility for vowel recognition. The optical micrograph of the photonic circuit is depicted in Fig. 31(a).

Figure 31.Various photonic circuits designed for matrix-vector multiplication. (a) Micrograph of a photonic circuit engineered to compute unitary matrices.³² Different methods for realizing real-valued matrix computations through coherent MZI mesh structures are shown: (b) using an incoherent laser source with power detection³⁵ and (c) constructing the real part of a unitary matrix.³⁹¹

Download full size

View all figures

An arbitrary complex matrix of $N$ dimensions has $2 N^{2}$ degrees of freedom, which theoretically requires $2 N^{2}$ phase shifters for full representation. By contrast, real-valued matrices, which are more commonly used in deep learning, have $N^{2}$ degrees of freedom, theoretically requiring only half the number of phase shifters. Directly applying the above implementation method would result in redundant phase shifters. Therefore, researchers explored various optical implementations for real-valued matrices to reduce energy consumption and minimize the chip’s footprint. Wu et al.35 used an incoherent light source as input signals and designed a simplified MZI mesh to perform real-valued matrix-vector multiplication, as shown in Fig. 31(b). This approach only requires detecting the power of the output light rather than its phase, thereby reducing measurement complexity. To represent negative weights, they designed $N + 1$ channels for $N \times N$ matrix-vector multiplication, where the extra port provides a reference signal that is subtracted from the optical power signals of the other ports to yield real-valued results. Tian et al.391 demonstrated that any real-valued matrix could be associated with a unitary matrix, where the real part matches the original matrix. Based on this principle, there is no need to use two MZI meshes to realize the corresponding unitary matrices $(U, V)$ . Instead, only one MZI mesh is needed to realize the unitary matrix, as shown in Fig. 31(c), significantly reducing the number of MZI units and chip size. During measurement, they detected the power and phase of the output signal to compute the real part of the output results.

3.1.2 Calibration method in coherent photonic circuits

Due to fabrication errors, the phase delays for internal and external phase shifters have random offsets. Therefore, all MZI units need to be calibrated before programming them for the given matrix. A common strategy involves sequentially calibrating the MZI units one by one, with monitors placed at the output port of each MZI unit. As the internal phase shifter $θ$ is related to the intensity at the MZI output port, it can be configured by recording the output intensity and the corresponding bias voltage applied to the phase shifter. The voltage offset can then be obtained by fitting the theoretical model to the measured data. However, variations in the external phase shifter $ϕ$ do not result in corresponding changes in light intensity, making it challenging to configure. To address this, Prabhu et al.392 constructed a meta-MZI structure that includes four adjacent MZIs. In this meta-MZI, the external phase shifter behaves similarly to an internal phase shifter and can thus be configured in a similar manner.

Based on the above calibration procedures, several efficient self-configuring algorithms have been developed, enabling MZI meshes to quickly adapt to environmental temperature fluctuations. The calibration process is usually progressive, meaning it is completed once each element is sequentially adjusted, requiring only local feedback for minimization on one variable at a time. As shown in Fig. 32(a), Miller393^,394 proposed a progressive calibration method based on a device that can take an arbitrary monochromatic input beam and automatically couple it into a single-mode guide or beam. This method can be applied to calibrate rectangular or triangular meshes but requires inline photodetectors, adding extra hardware complexity. Hamerly et al.395 analyzed algorithmic stability in self-configuration for large-scale meshes and proposed the direct method and ratio method to calibrate triangular meshes using only external detectors, as shown in Fig. 32(b). These methods cannot be directly applied to rectangular meshes but require modifications to incorporate drop ports and detectors along the diagonal. Similar reversed local light interference methods shown in Figs. 32(c) and 32(d) were proposed, which generalize to any feedforward mesh.396 By exploiting a graph-topological approach, nodes that can be parallel calibrated are identified, enhancing the efficiency of mesh calibration.397

Figure 32.Self-configuring strategies in optical systems. (a) A self-aligning universal beam coupler.³⁹³^,³⁹⁴ (b) Application of the ratio method for calibrating triangular meshes.³⁹⁵ (c), (d) Use of the reversed local light interference method to calibrate universal feedforward meshes.³⁹⁶^,³⁹⁷

Download full size

View all figures

Although the random phase offset caused by fabrication errors can be mitigated with the aforementioned calibration methods, other factors such as insertion loss, photodetection noise, and coupling coefficient drift can lead to inaccuracies in calculations. To address these challenges and further enhance the precision of optical matrix computation, various training algorithms have been designed to fine-tune the MZI mesh. Although calibration methods focus on correcting fabrication errors by fine-tuning individual MZI elements, training algorithms provide additional optimization strategies to compensate for other sources of inaccuracies. Below, we discuss some of the widely used training algorithms for enhancing optical neural network performance.

One category of these algorithms can be referred to as derivative-free algorithms. Shao et al.398 theoretically simulated the imperfect MZI mesh by modeling four types of noise and then used a GA to improve computational precision, as shown in Fig. 33(a). GA is widely used in optimization strategies that include fitness evaluation, selection, mutation, and crossover. In the selection stage, the roulette method is adopted, meaning that individuals with higher fitness are more likely to be chosen, and an individual can be selected repeatedly in one evolution stage. By continuously generating new individuals, the optimum individual, defined as the solution with the highest fitness score that best compensates for all modeled inaccuracies, can ensure that the average accuracy is close to the ideal one by compensating for all imprecisions. Zhang et al.399 experimentally realized on-chip training using GA and verified its effectiveness on various tasks, including the crossbar switch and the Iris classification, as shown in Fig. 33(b). Another training algorithm called bacterial foraging optimization (BFO) was implemented to configure the photonic circuit,400 as shown in Fig. 33(c). BFO is a natural intelligence-inspired algorithm emulating bacterial behavior in foraging nutrition. During the training process, the state vector moves along a randomly selected direction in high-dimensional space providing a possibility to escape from the local minimum.

Figure 33.Some gradient-free calibration methods. (a) Execution process of GA.³⁹⁸ (b) Whole pipeline for MZI mesh calibration using GA.³⁹⁹ (c) Bacterial foraging training algorithm is implemented on MZI mesh.⁴⁰⁰

Download full size

View all figures

Another category for training photonic circuits is gradient-based algorithms. These algorithms need to measure the deviation between real (photonic circuit) and ideal output to define a loss function. The gradient for each reconfigurable component is then estimated, and the voltage applied to the component is then updated based on the gradient. Gradient-based algorithms can be divided into two types: FP and backward propagation (BP), with the primary difference lying in the method used to compute the gradient. In FP, the gradient is typically obtained by measuring the change in the system’s output given a small phase change in each phase shifter. Specifically, two FP steps are performed, with perturbations of $+ δ φ$ and $- δ φ$ applied to the phase shifter. The loss functions for these two cases, $L (φ + δ φ)$ and $L (φ - δ φ)$ , are computed. The phase gradient is then estimated using the difference between these loss functions: $[L (φ + δ φ) - L (φ - δ φ)] / 2 δ φ$ . On the other hand, BP relies on the analytical expression of the FP process. It computes the phase gradient through the chain rule of differentiation. The FP algorithm is a more intuitive approach for on-chip training. In the original FP algorithm, each iteration involves traversing all phase shifters and performing two FP steps for each, as outlined above. Shen et al.32 simulated the FP algorithm and demonstrated its effectiveness. Wu et al.401 experimentally realized on-chip training through the FP algorithm and found that the convergence speed can be greatly improved using the Adam optimization method. However, the photonic mesh with $N$ parameters needs to perform $2 N$ calculations for each iteration, which is inefficient and time-consuming. Bandyopadhyay et al.402 proposed a stochastic optimization approach. In this approach, all parameters are first perturbed at once by a random vector $Π$ and then perturbed by $- Π$ . The corresponding losses for the two states are $L (Θ + Π)$ and $L (Θ - Π)$ , respectively. The gradient of the parameters $Θ$ can be calculated as $[L (Θ + Π) - L (Θ - Π)] / 2 ‖ Π ‖ Π$ . Therefore, each iteration requires only two calculations. They proved that, through a large number of iterations, the average gradient for the stochastic optimization approach is the same as the gradient for the original FP algorithm. Compared with the FP algorithm, the BP algorithm is a more efficient method for calculating gradients and is widely used in deep learning. However, BP is not intuitively applicable to photonic circuits. Therefore, one straightforward approach is to construct a physical model of the photonic circuit and training in silico using BP algorithms. The trained parameters then are deployed to the photonic circuit. However, due to discrepancies between the theoretical model and the real photonic circuit, significant accuracy loss can occur. An alternative method is to use optoelectronic hybrid training, where optics handle the forward computation, and electronics manage the BP. One of the representative works for optoelectronic hybrid training demonstrated a dual adaptive training approach.403 To relieve the effect of inherent systematic errors, a systematic error prediction network (SEPN) is introduced based on the ideal propagation model. The calibration process consists of four main steps: (1) each training sample is optically encoded and input into the PNN physical system for forward inference; (2) the same sample is digitally encoded and input into a numerical model, which includes both the ideal physical model and SEPNs; (3) the output difference between the physical system and numerical model is reduced by optimizing SEPN parameters; (4) the physical parameters of the numerical model are optimized and deployed to the physical system by minimizing the specific task loss. The final training method is all-optical in situ training, which is highly challenging as both the forward and backward computation processes occur entirely in the optical domain. In 2018, inspired by the adjoint method, Hughes et al.404 demonstrated a method for performing BP based on photonic circuits, where the gradient measurement procedure is shown in Fig. 34(a). They proved that the gradient of the permittivity $ε$ for each phase shifter can be regarded as the overlap of the original and adjoint fields over the phase-shifter positions. Through two FPs and one BP, one can obtain the gradient for every phase shifter by detecting the optical power behind it. Furthermore, in 2023, they experimentally realized the training of MZI mesh-based optical neural networks using the same principle,405 as shown in Fig. 34(b). They applied a grating tap monitor and infrared camera to capture the light intensity and achieved the same performance as electrical training. Although the time overhead of this training method is low, the considered physical model is ideal, not accounting for insertion loss, fabrication errors, or backscattering. In addition to photonic circuits, many works have also applied these training methods to free-space optical computing systems, which we will discuss in more detail in Sec. 3.2.2.

Figure 34.In situ training method is proposed to realize the BP algorithm in photonic circuits. (a) Procedure of in situ training⁴⁰⁴ and (b) experimental verification for in situ training.⁴⁰⁵

Download full size

View all figures

3.1.3 Matrix computation based on incoherent photonic circuits

Real-valued matrix-vector multiplication in the optical domain is an attractive topic. MZI-based meshes are typically coherent networks that execute matrix computations in the complex-valued domain. In this computing paradigm, real-valued matrix-vector multiplication can be achieved through SVD. This requires the implementation of two unitary matrices and one diagonal matrix in the photonic circuit. For an $N \times N$ real-valued matrix, the degrees of freedom are $N^{2}$ , but this approach necessitates $2 N^{2}$ phase shifters, leading to significant redundancy.

Several approaches have been proposed to reduce the complexity of coherent MZI-based mesh structures, but the relationship between phase shifters and matrix elements is not intuitive. Noncoherent architectures, which only modulate amplitude, are more suitable for implementing real-valued matrix computations. Unlike coherent architecture, noncoherent architecture has a one-to-one correspondence between matrix elements and physical devices.

One type of noncoherent architecture is the crossbar array, inspired by memristive crossbar arrays used in analog in-memory computing. In this architecture, the vector is encoded in the optical amplitude and fed into different input channels, each with a different wavelength to ensure incoherence across channels. The output channels receive the incoherent superposition of optical fields from different channels, with corresponding amplitude modulation to complete the $N$ -dimensional matrix-vector multiplication.

The crossbar array architecture can be divided into three parts: ingress, middle, and egress.387 In the ingress part, the optical input signal in each row is evenly split into each column. Directional couplers or MZIs can be implemented in this stage, with the split ratio for the $i$ th column given by $1 / (N + 1 - i)$ .406^,407 In the middle part, amplitude modulation is implemented, corresponding to the elements of the target matrix. Reconfigurable MZI units can be used to achieve arbitrary amplitude modulation. Alternatively, phase-change materials (PCMs) can tune the optical amplitude by controllably switching fractions of amorphous and crystalline parts in the PCM cell. In the egress part of a photonic computing structure, all signals within one column are consolidated through 2-by-1 multimode interference devices or directional couplers, with the light intensity at each output being measured to derive the final computational results.

In addition, the microring array represents another noncoherent optical matrix computing architecture. Owing to its compact nature, this structure notably minimizes the footprint of photonic circuits. The operational principle of a microring array-based matrix-vector multiplication is depicted in Fig. 35(a). Initially, the optical signal with different wavelengths in each input channel is independently modulated, which encodes the values of the vector elements. These modulated signals are then merged and channeled through a bus waveguide, forming a wavelength division multiplexing (WDM) signal. This WDM signal is subsequently split equally into $N$ channels and processed by the microring array module. Each microring, acting as a reconfigurable filter, independently weights the signal corresponding to its wavelength. In each channel, signals of different wavelengths are weighed in parallel and summed through power detection, effectively mimicking the process of matrix-vector multiplication. In 2013, Yang et al.408 experimentally demonstrated a $4 \times 4$ matrix-vector multiplication using a microring array. Notably, this setup confined the matrix multiplication computation to the nonnegative number domain as it solely utilized the through port to detect the power of each channel. Subsequent research extended this matrix operation to the real number domain by simultaneously detecting the output power in both the through port and drop port and executing differentiation between these measurements, as shown in Fig. 35(b). The microring array recently was applied to the photonic spiking neural network, which facilitates massively parallel interconnection between photonic spiking neurons.411^–413 For example, Feldmann et al.51 constructed a spiking neuro circuit including a ring resonator with an integrated PCM cell. The PCM cell in the circuit can switch between states based on the incoming pulses, thereby changing the optical resonance of a ring resonator, mimicking the behavior of synaptic connections in biological systems.51

Figure 35.Incoherent optical computing circuit architectures. (a) A $4 \times 4$ nonnegative matrix is realized using a microring array.⁴⁰⁸ (b) In the microring array, output power in both the through port and drop port is detected to realize real-valued matrix computation.⁴⁰⁹ (c) A recursive structure named SDDLN is used to realize matrix-vector multiplication.⁴¹⁰

Download full size

View all figures

Huang et al.410 proposed a novel noncoherent architecture to perform vector-matrix multiplication for nonnegative and real-valued matrices, as shown in Fig. 35(c). The core element of this network is the $2 \times 2$ SDDLN (similar to dilated double-layer network), which comprises four MZIs. Each MZI in the unit is characterized by its beam-splitting ratios: $a_{1}$ , $b_{1}$ , $c_{1}$ , and $d_{1}$ . Owing to its noncoherent nature, the external phase shifter typically found in MZI units is eliminated, leaving only a single internal phase shifter. This modification not only reduces the chip size but also simplifies the calibration process, enhancing the practicality of the photonic tensor core’s deployment. To achieve a large-scale $N \times N$ SDDLN, a recursive decomposition approach is employed. Specifically, an $n \times n$ SDDLN in the $i$ th recursion is decomposed into four parallel $(n / 2) \times (n / 2)$ SDDLN modules. These subnetworks operate independently, with no interconnections between them, allowing for scalable expansion and straightforward implementation in PICs.

3.1.4 Advances in optical computing based on PICs

Photonic circuits hold great potential for enabling optical computing, but certain limitations must be addressed to fully realize their capabilities. First, due to the confinement of spatial dimension, input signals are encoded in a 1D spatial configuration, which limits the ability of spatial multiplexing to achieve high-throughput data input. Second, for an $N \times N$ matrix, the number of photonic circuit components required increases quadratically with $N$ , leading to significantly higher loss and noise in the computing network. This restricts the ability to perform large-scale matrix computations. Lightmatter Inc. claims to have developed the largest ( $64 \times 64$ ) matrix computing chip based on MZI mesh architecture to date.414 However, to the best of our knowledge, no comprehensive publications have detailed the chip’s architecture or experimental measurement results conducted on it.

To address these challenges, recent efforts have focused on mitigating these limitations through advancements in three key areas: improved error tolerance, higher-density integration, and higher-dimensional data processing. In the following sections, we will review the latest research progress in these domains, exploring how these strategies are advancing the field of photonic circuit-based optical computing.

Improved error tolerance

In optical computing, fabrication errors can lead to deviations from ideal computational outcomes, making it crucial for the design of photonic structures to consider their susceptibility to such errors. Several studies have proposed network structures that are more robust than the predominant rectangular MZI mesh. One such study415 compared two MZI-based meshes, GridNet and FFTNet, in their ability to classify handwritten digits under varying levels of component errors. In this work, GridNet refers to the traditional rectangular MZI mesh, whereas FFTNet, proposed by Ref. 416, was designed to mimic the recursive fast Fourier transform (FFT) process, based on the Cooley–Tukey algorithm. Compared with GridNet, the FFTNet is a nonuniversal network that cannot represent an arbitrary unitary matrix. In addition, the number of MZIs is reduced from $N$ to $\log_{2} N$ . This work showed that GridNet is more accurate under ideal conditions, whereas FFTNet [Fig. 36(a)] exhibits greater fault tolerance in the presence of errors. One reason for this is the reduced number of MZIs in FFTNet, which results in less noise compared with GridNet. Further exploration into the impact of fabrication errors on MZI meshes is presented in Ref. 422. This work highlights how errors can cause significant precision loss when implementing random unitary matrices. The analysis reveals that most MZIs approach a near crossbar state; however, the inherent imperfection in the 50:50 split ratio of the 3 dB couplers restricts their ability to cover the full 0 to 1 range of splitting ratios. To enhance the representational capabilities of these devices, Pai et al.417 suggested two improvements to the MZI mesh structure. First, the addition of redundant tunable MZIs introduces extra degrees of freedom to help overcome the limitations imposed by fabrication errors on the splitting ratios. Second, incorporating a crossing structure within the MZI mesh helps in shuffling the outputs, thus preventing the mesh from defaulting to a near crossbar state during the implementation of random unitary matrices. The optimized MZI mesh structure can be seen in Fig. 36(b).

Figure 36.Some advances for recent optical computing circuits. The first column [(a), (b)] shows fault-tolerance computing architecture: (a) stacked FFT,⁴¹⁵ (b) redundant rectangular mesh and permuting rectangular mesh.⁴¹⁷ The second column [(c), (d)] shows some miniaturization strategies for computing devices: (c) 3D arrangement of MZI mesh for matrix computation,⁴¹⁸ (d) PBWs are instead of MZI as programmable units to minimize the footprint.⁴¹⁹ The third column [(e)–(g)] demonstrates that the computing parallelism can be enlarged via WDM,⁴²⁰ FDM,⁴⁰⁷ and MDM⁴²¹ technologies.

Download full size

View all figures

Higher-density integration

The footprint of photonic circuits is a critical factor in realizing large-scale matrix multiplication operations. For instance, implementing a $256 \times 256$ matrix multiplication typically requires the use of an entire 4-inch wafer with the existing MZI mesh structure. To address this challenge, several innovations aimed at improving computing density have been introduced. As shown in Fig. 36(d), the authors presented an alternative approach to MZI-based programmable photonic circuits by utilizing slow-light-enhanced periodic bimodal waveguides (PBWs) as programmable units.419 This approach demonstrates low-loss, short-tuning elements that offer significant improvements in integration density over conventional MZIs. In addition, Liu et al.418 proposed a 3D-unfolding method, illustrated in Fig. 36(c), which uses the third dimension to dramatically reduce footprints. Bell and Walmsley423 explored the use of symmetric MZI units, which lack external phase shifters and thus lead to more compact photonic circuits. This adaptation allows for the implementation of matrix multiplication under the Reck or Clements scheme, showcasing the potential of symmetric MZI configurations in reducing the physical space required for photonic computations.

Higher-dimensional data processing

The traditional MZI mesh can execute parallel matrix-vector multiplication operations, which is foundational for neural network computations. However, higher dimensional data processing is required in deep learning. For instance, in CNNs, 1D or 2D convolutions are typically transformed into matrix–matrix multiplications through the image-to-column algorithm to enhance computational efficiency. In addition, the training stage of neural networks necessitates the processing of high-dimensional data, where multiple independent training samples, selected randomly in each iteration, are reorganized into a high-dimensional tensor. This tensor then serves as input for neural network computation, and the corresponding losses are computed. Photonic computing, with its ability to modulate numerous dimensions of light, holds the potential for handling these high-dimensional data processing tasks in parallel, thereby boosting computational power and efficiency. The wavelength dimension, in particular, is frequently leveraged to enhance this parallelism. At the input stage, data are encoded into different wavelengths, which are then combined and fed into the optical computing network via multiplexing. At the output stage, these wavelengths are separated, allowing for the independent detection of signals at each wavelength through demultiplexing. Wang et al.420 utilized WDM to process high-dimensional data on an MZI-based architecture, as shown in Fig. 36(e), employing a soliton microcomb source to encode signals at various wavelengths. To demonstrate the system’s effectiveness, the MNIST data sets were simultaneously applied to two wavelengths, with independent output signal detection for each, showing nearly identical accuracies. Similarly, Xu et al.55 demonstrated a fiber-based WDM structure for matrix–matrix multiplication. By carefully leveraging time-wavelength interleaving, the network reaches an extraordinary 11 TPOS computing power, capable of performing imaging processing as well as handwriting digit recognition tasks.55 In addition, WDM can be applied in the incoherent crossbar architecture, and the data can be encoded into different wavelengths at each input channel, enabling matrix–matrix multiplication.406 Despite the enhanced parallelism afforded using multiple wavelengths, the number of wavelengths generated by a multiwavelength source is inherently limited by the decrease in power at wavelengths distant from the carrier wavelength, prompting researchers to explore additional degrees of freedom in light.

Mode division multiplexing (MDM) emerges as a promising technology to increase channel capacity. Yin et al.421 combined WDM and MDM technologies using a microring resonator array to substantially increase computing scalability, as shown in Fig. 36(g). This setup, incorporating three modes and four wavelengths, separates light by mode order through a three-mode demultiplexer, tunes the amplitude of light individually through a microring resonator array, and then combines them to perform multiply-accumulate (MAC) operations. Moreover, Dong et al.407 explored the application of frequency division multiplexing (FDM) for high-dimensional data processing, as shown in Fig. 36(f). This approach involves a photonic crossbar array where matrix–matrix multiplication is achieved by multiplexing vectors carried by radio-frequency components and then feeding them to a photonic tensor core. At the measurement stage, the output time data are transformed to the frequency domain, enabling the demodulation of information from the frequency spectrum.

In summary, Sec. 3.1 reviews the potential and advancements of photonic circuits for optical computing, particularly in matrix operations central to AI tasks. We covered how MZI meshes realize universal complex and real-valued matrices, how calibration techniques enhance accuracy, and how incoherent architectures and microring arrays provide intuitive and scalable alternatives. With CMOS compatibility, compact form factors, and ultrafast operation, photonic circuits pave the way for large-scale, complex optical processors. Looking ahead, improving error tolerance, increasing integration density, and leveraging multidimensional and multimodal data processing will further elevate performance and applicability. As AI and signal processing demands intensify, photonic circuits are poised to deliver high-efficiency, high-performance computations in increasingly large-scale and high-dimensional domains, opening new frontiers for photonic-electronic hybrid architectures.

3.2 Optical Computing Based on Diffraction of Light

Due to its high integration, reconfigurability, and controllability, optical computing based on photonic circuits presents a promising approach to achieving high-performance and energy-efficient matrix computations. However, it has yet to be scaled up for large matrix computations effectively. The challenge arises as the scale of the matrix increases, causing the number of modulators required to grow quadratically. This expansion leads to reduced integration density and increased chip size, complicating packaging efforts. In addition, the overall loss within the computing network escalates, diminishing the signal-to-noise ratio.

The diffraction principle provides an alternative approach to optical computing. Unlike photonic circuits, which confine optical signals within waveguides, diffraction enables light to propagate freely in space, allowing signals to be densely encoded on a 2D input plane. Based on the spatial bandwidth limit of the angular spectrum, the minimum spatial resolution can reach the scale of the wavelength. Currently, available commercial spatial light modulators (SLMs) offer $\sim 10^{6}$ to $10^{7} pixels$ for encoding optical signals,424 enabling a remarkably high data throughput.

Early research on free-space optical computing primarily focused on the implementation of linear operators. For instance, Von Bieren et al.425 demonstrated that a single lens could perform Fourier transforms. Building on this property, researchers employed frequency-domain filtering techniques to design various filters that enabled convolution operations. It was further discovered that more general linear operations, such as matrix-vector multiplications,426 and matrix-matrix multiplications,427 could be realized within lens systems. On the other hand, with the advent of deep learning, researchers began mimicking the computation process of fully connected neural networks and harnessed the diffraction properties of light to develop diffractive DNNs. In recent years, extensive studies have analyzed the operational mechanisms of $D^{2} NNs$ , and their applications have been explored in various domains.

Compared with photonic circuits, free-space optical computing also presents several challenges. First, achieving reconfigurability is more difficult in free-space systems. Photonic circuits can easily adjust configurations, but diffractive layers in $D^{2} NNs$ are typically passive. Although devices such as SLMs and digital micromirror devices (DMDs) provide some tunability, their modulation frequencies are generally restricted to $\sim kHz$ ,428 falling short of the high-speed demands of modern computing systems. Some recent studies have begun to address this limitation.429 Second, free-space optical computing systems are more vulnerable to system errors. One of the most critical issues is the requirement for pixel-level alignment precision in the diffractive layers of $D^{2} NNs$ . Even a slight misalignment of a single pixel can result in substantial deviations in computational outcomes. To mitigate this, training methods have been developed to compensate for errors introduced by system imperfections. These training strategies are discussed in detail in Sec. 3.2.2. Last, the reliance on diffractive elements for manipulating light inherently makes free-space optical computation bulky, posing challenges for integration into compact systems. Addressing the bulkiness of these components is essential for the practical deployment of free-space optical computing technologies. Ongoing efforts to miniaturize diffractive elements and integrate them into more compact architectures are explored in Sec. 3.2.3.

In this section, we focus on the research achievements in free-space optical computing. The discussion is divided into two main parts. In the first part, we explore the implementation of linear operators. We begin by introducing various methods for realizing convolution operations, along with brief discussions on specific operations such as differentiation and integration, which can be considered special cases of convolution. Subsequently, we explore works related to matrix computation. The earliest optical implementation of matrix computation can be traced back to 1978, when Goodman et al.426 used an incoherent optical method to perform discrete Fourier transforms. In recent years, novel optical matrix computation systems have continued to be designed and applied in various machine-vision applications. The second part of this section highlights research on $D^{2} NN$ . We start by introducing the fundamental principles of $D^{2} NN$ and comparing the data-processing methodologies with those of traditional ANNs. Subsequently, we review recent improvements and extensions to $D^{2} NN$ technology, which have broadened its applicability and enhanced its performance. Finally, AI-related applications of $D^{2} NN$ are reviewed.

3.2.1 Linear operator realization in free space

Convolution is an important linear operator and is widely used in various fields. Optics in free space is convenient to realize convolution with the help of special properties. The propagation behavior of monochromatic light in free space adheres to the Rayleigh–Sommerfeld diffraction formulation. The formulation describes the relationship between the field distributions at the source plane $E_{in} (ξ, η)$ and the destination plane $E_{out} (x, y)$ , separated by a propagation distance $d$ . This relationship can be represented as a convolution operation: $E_{out} (x, y) = \iint h (x - ξ, y - η) E_{in} (ξ, η) d ξ d η$ with the impulse response function $h = \frac{d}{2 π r} (\frac{1}{r} - j k) \frac{\exp (j k r)}{r}$ ; here, $k$ is the wavenumber, and $r = \sqrt{(x - ξ)^{2} + (y - η)^{2} + d^{2}}$ . This indicates that convolution is naturally achieved through light propagating in free space. However, the convolution kernel $h$ is a fixed form and cannot be designed to a desired form to fulfill specific computing tasks.

To address this, extensive research has been directed toward designing specific convolution kernel functions. The $4 f$ -system is a prevalent physical setup used to implement these convolution operations.50 It employs two lenses to perform Fourier transforms on the optical field. The first lens converts the field from the spatial domain to the spectral domain, whereas the second transforms it back to the spatial domain. Within the spectral domain, an optical element is introduced to facilitate element-wise multiplication with both amplitude and phase modulation. Due to the properties of Fourier transforms, this element-wise multiplication in the spectral domain equates to a convolution operation in the spatial domain. Figure 37 shows some recent works about optical $4 f$ systems with different configurations.

Figure 37.All-optical convolution using a $4 f$ -system under various configurations: coherent light sources in panels (a)⁴³⁰ and (b)⁴³¹ and incoherent light sources in panels (c)⁴³² and (d).⁴³³ Panels (a) and (c) utilize amplitude-only masks, whereas panels (b) and (d) employ phase-only masks.

Download full size

View all figures

The single-lens image system presents an alternative optical setup for realizing programmable convolution operations, offering a more space-efficient solution compared to the traditional $4 f$ system, where the convolution kernel can be approximated as inverse Fourier transform of the lens’s pupil function.434 Thus, the convolution kernel can be tailored by placing programmable optical elements immediately adjacent to the lens.

The potential for realizing optical convolution without lenses presents a significant opportunity to reduce the physical dimensions of the devices involved. There are two notable methods for achieving this. The first one is based on spectrum domain realization. This approach mirrors the mathematical principle of the $4 f$ system but simplifies the process by directly filtering the optical spectrum, eliminating the need for dual Fourier transformations of the image. Resonance-based structures that can be engineered to perform multiple types of filtering are a common choice to perform specific convolution operations, such as differentiation435^–440 and integration,436^,441^,442 as shown in Fig. 38. These structures, usually fabricated by nanostructures such as multilayer slab and gratings, are often smaller than the working wavelength, thus offering a highly space-efficient solution. The second one is based on spatial domain realization. Shi et al.443^,444 developed a method using a passive mask to perform optical convolution based on geometrical optics. By disregarding the diffraction effect, the modulation by the mask for light from any position remains consistent, which aligns with the shift-invariance characteristic of convolution. Owing to its reliance on geometric optics, this straightforward optical device is well suited for application in the incoherent and broadband light signals prevalent in natural scenes.

Figure 38.All-optical differentiator (a)–(c) and integrator (d)–(f) based on compact resonance structures. The phase-shifted Bragg grating can be designed to realize optical (a) differentiation⁴³⁵ and (d) integration.⁴⁴² (b), (e) Ruan et al. theoretically demonstrated differentiation and integration can be reconfigured in the same device by controlling the propagating loss of surface plasmon polariton.⁴³⁶ (c) Experimental realization of optical differentiation on surface plasmonic structure.⁴³⁷ (f) Integration is presented using a dielectric slab.⁴⁴¹

Download full size

View all figures

In addition to convolution, researchers have explored matrix-related operations, which are more general for many computational tasks. Over the past decade, with the rise of DNNs, matrix-related operations have gained significant importance. In the following, we discuss some representative works on the implementation of matrix-vector multipliers in free-space optical systems.

One of the earliest implementations of matrix-vector multiplication in optics dates back to the 1980s when Goodman et al.426 proposed a lens-based system to perform discrete Fourier transforms. The simplified system shown in Fig. 39(a) follows three main stages. (1) Fan-out: the first lens spreads the light, which is encoded with input vector vertically to fill an entire column of the matrix mask, effectively replicating the vector along the column dimension; (2) element-wise multiplication: the matrix mask consists of $N \times N pixels$ , where each pixel’s transparency is proportional to the corresponding matrix element, enabling multiplication between the vector and the matrix elements; (3) fan-in: the second lens focuses the light horizontally, summing the values along the row dimension to complete the matrix-vector multiplication.

Figure 39.Free-space optical matrix-vector multiplier. (a) Schematic diagram for matrix-vector multiplication proposed by Goodman.⁴²⁶ (b) Convolution realization through two metasurfaces.⁴⁴⁵ (c) Coherent system for realizing matrix computation.⁴⁴⁶ (d) Matrix-vector multiplier applied to imaging sensing for optical encoding.³⁸² (e) Experimental verification of dot product operation close to the shot-noise limit of detected photons.⁵⁶ (f) CMOS-compatible matrix processor supporting large input vector size.⁴⁴⁷ (g) Spatial-temporal multiplexed matrix computing system, where matrix elements and input vector are encoded via VCSEL arrays, exhibiting efficient electro-optic conversion and compact footprint.⁴⁴⁸

Download full size

View all figures

Recently, some improvements have been made to the aforementioned system. One such advancement, introduced by Spall et al.,446 utilizes a DMD and SLM to encode input vectors and matrix, respectively, enabling full reconfiguration, as shown in Fig. 39(c). In addition, unlike earlier designs that used LED arrays, this system employs a coherent light source, overcoming the limitations of nonnegative elements and significantly expanding the computational range. With an SLM resolution of $340 pixels \times 340 pixels$ , this system can handle vector sizes up to 56. In contrast to SLMs, metasurfaces offer a more compact and efficient approach to manipulating light. For instance, Zheng et al.445 proposed a convolution computation system using two metasurfaces, as depicted in Fig. 39(b). This system, in principle, also supports matrix-vector multiplication. The first metasurface acts as a multichannel microlens array, performing the fan-out operation by replicating the input object into multiple copies. The second metasurface is designed for the element-wise multiplication step based on the Pancharatnam–Berry phase principle.

Although these earlier works demonstrated the potential for optical matrix-vector operations, the size of the matrices in these systems remains limited to small-scale applications (tens of elements). However, modern neural networks often require handling much larger matrices. To enable matrix-vector multipliers to be applicable to mainstream neural network models, several efforts have been made to scale up the matrix sizes. A notable example is Bernstein’s work,447 where a system using three SLMs is proposed, as shown in Fig 39(f). The first SLM encodes the input signal. The second SLM, placed in the Fourier plane of a $4 f$ optical system, generates a phase pattern that transforms the point spread function into a spot array at the image plane. According to Fourier optics principles, this spot array convolves with the input image, producing replicated copies. The third SLM encodes the matrix weights, and the replicated images undergo element-wise pixel-by-pixel multiplication with the corresponding weights. A photodetector then converts the optical signals into electrical currents, which are summed in the electronic domain to complete the fan-in process. The size of the matrix that can be handled in this system is directly determined by the number of pixels in each SLM. Specifically, the pixel count in the third SLM limits the maximum size of the weight matrix, whereas the pixel count in the second SLM determines the number of vector copies. In this setup, matrix sizes up to $1000 \times 1000$ can be handled.

Matrix computation in the optical domain has been shown to achieve better energy efficiency than traditional electronic neural networks when the computational scale is sufficiently large. Some studies suggested that for large-scale matrix-vector multiplications, the optical energy cost can be reduced to less than 1 photon ( $\sim 10^{- 19} J$ ) per scalar multiplication, offering several orders of magnitude of energy advantage compared with digital electronic processors. However, theoretical analyses pointed out that due to the inherent shot noise in photodetectors, the energy consumption per MAC operation has a lower bound.449 To evaluate the precision of computations near this lower bound of energy consumption, Wang et al.56 experimentally verified the accuracy of dot products in the subphoton-per-multiplication regime, as shown in Fig. 39(e). By controlling the detector’s integration time, they demonstrated that using 3.1 photons per multiplication could achieve a 99% classification accuracy on the MNIST data set, whereas 0.66 photons per multiplication still yielded 90% accuracy. Apart from reducing energy consumption per MAC operation, enhancing the electro-optic conversion efficiency is a way to improve energy efficiency. Vertical-cavity surface-emitting lasers (VCSELs), which can achieve very low threshold currents and high power conversion efficiency larger than 50%, offer a competitive light source for optical computing.450 Leveraging this feature, Chen et al. designed a matrix computing system based on VCSEL arrays,448^,449 as shown in Fig. 39(g). Each VCSEL emits $100 μ W$ of light with a wall-plug efficiency of 25%, exhibiting efficient electro-optic conversion ( $< 5 aJ$ per symbol, with a $π$ -phase-shift voltage of $V_{π} = 4 mV$ ). This system uses spatial-temporal multiplexing, a significant departure from the methods we mentioned above. First, matrix weight and input vector elements are both encoded to VCSEL arrays. Through homodyne detection, the element-wise multiplication of matrix and vector is achieved. Finally, a time-integrating charge amplifier is used to accumulate homodyne photon currents.

Although most matrix-vector multipliers have demonstrated their advantages in vision tasks where digitalized images are processed, real-world image perception involves direct optical domain image sensing. Therefore, performing matrix computation directly in the optical domain is a more efficient approach, as it eliminates the need for electro-optic conversion. Aware of this, Wang et al.382 demonstrated a multilayer optical neural network that includes both fully connected layers, shown in Fig. 39(d), and optical nonlinear layers. This system showed significant capabilities across a range of applications, such as machine-vision benchmarks, flow-cytometry image classification, and object identification in a 3D printed real scene.

3.2.2 Diffractive optical neural network construction and its training strategy

In the field of free-space optical computing, in addition to research focused on the implementation of specific operators, there is also a large amount of work exploring how to simulate neural network inference in the optical domain. In 2018, the Ozcan group proposed a $D^{2} NN$ structure, which uses diffractive properties of light to perform neural network inference.13 The $D^{2} NN$ consists of several diffractive layers that modulate the spatial characteristics of incoming light. Each diffractive layer is an artificial surface, where each point corresponds to a complex-valued transmission or reflection coefficient, which is used to control the amplitude and phase of the optical field at each point. In the following, we discuss how the optical field evolves as it passes through a diffractive layer. In the first step, the optical field at each point on the $l$ th layer undergoes independent modulation of phase and amplitude. This alteration of the light wave’s characteristics as it passes through the diffractive elements of the layer can be mathematically expressed as $Y^{(l)} = X^{(l)} \cdot B^{(l)} .$ (9)

Here, $X^{(l)}$ ( $Y^{(l)}$ ) represents the optical field distribution before (after) the $l$ th layer as a column vector, and $B^{l}$ denotes the vector modifying the phase and/or amplitude at this layer. The operation $\cdot$ indicates a Hadamard product, applying element-wise multiplication.

In the second step, following the modulation, the altered light field $Y^{l}$ propagates to the ( $l + 1$ th) layer. According to the Huygens–Fresnel principle, each point on the diffractive layer acts as a secondary source, emitting spherical wavelets that travel to the next layer and interfere to form the new optical field. This propagation and interference process is described by the following equation: $X^{(l + 1)} = {V^{'}}^{(l)} Y^{(l)},$ (10)where the matrix ${V^{'}}^{(l)} = (v_{1}^{(l)}, \dots, v_{i}^{(l)}, \dots, v_{n}^{(l)})$ describes the optical field evolution in free space, and the $i$ th column $v_{i}^{(l)}$ represents the propagation mode for the spherical wave emitted by the $i$ th point located at $(x_{i}, y_{i})$ determined by Rayleigh–Sommerfeld equation.434 Combining the modulation and propagation equations yields $X^{(l + 1)} = {V^{'}}^{(l)} (X^{(l)} \cdot B^{(l)}) .$ (11)

It is worth noting that Eq. (11) can also be rearranged as follows: $X^{(l + 1)} = ({V^{'}}^{(l)} \cdot B^{' (l)}) X^{(l)} = V^{(l)} X^{(l)} .$ (12)

The above matrix ${B^{'}}^{(l)}$ is formed by transposing the column vector $B^{(l)}$ into a row vector and then stacking it repeatedly along the row dimension. From Eq. (12), we can consider that the optical field at $l$ th and ( $l + 1$ th) layers is related through the matrix $V^{(l)}$ .

The operation of a $D^{2} NN$ is similar to that of a traditional electronic ANN, which comprises multiple fully connected layers. For the $l$ th layer in an ANN, the input data $X^{(l)}$ is first linearly transformed by a matrix, then a bias term is added, and finally, a nonlinear activation function $σ (\cdot)$ is employed. The output data $X^{(l + 1)}$ can be expressed as $X^{(l + 1)} = σ (W^{(l)} X^{(l)} + B^{(l)}) .$ (13)

Equations (12) and (13) exhibit similar data operations. However, there are two key differences that make ANNs more expressive than $D^{2} NNs$ . The first difference is the matrices $V^{(l)}$ for $D^{2} NN$ and $W^{(l)}$ for ANN. In ANNs, the matrix $W^{(l)}$ of size $N \times N$ has $N^{2}$ degrees of freedom as each element of the matrix is a learnable parameter. By contrast, for $D^{2} NNs$ , the matrix $V^{(l)}$ is formed from the fixed matrix ${V^{'}}^{(l)}$ and learnable matrix ${B^{'}}^{(l)}$ . As $B^{' (l)}$ has $N$ degrees of freedom, $V^{(l)}$ inherits only $N$ degrees of freedom. Although increasing the number of diffraction layers can increase the degrees of freedom in $D^{2} NNs$ to some extent, it still does not match the flexibility of ANNs. The second difference is that ANN has a nonlinear activation function, whereas nonlinear operation is not easy to integrate in $D^{2} NN$ . In recent years, some works have proposed methods to introduce nonlinearities in $D^{2} NNs$ . For example, Wang et al.382 used a saturating image intensifier to realize a nonlinear activation layer in optical neural networks. Yildirim et al.451 presented a framework for achieving nonlinear operations using linear optics.

Once the $D^{2} NN$ is trained on a computer, the parameters of the digital model are transferred to the physical systems for inference in the optical domain. Direct deployment often results in performance degradation due to various sources of errors, such as position aberrations among diffractive layers and fabrication errors. To mitigate these issues, researchers have developed training methods to directly optimize the diffractive layers within the physical systems. Inspired by the in situ training method used in MZI meshes, Zhou et al.452 applied this approach to train $D^{2} NNs$ . They numerically demonstrated that the gradients of parameters in a $D^{2} NN$ could be obtained by measuring the forward and backward-propagated optical fields. The whole training procedure is shown in Fig. 40(a). However, the proposed method requires a complex field generation module for backpropagating light, and both the amplitude and phase of the forward and backward-propagated optical information must be measured through imaging systems. The gradient measurement may contain errors due to the complexity of the system and procedures, which can inevitably slow down the convergence speed. These challenges make this method hard to implement experimentally. Later on, Zhou et al.455 proposed a new adaptive training strategy where diffractive layers are sequentially calibrated to compensate for device errors. Specifically, they experimentally recorded output data in the current layer and used it as the input for the next layer. Subsequent diffractive layers are then adapted based on the physically generated input data from previous layers, allowing systematic errors from earlier layers to be alleviated by subsequent ones. Although effective, this layer-by-layer training approach is time-consuming. Spall et al.454 introduced a hybrid training scheme, as shown in Fig. 40(d), which simplifies the training procedure. They noted that the gradient matrix in each layer is the outer product of the corresponding activation and error vectors. The activation vectors are obtained through optical FP and measured by photodetectors, whereas the error vectors are acquired through digital backpropagation. Another hybrid training scheme is called physics-aware training (PAT), as depicted in Fig. 40(c).453 The PAT method involves a two-step process to determine the gradient of physical parameters. Initially, training data are fed into the real physical system, and the output is compared with the ideal result to derive the corresponding error signal. Subsequently, this error signal is backpropagated through a differentiable digital model to obtain the gradient. Although the forward and backward processes are unmatched, the gradients contain information about the actual physical system, ensuring effective convergence. Zheng et al.403 improved the PAT by incorporating a SEPN into their numerical model to better align it with the physical model, as shown in Fig. 40(b). The inclusion of the SEPN enhances the alignment between the digital model and the physical system, thereby improving the accuracy of gradients and accelerating the convergence process.

Figure 40.Training methods for $D^{2} NN$ . (a) In situ training procedure of $D^{2} NN$ includes four steps: FP, error calculation, BP, and gradient update.⁴⁵² (b) The flow chart for dual adaptive training method.⁴⁰³ (c) The data flow for physics-aware training.⁴⁵³ (d) The conceptual illustration for hybrid training of the optical neural network.⁴⁵⁴

Download full size

View all figures

3.2.3 Advances in diffractive optical neural network

In recent years, $D^{2} NNs$ have achieved substantial progress in several key areas. First, significant theoretical work has been undertaken to analyze $D^{2} NNs$ , focusing on optimal hyperparameter selection to enhance their performance. Second, researchers have developed a variety of light sources and $D^{2} NN$ architectures tailored to different application scenarios and specific challenges, broadening the versatility of these networks. Another major advancement addresses the inherent bulkiness of diffractive elements used in free-space optical computation. Numerous miniaturization strategies have been proposed, enabling the deployment of $D^{2} NNs$ in more practical and space-efficient formats. Last, the computational parallelism of $D^{2} NNs$ has been significantly boosted through the adoption of multiplexing technologies.

In the following Sec. 3.2.3, we will delve into each of these advancements in detail, highlighting the theoretical improvements, innovative architectures, miniaturization efforts, and multiplexing techniques that are driving the evolution of free-space optical computing.

Theoretical analysis of $D^{2} NN$

Several theoretical analyses have been conducted to improve the performance of $D^{2} NN$ . Mengu et al.53 identified the gradient vanishing problem commonly associated with $D^{2} NN$ training. This issue arises because the trainable parameters are represented by latent variables using a sigmoid function, which is a bounded function with a long tail that limits the range of optimization for these variables. By replacing the sigmoid function with an unbounded function, they found that accuracy for the Fashion-MNIST data set was significantly improved. Kulce et al.456 investigated the relationship between the number of diffractive layers and the capacity to perform general unitary transformations. They demonstrated that the dimensionality of the linear transformation space is linearly proportional to the number of diffractive surfaces, up to a limit determined by the extent of the input and output FOVs.456 Chen et al.457 analyzed various hyperparameters that impact the performance of $D^{2} NNs$ , such as neuron size, the number of neurons on each diffractive surface, and the spacing between adjacent diffractive layers. Their study revealed that for a fully connected $D^{2} NN$ , the diffraction angle of all neurons must be large enough to optically cover the diffractive surfaces in subsequent layers. Li et al.52 pointed out that the optoelectronic detectors used in $D^{2} NN$ can only measure nonnegative values of the output field intensity. They proposed a novel differential detection scheme where each class is associated with two optoelectronic detectors. The final inference results for each class are obtained by differentiating the signals from these two detectors, effectively expanding the range of output results from the nonnegative domain to the real-valued domain.

Different types of optical sources

Commonly, $D^{2} NNs$ primarily utilize monochromatic coherent light sources. Recent explorations have broadened the type of light sources used, as summarized in Figs. 41(a)–41(c), enhancing the functionality and application scope of these networks. In the studies,458^,461 researchers have explored the use of $D^{2} NNs$ under monochromatic spatially incoherent light illumination. Unlike coherent light, which can achieve arbitrary complex-valued linear transformations, incoherent light can perform arbitrary intensity transformations. This computational scheme is particularly promising for applications involving natural light scenes in machine-vision tasks, offering a more adaptable approach to real-world lighting conditions. Furthermore, the use of temporally broadband sources for illuminating $D^{2} NNs$ has been investigated.57^,459 In such setups, $D^{2} NNs$ process a continuum of wavelengths simultaneously as opposed to monochromatic sources. Each wavelength within the temporally broadband light undergoes independent transformation by the $D^{2} NN$ , with unique complex modulation. This capability allows broadband diffractive neural networks to execute a variety of tasks, including single- or dual-passband filtering, wavelength demultiplexing, pulse shaping, and machine vision.

$(a)–(c) Types of light sources used in D2NN, including (a) monochromatic light source,13" target="_self" style="display: inline;">13 (b) spatially incoherent monochromatic light source,458" target="_self" style="display: inline;">458 and (c) broadband pulse source.459" target="_self" style="display: inline;">459 (d)–(f) Types of D2NN structures, including (d) Fourier-space diffractive DNN,49" target="_self" style="display: inline;">49 (e) ensemble learning of diffractive neural network,460" target="_self" style="display: inline;">460 and (f) diffractive network in network and diffractive RNN.455" target="_self" style="display: inline;">455$

Figure 41.(a)–(c) Types of light sources used in $D^{2} NN$ , including (a) monochromatic light source,¹³ (b) spatially incoherent monochromatic light source,⁴⁵⁸ and (c) broadband pulse source.⁴⁵⁹ (d)–(f) Types of $D^{2} NN$ structures, including (d) Fourier-space diffractive DNN,⁴⁹ (e) ensemble learning of diffractive neural network,⁴⁶⁰ and (f) diffractive network in network and diffractive RNN.⁴⁵⁵

Download full size

View all figures

Different types of $D^{2} NN$ structures

To address the diverse requirements of optical computing tasks, various structures of $D^{2} NNs$ have been proposed, as illustrated in Figs. 41(d)–41(f), each tailored to enhance performance for specific applications. Yan et al.49 introduced a Fourier space $D^{2} NN$ ( $F - D^{2} NN$ ), as shown in Fig. 41(d). In this innovative architecture, the $D^{2} NN$ is placed at the Fourier plane of an optical system. This placement naturally preserves spatial correspondence better than traditional $D^{2} NN$ configurations, offering enhanced accuracy and efficiency in processing optical signals. Zhou et al.455 developed an optoelectronic fused computing architecture, as shown in Fig. 41(f). This architecture is based on a reconfigurable diffractive processing unit (DPU), which includes two primary components, an input data generation module and a single diffraction layer for light modulation. This versatile DPU allows for the construction of various types of ANN architectures, such as D-NIN-1 and D-RNN, providing flexibility and adaptability in deploying optical computing solutions. In addition, Rahman et al.460 proposed a feature engineering and ensemble learning framework for $D^{2} NNs$ , as shown in Fig. 41(e). In this framework, a large number of independent $D^{2} NNs$ are constructed, each trained with different features extracted from the same image but through various methods. At the back end, a pruning algorithm is applied to ensemble high-level features from different $D^{2} NNs$ . This method has demonstrated a 16% relative performance improvement on the CIFAR-10 data set, showcasing the potential of ensemble strategies in enhancing the accuracy of optical neural networks.

Miniaturization of $D^{2} NN$

$D^{2} NNs$ traditionally operate in the terahertz region, resulting in bulky optical architectures. There are currently two primary approaches to miniaturizing the system. One approach involves shifting the working wavelength of the $D^{2} NNs$ to shorter wavelengths, which necessitates higher precision in manufacturing the diffraction layers. Lu et al.462 made advancements in this area by moving the operating frequency of $D^{2} NNs$ to the long-wave infrared region, which has led to an 80-fold reduction in the feature size of the network. They used germanium to fabricate the diffractive grating; the fabrication process is shown in Fig. 42(a). Further progress has been seen in harnessing optical neural networks within the visible light spectrum. The Gu group has successfully fabricated a diffractive unit device operating in the near-infrared band using nanolithography technology, as depicted in Fig. 42(b). This device boasts an effective neuron density of $5 \times 10^{6} {mm}^{- 2}$ and can be integrated in front of a CMOS sensor for applications such as optical decryption.463 In addition, Duan has designed a polarization-multiplexed metasurface structure capable of conducting multitask neural network inference at a wavelength of 532 nm, shown in Fig. 42(c). This structure achieves an effective neuron density of $6.25 \times 10^{6} {mm}^{- 2}$ and has also been integrated into a CMOS imaging sensor, marking a significant step toward realizing chip-scale optical neural networks.464

$Diffracted layers are miniaturized by reducing working wavelength or designing on-chip diffracted structures. (a) Fabrication procedure of germanium-based diffraction grating.462" target="_self" style="display: inline;">462 (b) Optical machine-learning decryptor is physically 3D printed by galvo-dithered two-photon nanolithography, and integrated with a CMOS chip.463" target="_self" style="display: inline;">463 (c) Exploded schematic diagram of metasurface-based diffractive neural network integrated with a CMOS chip.464" target="_self" style="display: inline;">464 (d) Scanning electron microscope image of an on-chip metalens.465" target="_self" style="display: inline;">465 (e) Schematic of on-chip DONN. The diffractive unit composed of three identical silicon slots is used to modulate the amplitude and phase of the optical wave.466" target="_self" style="display: inline;">466 (f) The electric field distribution (left) and refractive index distribution (right) of the coherent photonic device that performs unitary matrix computation.467" target="_self" style="display: inline;">467 (g) Schematic of metastructures in a SiPh platform using an inverse-design method based on the effective index approximation with low-index contrast constraint.468" target="_self" style="display: inline;">468$

Figure 42.Diffracted layers are miniaturized by reducing working wavelength or designing on-chip diffracted structures. (a) Fabrication procedure of germanium-based diffraction grating.⁴⁶² (b) Optical machine-learning decryptor is physically 3D printed by galvo-dithered two-photon nanolithography, and integrated with a CMOS chip.⁴⁶³ (c) Exploded schematic diagram of metasurface-based diffractive neural network integrated with a CMOS chip.⁴⁶⁴ (d) Scanning electron microscope image of an on-chip metalens.⁴⁶⁵ (e) Schematic of on-chip DONN. The diffractive unit composed of three identical silicon slots is used to modulate the amplitude and phase of the optical wave.⁴⁶⁶ (f) The electric field distribution (left) and refractive index distribution (right) of the coherent photonic device that performs unitary matrix computation.⁴⁶⁷ (g) Schematic of metastructures in a SiPh platform using an inverse-design method based on the effective index approximation with low-index contrast constraint.⁴⁶⁸

Download full size

View all figures

Another approach to miniaturizing the $D^{2} NN$ involves integrating the diffractive layer of the $D^{2} NN$ onto a chip, shrinking the input dimension from 2D to 1D. In 2019, Wang et al. successfully minimized the scattering loss for on-chip lenses. They demonstrated the functionality of Fourier transform and differentiation by cascading three layers of high-contrast transmitarrays, as shown in Fig. 42(d).465 Following this development, the Chen group realized an on-chip diffractive optical neural network (DONN) based on a 1D dielectric metaline consisting of a series of silicon slots, which represent the hidden layer in the DONN, as depicted in Fig. 42(e).466 Subsequently, several works have expanded the capabilities of DONNs in various aspects. Liu et al.469 proposed a deep mapping regression model to characterize the process of light propagation in metalines, significantly enhancing the integration level of DONNs. Fu et al.470 addressed the limitation of input dimensions by utilizing space–time interleaving technology. Sun et al.471 introduced a multimode DONN architecture and accurately described the optical field’s evolution in DONNs through eigenmode analysis methods. Beyond metalines that mimic the $D^{2} NN$ computing diagram, photonic metastructures have been explored to meet diverse optical computational requirements. Khoram et al.472 employed the adjoint method to train a 2D metastructure to perform coherent optical neural network computing. Estakhri et al.473 designed a closed-loop coherent optical network, where input and output channels in metastructures are interconnected through waveguides. This network is capable of solving integral and differential equations and performing matrix inversion.473 In addition, Qu et al.467 theoretically demonstrated an incoherent optical neural network on 2D metastructures to realize stochastic matrix computation, as shown in Fig. 42(f). 2D metastructures can help reduce the simulation burden, but for real-world implementation, it is necessary to design 3D on-chip metastructures. Given the computational challenges associated with 3D simulation, the size of metastructures is limited. To address these challenges, Nikkhah et al.468 employed a propagation-based 2D effective index approximation for 3D planar structures, significantly reducing computational resources and time, as shown in Fig. 42(g). With this approach, they designed a metastructure that supports large-scale matrix computation and experimentally verified the results. Due to the sensitivity of on-chip metastructures, high-precision lithography technology is essential to minimize computational errors. Moreover, as the chip size is scaled up, reconfigurability becomes crucial to compensate for any lithographic imperfections that may severely deviate from the expected computing results. Considering this issue, Wu et al.474 introduced a lithography-free paradigm for an integrated photonic processor that is reconfigurable through the imaginary index, which is fully programmed by an external pumping pattern generated by a spatial light modulator.

Parallel computing capability of $D^{2} NN$

The computing parallelism of $D^{2} NNs$ can be significantly enhanced through the utilization of extra optical dimensionality and multiplexing technologies. Currently, methods such as polarization multiplexing, wavelength multiplexing, and orbital angular momentum (OAM) multiplexing are being explored to expand the capabilities of these networks.

The Ozcan group introduced the polarization multiplexing technologies to $D^{2} NNs$ , capable of performing multiple, arbitrarily selected linear transformations at the same time.475 The architecture of the optical processor, shown in Fig. 43(a), incorporates a polarizer array positioned between trainable diffractive layers. This arrangement uniquely responds to different input polarization states, allowing the neural network to concurrently process different linear transformations for different input polarizations. The Lin group has leveraged wavelength multiplexing to facilitate the simultaneous processing of data from multiple data sets on the same optical diffractive neural network, as shown in Fig. 43(b).476 By encoding data from different data sets onto different wavelengths, passing the encoded light through the diffractive neural network, and then using corresponding filters to demultiplex the information of different wavelengths, they effectively separated and detected the intensity information of the output light for each data set. Furthermore, Huang et al.477 explored the potential of OAM multiplexing to enhance the data processing capabilities of $D^{2} NNs$ , as shown in Fig. 43(c). By mixing vortex beams carrying different OAMs and feeding them to a $D^{2} NN$ , they demonstrated that the $D^{2} NN$ could diffract each vortex beam in specific directions, enabling hybrid-OAM mode identification.

Figure 43.High-parallelism $D^{2} NN$ inference using (a) polarization multiplexing,⁴⁷⁵ (b) wavelength multiplexing,⁴⁷⁶ and (c) OAM multiplexing⁴⁷⁷ technologies.

Download full size

View all figures

3.2.4 Applications of optical diffractive neural network in AI

The $D^{2} NN$ represents a novel paradigm for neural network inference within the optical domain, characterized by its high throughput, energy efficiency, and capability for highly parallel computing. This approach has seen various applications in AI-related fields, with numerous groundbreaking works contributing to its development.

In the initial stages of $D^{2} NN$ research, applications predominantly focused on image recognition tasks such as handwritten digit recognition using the MNIST data set and more complex scenarios such as fashion product recognition using the Fashion-MNIST data set, as shown in Figs. 44(a) and 44(b). In 2018, the Ozcan group achieved significant milestones by numerically attaining accuracies of 91.75% and 89.13% on the MNIST and Fashion-MNIST data sets, respectively, utilizing five phase-only diffractive layers. Subsequently, Mengu et al.53 improved the training strategies and numerically verified that the accuracy was boosted to 97.18% for MNIST and maintained at 89.13% for Fashion-MNIST. These data sets have since become benchmarks for assessing the performance of $D^{2} NN$ across various configurations.52^,53^,455^,457^,462^,464^,472^,476^,481 Beyond static image recognition, $D^{2} NNs$ have been extended to dynamic video content. Zhou et al.,455 for instance, developed a diffractive RNN specifically designed for video-based human action recognition, as shown in Fig. 44(c). This extension demonstrates the versatility of $D^{2} NNs$ in handling both still and moving images, highlighting their potential to revolutionize fields that require real-time processing of visual data.

Figure 44.AI-related applications for all-optical $D^{2} NN$ . (a) Handwritten digit recognition.⁵³ (b) Fashion product recognition.⁵³ (c) Video-based human action recognition.⁴⁵⁵ (d) Image reconstruction.⁴⁷⁸ (e) Subwavelength phase imaging.⁴⁷⁹ (f) All-optical image encryption using incoherent illumination.⁴⁶¹ (g) Superresolution display.⁴⁸⁰ (h) All-optical decryptors using coherent illumination.⁴⁶³

Download full size

View all figures

$D^{2} NN$ extends far beyond simple image recognition tasks. As $D^{2} NN$ technology has evolved, its application range has expanded into more sophisticated generative AI tasks, further broadening the potential uses of diffractive neural networks. Figures 44(f) and 44(h) show that $D^{2} NN$ can be potentially applied to image encryption.461^,463 The Ozcan group developed a remarkable technique for image reconstruction, successfully processing images obscured by unknown diffusers without the need for digital computing,478 as illustrated in Fig. 44(d). Recently, this group demonstrated subwavelength phase imaging, which has wide application in biomedical imaging, sensing, and material characterization. The imaging system, as shown in Fig. 44(e), consists of an all-optical diffractive encoder and decoder part. The encoder converts high-frequency information about the object into low-frequency features for transmission through air, whereas the decoder part is used to reconstruct the phase of the object.479 Işıl et al.480 introduced an architecture combining an electronic encoder with an optical decoder to achieve superresolution image display, as shown in Fig. 44(g). This system utilizes a $D^{2} NN$ for a complex light-field transformation that surpasses traditional lens-based imaging systems, which are typically limited to simple convolution operations. In 2023, Chen et al.482 proposed a photonic encoder–decoder scheme, aiming to enhance the efficiency of image transmission. This innovative approach utilizes $D^{2} NN$ for both the encoder and decoder. The encoder part transforms optical input information into a compressed and encrypted optical latent space, whereas the decoder is designed to reconstruct images from the distorted signals transmitted from the encrypted domain.

Compared with traditional ANNs, $D^{2} NN$ faces certain limitations in nonrepresentational capacity for two reasons. (1) As mentioned in Sec. 3.2.2, the degrees of freedom of $D^{2} NN$ are limited compared with ANNs. (2) Realizing nonlinear operation in the optical domain is challenging in $D^{2} NN$ . These two factors restrict $D^{2} NNs$ from handling more complex AI tasks. To overcome these limitations, opto-electrical hybrid neural networks have been developed, combining the strengths of optical and electronic components. In the hybrid architecture, the $D^{2} NN$ serves as the front end, performing optical feature extraction, whereas the back end consists of a lightweight neural network operating in the digital domain. This configuration has been widely applied in machine-vision tasks, demonstrating its versatility and effectiveness. Pad et al.432 designed a compact $D^{2} NN$ system that includes an amplitude-only transmittance mask and double lenses at the front end, capable of processing incoherent and broadband light, as shown in Fig. 45(a). They successfully demonstrated this system’s capabilities in tasks such as handwritten digit recognition and static gesture recognition. Similar systems were also proposed by Colburn et al.431 and Chang et al.433 Further applications of these hybrid systems include more complex real-world tasks such as 3D real-scene recognition382 and specimen detection [Fig. 45(b)].483 In the field of computational imaging, many studies have drawn inspiration from these hybrid systems to achieve various imaging tasks. Liutkus et al.484 utilized scattering material to perform random matrix-vector multiplication in the optical domain, illustrated in Fig. 45(c), enhancing the capability to take a scalable number of measurements in parallel, thus significantly reducing the acquisition time in compressive imaging. Sitzmann et al.485 optimized the point spread function of diffractive optical elements to achieve depth and chromatic invariance, as depicted in Fig. 45(d). Markley et al.486 used random multifocal lenslets as diffusers to encode 3D fluorescence information into 2D, which was then recovered by reconstruction algorithms, shown in Fig. 45(e). Moreover, as shown in Fig. 45(f), several works applied opto-electrical hybrid systems to depth estimation tasks,487^,488 illustrating the broad potential of opto-electrical hybrid systems in diverse computational fields.

Figure 45.Hybrid opto-electrical computing system empowers the machine-vision field. (a) Handwritten digit recognition through optical-digital implementation.⁴³² (b) Malaria parasite detection using learned sensing network.⁴⁸³ (c) Imaging compression using a multiply scattering medium and reconstruction by sparse optimization techniques.⁴⁸⁴ (d) End-to-end computational camera design paradigm to realize achromatic extended depth of field.⁴⁸⁵ (e) Joint optimization of microscope point spread function and differentiable reconstruction algorithm to achieve 3D information reconstruction.⁴⁸⁶ (f) The flow chart for depth map estimation using a phase-coded aperture camera.⁴⁸⁷

Download full size

View all figures

Although opto-electrical hybrid computing systems are capable of handling more complex tasks, some limitations remain to be solved. First, the inevitable analog-to-digital (AD) and digital-to-analog (DA) conversions in hybrid computing systems diminish the advantages of optical computing, such as energy efficiency and high computation speed. Improving conversion efficiency and speed and reducing the number of AD/DA conversions are two potential ways to optimize the system performance. Chen et al.489 introduced an all-analog chip that merges electronic and optical computing, named ACCEL, which avoids AD conversions. The workflow of ACCEL can be seen in Fig. 46(a). This system utilizes diffractive optical computing as an optical encoder for feature extraction, where the light-induced photocurrents are directly used for further computation in an integrated analog computing chip. As a result, ACCEL achieves remarkable computing performance with an energy efficiency of 74.8 peta-operations per second per watt and a computing speed of 4.6 peta-operations per second. This chip has been successfully applied in practical scenarios, such as time-lapse video recognition tasks, including moving direction recognition for various vehicles, demonstrating even higher accuracy than that achieved by a digital three-layer neural network.489 The second limitation concerns the scalability and reconfigurability of optical computing, which are not on par with those of electrical computing. To address this issue, Xu et al. designed the Taichi photonic chiplet, demonstrating the advantages of optical computing in supporting artificial general intelligence (AGI).490 The Taichi leverages a distributed computing architecture, where complex tasks are decomposed into multiple independent subtasks and processed in parallel. The execution units in Taichi combine the benefits of optical diffraction computing for large-scale operations and interference computing for reconfigurability. The DPU functions as both an information compressor (encoder) and a reconstructor (decoder), whereas the interference processing unit handles reconfigurable arbitrary matrix multiplications. Consequently, this Taichi photonic chiplet offers programmability in handling large-scale AI-related tasks, such as 1000-category-level classification and advanced versatile content generation, including music and image generation, shown in Figs. 46(b) and 46(c).

Figure 46.Recent high-performance optical computing chips to support advanced AI tasks. (a) The data flow of the all-analog photoelectronic chip, which can support energy-efficient and ultrahigh-speed vision tasks.⁴⁸⁹ (b), (c) Large-scale photonic chiplets are proposed to deploy large models for AGI tasks⁴⁹⁰ such as (b) music generation and (c) image generation.

Download full size

View all figures

In summary, Sec. 3.2 shows how free-space diffraction-based optical computing transforms light propagation into large-scale, efficient computation. By spreading signals in two dimensions rather than confining them in waveguides, these systems exploit diffraction to perform complex operations such as Fourier transforms, convolutions, and neural network inference with extraordinary parallelism. Although realizing nonlinearities, ensuring compact integration, and refining training remain challenging, recent innovations—from device miniaturization and new architectures to advanced calibration and learning strategies—are rapidly advancing the field. As these approaches mature, free-space optics is poised to deliver high-throughput, low-energy solutions for AI and beyond.

3.3 Emergent Materials for Photonic Computing

Building on the advances in photonic circuits and light diffraction platforms for AI, we now turn our attention to the emerging materials that are driving the next wave of innovation in photonic computing. Although silicon photonics has promoted photonic computing from a bulky, expensive, and complex system to a compact, low-cost, and highly reliable chip, its implementations are still mostly on addressing linear MAC operations. Limited by inherent material properties, silicon alone makes it difficult to express merit in every aspect of photonic computing. Therefore, extensive research has been focused on developing novel materials and their integration technologies to support silicon photonics toward a faster, smarter, and more efficient engine that is capable of solving advanced tasks. This section is structured around the functions of the novel materials in the AI system, detailing materials in topics of implementing weight elements, nonvolatile storage for in-memory and neuromorphic computing, and introducing nonlinearity.

3.3.1 Implementing weight element

MAC operations occupy most of the computation efforts in AI-related tasks. In free space, implementations of MAC operations can be done passively via a transparent phase mask13 and actively via liquid crystal372 or SLMs.491 Yet, these approaches are strongly limited by a lack of flexibility and low operation speed that is not compatible with high-speed MAC computing. On the other hand, on-chip solutions, such as silicon photonics, allow fast, reconfigurable adjustment of the weight elements by phase modulation of silicon MZIs or microring resonators.32^,405^,492^–494 Although successful, it is still worth pointing out that these works adopted a thermal-optic tuning scheme that resulted in a large device footprint. Severe thermal cross talk and high power consumption become notable when scaled up.32^,495 To address these issues, researchers found lithium niobate on insulator (LNOI) a viable material platform candidate for better signal modulation. LNOI takes advantage of its noncentrosymmetric crystal structure to exhibit a large Pockels effect, a second-order electro-optical linear modulation of the material’s refractive index. Compared with that in silicon thermo-optic modulators, the LNOI modulator is an order of magnitude more efficient with power consumptions as low as several megawatts.496 More importantly, the fast E/O response of the LNOI modulator guarantees high-speed and reconfigurable weight implementation, promoting a processor with larger computation power. In addition, the low propagation loss of $< 3 dB / m$ further provides a high-fidelity, scalable photonic MAC platform.497 A representative LNOI modulator structure and its performance are shown in Figs. 47(a) and 47(b). With these above-mentioned merits, Zheng et al.40 demonstrated the first photonic neural network on an LNOI photonic chip. The network chip, named ZEN-1, has an attractive computing speed of 0.6 TOPS with merely 33.5 fJ/OP power efficiency and a 98.5% task-solving accuracy. The successful validation of performing AI tasks such as iris flower and handwritten digit recognition on ZEN-1 with low power consumption and small propagation loss would clearly spark the evolution of LNOI photonic computing chips.

Figure 47.(a) Structure of an LNOI modulator. (b) Modulation depth with voltage.³⁷² (c) Architecture of an SOA-based neural network.⁴¹

Download full size

View all figures

On the other hand, amplitude modulation is another important technology for weight element implementation. Instead of leveraging coherent interference, amplitude modulation directly absorbs or amplifies optical signals for MAC operations. It is usually combined with wavelength-division multiplexing, where each wavelength channel is treated independently and combined incoherently to avoid complex coherent decomposition of matrices. This feature allows high parallelization of computation; however, it provides a harsh requirement on the spectral flatness of the light source.498 To demonstrate amplitude modulation with accurate control of loss and gain on chip, SOAs are the most mature and commonly applied technology. It heterogeneously integrates a III-V, usually InP chip, onto silicon photonics via flip-chip bonding, and addresses each weight by controlling the absorption coefficient with corresponding externally applied current. In addition, SOAs can preserve on-chip signal levels with light amplification. There has been significant progress in SOA-based photonic neural networks. Shi et al.499 first demonstrated an SOA photonic feed-forward neural network in 2019. It consists of eight weight channels per chip and eight chips in the system. The system is further used for Iris flower classification evaluation and achieved a prediction accuracy of 85.8%. Later in 2022, the same group systematically analyzed the performance of such an 8 by 8 monolithically integrated SOA chip architecture with handwritten digit recognition.41 The architecture and chip layout are illustrated in Fig. 47(c). Its accuracy was determined to be 89.5% with an energy efficiency of $12 pJ / MAC$ . Most recently, a fully integrated generic photonic linear operator processor was demonstrated.500 Based on a recursive computing architecture and reconfigurable weight, the all-optical linear operator processor can address complex computation tasks, ranging from matrix inversion to solving differential equations. The total accuracy is $> 97 %$ with computing speeds up to gigahertz to terahertz per operation. Besides these exciting achievements in SOA-based computing engines, there are other application-specific photonic computing architectures, such as reservoir computing and full-function Pavlov associative learning that leverage SOA.501^,502 The emergence of both generic and application-specific photonic computing chips certainly indicated SOA-based weighting elements as a viable route toward fast, large-scale, and reliable computing systems.

3.3.2 Nonvolatile storage for in-memory and neuromorphic computing

Traditional von Neumann computing architecture separates data processing and storage blocks. However, data retrieving and writing have become the bottleneck of high-speed computation, especially in AI-related tasks. Therefore, in-memory computing, where data are processed directly on a memory unit, comes under the spotlight. In photonics, such in-memory computing is mostly achieved by introducing nonvolatile phase change materials (PCMs).42^,503^–506 PCM is a group of materials that exhibits large refractive index contrast ( $Δ n > 1$ ) between amorphous and crystalline states and undergoes fast phase transition upon external heat stimulus. Once switched, its phase state can be preserved, even after the stimulus is absent. This nonvolatile nature of PCM makes it an ideal material platform for photonic memory cells. To switch PCM from a crystalline state to an amorphous state, PCM should be heated above its melting temperature and quickly quench down (100 ns to $10 μ s$ ) to freeze any crystallization processes. Conversely, keeping PCM between its crystallization temperature and melting temperature for an extended period ( $10 μ s$ to 10 ms) allows nucleation and growth for the transition from amorphous to crystalline state. In general, switching of PCM state can be achieved both optically and electrically. In optical switching, an ultrafast laser pulse ablates on the PCM off-chip or through a waveguide on-chip.507 The PCM or its cladding layer absorbs the light and generates the heat required for phase transition. In electrical switching, an integrated on-chip electro-thermal heater is adopted instead. PCM crystallization state is thus controlled by externally applied voltages.508 Figure 47 shows several exemplary demonstrations of PCMs in photonic computing. Ríos et al.509 first realized on-chip nonvolatile multilevel photonic memory leveraging GST PCM in 2015, where each memory bit is a PCM pad on waveguide. A different bit level was achieved by recording the relative transmission of a PCM-coated device, which was erased and written repetitively with an on-chip ultrafast stimulus light pulse. Based on the very same device structure, the same group defined PCM synaptic weight for neuromorphic computing in 2017510 and a new mechanism for photonic scalar and vector multiplication operations using PCM as an in-memory computing element in 2019.511 In these works, MAC operations are conducted by modulating the relative intensity of the input light after transmitting through the PCM-coated devices and then combined at the photodetector. The multiplier vectors can be flexibly reconfigured by erasing/writing the PCM pad for merely nanojoule energy in a single pulse. Since then, PCM has started to attract increasing attention in photonic computing.51^,406^,512^–514 A multimode photonic computing core consisting of an array of programmable mode converters based on on-waveguide metasurfaces made of PCMs, as shown in Fig. 48(a), was demonstrated.513 This computing core utilizes the refractive index change of PCM ${Ge}_{2} {Sb}_{2} {Te}_{5}$ to control the spatial mode of the waveguide, achieving high-precision mode conversion for matrix-vector multiplication calculations in neural network algorithms. By constructing a prototype optical CNN, high-precision image processing and recognition tasks have been realized. The most representative work for parallel convolution operations is a fully integrated photonic tensor core.406 In this work, PCMs are integrated into a $16 \times 16$ photonic crossbar array with individually encoded weight elements, as illustrated in Fig. 48(b). Convolution operations were demonstrated on both image edge detection and handwritten digit recognition. The photonic tensor core presented an exceptional performance, with an ultrafast computing speed of $2 \times 10^{12} MAC / s$ and extremely low power consumption of merely $17 fJ / MAC$ . The successful demonstration of PCMs as nonvolatile photonic memory cells certainly brings new possibilities in power-efficient and reconfigurable photonic in-memory computing.

Figure 48.Several representative works of PCMs as a nonvolatile memory and weight element. (a) A waveguide-integrated PCM metasurface.⁵¹³ (b) A PCM-integrated cross-bar array for parallel convolution.⁴⁰⁶ (c) A PCM pad array as the neural synapse.⁵¹⁰ (d) A PCM integrated all-optical abacus.⁵¹²

Download full size

View all figures

The research for neuromorphic computing that mimics the brain’s approach to the simultaneous processing and storage of information has drawn great attention. The electronic devices have been studied to achieve synaptic functions, such as the resistance change of phase-change chalcogenides515 induced by electrical stimulation, as well as research on photonic synapses516 and optoelectronic synapses,517 but they face challenges such as integration difficulties, speed limitations, and power consumption issues. Cheng et al.510 developed an all-photonic chip synapse, fully implemented in the optical domain using PIC methods, with simple and robust functionality for variable synaptic plasticity, as shown in Fig. 48(c). Photonic synapses exhibit different synaptic weighting during the crystalline and amorphous state transitions of GST structures. This research paves the way for next-generation architectures in neuromorphic computing. Feldmann et al.512 demonstrated an all-optical abacus computing unit based on PCMs embedded in nanophotonic waveguides, as shown in Fig. 48(d). Using optical pulses for arithmetic operations such as a traditional abacus, including addition, subtraction, multiplication, and division, they replicated the functionality of a traditional abacus. This technology involves the stepwise crystallization of nanoscale PCMs combined with nanophotonic waveguides, allowing for optical switching between amorphous and crystalline states to achieve simultaneous computation and storage functions.

Apart from PCMs, there are other groups of materials and devices, such as ferroelectrics and charge-trapped MOS gates, that display nonvolatile memories. Typical ferroelectrics include lithium niobate, lead zirconate titanate, and barium titanate. Unlike PCMs that undergo a phase transition, ferroelectrics switch their electrical domains to modulate their refractive index ( $Δ n \sim 10^{- 4}$ ) under externally applied voltage and exhibit a hysteresis behavior when the applied voltage is removed,518 whereas charge-trapped devices change their refractive index ( $Δ n \sim 10^{- 3}$ ) by accumulating and depleting charge carriers in a floating gate structure.43 It is worth noting that although there are successful demonstrations of such devices in photonic memories, their modulation in the refractive index is several orders of magnitude smaller than PCMs and can only be implemented with resonant structures. Therefore, their applications in photonic computing are significantly limited.

3.3.3 Introducing nonlinearity

The nonlinear activation layer plays an indispensable role in DNNs. According to linear algebra, a multiMAC operation layer without any nonlinear activation is equivalent to only one single MAC layer. The adoption of nonlinear activation guarantees the network capable of solving complex tasks and enhances overall accuracy. The incorporation of a nonlinear activation layer into photonic neural networks has been a recurring challenge.519 One common way is bypassing optical nonlinearity by employing O-E-O conversion, where the optical signal is first converted to the electrical domain and then this electrical signal is exerted back to the optical device to trigger nonlinearity.520^,521 This approach has been successfully demonstrated in photonic neural networks with accurate completion of complex identification and inference tasks. However, the conversion between the electrical and optical domains creates time delays and energy consumption for the systems, weakening the high-speed advantage of photonic computing.522 Another approach seeks direct nonlinear interactions of photons for computing. Although optical nonlinearity is not rare, it usually requires reasonably large intensities to trigger the interaction with nonlinear materials, and the conversion efficiency is relatively small, making it incompatible with the power-efficient, highly accurate, and scalable requirements of photonic neural networks. Therefore, searching for materials with low nonlinear thresholds, high nonlinear conversion efficiency, fast response, and ease of integration becomes the major task to introduce nonlinearities into photonic neural networks. In contrast to the aforementioned O-E-O conversion leveraging Ge detectors, direct integration of Ge onto silicon waveguide can induce free-carrier absorption nonlinear interactions.523 Its structure and the nonlinear transmittance of light are shown in Figs. 49(a) and 49(b). Such nonlinearity is harnessed into an all-optical neural network and demonstrates a 97.3% accuracy on open machine-learning tasks. In addition, materials with low dimensions stand out, as they exhibit unconventional properties that traditional materials have yet to possess. The saturation and reverse saturation absorption response of 2D materials allow the construction of different nonlinear activation functions. Hazan et al.524 exploited MXene-based 2D materials as nonlinear activation units. The saturation intensity is 8 to 10 mW for a femtosecond pulse laser, with a transmittance modulation depth of 30% to 50%. This nonlinear activation unit was subsequently incorporated into DNN and yielded an attractive accuracy of 97.54% for MNIST and 88.01% for Fashion-MNIST data sets. Most recently, Shi et al.44 adopted quantum dots as the nonlinear platform, as shown in Figs. 49(c) and 49(d). The quantum dots generate excitons that recombine through nonradiative relaxation and only become luminous after the incident light intensity reaches a certain threshold level. This process creates a response curve that highly resembles the ReLU function. With the addition of this quantum dots layer, the enhanced recognition accuracies of 86.6%, 78.66%, and 84.74% for hand signs, hand-drawn, and traffic signs are achieved, respectively. With the emerging technology of successful monolithic integration onto silicon photonics,525 quantum dots exhibit great potential to be an ideal on-chip nonlinear activation unit material. Despite significant efforts in addressing optical nonlinear activation, establishing a minimal time delay, low nonlinear threshold, and low loss platform with mature integration technology remains challenging. However, we emphasize the importance of such platforms as they finally complete all the components of an all-optical fully integrated photonic DNN chip with ultrafast computing speed and high energy efficiency.

Figure 49.Nonlinear activation units. (a) A Ge on Si nonlinear activation unit structure and (b) its nonlinear response curve.⁵²³ (c) An image recognition neural network with a quantum nonlinear dot activation layer and (d) a ReLU-like response with a quantum dot activation unit.⁴⁴

Download full size

View all figures

The successful demonstration of various photonic platforms for computing has validated that such systems are capable of completing different tasks and accelerating computation speed. However, lacking a universal material integration platform limits these demonstrations only to the device level. We wish to highlight that it is very important to develop novel materials as well as their integration technologies to exploit the full functionality of a photonic computing system. With the ever-developing heterogeneous integration technology, we foresee the possibility of building an electro-photonic hybrid computing system that can read/write local memory, perform high-speed parallel computing, and accomplish AI inference or even generative tasks.

4 Challenges and Future Directions

The intricate synergy between AI and photonics not only paves the way for groundbreaking advancements but also introduces a series of formidable challenges and rich avenues for future research. AI’s impact on foundational and cutting-edge research in photonics is profound. AI has revolutionized the analysis of complex optical phenomena, leading to new insights in optical theory and novel physical concepts. Deep learning is being employed to tackle previously intractable problems in photonics, such as predicting light–matter interactions and optimizing nanophotonic structures that significantly accelerate theoretical research and experimental validations. The future research landscape is likely to focus on automated AI tools for the design and optimization of photonic structures and circuits, AI-enhanced fabrication technologies to improve manufacturing precision, and the development of integrated photonic AI systems that embed AI functionality directly onto photonic platforms, such as photonic chips. The advancements in on-chip photonic processing aim to enhance AI computational speed while reducing energy consumption. In addition, exploring novel materials for photonics computing and neuromorphic photonics can endow AI systems with unprecedented capabilities in terms of speed, security, and efficiency. Photonics also offers vital support to AI by facilitating ultrafast data processing and transmission, critical for real-time AI applications and large-scale AI cloud services.

The fusion of quantum technology with AI and photonics is also revolutionizing these interdisciplinary fields, paving the way for groundbreaking advancements. Quantum machine learning (QML), combining quantum computing and machine learning, is poised to significantly enhance data analysis and pattern recognition capabilities. Classical machine-learning methods, such as neural networks, are already being applied to quantum tasks,526 demonstrating their utility in identifying quantum phase transitions,527^,528 optimizing ultracold atom experiments,529 and computing the quantum state overlapping.530

Quantum generalizations of classical models, including quantum support vector machines,531 quantum decision trees,532 parameterized quantum circuits,533 quantum Hopfield neural networks,534 and quantum reinforcement learning,535 extend machine learning into the quantum domain. The Harrow–Hassidim–Lloyd (HHL) algorithm536 for solving linear systems of equations with exponential speedup has been widely used as a subroutine in many QML algorithms. Recently, an HHL-type quantum algorithm for solving the gradient descent dynamics for large-scale classical sparse neural networks has been proposed, where quantum enhancement is shown to be possible at the early stage of training.537 Despite theoretical quantum speedup for various QML models,538 practical challenges remain due to the current limitations of quantum hardware. Most QML models rely on universal fault-tolerant quantum computers and efficient quantum state preparation that are not yet feasible on a large scale. These issues may make the quantum speedup impractical539 at present. Thus, hybrid quantum-classical algorithms are utilized on noisy intermediate-scale quantum (NISQ) devices.540 These algorithms leverage quantum feature mapping and classical model training. As quantum systems grow, classical optimization becomes harder, highlighting the need for QML algorithms trained directly on quantum hardware. Experimental demonstrations on various quantum platforms, including photonic quantum processors and superconducting systems, show the potential of QML.541^–544 Quantum kernel methods and parameterized quantum circuits based on continuous variables542^,545 and boson modes544 have been experimentally validated using photonic quantum processors, showing promising results. A comprehensive review of parametric quantum circuits in quantum optics platforms is available in Ref. 546. Quantum reservoir computing based on continuous variables547 and boson modes548 and quantum extreme learning machines549 have also been proposed in the integrated photonics platform, offering simplified training processes and potential resilience to noise, further expanding the applicability of QML. Larger and more universal photonic quantum circuits are desirable for advancing QML research. Continued advancements in experimental techniques and algorithm design will be crucial in overcoming current limitations and fully harnessing the potential of QML.

Integrating photonics with AI presents a diverse array of technical challenges and ethical considerations. Technically, the integration complexity necessitates advanced fabrication techniques for constructing photonic circuits that interface seamlessly with electronic systems, addressing issues related to miniaturization, scalable fabrication techniques, power consumption, and heat dissipation. In addition, the challenge encompasses the real-time processing of substantial volumes of data produced by photonic systems, the selection of suitable materials, and the transition from prototype stages to commercially viable products. In the realm of QML, additional challenges persist, such as designing efficient data encoding550 and measurement551^,552 schemes and addressing issues such as the barren plateau problem in parameterized quantum circuits.553^,554

Ethically, concerns regarding bias and fairness are prominent, as AI systems have the potential to perpetuate and amplify biases present in training data sets. This necessitates comprehensive testing and validation procedures. Moreover, the rapid data processing capabilities of photonic AI systems introduce significant privacy concerns, demanding stringent adherence to privacy standards. Technological advancements may also lead to job displacement, highlighting the need for robust reskilling and educational programs. Furthermore, ensuring transparency and accountability in photonic AI applications requires the development of explainable AI models and the establishment of regulatory frameworks.

The interdisciplinary applications of photonics and AI span several sectors. In healthcare, these technologies may enhance diagnostic precision and personalize medical treatment.279 In optical communications, they improve data transmission and information capacity.353 Environmental monitoring benefits from these advancements by providing more accurate data for climate change and pollution analysis.20 In the manufacturing sector, photonics and AI contribute to the optimization of production processes through intelligent technologies.129

Among these sectors, healthcare stands out as a particularly transformative area for the integration of photonics and AI. By combining the strengths of advanced optical systems and AI-driven analysis, photonics has achieved significant breakthroughs in biomedicine, offering innovative solutions for diagnostics and therapeutics. For instance, for cell imaging,279 label-free and high-resolution detection is often advantageous. AI-driven microscopy can enhance image resolution and thereby visualization of cellular/subcellular structures and functions, enabling detailed analysis of cellular morphology and behaviors. For pathology,555^–558 advanced optical imaging tools powered by AI can help pathologists quickly identify abnormalities at a microscopic level, thus speeding up the process of diagnosing conditions. Regarding in vivo tissue imaging,559^–561 real-time performance is highly necessary to address the problems caused by tissue motion. AI-powered photonics may significantly accelerate the acquisition of images and enable clinicians to monitor patients in real-time, track disease progression, and evaluate treatment effectiveness with negligible motion artifacts. Beyond these specific applications, the integration of multiple optical modalities promises to further expand diagnostic and therapeutic capabilities, providing comprehensive insights into biological and pathological processes.

It is an important trend to integrate more optical modalities and thereby information to aid diagnosis and to monitor therapy. As reviewed in Sec. 2.2, phase microscopy provides innovative approaches for live-cell imaging and pathological section analysis. AI for phase imaging allows for the collection of large amounts of quantitative data, providing insights into cellular dynamics without the need for staining or contrast agents.273^,562^,563 This noninvasive approach enhances diagnostic capabilities while maintaining cell integrity for further analysis. Polarization imaging is demonstrating more potential in surgical procedures and pathological examinations. By integrating AI, polarization imaging enables real-time, precise information acquisition during surgery, enhancing the detection of subtle tissue changes and the identification of cancerous cells.318^,564 Polarimetric imaging has been shown to improve cancer detection during minimally invasive surgeries by offering high-definition, wide-field, and label-free characterization capabilities of collagen—an imaging biomarker. Spectral imaging with AI provides precise monitoring of tissue composition and function,81 delivering high sensitivity and specificity. By capturing spectral data across multiple wavelengths, it distinguishes between different tissue types and abnormalities, empowering clinicians to identify early-stage diseases that might otherwise remain undetected.565^–567 These advancements are redefining the standards of healthcare diagnostics and treatments, laying a solid foundation for future innovations in personalized medicine.

These photonics technologies collectively represent a transformative leap in healthcare diagnostics, paving the way for personalized medicine and enabling more accurate, real-time decision-making in both diagnosis and treatment. In the future, the integration of AI and photonics will enhance the efficiency and reliability of healthcare practices by accelerating data acquisition, automating data analysis, minimizing human errors, and improving overall performance.

The societal impacts of photonic AI integration are also profound. Improved healthcare outcomes, enhanced connectivity driving economic growth, better environmental monitoring and protection, and the creation of new industries and job opportunities all showcase the potential of these technologies. It should be pointed out that realizing this potential necessitates careful consideration of the technical and ethical challenges, fostering interdisciplinary collaboration, and addressing societal impacts to harness the full promise of photonic AI.

5 Conclusion

The rapid advancements in AI have profoundly impacted the field of photonics, establishing a synergistic relationship that propels the progress of both disciplines. This review has explored the intricate interplay between AI and photonics, demonstrating how AI has become a pivotal tool for advancing photonic technologies and theories, and how photonics, in turn, holds immense potential to enhance AI capabilities. AI has significantly improved the design, optimization, and performance of photonic systems, leading to great progress and even breakthroughs in optical imaging, signal processing, and data acquisition and processing. Concurrently, the development of photonics enables AI to achieve faster processing speed, reduced energy consumption, and enhanced security, thus expanding the horizons of technological innovation. Despite the current technical and ethical challenges, the future of this interdisciplinary field appears promising. Continued research is expected to focus on developing advanced AI tools to optimize the design and fabrication processes of photonic systems. In addition, the exploration of advanced photonic platforms, such as PICs, quantum photonics, and neuromorphic photonics, is anticipated to bestow AI with unprecedented performance and novel functionalities. The intertwined development of AI and photonics is poised to drive significant innovations in various applications, including optical detection, healthcare monitoring, environmental monitoring, and optical communications. The ongoing collaboration between these fields promises to unlock new potentials and pave the way for future technological advancements.

Fu Feng has been engaged in fundamental theory and applied research in the fields of multidimensional light field manipulation, all-optical computing, and intelligent photonics. He has achieved a series of innovative results in areas such as all-optical computing, directional coupling of surface waves, surface wave sensing, and the detection and information transmission of singularity light fields. In the past 5 years, as the first or corresponding author, he has published more than 50 SCI papers in internationally renowned journals such as Nature Communications, Light: Science & Applications, Advanced Optical Materials, ACS Photonics, Nano Research, ACS Applied Materials & Interfaces, Advanced Functional Materials, Optics Letters, and Optics Express.

Dewang Huo received his doctorate degree in physics from Harbin Institute of Technology, Harbin, China, in 2021. He is currently an associate researcher at the Westlake Institute for Optoelectronics in Hangzhou, China, where his work focuses on advancing metasurface intelligent design and its innovative applications, as well as laser ultrasonic microimaging, all-optical computing, and related fields.

Ziyang Zhang received his doctorate and master’s degrees from the Institute of Modern Optics, Nankai University, Tianjin, China, in 2022. Currently, he is a basic science researcher at Zhejiang Lab, Hangzhou, China. His current research interests include multi-physics mixed light field modulation, optical neural networks, all-optical computing, and related fields.

Yijie Lou is a researcher at Zhejiang Lab. He received his PhD in optics from Zhejiang University. His work focuses on optical computing, with research interests in optical differentiator design, optical neural network modeling and training, and programmable on-chip optical networks. He has published multiple papers in these areas.

Shengyao Wang received his BS degree in microelectronics science and engineering from Changchun University of Science and Technology, Changchun, China, in 2024. He is currently working toward an MS degree at the School of Physics of Beijing Institute of Technology, Beijing, China. His research interests include nanophotonics and integrated photonic devices.

Zhijuan Gu received her bachelor’s degree in opto-electronics information science and engineering in 2022 from the School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, China, where she is currently working toward a PhD with the Wuhan National Laboratory for Optoelectronics. Her research interests include silicon photonics and optical signal processing.

Dong-Sheng Liu received his bachelor’s degree in photoelectric information science and engineering from the Department of Physics, University of Science and Technology of China (USTC) in 2020. Currently, he is a PhD candidate in physics at USTC. His research focuses on quantum machine learning in noisy intermediate-scale quantum (NISQ) systems, with a particular interest in quantum reservoir computing.

Xinhui Duan is currently an assistant researcher at Zhejiang Lab. She received her PhD in optical engineering from Beijing Institute of Technology in 2020. Prior to joining Zhejiang Lab, she was an optical engineer at Xiaomi Technology. Her primary research interests include computational holography, 3D imaging, and phase imaging, where she focuses on advancing techniques for improving imaging systems and exploring new applications in optical science.

Daqian Wang is currently a postdoctoral research fellow at the Research Center for Frontier Fundamental Studies, Zhejiang Lab, Hangzhou, China. He received his PhD in signal and information processing from Hefei University of Technology, Hefei, China, in 2022. He was an occasional student at Imperial College London, London, United Kingdom, from 2019 to 2021, sponsored by the China Scholarship Council. His research interests include computer vision, machine learning, and computational optical imaging.

Xiaowei Liu is currently an associate researcher in the Research Center for Frontier Fundamental Studies, Zhejiang Lab. She received her PhD in the College of Optical Science and Engineering, Zhejiang University. Her research interests focus on label-free super-resolution imaging, computational spectral imaging, and their applications in bio-medicine. Her academic achievement in super-resolution imaging has been selected as one of the Top Ten Advances in Optics in China.

Ji Qi is a research scientist at Zhejiang Lab. He received his PhD in medical optics from Imperial College London in 2015. Prior to joining Zhejiang Lab, he was an R&D Scientist at Elekta and held the position of a senior research fellow at the University College London. His primary research interest is in endoscopic imaging methods and technologies, including polarization endoscopy, fluorescence endoscopy, and computational endoscopy.

Shaoliang Yu obtained his BS and PhD degrees from Huazhong University of Science and Technology and Zhejiang University, respectively, before he started a postdoc position at MIT in 2017. Currently, he is the principal investigator at Zhejiang Lab, with a research focus on optical interconnects, photonics computing, and co-packaged optics. He serves as the editor for Advanced Photonics Nexus and the crack chair of Photonics Asia.

Qingyang Du is currently a principal investigator at Zhejiang Lab. He received his PhD in materials science and engineering from the Massachusetts Institute of Technology in 2018. His research focuses on chalcogenide glass heterogeneously integrated silicon photonic chips; exploring on-chip nonlinear wavelength conversion, phase change nonvolatile switches, as well as magneto-optical non-reciprocity for tunable light source, monolithic on-chip isolators, chemical sensing, and photonic computing.

Guangyong Chen is currently a researcher at Hangzhou Institute of Medicine, Chinese Academy of Sciences, and once served as the deputy director of the Research Center for Life Sciences Computing, Zhejiang Lab. His research is centered on robust learning for life sciences. He led the development of an intelligent drug discovery platform, chaired many projects, and published over 40 papers in top journals, such as Nature series journals, and top AI conferences, such as ICML, ICLR, and NeurIPS.

Cuicui Lu is a professor at School of Physics, Beijing Institute of Technology. She received her PhD from Peking University in 2015. Her research interests include topological photonics and nanophotonics. She has published 68 papers as the first author or corresponding author in journals such as Physical Review Letters, Nature Communications, Science Advances, and Advanced Photonics. She has been serving as a Topical Editor of the Optics Letters since 2020. She also serves as a young editorial board member for journals including Chinese Optics Letters, Chip, and APL Photonics.

Yu Yu received his PhD from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2009. From 2009 to 2010, he was a research associate with the Centre for Photonic Systems, Department of Engineering, University of Cambridge, Cambridge, United Kingdom. He is currently a professor with the Wuhan National Laboratory for Optoelectronics, HUST. His research interests include silicon device and all-optical signal processing.

Xifeng Ren is a professor at University of Science and Technology of China. His research is centered around quantum micro/nano photonics, including quantum photonic integrated circuits, quantum optics, and nanophotonics. He has published more than 100 SCI papers in these research areas, including 1 in Science, 1 in Nature, 7 in Physical Review Letters, 3 in Nature Communications, 3 in Light: Science & Applications, and 4 in Optica.

Xiaocong Yuan is a member of Academia Europaea (MAE), Chief Scientist of Frontier Fundamental Studies of Zhejiang Lab, and Chair Professor at Shenzhen University. He is a National Distinguished Professor and a fellow of several prestigious organizations, including OPTICA, SPIE, and the Institute of Physics. His research mainly focuses on fundamental studies of light field manipulation, surface wave sensing, photoacoustic pathological diagnosis using light fields, light field mode multiplexing mechanisms, and photonic interconnection applications. He has published over 600 SCI-indexed papers in related fields in top journals, including Science, Nature Photonics, Nature Physics, Nature Communications, Science Advances, Physical Review Letters, and PNAS. He currently serves as the editor-in-chief of the prestigious international journal Advanced Photonics.

Category: Reviews

Received: Sep. 8, 2024

Accepted: Jan. 24, 2025

Published Online: Apr. 3, 2025

The Author Email: Qi Ji (ji.qi@zhejianglab.org), Du Qingyang (qydu@zhejianglab.org), Chen Guangyong (gychen@zhejianglab.org), Lu Cuicui (cuicuilu@bit.edu.cn), Yu Yu (yuyu@mail.hust.edu.cn), Ren Xifeng (renxf@ustc.edu.cn), Yuan Xiaocong (xcyuan@zhejianglab.org)

DOI:10.1117/1.AP.7.2.024001

CSTR:32187.14.1.AP.7.2.024001