Compact high-robustness diffractive neural network chip for water-immersed optical inference

Haitao Luan; Long Chen; Yibo Dong; Min Gu; Qiming Zhang

doi:10.3788/COL202422.120002

1. Introduction

Technological advances have enabled humans to explore polar regions, space, and deep oceans^[1-3]. Among them, the ocean covers 71% of the Earth’s surface and is a strategically important space for global ecological, resource, economic, and security development. Optical means play an important role in deep ocean detection and research^[4,5]. There are many extreme environments in the ocean, including high pressure and seawater corrosion, which place high requirements on the robustness, speed, and energy consumption of the elements or chips used. Optical chips, as an emerging technology, have received wide attention because of their light-speed-computing and ultra-low-energy-consumption features^[6,7]. Compared with electronic chips, optical chips have higher interference resistance. This is because the optical properties (like refractive index^[8]) of chip materials are generally less sensitive than their electrical properties (like carrier mobility^[9]).

In recent years, diffractive neural networks (DNNs), as a three-dimensional (3D) optical network, have been widely investigated^[10,11]. Applications, including image recognition^[12-14], optical computing^[15], phase imaging^[16,17], and scattered image reconstruction^[18], have been demonstrated based on the DNN framework, showing its potential to be used in various fields. In DNNs, neural connections are constructed based on light propagation and diffraction in free space, eliminating the need for optical waveguides, thus making the chip structure simple, which helps improve the robustness. For deep-ocean research, DNNs enable direct optical image processing at the time of reception. This feature can increase the speed of image information acquisition. Furthermore, DNNs can deal with invisible light information such as phase^[19] and polarization^[20], which may provide more comprehensive details of the detection target.

Unfortunately, the materials, architectures, and designs of existing DNNs do not yet support their use in the ocean or water environment. Currently, organic 3D printing is the main fabrication method of DNNs, which is of low mechanical strength and robustness^[21,22]. Besides, the diffractive layers of DNNs are usually spatially separated^[23], resulting in insufficient compactness, which may cause functional failures due to structural changes during long-term operation. In addition, reported DNNs are only designed to work in air and have not been optimized for use in water^[24-28].

Here, we experimentally demonstrate a compact DNN chip for water-immersed optical inference. The DNN consists of two cascaded diffractive layers, which are integrated on the two surfaces of a quartz plate, respectively. We used the double-side photolithography followed by dry etching to fabricate the chip. This integration allows the spacing and relative positions of the diffractive layers to remain stable, thus enhancing the robustness. When optics work underwater, there may be unforeseen circumstances such as water ingress from leaking system seals, which can render the device inoperable. To address this problem, we designed a DNN chip to work directly in both water and air. Through initial training value optimization and multi-objective training, the DNN chip can realize high-accuracy inference in the two media. Handwritten digit recognition and fashion product recognition were performed based on our chip. The chip shows a good performance. Recognition accuracies of four-type handwritten digits (0–3) in air and water are 91.5% and 91.4%, respectively, while the accuracies of four-type fashion products (T-shirts, trousers, bags, and shoes) in air and water are 94.6% and 92.6%, respectively. Our strategy provides a route to underwater applications of smart photonic devices, which can also be used for applications in other extreme environments.

2. Experiments and Methods

2.1. Principle

Figure 1(a) is the schematic diagram of optical inference with our bilayer DNN chip, showing that it can operate in both water and air. As depicted in Fig. 1(b), optical images generated by coherent light propagate from the input layer through diffraction in the medium to the DNN chip, undergo optical processing by the two layers, and finally propagate to the output layer. Therefore, a feedforward optical neural network can be constructed. The inference results of the chip are displayed on the output layer. For each task, there are 4 circled regions on the output layer corresponding to the four different types of inputs [Fig. 1(a)]. The light intensity distributions represent the recognition results. The weight information carried by neurons is encoded in the pixels with different phase modulations by controlling the pixel depth. Thus, through the diffraction and interference of incident light, the matrix multiplication of inputs and weights can be realized [Fig. 1(c)]. Figure 1(d) shows the digital image of our chip. The substrate we used is a fused quartz plate. The high stability of quartz ensures that the chip can cope with corrosion in various underwater environments. On the chip, we fabricated four DNNs with 2 different functions, including handwritten digit recognition and fashion product recognition. Each diffractive layer has $500 \times 500$ pixels with a pixel size of $8 μm \times 8 μm$ . Therefore, the number of neurons per layer reaches 250,000.

$Bilayer DNN chip integrated on a quartz substrate. (a) A schematic diagram of the chip capable of operating in both air and water. Optical images enter the DNN, and the final recognition results are reflected on the output layer through the distribution of light intensity. (b) The schematic diagram illustrating the propagation of light in the chip. (c) A network description of the physical computation process of the DNN chip. Dataset images are generated at the input layer and then propagate through two diffractive layers with optical operation based on coherent superposition. In our experiment, the diffractive layer is designed with binary phase modulation. (d) The digital image of the DNN chip. Scale bar, 5 mm.$

Figure 1.Bilayer DNN chip integrated on a quartz substrate. (a) A schematic diagram of the chip capable of operating in both air and water. Optical images enter the DNN, and the final recognition results are reflected on the output layer through the distribution of light intensity. (b) The schematic diagram illustrating the propagation of light in the chip. (c) A network description of the physical computation process of the DNN chip. Dataset images are generated at the input layer and then propagate through two diffractive layers with optical operation based on coherent superposition. In our experiment, the diffractive layer is designed with binary phase modulation. (d) The digital image of the DNN chip. Scale bar, 5 mm.

Download full size

View all figures

2.2. Tensorflow-Based DNN training

Artificial neural networks achieve the mimicry of the synaptic transmission of signals through the multiplication of weight matrix and inputs^[29]. In DNNs, matrix multiplication is realized through the transmission and coherent superposition of incident coherent waves. Therefore, when designing DNNs, it is necessary to construct a light propagation model between diffractive layers.

Here, we used angular spectrum diffraction to simulate the propagation of incident light^[30], $U (z) = F^{- 1} {A (k_{x}, k_{y}) \cdot \exp [i k z \sqrt{1 - {(\frac{λ k_{x}}{2 π})}^{2} - {(\frac{λ k_{y}}{2 π})}^{2}}]},$ (1)where $U (z)$ denotes the complex amplitude of the wave-front at a distance $z$ , $A (k_{x}, k_{y})$ denotes the complex amplitude of the incident wave, $k_{x}$ and $k_{y}$ are the $x$ - and $y$ -direction frequencies, respectively, in the Fourier transform domain, $λ$ is the light wavelength, and $k$ is the wavevector. The Fourier transform used in angular spectrum diffraction is suitable for training DNNs with large neurons.

Figure 2(a) illustrates the forward propagation model and error backpropagation model used in training. The training process is implemented using the TensorFlow 2.0 framework (Google Inc.) with a learning rate of 0.03 and an epoch of 400. We trained the DNNs using modified versions of the Modified National Institute of Standards and Technology (MNIST)^[31] and Fashion-Modified National Institute of Standards and Technology (Fashion-MNIST) databases^[32]. Each training set has 20,000 images with 5000 images for each type. The distances from the input layer to the DNN chip and from the DNN chip to the output layer are 10 and 25 cm, respectively.

Figure 2.The simulation results of the DNN chip. (a) The training process diagram of the DNN chip. (b)–(e) Simulated outputs of the DNNs and corresponding normalized light intensities in the 4 circled regions. (b) Task: handwritten digital recognition. Medium: air. (c) Task: fashion product recognition. Medium: air. (d) Task: handwritten digital recognition. Medium: water. (e) Task: fashion product recognition. Medium: water.

Download full size

View all figures

To ensure accurate recognition in both air and water, we performed multi-objective training, which enables simultaneous optimization of multiple losses^[33]. The process involves inputting the images in datasets into the DNN network and simulating their propagation in water and air, respectively. Then, by comparing the outputs obtained in water and air ( $U_{water}$ and $U_{air}$ ) with the target output ( $U_{target}$ ), respectively, two loss values ( ${Loss}_{water}$ and ${Loss}_{air}$ ) can be obtained. These two losses are weighted summed to generate a total loss. Based on this value, the phase distribution of each diffractive layer is optimized by back propagation and gradient descent. It should be noted that, in the water test, the DNN chip was placed in a quartz container filled with water. The light signal propagates from air into the quartz container, through the DNN, and finally out of the container. Therefore, when training in water, there is still a propagation process in air [Fig. 2(a)]. The propagation distance of light in water is equal to the thickness of the quartz container (about 1 cm).

To simulate the light propagation in water and air, it is necessary to first determine the phase modulation difference in the two environments. We achieve transmission-type phase modulation by fabricating pixels with different depths. The phase modulation difference $Δ φ$ between two pixels can be calculated by $Δ φ = \frac{2 π}{λ} Δ n Δ d,$ (2)where $Δ n$ is the refractive index difference between quartz and surrounding medium (water or air) and $Δ d$ is the height difference between the two pixels. Therefore, we can calculate that the phase modulation of the DNN chip in air is 3 times that in water. As a result, during training, we set the initial phase range $φ_{air}$ of DNNs in air to be 0–6π. In this way, the phase modulation in water can have a range of 0–2π ( $φ_{water} = \frac{φ_{air}}{3}$ ). We should note that we have tried the initial phase range of 0–2π in air, but the results are very poor. Although the accuracy of the DNN in air can still reach above 96%, the accuracy in water will drop to about 59%. Therefore, it can be speculated that to train a DNN for working in different media, the optimal range of initial phase values should be the least common multiple of the $2 π$ phase modulation it achieves in these media.

In order to facilitate chip fabrication, we perform a binarization process on the phase values, discretizing the range of 0–6π into 0 and $3 π$ . The substrate we used is a commercial fused quartz plate with a thickness of 1 mm, so the layer spacing of the DNN has to be fixed at 1 mm. The DNN was trained to work at 532 nm wavelength.

Additionally, the amplitude fields of the four regions on the output layer representing different digitals or fashion products were optimized to follow a Gaussian distribution. Compared to a typical uniform distribution, regions with Gaussian distribution tend to exhibit more concentrated intensity. In this way, the maximum light intensity density in these regions can be increased. Therefore, the camera can more easily capture effective signals, enabling it to operate with shorter exposure time and/or lower laser power configurations to reduce the noise and save energy.

2.3. Fabrication of the DNN Chip

Optical elements based on quartz materials generally have extremely long lifetime because of the high chemical stability and mechanical strength of silica^[34]. Therefore, our chips can theoretically work with high stability in various environments for a long time. Figure S1 (Supporting Information) shows the fabrication process. We used a SUSS MA6 UV photolithography machine, which can achieve pattern alignment on both sides of the substrate. The plasma dry etching is carried out with a SENTECH inductively coupled plasma etching system using ${SF}_{6}$ gas flow. The etching depth, which is the phase modulation of the pixels, can be controlled by the etching time.

3. Results

3.1. Analysis of Training Results

Figures 2(b)–2(e) display the simulation results for the two datasets (MNIST and Fashion-MNIST) in air and water, respectively. The target regions corresponding to the input image exhibit the highest light intensity, indicating the successful recognition of the input image by the DNN. Due to the binary phase distribution, the incident light cannot be fully modulated, resulting in a decrease in diffraction efficiency. Therefore, it can be observed that there is still a certain proportion of input images present in the output images.

We analyze the effect on the DNN performance with different phase discretization levels (Table 1). It can be seen that the DNN can be trained to realize high accuracies ( $> 95 %$ ) when operating in both water or air. This is partly due to the multi-objective optimization training. Another important reason is that we used a bilayer-integration chip architecture. This ensures that the propagation medium between the two layers is not affected by the outside environments. Thus, we only need to consider the phase modulation changes in different environments without worrying about the changes in the optical range between the two layers, which reduces the training constraints. After phase binarization, there is a slight decrease in accuracy, but this decrease is small (2.4% in air and 1.7% in water); therefore, we consider it to be in the acceptable range.

Table 1. The Simulated Accuracy of Fashion Product Recognition with Different Phase Discretization Levels

View table
View all Tables
Table 1. The Simulated Accuracy of Fashion Product Recognition with Different Phase Discretization Levels

Phase discretization Accuracy (test set)
Air Water
256-level 98.7% 98.5%
8-level 98.1% 98.1%
4-level 97.4% 97.5%
2-level 96.3% 96.8%

The distance to recognize a target is an important metric; therefore, we analyzed the changes in the accuracy of DNNs under different recognition distances. The distance from the input layer to the DNN chip can be regarded as the recognition distance. In the experiment, the distance we designed was 10 cm. Therefore, from the results (Fig. S2, Supporting Information), we can see that the DNN has the highest accuracy at 10 cm. As the value shifts, the accuracy gradually decreases. When the recognition distance is larger than 500 mm, the accuracy in water is less than 90%. Therefore, if we define effective recognition as an accuracy greater than 90%, the detection distance range of our chip is approximately 20–500 mm.

3.2. Robustness Analysis

Figure 3(a) shows the obtained binary phase distributions of the DNN for fashion product recognition. The binary phase distributions of the DNN for handwritten digital recognition are shown in Fig. S3 (Supporting Information). We utilized double-sided photolithography followed by plasma dry etching to engrave the DNN on the two surfaces of a quartz plate. This process is compatible with the current complementary metal–oxide semiconductor (CMOS) manufacturing processes, which means that large batches of DNN chip fabrication can be achieved.

Figure 3.Characterization and fabrication error analysis of the chip for fashion product recognition. (a) The obtained phase map of the DNN after training. (b) The optical images of the two surfaces of the DNN chip captured by a 4f optical system. Scale bars, 1 mm. (c) The simulated impact of phase modulation errors caused by etching on the accuracy of the DNN. (d) and (e) The simulated impact of alignment errors caused by double-sided photolithography on the accuracy of the DNN working in (d) air and (e) water, respectively.

Download full size

View all figures

The optical images shown in Fig. 3(b) indicate that the morphology of the sample matches well with the idea phase map. Besides, the scanning electron microscope (SEM) images (Fig. S4, Supporting Information) of the sample demonstrate that the fabrication process has a high patterning accuracy. To achieve a $3 π$ phase modulation of the 532 nm laser in air, the ideal etching depth is about 1.596 µm, which can be calculated by^[35] $φ (λ) = \frac{2 π (n_{quartz} - n_{air}) d}{λ},$ (3)where $φ (λ)$ is the target phase modulation for an incident light with a wavelength of $λ$ , $n_{quartz}$ and $n_{air}$ are the refractive indices (about 1.5) of quartz and air, respectively, and $d$ is the etching depth. Height profile analysis of the sample by a step meter indicates that the actual etching depth is approximately 1.65 µm, resulting in an actual phase modulation of $3.1 π$ . We analyzed the impact of this error on the performance of the DNN by simulation. As depicted in Fig. 3(c), it is observed that the accuracy of the DNN remains almost unchanged within a range of $\pm 0.6 π$ . Apart from etching errors, alignment between the diffractive layers also poses significant technical challenges. We also simulated the impact of this error. As illustrated in Figs. 3(d) and 3(e), an offset of approximately 2–3 pixels (16–24 µm) can lead to a significant drop in the accuracy. The double-sided photolithography ensures that the alignment error can be within a range of 1–2 µm, so this error will not affect the chip’s performance.

3.3. Performance of the DNN Chip

Figure 4(a) depicts the experimental optical setup. A laser emits light at 532 nm, and its power is adjusted by a half-wave plate (HWP) and a polarizing beam splitter (PBS). We used lenses L1 and L2 to form a 4f system for beam expansion. A pinhole is placed at the focal plane of L1 to filter the beam, allowing only the Gaussian beam to pass through. The beam then enters the digital micromirror device (DMD) to generate the input optical images at a distance of 10 cm in front of the DNN chip. Finally, the output results are displayed in a plane 25 cm behind the chip and captured by a charge-coupled device (CCD) camera. For testing in water, the DNN chip was held in place by a clamp and immersed in a quartz container filled with water [Fig. 4(b)].

Figure 4.Experimental results of the DNN chip. (a) Experimental optical setup. HWP: half-wave plate, PBS: polarizing beam splitter, BS: beam splitter, CCD: charge-coupled device, DMD: digital micromirror device. (b) The digital image of the DNN chip working in water. (c) Confusion matrices for handwritten digital recognition in air and water. (d) Confusion matrices for fashion product recognition in air and water. (e)–(h) Recorded outputs of the DNNs and corresponding normalized light intensities in the 4 circled regions. Scale bars, 2 mm. (e) Task: handwritten digital recognition. Medium: air. (f) Task: fashion product recognition. Medium: air. (g) Task: handwritten digital recognition. Medium: water. (h) Task: fashion product recognition. Medium: water.

Download full size

View all figures

We tested the DNN chip using 1000 images (250 images for each type) in the test sets. The results in Fig. 4(c) show that the DNN can reach accuracies of 91.5% in air and 91.4% in water for handwritten digital recognition, respectively, while the accuracies of the DNN for fashion product recognition are 94.6% in air and 92.6% in water [Fig. 4(d)], respectively. These values are slightly lower than that of the simulation in Table 1, which may be due to fabrication and measurement errors, as well as the different numbers of test images. These experimental results proved that our DNN chip can work in both water and air with high accuracies ( $> 90 %$ ). Figures 4(e)–4(h) show the recorded images by the camera on the output layer. When the optical images are input into the DNN, the corresponding circled region on the output layer will exhibit the maximum intensity. We notice that the input optical images are not fully modulated, which is consistent with the simulation results. From the normalized signals, it can be seen that the DNNs have a relatively high signal-to-noise ratio. Compared with the target region, the light intensity in the other three regions is significantly lower.

4. Discussion

We have realized an integrated DNN chip with its two diffractive layers fabricated on the two surfaces of a quartz plate, respectively. Based on CMOS-compatible double-side photolithography, this approach is possible for large-scale fabrication of DNN chips with various functions. To cope with the unexpected situation of working underwater, we train the DNNs that can work in both water and air through multi-objective optimization training. The integrated chip architecture reduces the training constraints, allowing the chip to maintain high accuracy while operating in the two media. It can be integrated with the camera, thereby enabling the direct light-speed analog processing of optical images. The ability to work directly in water may enable it for direct recognition or to extract information of the target objective at a close range underwater. Besides, it may be used for underwater scattering imaging. For instance, in some turbid waters, it can be used to improve the quality of the captured images. Finally, the DNN can process invisible optical information such as phase^[22] and polarization^[20], so it can be used to study invisible targets underwater or in air, like turbulence. The high chemical stability of quartz ensures that the chip can handle different extreme environments. We note that existing DNNs are mainly designed for working in air. Therefore, our design strategy shown in this work may promote the direct application of DNNs in other media, especially in some extreme environments. In the future, not limited to recognition tasks, the DNN chip can also be designed to implement other tasks, for instance, image feature extraction and underwater beam shaping, which will further expand its applications.

Special Issue: SPECIAL ISSUE ON OPTICAL INTERCONNECT AND INTEGRATED PHOTONIC CHIP TECHNOLOGIES FOR HYPER-SCALE COMPUTING SYSTEMS

Received: Mar. 30, 2024

Accepted: Jun. 3, 2024

Posted: Jun. 3, 2024

Published Online: Dec. 26, 2024

The Author Email: Yibo Dong (dyb@usst.edu.cn), Min Gu (gumin@usst.edu.cn), Qiming Zhang (qimingzhang@usst.edu.cn)

DOI:10.3788/COL202422.120002

CSTR:32184.14.COL202422.120002

Table 1. The Simulated Accuracy of Fashion Product Recognition with Different Phase Discretization Levels

Table 1. The Simulated Accuracy of Fashion Product Recognition with Different Phase Discretization Levels