1Northwestern Polytechnical University, School of Physical Science and Technology, Ministry of Industry and Information Technology, Key Laboratory of Light Field Manipulation and Information Acquisition, Xi’an, China
2Southeast University, School of Automation, Nanjing, China
3Ministry of Education, Key Laboratory of Measurement and Control of Complex Systems of Engineering, Nanjing, China
Analog neural networks, which mimic numerical computations through energy-efficient physical transformations in hardware architectures, typically achieve lower accuracies than digital neural networks. We explore the potential of photonic neural networks to outperform digital counterparts. Unlike traditional analog computing, our extreme learning machine (ELM)-based photonic neural network operates with physical synaptic connections without relying on mathematical descriptions. Noteworthy accuracy enhancements are achieved through photonic multi-synaptic connections, going beyond conventional notions of network depth or nonlinearity. Experimental results on MNIST, Fashion-MNIST, and CIFAR-10 datasets demonstrate classification accuracies up to 99.79%, 98.26%, and 90.29%, respectively, outperforming digital counterparts and most reported hardware architectures. This underscores the transformative impact of photonic neural networks, especially with the pivotal role of photonic multi-synapses, in advancing intelligent devices and signal processing.
【AIGC One Sentence Reading】:Photonic neural networks with physical multi-synapses achieve superior accuracy over digital counterparts, showcasing transformative potential in intelligent devices and signal processing.
【AIGC Short Abstract】:Photonic neural networks, leveraging physical multi-synapses, achieve superior accuracy over digital counterparts. Our ELM-based approach enhances performance through energy-efficient photonic transformations, surpassing traditional analog methods. Experimental results on various datasets showcase significant improvements, highlighting the transformative potential of photonic technology in intelligent devices and signal processing.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
Artificial intelligence (AI) has seen extensive applications in recent decades, spanning from facial recognition to autonomous driving and natural language processing.1 However, the exponential surge in data poses challenges for traditional digital computing, leading to issues such as the “memory wall” and high energy consumption in AI tasks. The waning effect of Moore’s Law further highlights the limitations of traditional digital computing.2 Leveraging light as an information carrier presents a promising alternative, sparking strong interest in integrating photonics into AI applications.3
Traditional artificial neural networks on electronic computers execute mathematical algorithms through numerical calculations in digital form. Efforts have been made to convert these calculations into more energy-efficient analog forms via physical transformations, which can be electronic or photonic. Many photonic neural networks have adopted deep learning models, such as diffractive deep neural networks (),4–8 spatial convolution networks,9 and on-chip networks based on Mach-Zehnder interferometers (MZIs)10–12 or micro-ring resonators.13–17 These networks perform synaptic connections and nonlinear activations, for which matrix operations are executed through photonic transformations. Feature extraction is accomplished by optimizing connection strengths, akin to digital neural networks, with weights computed on computers and then applied to physical devices. Experimental test accuracy of s on the MNIST dataset has reached ,4,18,19 and a recent hybridization of with subsequent electronic digital networks achieved a classification test accuracy of 97.1%.20 In on-chip networks, the MZI architecture typically employs singular value decomposition for matrix operations, achieving a classification test accuracy of 90.5% on the MNIST dataset.11 The micro-ring architecture can implement matrix operations using a crossbar configuration via coupling relationships. A phase-change material-based micro-ring network has demonstrated an accuracy of 95.3% on the MNIST dataset.15
Experimental accuracy of hardware-architected networks is usually lower than that of digital computation due to errors in the mathematical description of physical transformations, device fabrication, and weight quantization. Most hardware architectures are designed for analog computing based on an isomorphic mathematics relationship with digital computing. Analog computing is generally considered less accurate than digital computing because digital computing operates with discrete values that can be precisely represented, whereas analog computing deals with continuous signals that are more susceptible to noise and other sources of error. Achieving software-equivalent accuracy will require improvement in both devices and hardware-aware training techniques for analog accelerators.21
Sign up for Advanced Photonics Nexus TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
However, what if an artificial neural network employs a physical transformation without relying on mathematical descriptions? The so-called physical neural network employs physics-aware training based on a hybrid in situ-in silico algorithm. A physical–digital hybrid architecture employing broadband optical second harmonic generation performed the MNIST task with 97% test accuracy.22 Extreme learning machines (ELMs),23–25 featuring an untrained hidden layer of random neurons and a trained output layer, can adopt real physical synapses without requiring mathematical descriptions, and the training is naturally physics-aware. Unlike software-architected numerical synapses, such physical synaptic connections are not analog computation, warranting exploration for accuracy performance compared with digital synapses. Photonic ELM networks26–28 harnessing free-space optical diffraction or multiple optical scattering have achieved accuracies of 92.18%27 or even close to 98%28 on the MNIST dataset.
This study delves into the accuracy performance of physical synapses compared with digital synapses in ELM neural networks. Free-space optical diffraction establishes the physical synaptic connections between input neurons and hidden neurons in the photonic neural network, whereas in the digital ELM networks, synaptic connections are realized using a random number template (RNT) and the angular-spectrum diffraction model. Tests on the MNIST,29 Fashion-MNIST,30 and CIFAR-1031 datasets reveal that the photonic network outperforms digital ELM networks. A strategy of photonic multi-synapses is demonstrated to show significant enhancement in accuracy. An input image is duplicated into multiple copies in a spatial array and projected onto the sensing layer of a camera via free-space diffraction. A hidden neuron thus receives multi-synaptic connections from an input neuron through many pathways. Experimental results show classification accuracies of 99.79% for MNIST, 98.26% for Fashion-MNIST, and 90.29% for CIFAR-10. These results are comparable to state-of-the-art digital deep learning algorithms and surpass most hardware architectures while also outperforming their digital counterparts. In addition, the photonic architecture achieves fast training speeds in the order of seconds, 2.89 TOPs/s computing speed, and low optical energy consumption at the attojoule-per-MAC (multiply-accumulate) level. The photonic multi-synapse neural network offers a simple and high-performance hardware architecture. By utilizing real physical synaptic connections, the system incorporates numerous real-world factors into the inference and training processes, bringing new opportunities for enhancing network accuracy. Photonic multi-synapses present a mechanism beyond nonlinearity and network depth and have the potential to play an important role in advancing intelligent devices and signal processing.
2 Results
2.1 Photonic Multi-synapse Neural Network Based on Input Duplicates
As illustrated in Fig. 1(a), the photonic multi-synapse neural network implements random hidden neurons via optical diffraction between the input layer and the hidden layer within the ELM framework. Photonic multi-synapses are established from an input neuron () to a hidden neuron () in multiple pathways, such that the synaptic connection weight is the sum of the weights across all paths, . This setup involves duplicating an input image using a spatial light modulator to generate multiple copies that reach the camera through distinct spatial paths, as shown in Fig. 1(b). Through optoelectronic conversion, the optical information is transformed into the digital domain, leading to hidden neurons producing output , with representing the nonlinear activation function. Subsequent processes involve down-sampling from hidden neurons to using a mean method, followed by matrix multiplication with the tensor core , and cognitive decision-making through digital computation. In the experimental schematic illustrated in Fig. 1(c), a grayscale input image is encoded as a phase image in the duplicate array, which is read by the laser light on the spatial light modulator. The region of interest (ROI) window defines the portion of the diffraction pattern on the camera used for further processing. Detailed experimental and training procedures are outlined in Methods, Supplementary Note 3, and Fig. S5 in the Supplementary Material.
Figure 1.Photonic multi-synapse neural network. (a) Schematic of network architecture. (b) Photonic multi-synaptic connection implemented by input duplicates and multi-pathway diffractive propagation. (c) Experimental schematic.
Three datasets including MNIST, Fashion-MNIST, and CIFAR-10 were employed for the image classification experiments. Binary images in MNIST and Fashion-MNIST were phase-encoded using 0 and , whereas grayscale images in CIFAR-10 were linearly mapped within a phase range of , as illustrated in Fig. 2(a). The diffractive projections of all images in the datasets were captured by a camera and subsequently processed for training and classification by computers. Original images, formatted as (with being 28 for MNIST and Fashion-MNIST, and 32 for CIFAR-10), were duplicated into arrays of and for establishing photonic multi-synaptic connections. Increasing the duplicate interval [Fig. 2(b)] led to reduced test accuracy, as seen in Fig. 2(c), prompting the adoption of null intervals in subsequent investigations. The reason for this effect is explained in Supplementary Note 4 and Fig. S11 in the Supplementary Material. In addition, for grayscale images as illustrated in Fig. 2(d), optimal performance was achieved with a phase encoding range closest to , leading to the subsequent adoption of the maximum achievable phase modulation of . The impact of varying the ROI size of the camera on test accuracy is highlighted in Fig. 2(e), with the ROI eventually set to for further experiments. Camera exposure gain, affecting the nonlinear activation functions, was set to 100% for optimization purposes, as depicted in Fig. 2(f). Furthermore, the introduction of color channels in the original CIFAR-10 dataset demonstrated enhanced test accuracy, validated by aggregating projection results for RGB channels, as shown in Figs. 2(g) and 5(e).
Figure 2.Experimental image classification. (a) Image examples from the datasets of MNIST, Fashion-MNIST, and CIFAR-10 (grayscale) in the phase-encoded form. (b) Schematic diagram of input duplicate array. (c) Test accuracy versus duplicate interval. (d) Reduction rate of classification error with respect to mono-synaptic connections. (e) Impact of ROI window on test accuracy. (f) Test accuracy under different camera exposure settings, with exposure time of 3 ms and exposure gains of 100% and 4000%. (g) Comparison of test accuracy between grayscale and color images considering joint training of RGB channels for CIFAR-10. (h) Test accuracies for the three datasets in duplicate array formats of , , and with 10,000 hidden neurons. (i)–(k) Confusion matrices for the three datasets in the format with 10,000 hidden neurons. (l)–(n) Test accuracy under different numbers of hidden neurons for the three datasets. CIFAR-10 (grayscale) is adapted in panels (c)–(f) and CIFAR-10 (RGB) is adapted in panels (h), (k), and (n).
Ultimately, image classification experiments using multi-synaptic connections were carried out on the three datasets [Fig. 2(h)]. The introduction of photonic multi-synaptic connections (, ) significantly led to a notable enhancement in test accuracy compared with mono-synaptic connections (), with a more pronounced effect observed for more complex images. Test accuracies of 99.79%, 98.26%, and 90.29% were achieved for MINST, Fashion-MNIST, and CIFAR-10, respectively, by employing input duplicates, with their respective confusion matrices shown in Figs. 2(i)–2(k). Furthermore, as the number of hidden neurons increased, the test accuracy gradually improved and approached saturation [Figs. 2(l)–2(n)]. A network comprising 900 hidden neurons demonstrated stable performance, with training times in the order of seconds. Notably, employing multi-synaptic connections with input duplicates on the CIFAR-10 dataset enhanced test accuracy by 19.94% compared with using mono-synaptic connections with 10,000 hidden neurons, surpassing traditional digital ELM networks. These image classification results are comparable to state-of-the-art deep digital networks and outperform most hardware-architected networks.
ELMs facilitate the random generation of hidden neurons without deliberate control over their connection strengths from input neurons. In traditional digital ELM networks, an RNT is utilized to define synaptic connection strengths for projecting an input neuron to hidden neurons. Conversely, photonic neural networks leverage diffractive optical propagation for physical synaptic connections, enabling the direct implementation of physical transformation without the need for mathematical representation. On the other hand, mathematical models such as the angular-spectrum diffraction model can be employed to construct digital ELM networks by simulating diffractive optical propagation numerically. For the three datasets, the model-based and the RNT-based networks exhibit similar test accuracy. A noteworthy finding from Figs. 3(a)–3(c) indicates that physical synapses outperform their numerical counterparts, particularly evident with more complex image datasets. This suggests that the mathematical transformation introduces information fidelity loss due to errors inherent in the numerical method and digital computation. These errors act as noise-impacting feature extraction within the hidden layer. By contrast, the physical transformation is immune to all these issues, facilitating high-fidelity feature extraction.
Figure 3.Test accuracy comparison. (a)–(c) Test accuracies on the three datasets for networks with mono-synaptic connections. (d) Test accuracies on the CIFAR-10 (grayscale images) dataset comparing the multi-synaptic connection effects.
The diffractive propagation responsible for synaptic connections involves transforming the spatial frequency spectrum from the input plane to the camera sensing plane. However, because the input is limited in area, crosstalk noise can arise in the frequency spectrum. Arranging the input into a duplicate array helps alleviate this crosstalk noise, as elaborated in Supplementary Note 4 and Fig. S10 in the Supplementary Material. This could explain the improvement in accuracy observed with multi-synaptic connections, demonstrated by both photonic experimentation and optical model-based numerical calculations in Fig. 3(d), but not with RNT-based numerical synaptic connections. With multi-synaptic connections, the optical model-based network achieves higher test accuracy than the RNT-based network [Figs. S1(d) and S3(c) in the Supplementary Material]. This observation suggests that optical model-based numerical synapses excel at feature extraction compared with RNT-based synapses. Although model-based numerical synaptic connections show improved accuracy with increased duplicates from to , further increments to do not yield additional benefits due to numerical implementation errors. Multi-synaptic connections offer more consistent and reliable feature extraction compared with mono-synaptic connections, with physical implementation once again demonstrating superior performance in enhancing accuracy through multi-synaptic configurations (Fig. S7 in the Supplementary Material).
3 Discussion
Our findings reveal that physical synapses outperform digital synapses in accuracy within ELM neural networks. The hardware-architected physical transformation, involved in both training and inference, circumvents the numerical errors and device parameter issues encountered in hardware-mimicking mathematical transformation. By introducing photonic multi-synapses in the physical transformation configuration, feature mapping capabilities can be significantly boosted. This mechanism, extending beyond traditional network depth and nonlinearity concepts, can further enhance accuracy performance through physical transformation. Through the input duplicates scheme implementing multi-synaptic connections, the suppression of crosstalk noise in the spatial frequency spectrum. Achieving classification accuracies of 99.79%, 98.26%, and 90.29% on benchmark datasets MNIST, Fashion-MNIST, and CIFAR-10, respectively, this photonic architecture competes effectively with deep digital networks using a single layer and outperforms its digital ELM counterparts (Table S4 in the Supplementary Material). Operating at a computing speed of 2.89 TOPs/s, with direct energy efficiency of 1.53 POPs/J and training times in seconds, our hardware-architected photonic multi-synapse neural network showcases its efficiency. Optical computing dominates over 99% of the computational composition, featuring optical energy consumption at the attojoule-per-MAC level (Supplementary Note 5 and Table S5 in the Supplementary Material). This innovative architecture proves to be hardware-friendly, enabling feature extraction solely through diffraction propagation.
Enhancing the accuracy of neural networks is crucial for practical applications. Neural networks based on physical transformation bridge the simulation-reality gap, addressing limitations inherent in analog computation. Controllable physical systems hold significant potential for executing AI tasks. The strategy of multi-synaptic connections achieves high accuracy performance in photonic neural networks, with lower complexity in physical implementation compared with digital computation. We anticipate that multi-synaptic connections will serve as a versatile strategy applicable to neural networks implemented through physical transformation. The proposed architecture is poised to inspire application-oriented AI development, drive innovation in modern computing paradigms, and cater to various fields requiring advanced computational capabilities.
4 Appendix: Methods
4.1 Experimental Setup
The experimental setup of the photonic neural network, as illustrated in Fig. 4, utilizes a continuous laser operating at a wavelength of 532 nm. The light beam was set to horizontal polarization using a polarizer, with light power adjusted to using an attenuator. After experiencing spot shaping by a spatial filter, the light was reflected by a beam splitter onto the surface of a spatial light modulator (SLM, E19×12-500-1200, Meadowlark), which is a reflective liquid crystal on silicon phase-type SLM with modulation units, each with a pixel interval of and an 8-bit precision of phase modulation. The SLM was calibrated to guarantee a linear phase response from 0 to at a wavelength of 532 nm. An input image was written on the SLM in a phase-encoded , , or duplicate array formats, and after passing through the SLM, the light carrying the phase-encoded field underwent 18-cm diffractive propagation before reaching the camera (CMOS, E3ISPM09000KPB, TOUPCAM). The selection of the diffraction distance and the duplicate array formats are described in Supplementary Note 3 and shown in Figs. S6 and S9 in the Supplementary Material.
Figure 4.Experimental setup for photonic neural network.
The ELM’s random hidden neurons were generated through diffraction propagation in the optical path and optoelectronic conversion by the camera. In the neural network, the former functioned as matrix operation and the latter functioned as nonlinear activation. The camera was set with a resolution of , an ROI window of , an exposure time of 3 ms, and an exposure gain of 100%. The image data collected by the camera were transmitted as a hidden layer matrix to the subsequent digital network. The acquisition time for all samples in one dataset is approximately 4 h, whereas the fastest acquisition time is determined by the maximum refresh rate of the SLM, which is 60 Hz. The digital network, performing image downsampling, matrix operation, and network training, was run on a computer equipped with an Intel(R) Core(TM) i9-14900KF CPU @ 3.20 GHz processor and 64 GB RAM.
4.2 Neural Network Architectures
4.2.1 General ELM neural network
A general ELM is a feedforward neural network with a single hidden layer employing random and untrained neurons [Fig. 5(a)]. For arbitrary distinct samples , the input of th sample can be represented as , and the label as . The hidden layer output , where is the hidden layer weight matrix, is the hidden layer bias vector and is the nonlinear activation function. The output can be represented as . In ELM neural networks, and are randomly generated and untrained, with only the output layer weight matrix being trained. By adjusting , the network optimization goal is to minimize the error between the predicted labels and their target labels for all samples. The output layer matrix for samples with hidden neurons is defined as , where
Figure 5.Neural network architectures. (a) General ELM neural network. (b) RNT-based neural network. (c) Optical model-based neural network. (d) Photonic neural network for grayscale image classification. (e) Photonic neural network for color image classification.
Given the target label , training corresponds to solving the problem: . The least-square solution to the problem is . Unlike the iterative training approach in deep learning, the training weights are determined analytically by solving a linear system in ELMs, empowering fast training. To make the solution more robust and to have better generalization performance, the learning parameter is determined by applying the ridge regression theory to the training by introducing the regularization coefficient : and where is the identity matrix.
4.2.2 Random number template (RNT)-based neural network
As shown in Fig. 5(b), random projection is achieved by convolving the 2D input matrix with an RNT that maps the connection strengths from an input neuron to hidden neurons, and no bias is involved. After convolution, a square function serves as the activation function for nonlinear mapping to generate the hidden neurons . Downsampling is executed using a mean method to reduce the matrix size of hidden neurons to , , , , , and , corresponding to 25, 100, 225, 900, 3600, 6400, and 10,000 neurons, respectively, denoted as . A mean method for downsampling is applied to adjust the size of a matrix (see Downsampling method). The reduced matrix is flattened into 1D format and then connected to the fully connected output layer. Network parameters for the RNT-based neural network are provided in Fig. S2 in the Supplementary Material.
4.2.3 Optical model-based neural network
As shown in Fig. 5(c), the network structure is similar to the RNT-based neural network, except that the transform relation based on the angular spectrum diffraction theory is adopted to replace the RNT for projection. It involves a numerical implementation of a mathematical transformation based on an optical model and thus is a digital neural network as well.
In the angular spectrum diffraction theory,32 the spatial frequency spectrum of an incident optical field is given by the Fourier transform where , , and are the spatial frequencies. The diffraction transfer function for free space propagation is Where , is the diffraction distance, and is the wavelength. In the spatial frequency domain, the spatial frequency spectrum at distance can be calculated as
The distribution of the diffraction field at distance is calculated by the inverse Fourier transform of the spatial frequency spectrum resulting in an intensity distribution as
However, discrete numerical implementation of the above mathematical transformation is more complicated, governed by the Nyquist sampling theorem to avoid aliasing33where is the phase of and is the sampling interval of . According to Eq. (9), the number of sampling points should satisfy where is the sampling interval of . Our calculation adopts , , and , requiring . Therefore, the number of samples for the incident field is increased to by padding the surrounding of a 2D input matrix with zeros to satisfy Eq. (10),
The discrete Fourier transform of the incident field is where and are sampling intervals and , , . The discretized transfer function is represented as and the spatial frequency spectrum of the diffraction field at distance can be expressed as
The diffraction field can be obtained through the inverse discrete Fourier transform
Based on the above formalism, the fast Fourier transform (FFT) algorithm is applied to calculate the diffraction field where FFT and IFFT represent the fast Fourier transform operation and the inverse fast Fourier transform operation, respectively. The intensity of the diffraction field is further adapted to calculate the hidden neurons in the ELM network. The subsequent procedure is identical to that of the RNT-based neural network. Network parameters for optical model-based neural network are provided in Fig. S4 in the Supplementary Material.
4.2.4 Photonic neural network
In network structures shown in Figs. 5(d) and 5(e) for classifying grayscale and color image samples, respectively, projection from input neurons to hidden neurons is obtained solely through experimental measurements and this physical transformation does not rely on mathematical formalism, which is different from the above numerical models. The hidden neurons are experimentally represented by the intensity of the diffraction field captured in the camera’s ROI region. This completes the optical computation of the photonic neural network, whereas the structure of the backend digital network remains the same as the RNT-based neural network. It should be noted that for the network classifying RGB images, after downsampling the diffraction intensities of the RGB data individually, the intensities of each channel are summed to obtain . Network parameters for photonic neural networks are provided in Fig. S8 in the Supplementary Material.
4.3 Down Sampling Method
To reduce the matrix size for hidden neurons, a simple mean interpolation method is programmed with the following MATLAB code of “ds” function lists to compress a matrix to (https://github.com/zacgogogo/-ds-function). This is a custom image compression method, whereas other algorithms, such as various pooling methods, can also be used for downsampling.
4.4 Training and Testing for ELM Neural Networks
In this work, we employed the complete datasets of MNIST, Fashion-MNIST, and CIFAR-10 for training and testing. Both MNIST and Fashion-MNIST datasets contain 60,000 training samples and 10,000 testing samples, whereas the CIFAR-10 dataset contains 50,000 training samples and 10,000 testing samples of color images. The grayscale images of CIFAR-10 used in this work are obtained by weighted summation of RGB pixel values according to
In our network structure, the hidden neurons are flattened into the 1D form after downsampling. The hidden layer matrix is constructed from for all training samples, and Eq. (3) is adapted for network training. The value of is an integer ranging from to 10, including 21 regularization coefficients for optimization. Ultimately, we selected the output weight matrix corresponding to the appropriate regularization coefficient as the final result of the network training. The selected regularization coefficients are listed in Tables S1–S3 in the Supplementary Material. For testing, the testing samples are fed into the trained network to perform inference. In addition, the training was conducted on a computer with the configuration mentioned in the experimental setup, and with the “tic-toc” command for measuring the code execution time in MATLAB.
Zhuonan Jia is a PhD student in optical engineering at the School of Physics Science and Technology, Northwestern Polytechnical University. His research interest focuses on optical neural networks.
Haopeng Tao is a PhD student in optical engineering at the School of Physics Science and Technology, Northwestern Polytechnical University. His research interest focuses on optical neural networks.
Guang-Bin Huang is a chair professor at the Southeast University, China, and is the founder of Mind PointEye, Singapore. He was a full professor at the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His two works on extreme learning machines (ELMs) have been listed by Google Scholar in 2017 as top two and top seven, respectively, in its “Classic Papers: Articles that Have Stood the Test of Time,” as top ten in artificial intelligence. He has been listed by Thomson Reuters as a “Highly Cited Researcher” (in two fields: engineering and computer science) since 2014. He has received the Best Paper Award from IEEE Transactions on Neural Networks and Learning Systems (2013).
Ting Mei is a professor at the School of Physical Science and Technology, Northwestern Polytechnical University. He received his BS and MS degrees in optical engineering from the Zhejiang University and his PhD in electrical engineering from the National University of Singapore. Previously, he held the position of tenured associate professor at the Nanyang Technological University and served as the director of the Institute of Optoelectronic Materials and Technology at the South China Normal University. His research specializes in nanophotonics and semiconductor optoelectronics.