The demand for ultrahigh-speed and energy-efficient computing
Opto-Electronic Advances, Volume. 4, Issue 11, 200060-1(2021)
All-optical computing based on convolutional neural networks
The rapid development of information technology has fueled an ever-increasing demand for ultrafast and ultralow-energy-consumption computing. Existing computing instruments are pre-dominantly electronic processors, which use electrons as information carriers and possess von Neumann architecture featured by physical separation of storage and processing. The scaling of computing speed is limited not only by data transfer between memory and processing units, but also by RC delay associated with integrated circuits. Moreover, excessive heating due to Ohmic losses is becoming a severe bottleneck for both speed and power consumption scaling. Using photons as information carriers is a promising alternative. Owing to the weak third-order optical nonlinearity of conventional materials, building integrated photonic computing chips under traditional von Neumann architecture has been a challenge. Here, we report a new all-optical computing framework to realize ultrafast and ultralow-energy-consumption all-optical computing based on convolutional neural networks. The device is constructed from cascaded silicon Y-shaped waveguides with side-coupled silicon waveguide segments which we termed “weight modulators” to enable complete phase and amplitude control in each waveguide branch. The generic device concept can be used for equation solving, multifunctional logic operations as well as many other mathematical operations. Multiple computing functions including transcendental equation solvers, multifarious logic gate operators, and half-adders were experimentally demonstrated to validate the all-optical computing performances. The time-of-flight of light through the network structure corresponds to an ultrafast computing time of the order of several picoseconds with an ultralow energy consumption of dozens of femtojoules per bit. Our approach can be further expanded to fulfill other complex computing tasks based on non-von Neumann architectures and thus paves a new way for on-chip all-optical computing.
Introduction
The demand for ultrahigh-speed and energy-efficient computing
Here, we report a new strategy to realize ultrafast and ultralow-energy-consumption all-optical computing including equation solving, multifunctional logic operations based on optical convolutional neural network (CNN). Inspired by biological brains
The optical CNN consists of cascaded silicon Y-shaped waveguides with side-coupled silicon waveguide segments designed to control the amplitude and phase of light in the waveguide branches. This conceptually and architecturally simple design uniquely affords both ultrafast computing time and low energy consumption. Importantly, the design is also scalable to handle CNNs with arbitrary network complexity. Our scalable optical CNN architecture presents a universal platform for implementing CNN-related functions leveraging the vast asset base of algorithms that have been matured in the field of computer science research (Supplementary information Section 1). Another important advantage of CNNs is that they can protect signals from distortion compared with fully-connected neural networks, as CNNs only contain local connections. As a proof-of-concept, we experimentally implemented the network design through several computation tasks including transcendental equations solvers, multifunctional logic gate operators, and half-adders.
Results and discussion
Scalable network configuration
To realize CNNs in an on-chip platform, we designed an all-optical network to emulate the convolutional operations (as shown in Fig. 1(a)). The signals fed into the network are encoded in the form of light amplitude distribution in discrete input waveguides. The network weights optimized to yield the target solutions are implemented through convolution operation between layers, i.e.
Figure 1.
The CNN is constructed from cascaded element structures comprising Y-shaped silicon waveguides side-coupled with silicon weight modulators. As an example, the schematic structure of the all-optical transcendental equation solver based on CNN is shown in Fig. 1(b). There are three layers of the element structure arrays. Each element structure connected to two adjacent element structures in the adjacent layers. Weight modulators are used to regulate the weights of the network according to the coupled mode theory. The weight modulator waveguide (as shown in Fig. 2(a)) has the same width as the transmission waveguide to ensure efficient coupling and large amplitude modulation. As Fig. 2(b) shows, the magnitude of weight
Figure 2.
It’s worth mentioning that other complex mathematical operations can be systematically designed into the unified optical CNN architecture by cascading the Y-shaped element structures. In the followings, we elaborate several examples of our optical CNN design being implemented as transcendental equation solvers, multifarious logic gate operators, and half-adders. It should be noted that signals operated in the network are the complex amplitudes of the light field, and what are measured in the experiment are light intensities at output ports. Therefore, the nonlinearity is introduced in the process of the measurement to realize the various functionalities of the devices although there are only convolutional layers in our networks.
All-optical transcendental equation solver
Since equations are effective tools for describing system states and processes, solving equations
A transcendental equation with the form of a trigonometric function is selected because in general, any arbitrarily complex mathematical expressions can be decomposed into trigonometric functions by Fourier decomposition, which means that we can solve any other transcendental equations in principle. The all-optical transcendental equation solver is used to solve the equation with a variable parameter
We choose to represent the input waveform (in this case
where
Figure 3.
where
Besides excellent solution accuracy, the all-optical equation solver also features ultrafast and energy-efficient computation. The total computing time, characterized by the time-of-flight of light through the entire structure (including the waveform discretization section), is 9.4
The optical CNN architecture presented here also offers the unique potential of crosstalk elimination. Crosstalk in optical analog computing is generally caused by light backscattering between adjacent layers in a densely integrated platform. Based on our device design, the crosstalk is expected to be naturally eliminated by means of the error back propagation optimization process. Stability analysis of our network further demonstrates its high fault tolerance to defects such as weight deviation and waveguide damage (Supplementary information Section 5).
Multifarious logic gate operators
All-optical logic gates constitute the basic building blocks for ultra-high-speed all-optical chips, as any complex optical logic circuit can be composed of these logic gates. In addition, logic operation sets the foundation for more complex optical signal processing functions, such as addressing
We leverage the scalability of our network to optimize on-chip all-optical multifarious logic devices. The design optimizes 6 input ports, including 2 signal input terminals and 4 control bits with a total of 5 layers (as shown in Fig. 4(a, b)). Similar to the all-optical equation solver, the fixed network weights were optimized using the iterative algorithm. Sixteen logic functions (representing exhaustive combinations of output results corresponding to all four possible input signals 11, 10, 01, and 00 is
Figure 4.
Half-adder
All-optical half-adder can perform the calculation task of adding two input data bits and yielding a Sum bit and a Carry bit in an all-optical implementation (Fig. 5(a)). Half-adder is a basic unit of arithmetic logic operation optical circuits: for example, a full-adder can be realized by cascading two half-adders. Here we demonstrate an all-optical half-adder based on our optical CNN platform. We use 2D convolutional layers to train our CNNs for half-adder as well as multifarious logic gate operators, because shared weights cannot meet the demands in these two scenarios. After the training process, only the weights corresponding to the non-zero positions are extracted (Supplementary information Section 1). Here, 12 network weights are determined through the algorithm optimization, and an SEM image of half-adder is shown in Fig. 5(b). The arithmetic logic operations of “1” + “1” = (Sum “0”, Carry “1”), “0” + “1” = (Sum “1”, Carry “0”), and “1” + “0” = (Sum “1”, Carry “0”) are realized. The average optical intensity contrast between logic states 0 and 1 is 14.2 dB (Fig. 5(c)). The time-of-flight computing time is 2.7
Figure 5.
Moreover, based on this element structure, the desired phase distribution can be obtained at output ports by adjusting weights of network, then the spatial filtering system can be constructed to realize the Fourier transformation of the input signal. Similarly, the input function can be expressed as a linear combination of multiple monomials at a given point to achieve series expansion. In addition, by defining the input-output relationship in advance to realize the network training, the output signals corresponding to different input signals are specified to represent specific code groups. Thus the encoder can be implemented. From above, a number of signal processing functions are allowed to be implemented on the proposed platform, which promotes the whole field of nanophotonics. The performance benchmark and significance of this work are presented in Supplementary information Section 7.
Conclusion
In this paper, we experimentally demonstrated the first physically-fixed CNN for all-optical computing based on silicon waveguides. Our optical CNN is formed by cascading a simple, universal element structure comprising Y-shaped silicon waveguides side-coupled with silicon weight modulators. We implemented the design to realize all-optical transcendental equation solvers, multifarious logic gate operators, and half-adders, all of which exhibit picosecond-scale ultrafast operation and ultralow energy consumption of the order of tens of femtojoules per bit. This optical network architecture is readily scalable which has the potential to be further extended to execute other complex computing tasks simply by cascading the basic element structures. Furthermore, this platform offers the possibility of parallel computing using wavelength multiplexing. Our work therefore points to a promising direction for next-generation all-optical computing systems.
Methods
Theoretical analysis and numerical simulation.
PyTorch, a custom package in Python which is used popularly for machine learning, was used to construct the theoretical modeling of our optical neural networks. The calculations were based on 1D CNN used for the equation solver and 2D CNN used for logic devices and half-adder, respectively. Some optimizers were then used in PyTorch, applying stochastic gradient descent (SGD) in the learning process, to compute the parameters in our networks and minimize the loss function related to the model’s performance as possible. The simulation results were conducted from finite element method (via the COMSOL Multiphysics commercial software).
Device fabrication.
Devices were fabricated leveraging standard silicon microfabrication technologies. A 6% hydrogen silsesquioxane (HSQ) electron beam resist was spun onto a double-side polished silicon-on-insulator (SOI) wafer and was patterned by an Elionix ELS-F125 electron beam lithography (EBL) tool. Development of the resist was performed by immersing the chip into 25% tetramethylammonium hydroxide solution for 150 seconds. The chip was subsequently etched in an RIE tool (PlasmaTherm Inc.) with chlorine gas at a power of 200 W and a pressure of 5 mTorr (1 Torr = 133.322 Pa). After stripping the electron beam resist in HF, an additional EBL step was conducted to pattern the waveguide grating couplers with ZEP resist on the same EBL tool (etching depth of grating couplers is different from that of transmission waveguide to obtain higher coupling efficiency). The chip was developed in ZED-N50 developer and etched in the same RIE tool under identical conditions. Finally, the resist was stripped by soaking in N-Methyl-2-Pyrrolidone (NMP) overnight.
Optical measurement.
Devices were tested on a microspectroscopy measurement system. Laser beam from a home-built femtosecond pulse fiber laser system was used as the light source. The laser central wavelength was 1560 nm with a repetition rate of 100 MHz and a pulse width of 80 fs (The results are stable in the range of femtosecond pulse wavelength broadening). The signal light with adjustable spot size was focused to the input-coupling port of the sample. The output signal was collected with a long working distance objective lens (Mitutoyo 20, NA = 0.58) and imaged onto a charge coupled device (CCD) camera (Xenics, XS-4407, Belgium).
[1] Application of space-time duality to ultrahigh-speed optical signal processing. Adv Opt Photonics, 5, 274-317(2013).
[2] Integrated microwave photonics. Nat Photonics, 13, 80-90(2019).
[3] All-optical signal processing. J Lightwave Technol, 32, 660-680(2014).
[4] Ultra-low power, highly reliable, and nonvolatile hybrid MTJ/CMOS based full-adder for future VLSI design. IEEE Trans Device Mater Reliab, 17, 213-220(2017).
[5] The era of hyper-scaling in electronics. Nat Electron, 1, 442-450(2018).
[6] mGDI based parallel adder for low power applications. Microsyst Technol, 25, 1653-1658(2019).
[7] Single-chip microprocessor that communicates directly using light. Nature, 528, 534-538(2015).
[8] Optical computing: a 60-year adventure. Adv Opt Technol, 2010, 372652(2010).
[9] Programmable nanowire circuits for nanoprocessors. Nature, 470, 240-244(2011).
[10] Large third-order optical nonlinearities in transition-metal oxides. Nature, 374, 625-627(1995).
[11] Nonlinear silicon photonics. Nat Photonics, 4, 535-544(2010).
[12] Towards spike-based machine intelligence with neuromorphic computing. Nature, 575, 607-617(2019).
[13] Memristive crossbar arrays for brain-inspired computing. Nat Mater, 18, 309-323(2019).
[14] All-optical machine learning using diffractive deep neural networks. Science, 361, 1004-1008(2018).
[15] Fourier-space diffractive deep neural network. Phys Rev Lett, 123, 023901(2019).
[16] Deep learning with coherent nanophotonic circuits. Nat Photonics, 11, 441-446(2017).
[17] All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569, 208-214(2019).
[18] Parallel photonic information processing at gigabyte per second data rates using transient states. Nat Commun, 4, 1364(2013).
[19] Human action recognition with a large-scale brain-inspired photonic computer. Nat Mach Intell, 1, 530-537(2019).
[20] Inverse-designed metastructures that solve equations. Science, 363, 1333-1338(2019).
[21] All-optical polariton transistor. Nat Commun, 4, 1778(2013).
[22] All-optical logic binary encoder based on asymmetric plasmonic nanogrooves. Appl Phys Lett, 103, 121107(2013).
[23] Nanoscale on-chip all-optical logic parity checker in integrated plasmonic circuits in optical communication range. Sci Rep, 6, 24433(2016).
[24] Small footprint transistor architecture for photoswitching logic and in situ memory. Nat Nanotechnol, 14, 662-667(2019).
Get Citation
Copy Citation Text
Kun Liao, Ye Chen, Zhongcheng Yu, Xiaoyong Hu, Xingyuan Wang, Cuicui Lu, Hongtao Lin, Qingyang Du, Juejun Hu, Qihuang Gong. All-optical computing based on convolutional neural networks[J]. Opto-Electronic Advances, 2021, 4(11): 200060-1
Category: Original Article
Received: Sep. 29, 2020
Accepted: Dec. 14, 2020
Published Online: Mar. 16, 2022
The Author Email: Hu Xiaoyong (xiaoyonghu@pku.edu.cn), Wang Xingyuan (wang_xingyuan@mail.buct.edu.cn), Lin Hongtao (hometown@zju.edu.cn)