Training neural networks with end-to-end optical backpropagation

Fig. 1. Illustration of optical training. (a) Network architecture of the ONN used in this work, which consists of two fully connected linear layers and a hidden layer. (b) Simplified experimental schematic of the ONN. Each linear layer performs optical MVM with a cylindrical lens and an SLM that encodes the weight matrix. Hidden layer activations are computed using SA in an atomic vapor cell. Light propagates in both directions during optical training. (c) Working principle of SA activation. The forward beam (pump) is shown by solid red arrows and the backward (probe) by purple wavy arrows. The probe transmission depends on the strength of the pump and approximates the gradient of the SA function. For high forward intensity (top panel), a large portion of the atoms are excited to the upper level. Stimulated emission produced by these atoms largely compensates for the absorption due to the atoms at the ground level. For the weak pump (bottom panel), the excited level population is small, and the absorption is significant. (d) NN training procedure. (e) Optical training procedure. Both signal and error propagations in the two directions are fully implemented optically. Loss function calculation and parameter update are left for electronics without interrupting the optical information flow.

Download full size

View in Article

Fig. 2. Multilayer ONN characterization. (a) Scatterplots of measured-against-theory results for MVM-1 (first layer forward), MVM-2a (second layer forward), and MVM-2b (second layer backward). All three MVM results are taken simultaneously. Histograms of the signal and noise error for each MVM are displayed underneath. (b) First layer activations $a_{meas}^{(1)}$ measured after the vapor cell, plotted against the theoretically expected linear MVM-1 output $z_{theory}^{(1)}$ before the cell. The green line is a best-fit curve of the theoretical SA nonlinear function. (c) Amplitude of a weak constant probe passed backward through the vapor cell as a function of the pump $z_{theory}^{(1)}$ , with a constant input probe. Measurements for both forward and backward beams are taken simultaneously.

Download full size

View in Article

Fig. 3. Optical training performance. (a) Decision boundary charts of the ONN inference output for three different classification tasks, after the ONN has been trained optically (top) or in silico (bottom). (b) Learning curves of the ONN for classification of the “Rings” dataset, showing the mean and standard deviation of the validation loss and accuracy averaged over five repeated training runs. Shown above are decision boundary charts of the ONN output for the test set, after different epochs. (c) Evolution of output neuron values and output errors, for the training set inputs of the two classes. (d) Comparison between optically measured and digitally calculated gradients. Each panel shows gradients for each of the 10 weight matrix elements.

Download full size

View in Article

Table 1. Summary of network architecture and hyperparameters used in both optical and digital training.
View table
View in Article
Table 1. Summary of network architecture and hyperparameters used in both optical and digital training.
Dataset Input neurons Hidden neurons Output neurons Learning rate Epochs Batches per epoch Batch size
Rings 2 5 2 0.01 16 20 20
XOR 0.005 30
Arches 0.01 25

Table 2. Generalization of the optical training scheme.

View table

View in Article

Table 2. Generalization of the optical training scheme.


Network layer	Function	Implementation example
Linear layer	MVM	Free-space optical multiplier and photonic crossbar array
Diffraction	Programmable optical mask
Convolution	Lens Fourier transform
Nonlinear layer	SA	Atomic vapor cell, semiconductor absorber, and graphene
Saturable gain	EDFA, SOA, and Raman amplifier