End-to-end all-optical nonlinear activator enabled by a Brillouin fiber amplifier

Caihong Teng; Qihao Sun; Shengkun Chen; Yixuan Huang; Lingjie Zhang; Aobo Ren; Jiang Wu

doi:10.1364/PRJ.559966

1. INTRODUCTION

As transistor miniaturization approaches its physical limits, traditional computing hardware will struggle to meet the computing power and energy consumption requirements of artificial neural networks (ANNs) during training and inference [1]. Benefiting from the intrinsic high speed and parallelism advantages of optics, optical neural networks (ONNs), as a novel computing paradigm, are expected to play a significant role in enabling ANN models for computation and signal processing [1 –3]. Existing ONNs mainly rely on optoelectronic hybrid computing methods, where linear and nonlinear operations are realized in the optical and electrical domains, respectively. In this process, frequent optical-electrical-optical conversions increase system latency, limit response bandwidth, and generate additional power consumption, thus undermining the most significant advantages of optical computing [4 –6]. Therefore, an all-optical approach is needed to enable large-scale scaling and high-speed applications of ONNs.

One of the challenges in realizing all-optical neural networks (AONNs) is the lack of scalable optical nonlinear devices, as linear operations alone cannot meet computational demands [7]. Several schemes have been proposed to enable the necessary nonlinear processes, including electro-absorption modulators [8], Fabry–Perot lasers [9], saturable absorbers [10], micro-ring resonators [11], and electromagnetically induced transparency [12]. A common limitation of these approaches is the reduction in transmission power after interaction with the nonlinear devices, which hinders the scalability of ONNs. Thus, it is essential to develop nonlinear activators that are compatible with computational systems. Optical phenomena, particularly stimulated Brillouin scattering (SBS), offer a promising solution. SBS is an efficient nonlinear amplification mechanism that delivers significant gain with relatively low pump power [13]. However, the application of this mechanism in ONNs has yet to be fully validated.

In this work, we present an all-optical nonlinear activator based on the Brillouin fiber amplifier (NABA) to overcome the challenges associated with transmission loss and nonlinear operations. The core of our approach is to leverage the transfer features of Brillouin scattering for efficient signal amplification. We analyze the optical characteristics of NABA under various experimental conditions and derive the corresponding nonlinear activation functions (NAFs). Experimental results illustrate that our device exhibits exceptional optical nonlinear properties, a broad response bandwidth (over 40 GHz), an ultra-low threshold (24 nW), high stability, and superior energy transfer efficiency (ETE). On this basis, various deep learning tasks are built to validate the effectiveness of the experiment-based activation model. Specifically, the classification accuracies for the MNIST and Fashion-MNIST data are 97.64% and 87.84%, respectively, while the symbol error rate drops to 0% in the regression task. Moreover, the ends of the NABA are equipped with standard single-mode fibers (SMFs), facilitating seamless integration with existing optical computing chips. Therefore, this work contributes to the realization of intelligent, high-speed, and low-power AONNs.

2. ALL-OPTICAL HARDWARE ACCELERATOR

Figure 1(a) illustrates the schematic diagram of the proposed AONNs, consisting of an input layer, multiple hidden layers, and an output layer. Each ONN layer incorporates both linear and nonlinear operations that can be cascaded to achieve ANNs of arbitrary dimensions and depth [14]. It has been theoretically shown that any real-valued matrix can be decomposed into the product of diagonal and unitary matrices through singular value decomposition, facilitating optical matrix multiplication via an essential interference structure [16], as shown in Fig. 1(b). In this configuration, the weight coefficient of each neuron can be updated by controlling the voltage loaded on the two thermal tuning electrodes of the Mach–Zehnder interferometer (MZI) [14], as shown in Fig. 1(d). After optical linear operations, the optical signals undergo the optical nonlinearity unit (ONU) to perform nonlinear processing, as shown in Fig. 1(c). During this process, the neuron iteratively processes information layer by layer using learnable parameters $w$ [14]. The output of ONU is the final calculated result, that is, $I_{out} = f (I_{in})$ , where $f (\cdot)$ is the NAF, and $I_{in}$ and $I_{out}$ are the input and output light intensities.

Figure 1.General architecture of the AONN [14,15]. (a) Decomposition of the neural networks into sequential cascades of optical interference and nonlinearity units. $X$ and $Y$ denote the input and output signals in vectors. $W_{i}$ and $f_{i}$ represent the weight coefficient and activation function of the $i$ -th hidden layer. (b) Implementing the optical interference unit (OIU) structure for matrix multiplication. For linear operations, real-valued matrix $M$ may be decomposed as $M = U Σ V^{†}$ , where $U$ is an $m \times m$ unitary matrix, $Σ$ is an $m \times n$ diagonal matrix, and $V^{†}$ is the complex conjugate of the $n \times n$ unitary matrix $V$ [14]. The unitary matrix can then be equated to an MZI network in a triangular or rectangular mesh [15]. (c) Implementing the ONU structure for nonlinear activation. Information propagates by an OIU followed by the application of an ONU. (d) Typical MZI structure. The $2 \times 2$ unitary matrix transformation is achieved by adjusting the phase of each MZI. (e) Typical NABA framework. The core is the energy conversion from pump light to backscattered waves, and its efficiency is related to the pump power.

Download full size

View all figures

Implementing nonlinear operations on optical platforms is essential for the training and decision-making processes in ONNs. The essential structure of the prepared NABA is depicted in Fig. 1(e). Herein, a narrow linewidth laser (NLL) serves as the pump source to drive the nonlinear processes. An optical circulator (OCI) directs the transmission path. A fixed-length SMF acts as the nonlinear medium to generate optical nonlinearities, while an optical isolator (ISO) ensures unidirectional propagation of the signal. In this setup, the electrostrictive effect causes periodic modulation of the medium refractive index, leading to mutual gain between the acoustic and scattered waves. This positive feedback mechanism results in an exponential gain of the scattered light, forming a backward Stokes wave that is downshifted relative to the pump light frequency, thus achieving effective signal amplification [17].

In theory, the Brillouin scattering dynamics in SMF can be described by the classical parametric-coupling model [18]. Previous studies have demonstrated that when fiber losses are considered, no exact analytical solution exists, and the system behavior can only be represented through numerical fitting [19]. More details are provided in Appendix A. Accounting for the saturation properties of Brillouin scattering, the intensity transmission of NABA in the analog domain can be modeled as [20] $T (I_{in}) = \frac{A_{1}}{1 + {(I_{in} / I_{s})}^{p}} + A_{2},$ (1)where $I_{in}$ and $I_{s}$ are the input optical and saturation intensities, respectively. $p$ is the steepness of the transition curve, and $A_{1}$ and $A_{2}$ are the saturable and non-saturable absorption components, respectively. Note that these parameters are fitted according to the experimental data.

Ultimately, the activation model of the prepared activator can be expressed as $I_{out} = T (I_{in}) \times I_{in} .$ (2)

The ONU is achieved using the proposed NABA, which can be directly cascaded into the existing optical computing chips [14]. For a given injection intensity, the output intensity can be described by the nonlinear function in Eq. (2). Note that the above model is an objective quantization of NABA, depending on the physical properties of the prepared device.

3. PERFORMANCE CHARACTERIZATION

A. Dynamic Transmission Characteristics

The experimental setup used to verify the dynamic behaviors of the prepared device is shown in Appendix B.1. In this configuration, the microwave signal source output is first disabled to analyze the energy transfer features of NABA. As shown in Fig. 2(a), the Stokes wave undergoes a frequency downshift of about 0.09 nm, corresponding to a Brillouin frequency shift of $\sim 11.25 GHz$ . Obviously, with the help of Brillouin fiber amplifiers (BFAs), the input signal is significantly enhanced. It should be emphasized that effective amplification requires the signal carrier to align with the Brillouin frequency shift.

Figure 2.NABA dynamic performance. (a) Basic mechanism of BFAs. Nonlinear amplification of the signal is realized during the energy conversion process. Time-domain waveforms of the demodulated signal at two different frequencies, (b) 100 MHz and (c) 40 GHz. For observation, the data with multiple cycles is captured. Herein, the experiment and reference signals are data with and without NABA processing, respectively.

Download full size

View all figures

The response bandwidth of the proposed nonlinear activator is subsequently examined. After passing through the NABA, the output spectra of the demodulated signals at various frequencies are measured (see Appendix B.1 for details). Figures 2(b) and 2(c) illustrate the device performance under two extreme experimental conditions. For observation, the collected signals with multiple periods are normalized. Experimental results indicate that: (I) the prepared activator exhibits no significant distortion within a 40-GHz bandwidth, which is several orders of magnitude higher than the previous 100 kHz [11]; (II) due to the response characteristics of the photoelectric conversion device, the output noise increases as the signal frequency decreases; (III) the performance of the device is influenced by experimental conditions. Specifically, the bandwidths of the photodetector (PD) and Mach–Zehnder modulator (MZM) are both 40 GHz, which restricts the capability of the proposed activator. These results demonstrate that our activator supports dynamic data transmission.

B. Static Transmission Characteristics

We then analyze the static traits of the fabricated devices to develop an experiment-based activation model. To do this, a simple double-balance detection experimental setup is built, as shown in Appendix B.2. During the experiment, the performance of NABA is closely related to the pump source. By adjusting the pump power, the prepared activator is made to be in different excitation states to achieve the desired nonlinear operation.

The nonlinear amplification curves of NABA at three typical excitation states, 5.37 mW (M1), 9.18 mW (M2), and 37.08 mW (M3), are presented in Figs. 3(a)–3(c). For observation, the optical characteristics of the prepared device are measured over a range from $- 54.57$ to 10.38 dBm. It can be seen that: (I) the transmission of the NABA decreases sharply at a specific threshold power and becomes almost constant at higher powers, exhibiting a typical saturation phenomenon; this behavior aligns with the nonlinear model described in Eq. (1); (II) the performance of the proposed nonlinear activator is sensitive to the pump power; in the strong-pump regime, the BFA exhibits greater amplification than in the weak-pump regime; (III) different NAFs can be achieved by setting the pump intensity, indicating that a single pump source can simultaneously drive multiple devices.

Figure 3.NABA static performance. (a)–(c) Mapping curve of amplification factor and input power under different pump powers; all curves show obvious nonlinear effects. Nonlinear mapping model under three different pump powers of (d) 5.37 mW, (e) 9.18 mW, and (f) 37.08 mW. Clearly, there is an excellent consistency between the experimental data (dots) and theoretical analysis (curve).

Download full size

View all figures

The above nonlinear amplification process is then mapped to the transmission response of the device, and the output is generated by multiplying the input power, as depicted in Figs. 3(d)–3(f). Experiment results show the following. (I) The performance of the BFA improves with increasing pump power. Specifically, for an injection power of 10.39 mW, the output powers for M1, M2, and M3 are 0.89 mW, 1.28 mW, and 15.15 mW, and the corresponding ETEs are 8.57%, 12.32%, and 145.81% in that order. (II) Our activator exhibits high robustness and stability due to minimal perturbations. For instance, the standard deviations (SDs) of perturbations at 10.39 mW for M1, M2, and M3 are $0.54 \times 10^{- 3}, 1.89 \times 10^{- 3}$ , and $9.71 \times 10^{- 3}$ , respectively. These small perturbations, likely caused by the experimental conditions, exhibit the high performance of our device. (III) The nonlinearity of the experiment-based model is enhanced with increasing pump power. Here, the standard mean square error (MSE) function is introduced to quantify the error between the experimental data and the linear model [21]. The error values $δ$ for M1, M2, and M3 are $1.42 \times 10^{- 7}, 1.52 \times 10^{- 4}$ , and $4.05 \times 10^{- 2}$ , respectively, and the larger $δ$ implies stronger nonlinearity. Note that 10.39-mW input power is chosen as a benchmark because it represents the minimum functionality of the prepared device, both in terms of amplification performance and stability. The above results provide key data to support the quantification of activator performance.

Table 1 summarizes the detailed parameters of several representative nonlinear models (see Appendix B.2 for a discussion of the key nonlinear parameters of NABA). It should be emphasized that no preprocessing is applied during the experiments, and the final experimental data represent the average results from multiple repeated trials. Therefore, the activation model derived from these results accurately reflects the actual device function.Table 1.

Nonlinear Coefficients for Three Typical Activation Models

		Nonlinear Coefficients
Model	Pump (mW)	$A_{1}$	$A_{2}$	$I_{S}$	$p$	ETE	SD
M1	5.37	45.54	−10.82	24 nW	0.48	8.57%	$0.54 \times 10^{- 3}$
M2	9.18	98.12	−23.61	0.65 μW	0.19	12.32%	$1.89 \times 10^{- 3}$
M3	37.08	349.41	−137.86	2.62 μW	0.05	145.81%	$9.71 \times 10^{- 3}$

Table 2 summarizes the key parameters of the ONN-related nonlinear units. Our work demonstrates clear advantages over previous studies, as outlined below. (I) All-optical framework: our approach fully leverages the inherent features of NABA to perform nonlinear operations without the need for optical-to-electrical conversion, thereby enhancing the overall performance of the ONN. (II) Ultra-low threshold: the proposed NABA threshold is as low as 24 nW, enabling the excitation of nonlinear states with minimal input. (III) High bandwidth: for practical hardware accelerators, the computing power of an ONN is directly proportional to the transmission bandwidth. Therefore, a higher bandwidth is crucial for fully exploiting the potential of optical computing. This work represents an optimization of the previous scheme based on the stimulated Brillouin scattering activator [22]. Compared to prior research, our approach offers notable improvements in transmission bandwidth, saturation threshold, and transmission efficiency, thereby providing key device support for the realization of multi-layer cascaded AONNs.Table 2.

Performance Comparison of ONN-Related Nonlinear Activators

Structure	Type	Integrated	Reconfigurable	Threshold	Bandwidth	Reference
MZM	Opto-electronic	Yes	Yes	0.1 mW	75 MHz	[23]
Photodiode	Opto-electronic	Yes	No	1.1 mW	20 GHz	[15]
SOA^a	Opto-electronic	Yes	No	1 mW	10 GHz	[24]
PPLN^b	All-optical	Yes	No	4 μW	250 MHz	[25]
DFB-LD^c	All-optical	No	No	26 μW	1 GHz	[9]
${MoTe}_{2} / OWG$ ^d	All-optical	Yes	No	0.94 μW	2.08 THz	[26]
Fano lasers	All-optical	Yes	No	0.5 mW	1 GHz	[27]
MRR^e	All-optical	Yes	Yes	0.74 mW	100 kHz	[11]
SBS	All-optical	No	No	2.29 mW	11.24 GHz	[22]
MRR	All-optical	Yes	Yes	3.16 mW	1 GHz	[28]
Ge/Si	All-optical	Yes	No	5.1 mW	70 MHz	[29]
Si/graphene	All-optical	Yes	No	5.49 mW	10 GHz	[30]
NABA	All-optical	No	No	24 nW	$> 40 GHz$	This work

SOA, semiconductor optical amplifier.

PPLN, periodically poled lithium niobate.

DFB-LD, distributed feedback laser diode.

OWG, optical waveguide.

MRR, microring resonator.

At this point, the optical characteristics of NABA have been discussed in detail. Our activator features an ultra-low threshold, wide response bandwidth, strong robustness, and high ETE. These advantages provide technical support for the physical realization of AONNs. However, there are still areas for improvement in our scheme. (I) Transmission latency: the present design employs a 20-km SMF as the gain medium, resulting in a signal delay exceeding 90 μs. This latency could be significantly reduced by utilizing alternative media with higher dispersion coefficients, such as photonic crystal fibers [31] or highly nonlinear fibers [32], which would enable equivalent nonlinear operations over shorter distances. (II) Reconfigurability: like most all-optical nonlinear activators that rely on light-matter interactions, the optical properties of NABA become fixed after fabrication, limiting flexibility for ONN applications. Interestingly, our experiments reveal that adjusting the pump power can modify the nonlinear amplification characteristics, thereby enabling partial reconfiguration of the activation model. (III) Integrability: while the fabricated NABA can be directly connected to existing computing chips, challenges remain in developing fully integrated standalone devices. Notably, inducing and suppressing SBS in chip-level devices has been demonstrated, which offers promising pathways for realizing on-chip NABA implementations [33 –35]. Unfortunately, due to experimental constraints, related work will be addressed in future research. In the next section, we consider representative deep learning tasks to showcase the performance of our activator.

4. CASE STUDY

A. Classification Task

To evaluate the performance of our activator, an ONN with saturating nonlinearity is first compared with the state-of-the-art ANN. For illustration, standard databases including MNIST and Fashion-MNIST are used as benchmark comparisons [10]. Both datasets consist of 70,000 grayscale images, including 60,000 training and 10,000 test samples. Furthermore, all data are normalized to follow the same distribution, with the intensity center of gravity positioned at the center of the $28 \times 28$ -pixel images [36]. Then, a simple network architecture is utilized for classification tasks, as shown in Fig. 4(a). The proposed network is a two-layer fully connected framework. The input layer consists of 784 neurons, corresponding to the $28 \times 28$ inputs. The hidden layer contains 128 neurons, resulting in $784 \times 128$ connections. The output layer has 10 neurons, representing the 10 output classes for handwritten digits or fashion items.

Figure 4.Performance on image classification. (a) Typical fully connected neural network frame. Learning curves for (b) MNIST and (c) Fashion-MNIST datasets under different NAFs. Herein, experiment-based nonlinear models are compared with existing NAFs. The M3 activation function is used to compute confusion matrix for (d) MNIST and (e) Fashion-MNIST datasets.

Download full size

View all figures

The objective of training ANNs is to find a set of parameters that minimizes the discrepancy between the ground-truth labels and the predicted outputs [21]. This is achieved by defining the loss function as the MSE, which is optimized using the backpropagation and stochastic gradient descent algorithms [21] (see Appendix C for details). After training, the performance of ANNs is evaluated on different test sets to assess their generalization ability. For comparison, other classical NAFs are used as benchmarks to verify the capability of our activator.

Figures 4(b) and 4(c) demonstrate the capability of ONNs with different nonlinear models. It can be observed that: (I) M3 performs the best among all NAFs with accuracies of 97.64% and 87.84% on the MNIST and Fashion-MNIST datasets, representing an improvement of 6.97% and 5.59% over the linear model; (II) the experiment-based activation models exhibit excellent performance comparable to the NAFs commonly used in computers, and the training convergence occurs even slightly faster; (III) among the experimental activation models, M1 performs the worst, and M3 performs the best. From M1 to M3, the nonlinear trend of the mapping model gradually increases [depicted in Figs. 3(d)–3(f)], and the stronger nonlinear effect enables the network to fit the target data better.

For visualization, the confusion matrix is used to evaluate the performance of the classification algorithm, as shown in Figs. 4(d) and 4(e). Each row and column of the matrix correspond to instances in the actual and predicted categories, with the diagonal elements representing correctly classified instances [10]. It can be seen that: (I) for the MNIST and Fashion-MNIST classification tasks, the sum of the diagonal elements for the M3 model is 9764 and 8784, corresponding to recognition accuracies of 97.64% and 87.84%; (II) the recognition error rate for Fashion-MNIST is higher than that for MNIST, which can be attributed to the increased complexity of the Fashion-MNIST dataset.

In this section, we use benchmark datasets commonly used in ANNs to validate the performance of the prepared activators in classification tasks. These results reveal the feasibility of NABA as an ONU. Next, a more challenging deep learning task is considered to explore the limits of the achievable performance using NABA.

B. Regression Task

Optical fiber is widely employed in optical transmission systems (OTSs) thanks to its inherent advantages. Regrettably, optical nonlinearities associated with dispersion can limit the capability of OTSs, including symbol error rate (SER), power level, and transmission bandwidth, especially in long-distance transmission systems [37]. Here, we explore the feasibility of using ONNs to address the above bottlenecks. To do this, an experiment-based OTS is developed to generate realistic training and testing data, as shown in Appendix D.

Our solution consists of three components: pre-processing, a fully connected ANN, and error quantization, as shown in Fig. 5(a). During the pre-processing stage, the original data is first transformed into a one-dimensional vector using standard algorithms. Specifically, the OSC captures 100,000 data points per cycle. Down-sampling is applied by selecting every 100th data point, resulting in a 10,000-dimensional column vector. The preprocessed data is then fed into the ANN, which consists of an input layer, a hidden layer, and an output layer. Both the input and output layers contain 10,000 neurons, corresponding to the number of symbols. The hidden layer consists of 256 neurons, thereby reducing network complexity. Ultimately, a simple decision mechanism is used to quantify the data, where the midrange of the data serves as a threshold. If the input exceeds the threshold, the output is “1;” otherwise, it is “0,” as shown in error quantization module (EQM). In order to compare the performance of the nonlinear activation model under the same judgment conditions, the judgment threshold of EQM is set to 0.5 during the simulation process.

Figure 5.Performance on regression task. (a) Data processing flowchart. In this diagram, the preprocessing module downsamples the experimental data, the fully connected ANN implements the regression function, and the error quantization module makes decisions based on the predicted data to yield the final result. SER (b) without and (c)–(e) with neural network processing. For comparison, the different activation functions including (c) M1, (d) M2, and (e) M3 are applied in ONNs.

Download full size

View all figures

Figure 5(b) demonstrates the results of directly injecting the preprocessed data into the EQM, which serves as the baseline. In this case, 783 out of 10,000 symbols are misclassified, resulting in an SER of 7.83%. For comparison, the input vector is processed by an ANN before being fed into the EQM. During the training phase, data measured at fiber lengths of 2 m, 3 km, and 10 km are used as training data, while data measured at 20 km are used for testing. Note that the algorithm of this approach follows the classification task, with the primary distinction being that in regression tasks, the output layer neurons do not apply NAF.

Figures 5(c)–5(e) illustrate the distribution information of the demodulated signal after processing by ANNs. It can be seen that: (I) with the help of the regression algorithm, the SER is reduced to 0%, which is an improvement of 7.83% over the baseline result; (II) although all NAFs can effectively separate the symbols “0” and “1,” the performance of the ANN is different for different nonlinear models; obviously, the predicted value of M1 fluctuates the most, followed by M2, and the lowest is M3; (III) the predictions of M3 are closest to the original labels, followed by M2, and M1 is the worst. Specifically, the midranges of M1, M2, and M3 are 0.47, 0.49, and 0.50 in that order. This progression illustrates that M3 achieves optimal convergence, with its midrange approaching the ideal threshold of 0.5 most closely among the three models. These results show that the stronger the nonlinearity the better the fit to the data, again arguing for the previous conclusion.

To date, two distinct tasks have been designed to demonstrate the feasibility of the prepared NABA as an ONU. The results confirm that our model meets the requirements of different applications. Of course, the system capabilities can be further improved by optimizing the structure and algorithm, which can be carried out in subsequent stages of the research.

5. CONCLUSION

The lack of scalable optical nonlinearity and the associated losses in photonic devices pose obstacles to the development of ONNs. To overcome these challenges, we propose and illustrate an end-to-end all-optical nonlinear activator based on BFA. The key to our approach is leveraging the energy-transfer properties of Brillouin scattering to achieve the desired transformations. Experimental results show that our activator exhibits excellent nonlinear identities, ultra-low threshold, wide bandwidth, strong robustness, and high ETE. The above advantages are critical for the realization of ONUs. As a proof-of-concept, the simulation-based ANN is formed to mimic the physical implementation of ONNs and validate the feasibility of using the NABA as nonlinear units. Simulations show that our model performs comparably to traditional NAFs in both classification and regression tasks. This work provides strong support for the realization of AONNs.

Category: Nonlinear Optics

Received: Feb. 18, 2025

Accepted: May. 9, 2025

Published Online: Jul. 25, 2025

The Author Email: Jiang Wu (jiangwu@uestc.edu.cn)

DOI:10.1364/PRJ.559966

CSTR:32188.14.PRJ.559966