Photonics Research, Volume. 13, Issue 8, 2145(2025)

End-to-end all-optical nonlinear activator enabled by a Brillouin fiber amplifier

Caihong Teng1, Qihao Sun1, Shengkun Chen2, Yixuan Huang1, Lingjie Zhang2, Aobo Ren1, and Jiang Wu1、*
Author Affiliations
  • 1Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
  • 2School of Optoelectronic Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
  • show less

    The rapid growth of deep learning applications has sparked a revolution in computing paradigms, with optical neural networks (ONNs) emerging as a promising platform for achieving ultra-high computing power and energy efficiency. Despite great progress in analog optical computing, the lack of scalable optical nonlinearities and losses in photonic devices pose considerable challenges for power levels, energy efficiency, and signal latency. Here, we report an end-to-end all-optical nonlinear activator that utilizes the energy conversion of Brillouin scattering to perform efficient nonlinear processing. The activator exhibits an ultra-low activation threshold (24 nW), a wide transmission bandwidth (over 40 GHz), strong robustness, and high energy transfer efficiency. These advantages provide a feasible solution to overcome the existing bottlenecks in ONNs. As a proof-of-concept, a series of tasks is designed to validate the capability of the proposed activator as an activation unit for ONNs. Simulations show that the experiment-based nonlinear model outperforms classical activation functions in classification (97.64% accuracy for MNIST and 87.84% for Fashion-MNIST) and regression (with a symbol error rate as low as 0%) tasks. This work provides valuable insights into the innovative design of all-optical neural networks.

    1. INTRODUCTION

    As transistor miniaturization approaches its physical limits, traditional computing hardware will struggle to meet the computing power and energy consumption requirements of artificial neural networks (ANNs) during training and inference [1]. Benefiting from the intrinsic high speed and parallelism advantages of optics, optical neural networks (ONNs), as a novel computing paradigm, are expected to play a significant role in enabling ANN models for computation and signal processing [13]. Existing ONNs mainly rely on optoelectronic hybrid computing methods, where linear and nonlinear operations are realized in the optical and electrical domains, respectively. In this process, frequent optical-electrical-optical conversions increase system latency, limit response bandwidth, and generate additional power consumption, thus undermining the most significant advantages of optical computing [46]. Therefore, an all-optical approach is needed to enable large-scale scaling and high-speed applications of ONNs.

    One of the challenges in realizing all-optical neural networks (AONNs) is the lack of scalable optical nonlinear devices, as linear operations alone cannot meet computational demands [7]. Several schemes have been proposed to enable the necessary nonlinear processes, including electro-absorption modulators [8], Fabry–Perot lasers [9], saturable absorbers [10], micro-ring resonators [11], and electromagnetically induced transparency [12]. A common limitation of these approaches is the reduction in transmission power after interaction with the nonlinear devices, which hinders the scalability of ONNs. Thus, it is essential to develop nonlinear activators that are compatible with computational systems. Optical phenomena, particularly stimulated Brillouin scattering (SBS), offer a promising solution. SBS is an efficient nonlinear amplification mechanism that delivers significant gain with relatively low pump power [13]. However, the application of this mechanism in ONNs has yet to be fully validated.

    In this work, we present an all-optical nonlinear activator based on the Brillouin fiber amplifier (NABA) to overcome the challenges associated with transmission loss and nonlinear operations. The core of our approach is to leverage the transfer features of Brillouin scattering for efficient signal amplification. We analyze the optical characteristics of NABA under various experimental conditions and derive the corresponding nonlinear activation functions (NAFs). Experimental results illustrate that our device exhibits exceptional optical nonlinear properties, a broad response bandwidth (over 40 GHz), an ultra-low threshold (24 nW), high stability, and superior energy transfer efficiency (ETE). On this basis, various deep learning tasks are built to validate the effectiveness of the experiment-based activation model. Specifically, the classification accuracies for the MNIST and Fashion-MNIST data are 97.64% and 87.84%, respectively, while the symbol error rate drops to 0% in the regression task. Moreover, the ends of the NABA are equipped with standard single-mode fibers (SMFs), facilitating seamless integration with existing optical computing chips. Therefore, this work contributes to the realization of intelligent, high-speed, and low-power AONNs.

    2. ALL-OPTICAL HARDWARE ACCELERATOR

    Figure 1(a) illustrates the schematic diagram of the proposed AONNs, consisting of an input layer, multiple hidden layers, and an output layer. Each ONN layer incorporates both linear and nonlinear operations that can be cascaded to achieve ANNs of arbitrary dimensions and depth [14]. It has been theoretically shown that any real-valued matrix can be decomposed into the product of diagonal and unitary matrices through singular value decomposition, facilitating optical matrix multiplication via an essential interference structure [16], as shown in Fig. 1(b). In this configuration, the weight coefficient of each neuron can be updated by controlling the voltage loaded on the two thermal tuning electrodes of the Mach–Zehnder interferometer (MZI) [14], as shown in Fig. 1(d). After optical linear operations, the optical signals undergo the optical nonlinearity unit (ONU) to perform nonlinear processing, as shown in Fig. 1(c). During this process, the neuron iteratively processes information layer by layer using learnable parameters w [14]. The output of ONU is the final calculated result, that is, Iout=f(Iin), where f(·) is the NAF, and Iin and Iout are the input and output light intensities.

    General architecture of the AONN [14,15]. (a) Decomposition of the neural networks into sequential cascades of optical interference and nonlinearity units. X and Y denote the input and output signals in vectors. Wi and fi represent the weight coefficient and activation function of the i-th hidden layer. (b) Implementing the optical interference unit (OIU) structure for matrix multiplication. For linear operations, real-valued matrix M may be decomposed as M=UΣV†, where U is an m×m unitary matrix, Σ is an m×n diagonal matrix, and V† is the complex conjugate of the n×n unitary matrix V [14]. The unitary matrix can then be equated to an MZI network in a triangular or rectangular mesh [15]. (c) Implementing the ONU structure for nonlinear activation. Information propagates by an OIU followed by the application of an ONU. (d) Typical MZI structure. The 2×2 unitary matrix transformation is achieved by adjusting the phase of each MZI. (e) Typical NABA framework. The core is the energy conversion from pump light to backscattered waves, and its efficiency is related to the pump power.

    Figure 1.General architecture of the AONN [14,15]. (a) Decomposition of the neural networks into sequential cascades of optical interference and nonlinearity units. X and Y denote the input and output signals in vectors. Wi and fi represent the weight coefficient and activation function of the i-th hidden layer. (b) Implementing the optical interference unit (OIU) structure for matrix multiplication. For linear operations, real-valued matrix M may be decomposed as M=UΣV, where U is an m×m unitary matrix, Σ is an m×n diagonal matrix, and V is the complex conjugate of the n×n unitary matrix V [14]. The unitary matrix can then be equated to an MZI network in a triangular or rectangular mesh [15]. (c) Implementing the ONU structure for nonlinear activation. Information propagates by an OIU followed by the application of an ONU. (d) Typical MZI structure. The 2×2 unitary matrix transformation is achieved by adjusting the phase of each MZI. (e) Typical NABA framework. The core is the energy conversion from pump light to backscattered waves, and its efficiency is related to the pump power.

    Implementing nonlinear operations on optical platforms is essential for the training and decision-making processes in ONNs. The essential structure of the prepared NABA is depicted in Fig. 1(e). Herein, a narrow linewidth laser (NLL) serves as the pump source to drive the nonlinear processes. An optical circulator (OCI) directs the transmission path. A fixed-length SMF acts as the nonlinear medium to generate optical nonlinearities, while an optical isolator (ISO) ensures unidirectional propagation of the signal. In this setup, the electrostrictive effect causes periodic modulation of the medium refractive index, leading to mutual gain between the acoustic and scattered waves. This positive feedback mechanism results in an exponential gain of the scattered light, forming a backward Stokes wave that is downshifted relative to the pump light frequency, thus achieving effective signal amplification [17].

    In theory, the Brillouin scattering dynamics in SMF can be described by the classical parametric-coupling model [18]. Previous studies have demonstrated that when fiber losses are considered, no exact analytical solution exists, and the system behavior can only be represented through numerical fitting [19]. More details are provided in Appendix A. Accounting for the saturation properties of Brillouin scattering, the intensity transmission of NABA in the analog domain can be modeled as [20] T(Iin)=A11+(Iin/Is)p+A2,where Iin and Is are the input optical and saturation intensities, respectively. p is the steepness of the transition curve, and A1 and A2 are the saturable and non-saturable absorption components, respectively. Note that these parameters are fitted according to the experimental data.

    Ultimately, the activation model of the prepared activator can be expressed as Iout=T(Iin)×Iin.

    The ONU is achieved using the proposed NABA, which can be directly cascaded into the existing optical computing chips [14]. For a given injection intensity, the output intensity can be described by the nonlinear function in Eq. (2). Note that the above model is an objective quantization of NABA, depending on the physical properties of the prepared device.

    3. PERFORMANCE CHARACTERIZATION

    A. Dynamic Transmission Characteristics

    The experimental setup used to verify the dynamic behaviors of the prepared device is shown in Appendix B.1. In this configuration, the microwave signal source output is first disabled to analyze the energy transfer features of NABA. As shown in Fig. 2(a), the Stokes wave undergoes a frequency downshift of about 0.09 nm, corresponding to a Brillouin frequency shift of 11.25  GHz. Obviously, with the help of Brillouin fiber amplifiers (BFAs), the input signal is significantly enhanced. It should be emphasized that effective amplification requires the signal carrier to align with the Brillouin frequency shift.

    NABA dynamic performance. (a) Basic mechanism of BFAs. Nonlinear amplification of the signal is realized during the energy conversion process. Time-domain waveforms of the demodulated signal at two different frequencies, (b) 100 MHz and (c) 40 GHz. For observation, the data with multiple cycles is captured. Herein, the experiment and reference signals are data with and without NABA processing, respectively.

    Figure 2.NABA dynamic performance. (a) Basic mechanism of BFAs. Nonlinear amplification of the signal is realized during the energy conversion process. Time-domain waveforms of the demodulated signal at two different frequencies, (b) 100 MHz and (c) 40 GHz. For observation, the data with multiple cycles is captured. Herein, the experiment and reference signals are data with and without NABA processing, respectively.

    The response bandwidth of the proposed nonlinear activator is subsequently examined. After passing through the NABA, the output spectra of the demodulated signals at various frequencies are measured (see Appendix B.1 for details). Figures 2(b) and 2(c) illustrate the device performance under two extreme experimental conditions. For observation, the collected signals with multiple periods are normalized. Experimental results indicate that: (I) the prepared activator exhibits no significant distortion within a 40-GHz bandwidth, which is several orders of magnitude higher than the previous 100 kHz [11]; (II) due to the response characteristics of the photoelectric conversion device, the output noise increases as the signal frequency decreases; (III) the performance of the device is influenced by experimental conditions. Specifically, the bandwidths of the photodetector (PD) and Mach–Zehnder modulator (MZM) are both 40 GHz, which restricts the capability of the proposed activator. These results demonstrate that our activator supports dynamic data transmission.

    B. Static Transmission Characteristics

    We then analyze the static traits of the fabricated devices to develop an experiment-based activation model. To do this, a simple double-balance detection experimental setup is built, as shown in Appendix B.2. During the experiment, the performance of NABA is closely related to the pump source. By adjusting the pump power, the prepared activator is made to be in different excitation states to achieve the desired nonlinear operation.

    The nonlinear amplification curves of NABA at three typical excitation states, 5.37 mW (M1), 9.18 mW (M2), and 37.08 mW (M3), are presented in Figs. 3(a)–3(c). For observation, the optical characteristics of the prepared device are measured over a range from 54.57 to 10.38 dBm. It can be seen that: (I) the transmission of the NABA decreases sharply at a specific threshold power and becomes almost constant at higher powers, exhibiting a typical saturation phenomenon; this behavior aligns with the nonlinear model described in Eq. (1); (II) the performance of the proposed nonlinear activator is sensitive to the pump power; in the strong-pump regime, the BFA exhibits greater amplification than in the weak-pump regime; (III) different NAFs can be achieved by setting the pump intensity, indicating that a single pump source can simultaneously drive multiple devices.

    NABA static performance. (a)–(c) Mapping curve of amplification factor and input power under different pump powers; all curves show obvious nonlinear effects. Nonlinear mapping model under three different pump powers of (d) 5.37 mW, (e) 9.18 mW, and (f) 37.08 mW. Clearly, there is an excellent consistency between the experimental data (dots) and theoretical analysis (curve).

    Figure 3.NABA static performance. (a)–(c) Mapping curve of amplification factor and input power under different pump powers; all curves show obvious nonlinear effects. Nonlinear mapping model under three different pump powers of (d) 5.37 mW, (e) 9.18 mW, and (f) 37.08 mW. Clearly, there is an excellent consistency between the experimental data (dots) and theoretical analysis (curve).

    The above nonlinear amplification process is then mapped to the transmission response of the device, and the output is generated by multiplying the input power, as depicted in Figs. 3(d)–3(f). Experiment results show the following. (I) The performance of the BFA improves with increasing pump power. Specifically, for an injection power of 10.39 mW, the output powers for M1, M2, and M3 are 0.89 mW, 1.28 mW, and 15.15 mW, and the corresponding ETEs are 8.57%, 12.32%, and 145.81% in that order. (II) Our activator exhibits high robustness and stability due to minimal perturbations. For instance, the standard deviations (SDs) of perturbations at 10.39 mW for M1, M2, and M3 are 0.54×103,1.89×103, and 9.71×103, respectively. These small perturbations, likely caused by the experimental conditions, exhibit the high performance of our device. (III) The nonlinearity of the experiment-based model is enhanced with increasing pump power. Here, the standard mean square error (MSE) function is introduced to quantify the error between the experimental data and the linear model [21]. The error values δ for M1, M2, and M3 are 1.42×107,1.52×104, and 4.05×102, respectively, and the larger δ implies stronger nonlinearity. Note that 10.39-mW input power is chosen as a benchmark because it represents the minimum functionality of the prepared device, both in terms of amplification performance and stability. The above results provide key data to support the quantification of activator performance.

    Table 1 summarizes the detailed parameters of several representative nonlinear models (see Appendix B.2 for a discussion of the key nonlinear parameters of NABA). It should be emphasized that no preprocessing is applied during the experiments, and the final experimental data represent the average results from multiple repeated trials. Therefore, the activation model derived from these results accurately reflects the actual device function.

    Nonlinear Coefficients for Three Typical Activation Models

    Nonlinear Coefficients
    ModelPump (mW)A1A2ISpETESD
    M15.3745.54−10.8224 nW0.488.57%0.54×103
    M29.1898.12−23.610.65 μW0.1912.32%1.89×103
    M337.08349.41−137.862.62 μW0.05145.81%9.71×103

    Table 2 summarizes the key parameters of the ONN-related nonlinear units. Our work demonstrates clear advantages over previous studies, as outlined below. (I) All-optical framework: our approach fully leverages the inherent features of NABA to perform nonlinear operations without the need for optical-to-electrical conversion, thereby enhancing the overall performance of the ONN. (II) Ultra-low threshold: the proposed NABA threshold is as low as 24 nW, enabling the excitation of nonlinear states with minimal input. (III) High bandwidth: for practical hardware accelerators, the computing power of an ONN is directly proportional to the transmission bandwidth. Therefore, a higher bandwidth is crucial for fully exploiting the potential of optical computing. This work represents an optimization of the previous scheme based on the stimulated Brillouin scattering activator [22]. Compared to prior research, our approach offers notable improvements in transmission bandwidth, saturation threshold, and transmission efficiency, thereby providing key device support for the realization of multi-layer cascaded AONNs.

    Performance Comparison of ONN-Related Nonlinear Activators

    StructureTypeIntegratedReconfigurableThresholdBandwidthReference
    MZMOpto-electronicYesYes0.1 mW75 MHz[23]
    PhotodiodeOpto-electronicYesNo1.1 mW20 GHz[15]
    SOAaOpto-electronicYesNo1 mW10 GHz[24]
    PPLNbAll-opticalYesNo4 μW250 MHz[25]
    DFB-LDcAll-opticalNoNo26 μW1 GHz[9]
    MoTe2/OWGdAll-opticalYesNo0.94 μW2.08 THz[26]
    Fano lasersAll-opticalYesNo0.5 mW1 GHz[27]
    MRReAll-opticalYesYes0.74 mW100 kHz[11]
    SBSAll-opticalNoNo2.29 mW11.24 GHz[22]
    MRRAll-opticalYesYes3.16 mW1 GHz[28]
    Ge/SiAll-opticalYesNo5.1 mW70 MHz[29]
    Si/grapheneAll-opticalYesNo5.49 mW10 GHz[30]
    NABAAll-opticalNoNo24 nW>40  GHzThis work

    SOA, semiconductor optical amplifier.

    PPLN, periodically poled lithium niobate.

    DFB-LD, distributed feedback laser diode.

    OWG, optical waveguide.

    MRR, microring resonator.

    At this point, the optical characteristics of NABA have been discussed in detail. Our activator features an ultra-low threshold, wide response bandwidth, strong robustness, and high ETE. These advantages provide technical support for the physical realization of AONNs. However, there are still areas for improvement in our scheme. (I) Transmission latency: the present design employs a 20-km SMF as the gain medium, resulting in a signal delay exceeding 90 μs. This latency could be significantly reduced by utilizing alternative media with higher dispersion coefficients, such as photonic crystal fibers [31] or highly nonlinear fibers [32], which would enable equivalent nonlinear operations over shorter distances. (II) Reconfigurability: like most all-optical nonlinear activators that rely on light-matter interactions, the optical properties of NABA become fixed after fabrication, limiting flexibility for ONN applications. Interestingly, our experiments reveal that adjusting the pump power can modify the nonlinear amplification characteristics, thereby enabling partial reconfiguration of the activation model. (III) Integrability: while the fabricated NABA can be directly connected to existing computing chips, challenges remain in developing fully integrated standalone devices. Notably, inducing and suppressing SBS in chip-level devices has been demonstrated, which offers promising pathways for realizing on-chip NABA implementations [3335]. Unfortunately, due to experimental constraints, related work will be addressed in future research. In the next section, we consider representative deep learning tasks to showcase the performance of our activator.

    4. CASE STUDY

    A. Classification Task

    To evaluate the performance of our activator, an ONN with saturating nonlinearity is first compared with the state-of-the-art ANN. For illustration, standard databases including MNIST and Fashion-MNIST are used as benchmark comparisons [10]. Both datasets consist of 70,000 grayscale images, including 60,000 training and 10,000 test samples. Furthermore, all data are normalized to follow the same distribution, with the intensity center of gravity positioned at the center of the 28×28-pixel images [36]. Then, a simple network architecture is utilized for classification tasks, as shown in Fig. 4(a). The proposed network is a two-layer fully connected framework. The input layer consists of 784 neurons, corresponding to the 28×28 inputs. The hidden layer contains 128 neurons, resulting in 784×128 connections. The output layer has 10 neurons, representing the 10 output classes for handwritten digits or fashion items.

    Performance on image classification. (a) Typical fully connected neural network frame. Learning curves for (b) MNIST and (c) Fashion-MNIST datasets under different NAFs. Herein, experiment-based nonlinear models are compared with existing NAFs. The M3 activation function is used to compute confusion matrix for (d) MNIST and (e) Fashion-MNIST datasets.

    Figure 4.Performance on image classification. (a) Typical fully connected neural network frame. Learning curves for (b) MNIST and (c) Fashion-MNIST datasets under different NAFs. Herein, experiment-based nonlinear models are compared with existing NAFs. The M3 activation function is used to compute confusion matrix for (d) MNIST and (e) Fashion-MNIST datasets.

    The objective of training ANNs is to find a set of parameters that minimizes the discrepancy between the ground-truth labels and the predicted outputs [21]. This is achieved by defining the loss function as the MSE, which is optimized using the backpropagation and stochastic gradient descent algorithms [21] (see Appendix C for details). After training, the performance of ANNs is evaluated on different test sets to assess their generalization ability. For comparison, other classical NAFs are used as benchmarks to verify the capability of our activator.

    Figures 4(b) and 4(c) demonstrate the capability of ONNs with different nonlinear models. It can be observed that: (I) M3 performs the best among all NAFs with accuracies of 97.64% and 87.84% on the MNIST and Fashion-MNIST datasets, representing an improvement of 6.97% and 5.59% over the linear model; (II) the experiment-based activation models exhibit excellent performance comparable to the NAFs commonly used in computers, and the training convergence occurs even slightly faster; (III) among the experimental activation models, M1 performs the worst, and M3 performs the best. From M1 to M3, the nonlinear trend of the mapping model gradually increases [depicted in Figs. 3(d)–3(f)], and the stronger nonlinear effect enables the network to fit the target data better.

    For visualization, the confusion matrix is used to evaluate the performance of the classification algorithm, as shown in Figs. 4(d) and 4(e). Each row and column of the matrix correspond to instances in the actual and predicted categories, with the diagonal elements representing correctly classified instances [10]. It can be seen that: (I) for the MNIST and Fashion-MNIST classification tasks, the sum of the diagonal elements for the M3 model is 9764 and 8784, corresponding to recognition accuracies of 97.64% and 87.84%; (II) the recognition error rate for Fashion-MNIST is higher than that for MNIST, which can be attributed to the increased complexity of the Fashion-MNIST dataset.

    In this section, we use benchmark datasets commonly used in ANNs to validate the performance of the prepared activators in classification tasks. These results reveal the feasibility of NABA as an ONU. Next, a more challenging deep learning task is considered to explore the limits of the achievable performance using NABA.

    B. Regression Task

    Optical fiber is widely employed in optical transmission systems (OTSs) thanks to its inherent advantages. Regrettably, optical nonlinearities associated with dispersion can limit the capability of OTSs, including symbol error rate (SER), power level, and transmission bandwidth, especially in long-distance transmission systems [37]. Here, we explore the feasibility of using ONNs to address the above bottlenecks. To do this, an experiment-based OTS is developed to generate realistic training and testing data, as shown in Appendix D.

    Our solution consists of three components: pre-processing, a fully connected ANN, and error quantization, as shown in Fig. 5(a). During the pre-processing stage, the original data is first transformed into a one-dimensional vector using standard algorithms. Specifically, the OSC captures 100,000 data points per cycle. Down-sampling is applied by selecting every 100th data point, resulting in a 10,000-dimensional column vector. The preprocessed data is then fed into the ANN, which consists of an input layer, a hidden layer, and an output layer. Both the input and output layers contain 10,000 neurons, corresponding to the number of symbols. The hidden layer consists of 256 neurons, thereby reducing network complexity. Ultimately, a simple decision mechanism is used to quantify the data, where the midrange of the data serves as a threshold. If the input exceeds the threshold, the output is “1;” otherwise, it is “0,” as shown in error quantization module (EQM). In order to compare the performance of the nonlinear activation model under the same judgment conditions, the judgment threshold of EQM is set to 0.5 during the simulation process.

    Performance on regression task. (a) Data processing flowchart. In this diagram, the preprocessing module downsamples the experimental data, the fully connected ANN implements the regression function, and the error quantization module makes decisions based on the predicted data to yield the final result. SER (b) without and (c)–(e) with neural network processing. For comparison, the different activation functions including (c) M1, (d) M2, and (e) M3 are applied in ONNs.

    Figure 5.Performance on regression task. (a) Data processing flowchart. In this diagram, the preprocessing module downsamples the experimental data, the fully connected ANN implements the regression function, and the error quantization module makes decisions based on the predicted data to yield the final result. SER (b) without and (c)–(e) with neural network processing. For comparison, the different activation functions including (c) M1, (d) M2, and (e) M3 are applied in ONNs.

    Figure 5(b) demonstrates the results of directly injecting the preprocessed data into the EQM, which serves as the baseline. In this case, 783 out of 10,000 symbols are misclassified, resulting in an SER of 7.83%. For comparison, the input vector is processed by an ANN before being fed into the EQM. During the training phase, data measured at fiber lengths of 2 m, 3 km, and 10 km are used as training data, while data measured at 20 km are used for testing. Note that the algorithm of this approach follows the classification task, with the primary distinction being that in regression tasks, the output layer neurons do not apply NAF.

    Figures 5(c)–5(e) illustrate the distribution information of the demodulated signal after processing by ANNs. It can be seen that: (I) with the help of the regression algorithm, the SER is reduced to 0%, which is an improvement of 7.83% over the baseline result; (II) although all NAFs can effectively separate the symbols “0” and “1,” the performance of the ANN is different for different nonlinear models; obviously, the predicted value of M1 fluctuates the most, followed by M2, and the lowest is M3; (III) the predictions of M3 are closest to the original labels, followed by M2, and M1 is the worst. Specifically, the midranges of M1, M2, and M3 are 0.47, 0.49, and 0.50 in that order. This progression illustrates that M3 achieves optimal convergence, with its midrange approaching the ideal threshold of 0.5 most closely among the three models. These results show that the stronger the nonlinearity the better the fit to the data, again arguing for the previous conclusion.

    To date, two distinct tasks have been designed to demonstrate the feasibility of the prepared NABA as an ONU. The results confirm that our model meets the requirements of different applications. Of course, the system capabilities can be further improved by optimizing the structure and algorithm, which can be carried out in subsequent stages of the research.

    5. CONCLUSION

    The lack of scalable optical nonlinearity and the associated losses in photonic devices pose obstacles to the development of ONNs. To overcome these challenges, we propose and illustrate an end-to-end all-optical nonlinear activator based on BFA. The key to our approach is leveraging the energy-transfer properties of Brillouin scattering to achieve the desired transformations. Experimental results show that our activator exhibits excellent nonlinear identities, ultra-low threshold, wide bandwidth, strong robustness, and high ETE. The above advantages are critical for the realization of ONUs. As a proof-of-concept, the simulation-based ANN is formed to mimic the physical implementation of ONNs and validate the feasibility of using the NABA as nonlinear units. Simulations show that our model performs comparably to traditional NAFs in both classification and regression tasks. This work provides strong support for the realization of AONNs.

    APPENDIX A: BRILLOUIN AMPLIFICATION MECHANISM

    Optical fiber, as a typical non-uniform medium, exhibits various scattering effects during light propagation due to inhomogeneities in the refractive index of the medium [32]. Among these scattering phenomena, Brillouin scattering dominates as the strongest optical nonlinear process, surpassing Kerr-induced nonlinearity by orders of magnitude [38]. Theoretically, SBS arises from electrostriction, a physical phenomenon where light induces a change in the density of a medium [39], as shown in Fig. 6. In this process, the input wave generates an acoustic wave via electrostriction, simultaneously modulating the refractive index of the nonlinear medium to form a dynamic grating. This grating diffracts the pump wave, generating a Stokes light wave with a downshifted frequency. Meanwhile, the Stokes and input waves enhance electrostriction through interference, strengthening the moving grating in the gain medium, thereby amplifying the Stokes wave and establishing positive feedback. As a result, the pump wave’s energy is efficiently converted via photon-phonon interactions, enabling nonlinear amplification [39].

    Basic principle of Brillouin scattering. The energy transfer process in Brillouin amplification can be described as a coupled three-wave interaction involving a pump wave, a Stokes wave, and an acoustic wave [40]. The Stokes wave primarily propagates in the direction opposite to that of the pump wave due to the negligible forward scattering in the fiber [40].

    Figure 6.Basic principle of Brillouin scattering. The energy transfer process in Brillouin amplification can be described as a coupled three-wave interaction involving a pump wave, a Stokes wave, and an acoustic wave [40]. The Stokes wave primarily propagates in the direction opposite to that of the pump wave due to the negligible forward scattering in the fiber [40].

    Apparently, SBS is a typical nonlinear process, where the key mechanism involves the transfer of light wave energy through interaction with acoustic waves. In this process, the gain medium interacts with light waves via acoustic excitation, transferring energy from the light waves to the acoustic waves through a nonlinear process. This energy transfer is closely related to the input power [19]. Specifically, at low input power, light wave energy is effectively transferred to acoustic waves, resulting in a significant Brillouin scattering gain. As the input power increases, the excited acoustic waves induce an attenuation effect that inhibits further energy transfer, saturating the scattering gain. The saturation effect is a key feature of the SBS process, reflecting the tendency of the Brillouin gain to level off as input power increases. In this regime, the increase in output power becomes gradual [19]. Therefore, the transmission characteristics of NABA can be described by a typical saturation model [41].

    APPENDIX B: EXPERIMENTAL CONFIGURATION

    1. Dynamic Transmission Characteristics

    A simple experimental setup is used to investigate the dynamic behavior of SBS and quantify the response bandwidth of NABA, as shown in Fig. 7. The optical carrier from the NLL1 (PureSpectrum, Teraxion), with a wavelength of 1550.73 nm and a power of 12.02 dBm, is coupled into a 40-GHz MZM (IML-1550-40-G, Optilab) through a polarization controller (PC). The PC minimizes polarization-dependent loss, and the microwave signal source (MSS; SMB 100A, Rohde & Schwarz) generates the baseband signal. The intensity-modulated signal is then transmitted through a 20-km SMF via an isolator, which blocks the counter-propagating optical wave. Simultaneously, the pump light at 1550.64 nm is emitted from NLL2 and injected into the nonlinear medium through an optical circulator (OCI) to introduce Brillouin amplification. Finally, the amplified signal is filtered to remove out-of-band noise using a tunable optical filter (TOF; OTF-350, Santec) and injected into a 40-GHz PD (KG-PR-40G-AC-FA, Conquer) for signal recovery. The spectra of the generated signals are measured using an electrical spectrum analyzer (ESA; FSU, Rohde & Schwarz).

    Schematic diagram of the NABA dynamic characterization experiment. In this configuration, the signal propagates in the direction opposite to that of the pump source.

    Figure 7.Schematic diagram of the NABA dynamic characterization experiment. In this configuration, the signal propagates in the direction opposite to that of the pump source.

    Wideband response characteristics. (a) Spectral information of demodulated signals at different modulation frequencies. (b) 40-GHz signal spectrum.

    Figure 8.Wideband response characteristics. (a) Spectral information of demodulated signals at different modulation frequencies. (b) 40-GHz signal spectrum.

    (a) Schematic diagram of double-balanced detection. During the experiment, all data are automatically recorded by the computer, minimizing human error in the measurements. (b) Mapping curve of amplification factor and input power under different pump powers; all curves show obvious nonlinear effects.

    Figure 9.(a) Schematic diagram of double-balanced detection. During the experiment, all data are automatically recorded by the computer, minimizing human error in the measurements. (b) Mapping curve of amplification factor and input power under different pump powers; all curves show obvious nonlinear effects.

    Fitting curves of the four core parameters (a) A1, A2, (b) IS, and p versus pump power. Herein, all parameters exhibit saturation behavior.

    Figure 10.Fitting curves of the four core parameters (a) A1, A2, (b) IS, and p versus pump power. Herein, all parameters exhibit saturation behavior.

    APPENDIX C: TRAINING AND INFERENCE MECHANISM

    1. Forward Propagation

    Feedforward neural networks (FNNs) are the simplest type of deep learning model. In this architecture, signals propagate unidirectionally from the input layer to the output layer through an intermediate hidden layer, without feedback loops or inter-layer connections. During this process, neurons use trainable parameters to process the input data sequentially. Additionally, each neuron in a given layer is directly connected to all neurons in the subsequent layer [42]. Combining this with the typical structure of artificial neurons, the forward propagation process can be described as ajm=f(kwjkmakm1+bjm),where ajm is the output of the j-th neuron in layer m. wjkm is the weight of the k-th neuron in the layer m1 to the j-th neuron of the layer m. bjm is the bias of the j-th neuron in layer m.

    Neural network models employed in engineering typically omit the bias term, as they are essentially superpositions of linear functions. In reality, the weighted connections between neighboring neurons in an ANN can be portrayed by a matrix, where the elements correspond to the weight values [42]. On this basis, the vector form of Eq. (C1) can be described as am=f(wmam1).

    For a two-layer FNN, the forward propagation process can be described as follows: Step1.a1=w1x1,Step2.z1=f(a1),Step3.a2=w2z1,Step4.y=f(a2).

    Note that NAFs are crucial for enhancing the capabilities of ANNs. The primary objective of this work is to use the experiment-based activation model to achieve the desired nonlinear operation. Other commonly used activation functions in ANNs include the sigmoid function σ(x), rectified linear unit (Relu), and hyperbolic tangent Tanh(x) [21].

    2. Backward Propagation

    To evaluate the performance of ANNs, the loss function is introduced to quantify the difference between the predicted and actual results. In this work, the classical MSE is used as the loss function [42]: C=1Ni(yiyi¯)2,where N is the number of training samples, and y and y¯ are the predicted and true label values, respectively.

    The objective of training an ANN is to identify a set of trainable parameters that minimizes the loss function C. The gradient, a vector, indicates the direction of the fastest change in the function and the rate at which the function value increases. In theory, subtracting the corresponding gradient allows for the rapid convergence to the minimum function value. To achieve this, it is necessary to compute the partial derivative of C with respect to each trainable parameter in the network [43]. The most widely used algorithm to accomplish this is stochastic gradient descent, which utilizes the chain rule in a flexible manner [44]. Specifically, the impact of changes in weights and biases on the loss function can be expressed by the following equation [42]. To simplify the calculation, two intermediate variables are introduced: {zl=wlal1+bl,δl=Czl,where δl is the error of the neurons in the l-th layer. Therefore, the error of the output layer L can be illustrated as δL=aCf(zL),where the vector aC describes the change rate of C with respect to the activation value. The Hadamard product is the product of the elements of two vectors.

    Then, the error between the next layer and the current layer can be expressed as δl=[(wl+1)Tδl+1]f(zl),where T is the transpose of the matrix. Equations (C5) and (C6) outline the core concept of the backpropagation process. During this process, the difference between the actual output and the expected result is propagated backward from the output layer to the input layer. The intermediate errors of each layer are retained, allowing the error for any given layer to be computed. The rate of change of C with respect to the biases and weights of any layer in the network can be represented as {Cbjl=δjl,Cwjkl=akl1δjl.

    Ultimately, the update rule for trainable parameters can be expressed as {wlwl+ηNxδx,l(ax,l1)T,blblηNxδx,l,where the hyperparameter η is the learning rate, which controls the step size during network training. The complete process of ANN training is elaborated in Eqs. (C1)–(C8), and the pseudo-code of the corresponding algorithm is shown in Fig. 11.

    Completed pseudo-code [45]. dim is the abbreviation of dimension; x and y are the training sample and the desired output, respectively. C and θ are the loss function and learning rate, while w and b are the weight and bias vectors. m is the sample size; H and O are the number of neurons in each layer.

    Figure 11.Completed pseudo-code [45]. dim is the abbreviation of dimension; x and y are the training sample and the desired output, respectively. C and θ are the loss function and learning rate, while w and b are the weight and bias vectors. m is the sample size; H and O are the number of neurons in each layer.

    APPENDIX D: REGRESSION TASK DATASET GENERATION

    To generate realistic training and testing data, an experiment-based OTS was developed, as illustrated in Fig. 12(a). In this experiment, a CW light source with a power level of 12.02 dBm and a wavelength of 1550.73 nm was emitted by an NLL and coupled into a 40-GHz MZM. A fixed-length SMF was embedded between the ISO and the 40-GHz PD, serving both as an optical delay line to achieve low phase noise and as a dispersive element to introduce power fading. To systematically analyze the impact of fiber length on the OTS, the dispersion-induced power fading for different fiber lengths was measured using a vector network analyzer (VNA; N5235A, Keysight), as shown in Fig. 12(b). The results indicate that longer transmission distances significantly degrade system performance. To address this issue, we propose the implementation of an ANN to enhance system performance.

    Training data generation. (a) Schematic of the typical OTS architecture. (b) Power fading curves for different fiber lengths.

    Figure 12.Training data generation. (a) Schematic of the typical OTS architecture. (b) Power fading curves for different fiber lengths.

    Flowchart of ONN dataset generation for OTSs. (a) Complete cycle data captured by an OSC. (b) Down-sampling of the original data. In this process, the orange and blue data represent the sampled values for symbol “1” and symbol “0,” respectively. Notably, normalization is not required as the sampled values fall within the appropriate range.

    Figure 13.Flowchart of ONN dataset generation for OTSs. (a) Complete cycle data captured by an OSC. (b) Down-sampling of the original data. In this process, the orange and blue data represent the sampled values for symbol “1” and symbol “0,” respectively. Notably, normalization is not required as the sampled values fall within the appropriate range.

    The data, after removing the flag bit, are considered valid and consist of 100,000 data points within a complete signal period. Preprocessing operations are then applied to the data. During preprocessing, the original data are transformed into a one-dimensional vector using standard algorithms. Specifically, the OSC (DPO75002SX, Tektronix) captures 100,000 data points per cycle. Down-sampling is performed by selecting every 10th data point (corresponding to a 2-ns time interval), resulting in a 10,000-dimensional column vector. In this process, the value corresponding to the midpoint of each symbol is selected as the final input data, as shown in Fig. 13(b).

    Ultimately, 80 datasets (4×20=80) are measured at four distinct fiber lengths: 2 m, 3 km, 10 km, and 20 km. From these, four datasets are randomly selected as the inference dataset, while the remaining 76 datasets are utilized as the training dataset. Each dataset consists of inputs and labels represented as 100,000-dimensional column vectors. Note that the experimental data are collected under independent conditions, ensuring the reliability of the results. Additionally, the input data and label information are strictly paired to maintain consistency.

    Tools

    Get Citation

    Copy Citation Text

    Caihong Teng, Qihao Sun, Shengkun Chen, Yixuan Huang, Lingjie Zhang, Aobo Ren, Jiang Wu, "End-to-end all-optical nonlinear activator enabled by a Brillouin fiber amplifier," Photonics Res. 13, 2145 (2025)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Nonlinear Optics

    Received: Feb. 18, 2025

    Accepted: May. 9, 2025

    Published Online: Jul. 25, 2025

    The Author Email: Jiang Wu (jiangwu@uestc.edu.cn)

    DOI:10.1364/PRJ.559966

    CSTR:32188.14.PRJ.559966

    Topics