Compressive single-pixel imaging with low-order nonlinear neural networks

Huaijian Chen; Xiao Wang; Botao Hu; Aiping Fang; Ruifeng Liu; Pei Zhang; Fuli Li

doi:10.3788/AI.2025.10007

1. Introduction

Single-pixel imaging (SPI), which retrieves a scene by calculating the second-order correlation between modulation patterns and the light intensity, has emerged as a compelling alternative to conventional camera-based approaches. Because of its advantages of detection sensitivity, dark count, and spectral range over traditional camera imaging, SPI has shown significant potential in various fields such as 3D imaging^[1,2], photoacoustic imaging^[3,4], spectral imaging^[5,6], and wavefront sensing^[7–9]. Nevertheless, due to its inherently correlation-based measurement mechanism, SPI faces significant challenges in meeting the demands of real-time imaging. To address this issue, the mainstream approach has been proposed to reduce the number of measurements, which in turn leads to the need for intricate nonlinear algorithms like compressive sensing (CS)^[10,11] or deep neural networks (DNNs)^[12–14] in the reconstruction process. Generally, CS algorithms are computationally intensive, often incurring substantial reconstruction time. In contrast, DNNs, with the growing model complexity and availability of enormous data resources, are hungry for high-end graphic processing units to execute parallel accelerated computations, and thus result in substantial energy consumption.

In recent years, optical neural networks (ONNs) have undergone significant development with the advances in artificial intelligence^[15] and optoelectronics. Generally, ONNs serve as the optical counterparts of DNNs, and while inheriting the powerful fitting capabilities of DNNs, they also offer the advantages such as low power consumption^[16], high computational speed, and inherent parallelism^[17–19]. The key to ONNs involves incorporating some or all network layers that perform linear or nonlinear mapping in optical form^[20–22]. Currently, a fully connected layer in optical networks, in other words, an optical matrix-vector multiplier, is capable of executing the measurement process of SPI, which essentially performs only a single linear transformation of the light intensity distribution of the target. For example, Shen et al. showed that a matrix-vector multiplier could be prepared based on the topology cascading of Mach–Zehnder interferometers^[23], and more recently, Wang et al. achieved a single linear transformation of the input two-dimensional optical signal through a combination of a microlens array and a spatial light modulator^[21].

However, incorporating the all-optical nonlinear activation function (ONAF) into ONNs remains a key requirement for practical use. Generally, constructing the ONAF often involves optical nonlinear effects, which is characterized by a strong pump light to generate an available nonlinear response^[24], and this implies a low energy efficiency and high thresholds. Besides that, given the demand for high computation speed ONNs, the response rate of ONAF must be sufficiently high. Today, these challenges have been addressed to some extent with some creative approaches emerging as potential solutions, such as saturable absorbers^[23,25], micro-ring resonators^[26–29], semiconductor optical amplifiers^[30–32], and electromagnetically induced transparency^[20]. Consequently, if one wants to conduct compressive SPI (CSPI) by ONNs, there remains a need for an efficient method to execute optical nonlinearity in the reconstruction process.

In this paper, we introduce a novel ONN structure that performs multi-order polynomial expansion to realize mathematical fitting and avoids direct nonlinear operations. Following the scheme of CSPI, each element in the label or objective feature is viewed as a polynomial function of all elements of the bucket signal. We demonstrate its nonlinearity-fitting capability in silicon with low-order neural networks by four tasks: reconstructing intensity imaging, linear-edge imaging, nonlinear-edge imaging, and image classification. In addition, the robustness against noise and specific optical hardware implementation are further discussed.

2. Results

2.1. Compressive single-pixel imaging

The schematic diagram of SPI is depicted in Fig. 1. A static object is illuminated by an incoherent light source and imaged onto the surface of the digital micromirror device (DMD) by lens L1. Then, the DMD modulates the incident light by displaying a set of preloaded patterns. Each pattern induces an element-wise multiplication modulation with the image of the object. The modulated light propagates through L2 and converges onto the photosensitive surface of a single-pixel detector. After the DMD finishes the playback, the captured signal, $z$ , can be expressed as the inner product between the object and patterns [see Fig. 1(b)]: $z = [Φ_{1}^{T}; Φ_{2}^{T}; \dots; Φ_{M}^{T}] x,$ (1)where $z \in R^{M \times 1}$ is the bucket signal, $Φ_{i}^{T} \in R^{1 \times N}$ is the row vector representation of the $i$ th modulation pattern, $x \in R^{N \times 1}$ stands for the column vector composed of all pixels of an object, and $M, N$ represent the numbers of modulation patterns and the total count of pixels of the object, respectively.

Figure 1.Schematic of the experimental setup and reconstruction principle. (a) Schematic diagram of the experimental setup. An incoherent light source illuminates the object, which is then imaged onto the DMD surface with L1. Once modulated by patterns loaded on the DMD, the total intensity of the modulated light field is captured by a single-pixel detector. (b) The first fully connected layer that corresponds to the physical process of SPI. (c) The 1st-order PNN structure is chosen as the reconstruction algorithm of two linear tasks: compressive single-pixel imaging and linear-edge extraction. (d), (e) Simulation results of two linear tasks with a sampling ratio of 6.25%. (f) The 2nd-order PNN is selected as the reconstruction algorithm of two nonlinear tasks: nonlinear-edge extraction and handwritten dataset classification; the symbol $*$ denotes the Hadamard product. (g), (h) Simulation results of two nonlinear tasks with a sampling ratio of 6.25%, and the classification accuracy is 98.03%. L1, lens 1; L2, lens 2; DMD, digital micromirror device.

Download full size

View all figures

Here, the sampling ratio is defined as $β = M / N$ . When $β > 1$ , as in the case of SPI with random speckle patterns, the image can be reconstructed using the second-order intensity correlation of the light field^[33,34]. When $β = 1$ , some orthogonal bases such as Hadamard^[11] and Fourier patterns^[35] can serve as modulation patterns. The image can be directly reconstructed by an orthogonal transformation applied to the bucket signal^[36]. It is worth noting that both the second-order intensity correlation operation and the orthogonal transformation are linear reconstruction algorithms, but when $β < 1$ , also known as under-sampling or sub-sampling, the image can be reconstructed by finding the solution of the $l_{1}$ -minimization problem: $x = \arg \min {‖ x ‖}_{1} s.t. z = Φ x,$ (2)where ${‖ \cdot ‖}_{1}$ represents the $l_{1}$ norm. It has been demonstrated that resolving such an optimization problem often requires intricate nonlinear algorithms such as CS^[10,11] or DNNs^[12–14].

However, for ONNs that solve this optimization problem optically, the routine establishment of an ONAF typically involves a nonlinear optical process. This is achieved by designing an all-optical hardware component whose response curve aligns precisely with the nonlinear activation functions used in the training process^{[20,23,25–32]}. Presently, this nonlinear approach often encounters issues such as high threshold requirements and slow response rates. In such a dilemma, one strategy is to avoid introducing the ONAF while maintaining the nonlinear fitting capacity of ONNs, but for a typical ONN with a cascade configuration that includes optical linear layers followed by ONAFs, the removal of ONAFs will reduce the network to a mere linear optical computation, whereas we can change the structure of ONN to engage in mathematical approximations, thereby sidestepping the need for direct nonlinear optical processes. An intuitive idea is regarding the label as a multivariate function of all elements of the input sample and employing the ONN to execute polynomial fitting of this function without introducing direct nonlinear mappings.

2.2. Polynomial neural network

Here, we introduce a novel network, the polynomial neural network (PNN)^[37], serving as the reconstruction algorithm for four CSPI tasks. Unlike traditional DNNs, the PNN is trained to seek a polynomial function approximator that can fit the complex relationship between the input ( $z \in R^{d}$ ) and label ( $y \in R^{o}$ ) without introducing nonlinear activation functions. In general, $y_{j}$ , any element in $y$ , can be regarded as a multivariate function of all elements in $z$ , and the goal of the PNN is to learn a function $G : R^{o} \to R^{d}$ of order $N \in N$ , such that $y_{j} = G {(z)}_{j} = γ_{j} + w_{j}^{{[1]}^{T}} z + z^{T} W_{j}^{[2]} z + W_{j}^{[3]} \times_{1} z \times_{2} z \times_{3} z + \dots + W_{j}^{[N]} \prod_{n = 1}^{N} \times_{n} z,$ (3)where $γ_{j} \in R$ and ${W_{j}^{[n]} \in R^{\prod_{m = 1}^{n} \times_{m} d}}_{n = 1}^{N}$ are the learnable parameters, and $W^{[N]} \times_{m} z$ is the mode m vector product of $N$ -dimensional tensor $W^{[N]}$ with the vector $z$ . A more compact expression is obtained by vectorizing the label, $y = G (z) = \sum_{n = 1}^{N} (W^{[n]} \prod_{j = 2}^{n + 1} \times_{j} z) + γ,$ (4)where $γ \in R^{o}$ and ${W^{[n]} \in R^{o \times \prod_{m = 1}^{n} \times_{m} d}}_{n = 1}^{N}$ are the learnable parameters. However, as we see, the number of parameters grows with $O (d^{N})$ .

Many methods have been proposed to reduce the parameters^[38–40]. There we introduce the method^[37], which demonstrates the equivalence between an iterative relationship and the $N$ th-order polynomial approximation based on coupled tensor decomposition^[39]. The iterative relationship is defined as $y_{n} = (U_{n} z) * y_{n - 1} + y_{n - 1},$ (5)where $n = 2, \dots, N$ with $y_{1} = U_{1} z$ , and the output is $y = C y_{N} + β$ . In this way, the symbol $*$ represents the Hadamard product. $C$ , $β$ , and $U_{n}$ for $n = 1, \dots, N$ are the learnable parameters, $z$ stands for the input, and $N$ is the polynomial order of the PNN. This structure can be further understood with schematic diagrams presented in Figs. 1(c) and 1(f). For example, Fig. 1(f) illustrates the 2nd-order PNN structure, which incorporates two fully connected layers to model a 2nd-order polynomial function. This function can be articulated as $y_{1} = U_{1} z, y_{2} = U_{2} z * y_{1} + y_{1},$ (6)where $U_{i}$ and $y_{i}$ for $i = 1, 2$ are the weight matrix and output of $i$ th fully connected layer, respectively. The output $y = C y_{2}$ is given by the last fully connected layer with $C$ as the weight matrix.

2.3. Simulation results

In the simulations, we employed the aforementioned structure to tackle four CSPI tasks. The tasks can be categorized into two groups: linear tasks and nonlinear tasks (see the Supplement 1). As shown in Figs. 1(b), 1(c) and 1(b), 1(f), the entire network is exclusively constructed from fully connected layers without bias terms. In Fig. 1(b), the first layer emulates the physical process of CSPI. The weight matrix, denoted by $Φ$ , dictates the modulation patterns, and its output acts as the bucket signals $z$ in Eq. (1). Furthermore, the subsequent fully connected layers, shown in Figs. 1(c) or 1(f), constitute the PNN, designed to extract linear and nonlinear features from the bucket signals $z$ .

As shown in Figs. 1(b) and 1(c), we used the 1st-order PNN as the reconstruction algorithm for two linear tasks: CSPI and linear-edge extraction. These two tasks are both trained on the MNIST and Fashion MNIST datasets, and the network is composed of two fully connected layers. In the first layer, after undergoing interpolation and arrangement, the object $x \in R^{16, 384 \times 1}$ serves as the network input. Then, the object $x$ is modulated by weight matrix $Φ \in R^{1024 \times 16, 384}$ to produce the bucket signals $z \in R^{1024 \times 1}$ with sampling ratio $β = 6.25 %$ . After that, the second layer performs as a 1st-order PNN to extract the object feature from $z$ with the weight matrix $C \in R^{16, 384 \times 1024}$ . The simulation results for CSPI and linear-edge extraction are presented in Figs. 1(d) and 1(e), where the first row represents the simulation results and the second row represents the original images which are undergoing the linear-edge operator or not. The average peak signal-to-noise ratios (PSNRs) of the MNIST and Fashion MNIST datasets for CSPI in Fig. 1(d) are 32.81 and 35.76 dB, and the average PSNRs for linear-edge extraction listed in Fig. 1(e) are 34.16 and 37.23 dB.

Apart from that, a 2nd-order PNN was chosen as the reconstruction algorithm for two nonlinear tasks: nonlinear-edge extraction and handwritten dataset classification. Similarly, the networks tailored for nonlinear tasks also can be divided into two parts. The first part remains a fully connected layer aligned with the physical process of SPI in Fig. 1(b). The weight matrix $Φ \in R^{256 \times 4096}$ , input $x \in R^{4096 \times 1}$ , and output $z \in R^{256 \times 1}$ , and thus the sampling ratio is $β = 6.25 %$ . In the second part, there are three fully connected layers that constitute a 2nd-order PNN whose learnable parameters are identical to those described in Eq. (6), with $U_{i} \in R^{4096 \times 256}$ for $i = 1, 2$ and $C \in R^{4096 \times 4096}$ (nonlinear-edge extraction) or $R^{10 \times 4096}$ (handwritten dataset classification). The simulation results of nonlinear-edge extraction are illustrated in Fig. 1(g), and the average PSNRs of the MNIST and Fashion MNIST databases are 30.81 and 29.76 dB, respectively. In addition, Fig. 1(h) shows the confusion matrix of the classification task with an accuracy rate of 98.03%. These simulation results show that, even for the CSPI process with an extremely low sampling rate, using only the 2nd-order nonlinear algorithms, one can effectively extract the nonlinear features of the image from the bucket detection values.

2.4. Experimental results

Here, we further demonstrate that the PNN with only low-order nonlinear structure can be used as a reconstruction algorithm for four CSPI tasks experimentally. For better comparison, the network architecture, sampling ratio, and parameter configuration of each task are consistent with the above simulation. Figure 2 illustrates the experimental results of three regression tasks: imaging, linear-edge extraction, and nonlinear-edge extraction. The images reconstructed from complete Hadamard basis patterns are set as the ground truth for imaging tasks. Meanwhile, these images serve as the ground truth for linear and nonlinear-edge extraction tasks after they undergo linear and nonlinear-edge operators, respectively. The quantitative assessment metrics of the restored images, PSNR, and structural similarity index measure (SSIM), are marked in the lower right corner. As shown in Fig. 2, the average PSNR/SSIM of three regression tasks are 28.98 dB/0.97, 26.98 dB/0.91, and 22.18 dB/0.83, respectively. The experimental results of the three regression tasks presented in Fig. 2 are consistent with the simulation shown in Fig. 1.

Figure 2.Experimental results of three regression tasks. For each task, images in the first row represent the experimental results, and the second-row images are the full-sampling results of Hadamard patterns undergoing respective mathematical processing defined in the Supplement 1. The sampling rate for all tasks is 6.25%. The lower right corner of reconstructed images shows the PSNR and SSIM, respectively.

Download full size

View all figures

Moreover, the experimental results of the classification task are depicted in Fig. 3, with an accuracy rate of 99%. As we can see, regardless of whether the feature (label) to be extracted is linear or nonlinear, the low-order PNN is sufficient to achieve satisfactory results without introducing the nonlinear activation function, even with a low sampling rate of 6.25%. To further validate the fitting capability of the PNN, we conducted the experiments of two nonlinear tasks with 3rd-order PNNs, and the experimental results are presented in the Supplement 1.

Figure 3.Experimental results of classification task. The confusion matrix of handwritten dataset classification has an accuracy rate of 99%.

Download full size

View all figures

3. Discussion and Conclusion

In both Figs. 2 and 3, we report the reconstruction of linear and nonlinear features based on different-order PNNs. However, the more complex the extracted feature is, the higher the order of PNN required. Indeed, a high-order PNN often possesses superior fitting capability compared to a low-order one, thus a more filed reconstruction. But for ONNs that perform the trained PNN optically, although the PNN theoretically circumvents the implementation of the ONAF, a higher order often implies more complex optical structures, and this poses an additional experimental difficulty. Therefore, there is a delicate trade-off between accuracy and experimental complexity. Moreover, it can be observed from Eq. (5) that the ONN based on trained PNN introduces optical pump modulation, which will be discussed as follows.

Here, we report on successfully addressing four CSPI tasks by using a special nonlinear structure, PNN, without introducing the nonlinear activation function. We believe that, for the challenge of realizing the ONAF in ONNs, such a structure could be a promising solution, especially when there is a weak nonlinearity between the input and label. However, there are still two issues that need further consideration.

One of the issues is the robustness of the PNN to noise. Consequently, by adding noise at different levels to the measurements, simulations of the four tasks were conducted with different-order PNNs. The assessment of the simulation results is presented in Fig. 4. As shown in Figs. 4(a) and 4(b), it can be observed that for two linear regression tasks, with a noise level at 0, a higher-order PNN results in better reconstruction quality. However, starting from the third order, there is a distinct slow transition zone with the addition of noise, which suggests that higher-order PNNs are more sensitive to noise. This point can be further illustrated by Fig. 4(c), for example, when the noise level is 0.03, there exists a clear drop between a 3rd-order PNN and a 2nd-order one. For nonlinear regression tasks, the multi-order PNN exhibits substantial improvement in reconstruction quality compared to the 1st-order PNN, which is in contrast to linear regression tasks and illustrates the superior fitting capability of multi-order PNNs for nonlinear regression tasks. Finally, the simulation results regarding classification tasks are presented in Fig. 4(d), which is similar to the case of nonlinear regression tasks, and differently, we observed that, with the addition of noise, higher-order PNNs were hardly affected. A reasonable inference would be that the classification task inherently has greater tolerance to noise. Overall, 2nd-order PNNs exhibit commendable performance in both robustness and reconstruction quality, and this guarantees that future ONNs based on this structure will have a simple and stable architecture.

Figure 4.Quantitative assessment of the robustness of different-order PNNs with four CSPI tasks and $β = 6.25 %$ . Three types of random noise are dynamically added to the measurements $z$ (the input of PNN) during the training process, and the respective ranges are 0, 3%, and 6% of the standard deviation of $z$ . (a)–(c) present the simulation results regarding the robustness of three regression tasks, while the corresponding results for the classification task are displayed in (d). The shaded bands represent the reconstruction standard deviations at different noise levels.

Download full size

View all figures

The other crucial issue is the optical realization of the PNN structure. As detailed in Eq. (5), it is necessary to execute optical matrix-vector multiplication (OMVM) and the Hadamard product in the $n_{t h} (n \geq 2)$ iteration. First, for conventional artificial neural networks, matrix-vector multiplication often occupies a significant portion of the computational workload, and as a result of focused research efforts toward ONNs, there are already some relatively mature experimental schemes for OMVM, which could be divided into three categories by their implementation methods including spatial light structure^[20–22], on-chip coherent principles^[23,41], and wavelength-division multiplexing technology^[29,42]. Furthermore, it can be observed that the Hadamard product between two light fields ( $U_{n} z$ and $y_{n - 1}$ ) needs to be executed to generate high-order terms, that is, modulating light by light. Of course, the hybrid electro-optical scheme could be an efficient way, but it is not the best choice because information needs to convert back and forth between the optical and electronic domains^[43], which means a slow computation speed. The optically addressed spatial light modulators (OASLMs), on the other hand, may be a good choice that allows amplitude, phase, or polarization modulation directly by light^[44,45]. For example, Rayko et al. established an optical pump THz modulator based on a photoconductive layer and complete real-time terahertz SPI^[46].

Generally, an OASLM primarily consists of two functional components: the photoconductive layer and the light-modulating layer. The liquid-crystal layers have traditionally found the most widespread use for the latter due to their large fill factor, resolution, active area, and wide optical bandwidth. A prevalent instance is the Hughes liquid crystal light valve, which accepts a low-intensity input spatial image and converts readout light from a different source into an output image^[47,48]. In this case, the intensity profile of the output image will be modulated point-by-point by the input spatial image, that is, the Hadamard product between two light fields. For further understanding, a schematic diagram of the ONN based on the 2nd-order PNN is shown in Fig. S2 in the Supplement 1, as well as OMVM and the Hughes liquid crystal light valve. In recent times, with major improvements in optoelectronics and silicon photonics, some creative OASLM schemes based on different modulation mechanisms have been proposed, such as the optical metasurface^[49] and photonic crystal cavity array^[50].

In conclusion, we have shown a special nonlinear structure, the PNN structure, to computationally tackle four CSPI tasks with a sampling rate of 6.25%. We have demonstrated that the structure works well on both linear and nonlinear tasks, even with 2nd-order nonlinearity. Importantly, This structure does not rely on nonlinear activation functions, and it is adaptable to various classification and regression tasks, which may open a promising avenue toward the all-optical network.

Category: Research Article

Received: Apr. 7, 2025

Accepted: May. 29, 2025

Published Online: Jun. 30, 2025

The Author Email: Ruifeng Liu (ruifeng.liu@mail.xjtu.edu.cn), Pei Zhang (zhang.pei@mail.xjtu.edu.cn)

DOI:10.3788/AI.2025.10007