Simultaneous multicolor imaging using off-axis spectral encoding in a single camera without sacrificing frame rate

Jiangjiang Zhao; Jing Zhang; Zhangheng Ding; Bolin Lu; Ke Peng; Jie Yang; Hui Gong; Qingming Luo; Jing Yuan

doi:10.1364/PRJ.555248

1. INTRODUCTION

The multicolor imaging ability of fluorescence microscopy is essential for exploring the structure and function of various fluorescence-labeled biological organisms [1]. However, conventional multicolor imaging strategies of sequential and parallel detection lack practicality in fluorescence microscopy. Sequential detection in commercial microscopes switches among different excitation and detection channels [2,3]. It is time-consuming and lacks simultaneous multicolor imaging ability. Parallel detection usually separates multicolor fluorescent signals to different detectors by dispersive devices to achieve simultaneous imaging [4,5]. However, high-throughput imaging using line- or area-scanning is typically limited to three camera detectors due to spatial constraints, spectral crosstalk, cost considerations, and precise detector alignment. Therefore, expanding the simultaneous imaging channels in fluorescence microscopy remains a challenge.

Simultaneous multicolor imaging on a single camera provides an alternative solution with common regional, spectral, and modulated imaging strategies. Imaging different colors in separate areas in the same camera is the most straightforward and practical approach [6,7]. However, this method needs channels spaced far enough to avoid crosstalk, which reduces the field of view (FOV) and demands image registration across camera regions. Spectral imaging introduces dispersive elements into the optical path to acquire spectral information, enabling simultaneous imaging of more fluorophores with spectral overlap [8 –12]. Nevertheless, massive spectral scanning data lead to a reduction in spatial scanning speed. Modulation multiplexing imaging identifies multicolor signals by temporal modulation of multicolor excitation beams [13 –15] or point spread function (PSF) modulation of fluorescent emission signals [16]. Yet, temporal modulation restricts imaging speed due to the need for repeated captures; PSF modulation increases signal density but reduces intensity, restricting its use for sparsely distributed, high-intensity targets. Thus, achieving simultaneous multicolor imaging in a single camera without compromising speed remains challenging.

Here, we present off-axis spectral encoding multicolor microscopy (OSEM) in a single camera at the maximum frame rate. Based on the natural intensity modulation difference of a single illumination spot across off-axis detection positions, we adjusted the multicolor excitation beams with distinct off-axis lateral offsets from the same detection position to achieve spectral encoding. We also developed PSF linear and deep learning decoding methods for different thickness samples. We then evaluated the simultaneous multicolor imaging ability of OSEM by using simulation tests and experimentally imaging fluorescent microbeads. We also classified and counted the mixed bacterial samples with various concentrations, demonstrating a speed improvement of 5.8 times and highlighting the potential for microbial research. We acquired four-color optical-sectioning images of mouse brain slices at $2.85 {mm}^{2} / s$ . OSEM potentially provides a new tool for simultaneous multicolor imaging of diverse biological structures and functions with high throughput and excellent quality, especially for large biological samples.

2. PRINCIPLE AND METHOD

A. Off-Axis Spectral Encoding

We developed an off-axis spectral encoding method to detect the multicolor signals on a single array detector without sacrificing the frame rate. As reported, we can consider the spatial intensity distribution of the illumination PSF in point or line scanning imaging as a natural modulation [17 –20]. The sub-detectors of the array detector are each conjugated to distinct regions of the illumination PSF. As the sample traverses different portions of the illumination, the corresponding sub-detectors capture signals modulated by their respective conjugated PSF regions at different off-axis positions, referred to as spatiotemporal multiplexing off-axis detection [21]. Here, we introduced a wavelength-specific lateral offset within the PSF into multicolor illumination. Then, we employed the off-axis detection to obtain multicolor spectral-encoded images with distinct mixing ratios at different off-axis positions. This approach avoids additional sub-detectors, so we can obtain the spectral-encoded raw data without compromising the imaging resolution or field of view.

In single-wavelength excitation, the illumination spot and single detector align at the center of the objective’s FOV. The illumination spot’s finite width enables recording fluorescence signals across multiple sub-detectors with varying off-axis displacements along the $x$ -direction from the spot’s center, as shown in Fig. 1(a). This results in signal variations across different sub-detectors. The signal differences can be quantified by the point spread function (PSF) of each sub-detector as follows: $H_{i} = [S (x, y) \otimes h (x, y, z, λ_{e x 0})] \times [h (x, y, z, λ_{e m 0}) \otimes D (x + d_{i}, y)] .$ (1)

$Principle of off-axis spectral encoding. (a) Schematic of off-axis intensity encoding under single-wavelength excitation. The detection and illumination axes of different line detectors have different off-axis distances. Line illumination beams with Gaussian distribution excite the sample, and each sub-detector of the multi-line detector records the corresponding modulated fluorescence signals at different off-axis positions. (b) PSF properties correspond to each sub-detector’s imaging in (a). (c) Schematic of off-axis spectral encoding under multi-wavelength excitation. Each sub-detector of the multi-line detector captures all fluorescence signals with distinct spectral mixing ratios due to wavelength-specific lateral offsets of each line-illumination beam. Different colors represent the excitation wavelengths of different channels. (d) PSF properties correspond to each sub-detector’s imaging in (c). (e) Modulation transfer function (MTF) curves for coaxial detection, off-axis detection, and diffraction limit. Scale bar: 500 nm.$

Figure 1.Principle of off-axis spectral encoding. (a) Schematic of off-axis intensity encoding under single-wavelength excitation. The detection and illumination axes of different line detectors have different off-axis distances. Line illumination beams with Gaussian distribution excite the sample, and each sub-detector of the multi-line detector records the corresponding modulated fluorescence signals at different off-axis positions. (b) PSF properties correspond to each sub-detector’s imaging in (a). (c) Schematic of off-axis spectral encoding under multi-wavelength excitation. Each sub-detector of the multi-line detector captures all fluorescence signals with distinct spectral mixing ratios due to wavelength-specific lateral offsets of each line-illumination beam. Different colors represent the excitation wavelengths of different channels. (d) PSF properties correspond to each sub-detector’s imaging in (c). (e) Modulation transfer function (MTF) curves for coaxial detection, off-axis detection, and diffraction limit. Scale bar: 500 nm.

Download full size

View all figures

$H_{i}$ and $h$ represent the PSF of the $i$ th sub-detector and the objective, respectively. $S$ and $D$ are the shapes of illumination and detection, respectively. $x$ , $y$ , and $z$ are the three-dimensional (3D) spatial coordinates. $λ_{e x 0}$ and $λ_{e m 0}$ are the excitation and emission wavelengths of the fluorescence label, respectively. $d_{i}$ represents the off-axis displacement between the $i$ th sub-detector and the center of FOV. ⊗ represents the two-dimensional (2D) convolution operation.

We show the simulated lateral profile of the 2D PSF of the first to fourth sub-detectors with a single-excitation wavelength, as shown in Fig. 1(b). It reveals the variations in signal intensity detected by different sub-detectors. Signal intensity decreases, and side lobes appear as the off-axis degree increases. We named this phenomenon as off-axis intensity encoding.

All fluorophore signals are mixed in each sub-detector in multi-wavelength homocentric excitation, resulting in severe spectral crosstalk. Moreover, aligned multicolor illumination spots with off-axis imaging result in minimal PSF changes with wavelength variations. Fortunately, we noticed that slightly laterally offsetting multi-wavelength illumination spots could introduce the off-axis spectral encoding by PSF, as shown in Fig. 1(c). For the $j$ th excitation wavelength, the PSF of the $i$ th detector, $H_{i, j}$ , can be expressed as below: $H_{i, j} = [S (x + d_{j}, y) \otimes h (x, y, z, λ_{e x j})] \times [h (x, y, z, λ_{e m j}) \otimes D (x + d_{i}, y)],$ (2)where $λ_{e x j}$ and $λ_{e m j}$ are the excitation and emission wavelengths of the $j$ th kind of the fluorescence label, respectively. $d_{j}$ is the off-axis displacement between the $j$ th excitation-wavelength illumination spot and the center of FOV. Other parameters are the same as Eq. (1). Then, each sub-detector collects the mixture of multicolor signals with different mixing ratios determined by Eq. (2). In Fig. 1(d), we show the simulated lateral profile of the PSFs of the first to fourth sub-detectors with four excitation wavelengths. It indicates the signal differences among different excitation wavelengths across the sub-detectors due to the off-axis spectral encoding.

We simulated the off-axis detection in Zemax with a 0.8-μm offset between the illumination and detection axes at the imaging focal plane. Figure 1(e) shows the overlap of modulation transfer function (MTF) curves for coaxial and off-axis detections. It indicates that the aberrations caused by off-axis detection are negligible.

With no extra dispersive or modulated elements required, we can obtain off-axis spectral-encoded data on a single camera, achieving multicolor imaging without sacrificing the frame rate. In practice, line-scanning confocal microscopy outperforms point-scanning in imaging speed [18,22 –24], which is advantageous for imaging large biological samples. Hence, our experiments and analyses in this paper are all conducted on a line-scanning microscope. When scanning the large samples, we move the sample to pass through the encoding imaging area in sequence. We record the information of various sample points at different off-axis positions simultaneously. On the other hand, we record the off-axis information at different moments for the same sample point. Thus, the approach can occupy the minimal sub-detectors and maintain the highest speed for simultaneous multicolor imaging.

B. PSF Linear Decoding

To extract monochromatic fluorescent signals from the off-axis spectral encoding raw mixed images, we developed the PSF linear decoding method using the measured PSFs, as shown in Fig. 2. The image of each kind of the fluorescence can be represented as the convolution of the actual signal distribution with the 3D effective PSF of the imaging system. To improve the efficiency of the PSF linear decoding, we approximate a 3D PSF with the 2D PSF at the focal plane that carries most of the adequate information.

Figure 2.Principle of PSF linear decoding of off-axis spectral-encoded data. $I_{i}$ represents the off-axis spectral-encoded raw image detected by Row $i$ of the multi-line detector. $H_{i, j}$ represents the PSF distribution of the $j$ th wavelength channel in Row $i$ of the multi-line detector. $X_{j}$ represents the decoded $j$ th wavelength channel image by the PSF linear decoding.

Download full size

View all figures

For the samples containing $M$ kinds of fluorescent labels with distinct excitation spectra, we performed the off-axis spectral illumination with $M$ kinds of excitation wavelengths and used the $N$ sub-detectors to record $N$ off-axis spectral encoding raw images. Each image is a fluorescent mixture image of all $M$ kinds of fluorescent labels [25]. Thus, the $i$ th sub-detector records the off-axis spectral encoding raw mixed image $I_{i} (x, y)$ as follows: $I_{i} (x, y) = \sum_{j = 1}^{M} H_{i, j} (x, y) \otimes X_{j} (x, y),$ (3)where $H_{i, j}$ represents the 2D PSF distribution of the $j$ th excitation-wavelength fluorescence signal in the $i$ th sub-detector, and $X_{j}$ represents the actual distribution of the $j$ th excitation-wavelength fluorescence signal.

$H_{i, j}$ is determined by measuring in-focus images of fluorescent microspheres under a single excitation wavelength for each sub-detector. Hence, Eq. (3) can be regarded as an $M$ -element first-order equation with each kind of fluorescent signal distribution $X_{j}$ as the unknown variable. As long as $N \geq M$ , we can obtain the monochromatic images of each type of fluorescent signal by performing simple PSF linear decoding on the $N$ raw mixed images. We apply a Fourier transform to Eq. (3) to boost computational efficiency, converting convolutions to multiplications in the frequency domain. By matrix operations, we can derive the solution for the fluorescent signal distribution $X_{j}$ as follows: $[\begin{matrix} X_{1} \\ X_{2} \\ ⋮ \\ X_{M} \end{matrix}] = F^{- 1} {{[\begin{matrix} F (H_{1, 1}) & F (H_{1, 2}) & \dots & F (H_{1, M}) \\ F (H_{2, 1}) & F (H_{2, 2}) & \dots & F (H_{2, M}) \\ ⋮ & ⋮ \\ F (H_{N, 1}) & F (H_{N, 2}) & \dots & F (H_{N, M}) \end{matrix}]}^{- 1} [\begin{matrix} F (I_{1}) \\ F (I_{2}) \\ ⋮ \\ F (I_{N}) \end{matrix}]},$ (4)where $F$ and $F^{- 1}$ represent the Fourier and inverse Fourier transform, respectively.

To facilitate understanding, Fig. 2 uses synthetic images to demonstrate the process of PSF linear decoding for four-color off-axis spectral-encoded mixed images in the case where $M$ and $N$ are equal to 4. Specifically, we used four letters, H, U, S, and T, to represent four distinct fluorescent signals. According to Eq. (3), we synthesized the raw mixed images from the four sub-detectors illustrated in Fig. 1(c). We then successfully decoded the monochromatic images of the four fluorescent signals using Eq. (4).

To theoretically confirm the feasibility of the PSF linear decoding, we performed the simulation tests on a standard sample model composed of four fluorescent cones with distinct spatial orientations and excitation spectrum, as shown in Fig. 3(a). The size of the virtual sample is $80 μm \times 80 μm \times 10 μm$ . In the simulation, we located the imaging focal plane at the central layer of the sample at $z = 5 μm$ . We calculated the corresponding 3D PSF distribution using a $20 \times /NA$ 1.0 water immersion objective at a 0.325-μm sampling rate. Then, we generated the raw mixed images for each sub-detector by convolving the sample model with the 3D PSF.

Figure 3.Simulation of the PSF linear decoding using the virtual sample. (a) Off-axis spectral encoding data generation. The virtual sample containing four kinds of fluorescent labels is convolved with the 3D PSF of the off-axis spectral encoding to generate the raw mixed images of each sub-detector with serious crosstalk. (b) Single-wavelength-excited images of each kind of fluorescence label in the corresponding channels as the GT for the decoded results of the raw decoded images. (c) PSF linear-decoded results in four channels of the virtual samples with thicknesses of 0.5, 1, and 2 μm. Scale bar: 1 μm in (a), 20 μm in (b) and (c).

Download full size

View all figures

To evaluate the impact of sample thickness on the PSF linear decoding, we extracted the sample slices with 0.5 μm, 1 μm, and 2 μm thickness using the central layer as a reference plane. Due to the extracted sample close to the central layer and its slow variation, we adopted the single-wavelength excited image from the middle layer of the sample model as the ground truth (GT), as shown in Fig. 3(b). We performed the PSF linear decoding on these sample slices and obtained the decoded images of all channels, as shown in Fig. 3(c). We found no crosstalk among the different channels of the decoded images of the 0.5-μm-thick sample slice. In contrast, other results indicated that the crosstalk among the different channels of the decoded images increased with the sample thickness. It may result from the out-of-focus PSF and optical heterogeneities in the samples. However, the 3D PSF measurement and linear decoding in thick tissue are time-consuming and multifactor-sensitive. These results indicated that the PSF linear decoding method for the off-axis spectral encoding images is suitable for the thin samples, and the spectral decoding methods for the thick samples require further advancements.

C. Deep Learning Decoding

To correctly decode the raw mixed images of the thick samples, we introduced a deep learning decoding strategy based on a ResUNet architecture [26,27], as depicted in Fig. 4(a). We trained the network on a dataset of 3000 pairs of images with a size of $512 \times 512$ pixels. Each training pair included 16-bit off-axis spectral encoding raw mixed images from four sub-detectors as inputs and sequentially excited monochromatic images per channel as outputs. We used a mean squared error (MSE) as the loss function during the training. We conducted the training on a workstation with a single NVIDIA GeForce RTX 3090 card and an Intel(R) Xeon(R) Gold 5222 CPU in 12 epochs in 1 h.

Figure 4.Simulation of deep learning decoding for virtual sample. (a) Schematic diagram of the decoded network. (b) Deep-learning-decoded results of raw mixed images for virtual samples in Fig. 3(a) with thicknesses of 0.5 μm, 1 μm, 2 μm, and 10 μm from top to bottom. (c) Normalized intensities along the white lines in (b) and the same positions are shown in Fig. 3(c). Three peaks indicated by the three arrows correspond to the crosstalk from the three respective color channels. (d) Comparison of time consumption between PSF linear decoding and deep learning decoding. Scale bar: 20 μm.

Download full size

View all figures

Figure 4(b) shows the deep learning decoding results for the virtual samples spanning thicknesses from 0.5 to 10 μm. The deep-learning-decoded images remain stable with increasing sample thickness. Compared with the PSF linear decoding in Fig. 3(c), the deep learning decoding method significantly reduces the spectral crosstalk among the different channels of the decoded images of the thick samples, regardless of the sample thickness.

To evaluate the thickness-dependent performance of the deep learning and PSF linear decoding on the off-axis spectral encoding data, we plotted the normalized grayscale curves along the white lines in Fig. 4(b) and the same positions in Fig. 3(c), as shown in Fig. 4(c). For the sample thickness under 1 μm, deep learning and 2D PSF decoding can effectively decode the low-crosstalk fluorescence signals. As the sample thickness grows, the crosstalk among the channels increases in the PSF linear decoding results but remains close to zero in the deep learning decoding results. The colored arrows in Fig. 4(c) show crosstalk from the respective color channels, with only the leftmost peak representing the actual signal. It demonstrated that deep learning decoding enables the avoidance of crosstalk more effectively than PSF linear decoding in decoding the off-axis spectral encoding data of the thick samples.

To compare the calculation speeds of the PSF linear and deep learning decoding, we recorded the time consumption for the preparation and computation process of two decoding methods, as shown in Fig. 4(d). It took 20 s to measure the PSF of different wavelength excitations at corresponding sub-detectors in the PSF linear decoding. The preparation for the deep learning decoding included collecting the training data and training the network, taking 75 and 3600 s, respectively. The total preparation time for the deep learning decoding was 3675 s, 184 times slower than PSF linear decoding. The PSF linear and deep learning decoding took 8.1 and 0.7 s for a four-channel raw mixed image set of $2048 \times 26,889$ pixels, respectively. Deep learning offers a significant speed advantage over PSF linear decoding in the decoding process. However, total decoding time equals the sum of preparation and decoding times, with the latter increasing with sample volume while the former stays constant. Consequently, deep learning decoding is particularly advantageous for large-volume sample imaging. Therefore, these results demonstrated that the PSF linear decoding is ideal for thin samples, and deep learning is suitable for thick ones, each with pros and cons.

3. EXPERIMENTS AND RESULTS

A. Four-Color Off-Axis Spectral Encoding Imaging System

We constructed an off-axis spectral encoding imaging system for four-color simultaneous line confocal imaging on a single camera, as shown in Fig. 5. Four lasers (405-06-01-0100, 488-06-01-0100, 561-06-01-0100, 633-06-01-0080, Cobolt) cover the commonly used visible excitation wavelengths in fluorescence imaging. We combine the four laser beams using three dichroic mirrors (DM1, ZT568rdc- $25 \times 36$ ; DM2, ZT502rdc- $25 \times 36$ ; DM3, ZT442rdc- $25 \times 36$ , Chroma) and then expand the beam with a telescope system of L1 ( $f = 7.5 mm$ , AC050-008-A-ML, Thorlabs) and L2 ( $f = 100 mm$ , AC254-100-A, Thorlabs). The system utilizes cylindrical lenses (CL, ACY254-100-A, Thorlabs) to transform the expanded laser beams into line beams at the lens’s rear focal plane and then project the line beams onto the objective’s focal plane using the illumination tube lens (TL1, AC254-150-A-ML, Thorlabs) and the objective lens ( $20 \times$ , NA 1.0, XLUMPLFLN20XW, Olympus). To adjust the off-axis positions of the illumination beams at the sub-pixel accuracy, we choose to change the beam positions after the beam expansion rather than before it. To independently adjust each wavelength, we split the illumination beams into four separate paths using polarizing beam splitters (PBSs, CCM1-PBS251/M, Thorlabs) and dichroic mirrors (DM4 and DM5, Di03-R405-t3- $25 \times 36$ ; DM6 and DM7, Di02-R514- $25 \times 36$ , Semrock). We control the transverse position of the corresponding line illumination beam by adjusting the angle of the reflection mirror (PF10-03-P01, Thorlabs) in each path.

Figure 5.Schematic of the four-color off-axis spectral encoding line scanning microscope. The optical path mainly consists of three parts: beam combination and expansion, independent adjustment of the positions for each beam, and detection of multicolor fluorescence signals.

Download full size

View all figures

The fluorescent signals from the sample are projected onto the sCMOS detector (ORCA-Flash 4.0, Hamamatsu) by the objective lens and detection tube lens (TL2, U-TLU, Olympus). The sample is positioned on a 2D translation stage (X-axis, ABL20020; Y-axis, ANT130, Aerotech) to achieve the scanning of the entire imaging range.

We first capture the off-axis spectral-encoded mixed images under the four-wavelength simultaneous excitation in the image acquisition. Then, we acquire single-wavelength-excitation images for each kind of fluorescent label under the corresponding wavelength illumination and emission filter as the GT images. All the subsequent experiments utilize only a minimal six-row subarray of the sCMOS detector to acquire the images with a pixel size of 0.325 μm. For convenience, we named each channel according to its excitation wavelength.

B. Four-Color Off-Axis Spectral Encoding Imaging System

To evaluate the imaging performance of the off-axis spectral encoding, we imaged tricolor fluorescent beads with a diameter of 200 nm (Fig. 6). Due to the low excitation efficiency of blue beads under visible light, the sample is a smear of a mixture of three types of fluorescent beads (F8811, F8810, and F8806, Thermo Fisher) with excitation/emission peaks at 505/515 nm, 580/605 nm, and 625/645 nm, respectively.

Figure 6.Multicolor imaging of 200-nm-diameter fluorescent beads. (a) The raw-mixed (Raw, left), linear-decoded (Decoded, middle), and ground-truth (GT, right) images. The four rows of images are the images of three detection channels and their merged image. 488 nm, 561 nm, and 633 nm correspond to the excitation wavelength of each channel, and the corresponding decoded results are shown in cyan, yellow, and purple, respectively. The white boxed area shows the typical signal distribution of the three types of beads. (b) The line profiles and corresponding Gaussian fittings through the beads along a 45° diagonal from the images in (a). (c) The numbers of the fluorescent beads in the raw-mixed, linear-decoded, and GT images. Scale bar: 10 μm.

Download full size

View all figures

After completing the image acquisition as described, we performed PSF linear decoding on the off-axis spectral-encoded mixed images using Eq. (3) to get the linear-decoded images. We served the single-wavelength excitation images as GT. In Fig. 6(a), the first column presents raw mixed images from three sub-detectors and the merged image from top to bottom. The second and third columns show the linear-decoded images and GT, with monochromatic and merged images of three fluorescent beads from top to bottom. The white boxes in Fig. 6(a) highlight a typical area containing three colors of fluorescent beads. As expected, the raw-mixed images from each sub-detector captured signals from all bead colors, yet with variations in signal intensity. The linear-decoded images contained only the beads of the corresponding color in each channel. Moreover, the signal distribution of the linear-decoded images was consistent with that in the GT, preliminarily confirming the feasibility of the method.

To assess the resolution of the decoded images, we measured the full width at half maximum (FWHM) at a 45° diagonal for five randomly selected beads per color from the raw-mixed, linear-decoded, and GT images [Fig. 6(b)]. The FWHMs of the 488-nm-excited beads in the three types of images are $0.45 \pm 0.02 μm$ , $0.44 \pm 0.03 μm$ , and $0.42 \pm 0.02 μm$ ( $mean \pm s . d .$ , $n = 5$ ). The FWHMs of the 561-nm-excited beads are $0.54 \pm 0.02 μm$ , $0.54 \pm 0.02 μm$ , and $0.52 \pm 0.02 μm$ . The FWHMs of the 633-nm-excited beads are $0.58 \pm 0.01 μm$ , $0.58 \pm 0.01 μm$ , and $0.57 \pm 0.01 μm$ . The FWHMs of beads in the decoded images deviate by less than 5% compared to those in the raw-mixed and GT images per channel. Additionally, the FWHMs of the beads increase with the corresponding excitation wavelengths, which is consistent with the trend of optical resolution varying with the wavelength. Figure 6(b) also shows the Gaussian fitting curves of the intensity distributions for the beads of each color across the three types of images, showing that the three curves are generally consistent. These results indicate that the linear decoding process can correctly recover the corresponding fluorescent labels from the raw-mixed images without extra resolution loss compared with the GT.

To analyze the accuracy of PSF linear decoding, we counted the three types of beads in the raw-mixed, linear-decoded, and GT images with a size of $2048 \times 2048$ pixels [Fig. 6(c)]. We binarized the images in each channel and counted the connected components as the beads. The numbers of beads in the three channels for raw-mixed, linear-decoded, and GT images are 410/43/42, 413/190/192, and 411/180/182, respectively. The raw-mixed images demonstrate nearly uniform bead counts across the three channels. It suggests that each detection channel of the raw images has acquired signals from all three types of fluorescent beads due to the spectral crosstalk. In contrast, the linear-decoded and GT images calculated almost the same numbers of the fluorescent beads detected by the three channels, indicating that the PSF linear decoding method can accurately identify the different fluorescently labeled beads. These results show that PSF linear decoding is feasible and accurate on the classic multicolor sample with minimal loss in image quality.

C. Multicolor Imaging of E. coli

To analyze the feasibility of biological samples, we performed four-color imaging on the mixture of E. coli Arctic strains (230191, Agilent Technologies) labeled with EBFP (cyan), EGFP (green), mScarlet (yellow), and smURFP (red), as shown in Fig. 7. To prepare the mixed E. coli samples, we centrifuged the bacterial suspensions at 8000 r/min for 2 min (M1324, RWD Life Science), removed the supernatant, and resuspended the pellets in 100 μL PBS. Next, we diluted 5 μL of this suspension with 95 μL PBS and pipetted 4 μL onto a microscope slide. Subsequently, we covered it with a coverslip and sealed the edges with nail polish to prevent drying. Then, we prepared four mixed samples of four E. coli types with different proportions.

Figure 7.Multicolor imaging of the mixture of four E. coli strains with different fluorescent labels. (a) PSF linear-decoded image of the mixture of four E. coli strains labeled by EBFP (cyan), EGFP (green), mScarlet (yellow), and smURFP (red). (b) Zoomed merged images of the white box in (a) using raw-mixed, linear-decoded, and ground-truth data. (c) Correlation between the proportion of each E. coli species detected from the PSF linear-decoded and the GT images. Data points of different colors and shapes correspond to different types of E. coli. (d) Comparing the data acquisition time of the GT images by four-color sequential imaging and those of the decoded images of off-axis spectral encoding four-color simultaneous imaging. Scale bar: 100 μm (a), 20 μm (b).

Download full size

View all figures

We collected the off-axis spectral encoding raw images across $2048 \times 6000$ pixels and performed the linear PSF decoding as Eq. (3) to obtain monochromatic images for each channel. Figure 7(a) shows a merged image of the PSF linear-decoded monochromatic images with a selected region of $2048 \times 2048$ pixels in the first sample. Different colors in the image represent the bacteria with different fluorescent labels. Figure 7(b) shows the zoomed views of the merged images of all channels using raw-mixed, PSF linear-decoded, and GT data of the white box in Fig. 7(a). The linear-decoded image accurately identified all E. coli in the same way as the GT image, demonstrating that PSF linear decoding can extract monochromatic images from the off-axis spectral encoding data. We further calculated the relative proportions of each E. coli species in the four mixtures based on the decoded images to assess the accuracy of PSF linear decoding in biological imaging. We randomly selected $2048 \times 2048$ -pixel regions in the four samples and derived 16 proportions across the four channels. Then, we matched the calculated ratios from the decoded and GT images to form 16 data points, with different colors representing different E. coli species, as depicted in Fig. 7(c). We linearly fitted the data points to obtain a correlation coefficient. It is closer to one, which indicates a higher agreement between the decoded and GT values. The experimental correlation coefficient exceeds 0.99, validating the accuracy of the linear decoding images.

To highlight the speed benefits of off-axis spectral encoding, we compared its imaging speed with sequential monochromatic imaging over a $0.67 mm \times 1.95 mm$ area. Our method took 31 s, 5.8 times faster than 180 s for sequential imaging [Fig. 7(d)]. The speed improvement is slightly over four times as anticipated due to additional time for switching wavelengths and filters. Therefore, these results demonstrate that OSEM enables efficient and reliable simultaneous multicolor imaging of thin biological samples.

D. Multicolor Imaging of Brain Slice

To evaluate the ability of OSEM to image thick tissues, we performed four-color simultaneous imaging on a 5-μm-thick slice of BALB/c (000651, Jackson Laboratory) mouse brain by off-axis spectral encoding, as shown in Fig. 8. The sample contains four fluorescence signals of the nucleus (blue), neuronal cytoplasm (green) and nucleus (magenta), and astrocyte (red), labeled by DAPI, IF488, IF555, and IF647, respectively.

Figure 8.Multicolor imaging of a 5-μm-thick BALB/c mouse brain slice. (a) Coronal image of the brain slice reconstructed by deep learning decoding, which contains four fluorescence signals of the nucleus (blue), neuron cytoplasm (green), neuron nucleus (magenta), and astrocyte (red). (b) Zoomed images of the yellow rectangular box in (a), showing the raw-mixed, PSF linear-decoded, deep-learning-decoded, and GT images from left to right. (c) Statistical results of structural similarity (SSIM) of the raw-mixed, linear-decoded, and deep-learning-decoded images with GT images. (d) Zoomed images of the square in (b). From left to right are the raw-mixed, linear-decoded, wide-field, line confocal, deep-learning-decoded, and GT images. The merged images and the decoded images of 405-, 488-, 561-, and 633-nm-excitation channels are from top to bottom. (e) Signal intensity profiles along the corresponding white lines in (d). Two black arrows indicate a bright and a weak protrusion. The black box indicates another weak protrusion, as indicated by a white arrow in (d). Scale bar: 1 mm (a), 100 μm (b), 15 μm (d).

Download full size

View all figures

We captured the off-axis spectral encoding in-focus data for the $8.7 mm \times 5.7 mm$ area in 17.4 s, achieving a four-color simultaneous imaging speed of $2.85 {mm}^{2} / s$ . Then, we performed PSF linear decoding on the raw-mixed images by Eq. (3). We trained a deep learning model on a small sample area of $6.84 {mm}^{2}$ , dividing it into 3000 blocks of $512 \times 512$ pixels to form the training dataset. Then, we used the well-trained network to decode the entire coronal brain slice image, as shown in Fig. 8(a). We also applied LiMo [18] on monochromatic images to obtain optical-sectioning GT by subtracting edge sub-detector images from center sub-detector images.

To illustrate the difference between the two decoding methods, we show the enlarged four-color merged images from the raw-mixed, linear-decoded, deep-learning-decoded, and GT data of the yellow box in Fig. 8(a), as shown in Fig. 8(b). The four-channel merged image of the raw data shows white signals, indicating significant spectral crosstalk among the sub-detectors. The PSF linear decoding results show that although it achieved partial decoding, there was still significant crosstalk residue. It may result from the brain slice’s thickness exceeding the range of the 2D PSF approximation. In contrast, deep learning decoding accurately decoded the four fluorescence-labeled structures, compared with the GT.

We calculated the structural similarity (SSIM) [28] of the raw-mixed, linear-decoded, and deep-learning-decoded images with the GT to quantitatively assess the spectral crosstalk on five randomly selected images with a size of $1000 \times 1000$ pixels. The SSIMs of the raw-mixed, linear-decoded, and deep-learning-decoded images were $0.360 \pm 0.010$ , $0.430 \pm 0.020$ , and $0.910 \pm 0.004$ , respectively, as shown in Fig. 8(c). These results demonstrate the feasibility and necessity of deep learning decoding for thick biological samples, compared with PSF linear decoding.

To demonstrate the high quality of the deep-learning-decoded images, we compared four channel images of the raw-mixed, linear-decoded, wide-field, line confocal, deep-learning-decoded, and LiMo images of the yellow box in Fig. 8(b), as shown in Fig. 8(d). We summed the raw decoded images of the sub-detectors to generate the wide-field images.

We selected the raw data from the middle one as the line confocal image from the sequential monochromatic acquisitions. Figure 8(d) shows the merged images at the top, followed by the images in individual channels. We calculated the signal-to-back ratio (SBR) in each image by segmenting signal and background regions through binarization and calculating the ratio of their mean values. The SBRs of the raw-mixed, linear-decoded, wide-field, line confocal, deep-learning-decoded, and LiMo images for the four channels were 2.7/6.5/2.9/1.8, 2.8/8.2/1.7/5.1, 2.1/7.4/5.2/9.3, 3.7/10.4/8.5/18.0, 5.0/13.8/12.9/18.0, and 6.0/15.5/14.4/18.8, respectively. The deep-learning-decoded images are the closest to the LiMo image in quality and surpass wide-field and line confocal images. It indicates that the deep learning decoding method effectively decodes the spectral-decoded data and inherits LiMo’s optical sectioning ability.

To highlight the advantage of high SBR, we plotted the intensity profiles of astrocyte signals along the dashed lines in Fig. 8(d), as shown in Fig. 8(e). Although all methods can resolve the bright protrusion on the far left, its adjacent faint protrusion is indistinguishable from the background in the wide-field image due to the lack of optical sectioning capability. Other methods successfully resolve this weak protrusion, though the raw-mixed and linear-decoded images exhibit higher residual background and spectral crosstalk. Deep learning decoding achieves the background suppression closest to that of GT and is superior to line confocal scheme.

The black box in Fig. 8(e) shows an astrocyte protrusion with the FWHM of 0.93 μm, indicated by the white arrow in Fig. 8(d). To assess the resolution of OSEM on biological samples with finer structures, we further randomly selected five similar protrusions and measured their FWHMs. The results of the raw-mixed, deep-learning-decoded, and GT images were $2.76 \pm 0.43 μm$ , $0.82 \pm 0.08 μm$ , and $0.82 \pm 0.09 μm$ , respectively. The consistency between the deep learning decoding and GT demonstrates that OSEM expands the simultaneous imaging channel number without sacrificing the original resolution of GT in biological samples. These results demonstrate that line-illumination off-axis spectral coding imaging with deep learning decoding enables four-color simultaneous high-quality imaging in biological tissues using a single camera.

4. CONCLUSION

We proposed the off-axis spectral encoding imaging method to achieve the four-color simultaneous imaging in a single camera without sacrificing imaging speed and quality. We employed a single illumination spot’s natural intensity modulation difference across off-axis detection positions to generate spectral encoding by adjusting the multicolor excitation beams with the distinct off-axis offset. We theoretically derive the process of off-axis spectral coding and demonstrate two methods of PSF linear and deep learning decoding through simulations. Our imaging of fluorescent beads, bacterial mixtures, and four-color labeled brain slices demonstrates OSEM’s ability to image four colors simultaneously on a single camera without additional time consumption. These results indicate that off-axis spectral encoding is reliable and efficient for high-quality multicolor simultaneous microscopy without extra dispersive devices, modulators, or cameras.

The fabrication process of the sCMOS camera leads to a minimum of eight sub-detectors in its subarray mode. Theoretically, OSEM can image eight excitation spectrum-specific colors simultaneously by focusing each illumination wavelength at the different sub-detectors in the minimum subarray without sacrificing the frame rate or imaging resolution. The challenge is to precisely and efficiently adjust the positions of eight excitation beams. We may further design dispersive elements like diffraction gratings to simplify the illumination path. The deep learning decoding may require retraining when encountering distinct signal features or changing imaging parameters. Future advances in transfer learning or other deep learning may solve this issue. It would be helpful to improve the generalizability of the deep learning decoding. OSEM still needs prior information or network training to decode the raw images. We would develop new decoding algorithms to avoid these preparation steps in the future. The OSEM method potentially becomes a powerful high-throughput tool for biological research, such as mRNA imaging, microbial community analysis, and whole-brain optical imaging.

Category: Imaging Systems, Microscopy, and Displays

Received: Jan. 9, 2025

Accepted: Apr. 19, 2025

Published Online: Jul. 1, 2025

The Author Email: Jing Yuan (yuanj@hust.edu.cn)

DOI:10.1364/PRJ.555248

CSTR:32188.14.PRJ.555248

微信扫一扫：分享