Inter-event interval microscopy for event cameras

Changqing Su; Yanqin Chen; Zihan Lin; Zhen Cheng; You Zhou; Bo Xiong; Zhaofei Yu; Tiejun Huang

doi:10.1364/PRJ.562782

1. INTRODUCTION

In recent years, a novel type of neuromorphic sensor, known as an event camera, has been developed to mimic the dynamic perception capabilities of the retinal periphery. Unlike conventional cameras that capture scene intensity, event cameras independently detect intensity changes at each pixel and record these changes as a stream of “event.” Once the intensity change at a pixel exceeds a predefined threshold, the event camera outputs an event as a four-dimensional tuple, including a timestamp, pixel coordinates, and event polarity. This innovative design circumvents the exposure time limitations of traditional cameras, providing extremely high temporal resolution. Additionally, compared to traditional cameras, event cameras offer advantages such as a high dynamic range and low power consumption, demonstrating the significant potential for applications in microscopy [1,2]. However, while the sparse event output of event cameras reduces the requirements for transmission bandwidth, it also leads to the loss of original intensity information, thus constraining their further application in certain microscopy scenarios [3]. To mitigate this limitation, the dynamic and active pixel vision sensor (DAVIS) [4] was developed by incorporating an active pixel sensor into the event camera, allowing it to output both events and intensity images. By leveraging this fused data, numerous traditional computer vision tasks, such as image restoration [5], video interpolation [6], and object detection and tracking [7], have experienced significant performance enhancements. This improvement mainly utilizes the key advantages of the event stream, such as the high temporal resolution and extensive dynamic range, to compensate for the limitations of conventional frame-based images. Nevertheless, integrating two sensors into a single chip in DAVIS leads to a concomitant reduction in sensitivity and resolution [8], both of which are crucial for microscopic observations that typically require high sensitivity and high resolution.

Some research efforts have attempted to directly convert event streams to videos, which can be broadly categorized into three main types. (1) Traditional methods. These methods primarily rely on gradient information provided by events [9], constraints from the optical flow equation [10], and strong assumptions or prior knowledge [11]. Additionally, some studies [12] have also explored the direct integration of events for reconstruction, which offers good time efficiency. However, all these methods invariably suffer from severe artifacts, edge losses, and intensity distortions. (2) Learning-based methods. With the rapid development of deep learning, it has gradually been introduced into the field of event-based reconstruction [13 –16], significantly improving performance in dynamic scenes compared to traditional methods. However, neural networks require large amounts of data for training, and obtaining such large datasets is often challenging. To address this problem, early researchers developed event simulators like ESIM [17] to generate events. These generated events, along with corresponding image frames, can be used to train networks. Rebecq et al. [18] trained a convolutional neural network model, named E2VID, on synthetic data for end-to-end event-based reconstruction, greatly enhancing the quality of reconstructed videos. Cadena et al. [14] enhanced the reconstruction details of E2VID, while Zhang et al. [19] extended its performance in low-light scenarios. Other studies [15,20,21] have improved event-based reconstruction from different directions. Some have even achieved event-based reconstruction through self-supervised methods [22]. However, these approaches still face the reconstruction quality issues typical in traditional methods, lacking perceptual realism. Additionally, they are generally limited to events generated by motion, whether from camera or object movement. (3) Photography-based methods. Due to the inherent loss of intensity information in event cameras, researchers have explored incorporating traditional photography methods during the imaging process to enhance the event camera’s imaging capabilities. He et al. [23] significantly improved the perceptual capability of event cameras by introducing a rotating prism at the acquisition end, which enhanced reconstruction quality to a certain extent. Bao et al. [24] introduced a controllable aperture at the acquisition end to regulate the intensity changes perceived by the event camera, achieving high-quality imaging of static scenes with event cameras. However, these methods can adversely impact acquisition efficiency, as the modulation at the acquisition end results in the loss of some fluorescence signals excited from the sample, making them unsuitable for microscopy. Moreover, the mechanical structure of the modulation devices limits imaging speed. Therefore, developing an efficient method that enables event cameras to achieve both static and dynamic imaging in microscopy remains a challenging problem.

In fluorescence microscopy, the bit depth of the sensor determines the maximum range of detectable fluorescence signals, defining the minimum and maximum intensities that can be detected. However, the distribution and concentration of fluorescent protein expression in biological samples can vary across several orders of magnitude [25]. This difference can cause the actual dynamic range of the excited fluorescence to exceed the sensor’s dynamic range. For instance, in neuronal imaging, the size disparity between cell bodies and neurons, as well as varying densities among cell clusters, can result in a scene with a high dynamic range [26]. In such scenarios, traditional imaging methods may fail, leading to indistinguishable structures in saturated areas and the loss of crucial structural information within noise. Therefore, expanding the dynamic range of imaging is crucial for fluorescence microscopy. The most intuitive method is to use a multi-exposure acquisition strategy [27], where multiple images of the same scene are captured at different exposure levels and then combined using fusion algorithms to reconstruct a high-dynamic-range image from several low-dynamic-range images. This method has essentially become a standard in photography [28], and its basic principles have been extended to fluorescence microscopy. Vinegoni et al. [26] introduced a multi-exposure acquisition strategy in confocal two-photon microscopy, enabling high-dynamic-range imaging without additional acquisition time. However, this approach faces challenges in aligning multiple low-dynamic-range images [29], with the additional difficulty of achieving high-speed, high-dynamic-range imaging. Therefore, there is an urgent need for a method that can achieve high-speed and high-dynamic-range microscopy.

Fluorescence microscopy exploits fluorophore excitation-emission dynamics to resolve sample structures [30], with conventional intensity-based modalities [31] encoding molecular density through photon flux modulation. However, an event stream does not contain any intensity information, with the only intensity-related element being the recorded timestamp. Event-driven imaging presents a paradigm shift by encoding structural information temporally rather than spatially. We present inter-event interval microscopy (IEIM), a novel modality leveraging the asynchronous temporal resolution of neuromorphic vision sensors to map fluorophore density distributions through precisely timed excitation pulses and event interval analysis. In IEIM, the excitation light intensity of the sample is modulated in high-frequency, low-amplitude pulses rather than remaining constant. By leveraging the high-speed capability of the event camera, data is collected in the form of an event stream. The time intervals between adjacent events can reflect the structural information of the sample. By employing this strategy, IEIM enables both static and dynamic fluorescence imaging using a fixed event camera. Experiments on both real-world and simulated data in static and dynamic scenes demonstrate the state-of-the-art (SOTA) performance of IEIM. Compared to traditional frame-based camera methods, it offers a higher dynamic range, lower bandwidth, and higher speed. Furthermore, IEIM achieves high-speed and high-dynamic imaging at 800 Hz in mimetic dynamic mice brain tissues. Additionally, we demonstrate its capability to capture the movements of in vivo freshwater euglenae at 500 Hz.

2. PRINCIPLES AND METHODS

A. Optical Setup

A custom-designed pulse-modulation-based event-driven fluorescence microscope was built around an Olympus IX-73 microscope stand (IX-73, Olympus America) and the schematic of the optical setup is presented in Fig. 1. The excitation was performed using a continuous-wave laser at a wavelength of 488 nm (OBIS 488 nm LS 150 mW, Coherent) or 642 nm (2RU-VFL-P-2000-642-B1R, MPB Communications). The excitation will pass through an acousto-optic tunable filter (AOTF, AOTFnC-400.650-TN, AA Opto-Electronic) and an excitation filter (FF01-390/482/563/640-25, Semrock) for power control and cleaning, respectively. Subsequently, the excitation passes through a beam expander module and illuminates the fluorescence sample at a certain field of view. The illumination modulation was realized via a data acquisition card (National Instruments, USB-6343), which sends a voltage signal to the blanking channel of the AOTF to control the output power. Corresponding control codes were custom-written in the LabVIEW environment (National Instruments, 64 bit, LabVIEW 2020). The fluorescence collection was carried out using an Olympus $100 \times / 1.5 NA$ UPLAPO100XOHR oil immersion objective or $40 \times / 0.6 NA$ LUCPlanFLN air objective, and imaged onto the event camera (EVK4 HD, Prophesee) or the referencing sCMOS camera (Fusion-BT, Hamamatsu) by a relayed lens module including an emission filter (ZET405/488/561/640mv2, Chroma).

Figure 1.Pipeline of IEIM. (a) The IEIM data collection device. It employs a periodic pulsed modulation of light intensity with a period $T$ . The collected events are processed according to the (b) inter-event interval (IEI) principle, where intervals between events reflect the intensity. (c) IEI calculations are performed on all event streams, and an image is selected from each cycle to represent the intensity at that specific moment.

Download full size

View all figures

B. Event Generation Mechanisms

Event cameras differ from traditional cameras in that they do not operate on a fixed frame rate. Instead, all pixels on the sensor work asynchronously, independently responding to changes in intensity on the pixel and recording these changes in the form of events. For an event $e_{k}$ , it is represented as a four-dimensional tuple, where $e_{k} = (x_{k}, y_{k}, t_{k}, p_{k})$ , with $k$ indicating the $k$ -th event, $(x_{k}, y_{k})$ representing the pixel’s coordinates on the sensor, and $t_{k}$ indicating the time when the event was triggered. Once the intensity change on a pixel exceeds a predefined threshold $θ$ , an event will be triggered, which can be expressed as $L (x_{k}, y_{k}, t_{k}) - L (x_{k}, y_{k}, t_{k} - Δ t_{k}) = θ p_{k},$ (1)where $L (\cdot)$ represents log-transformed pixel intensity, $Δ t_{k}$ is the duration between the current event and previous event, and $p_{k} \in {+ 1, - 1}$ is the polarity signifying the intensity change. Here, $+ 1$ represents an increase in intensity and $- 1$ represents a decrease. The event list of pixel $x_{k}$ in the time domain can be represented as $e (t) = \sum p_{k} δ (t - t_{k})$ . This unique design of recording and data output equips event cameras with several advantages, including high temporal resolution, low bandwidth, low latency, and high dynamic range. These features enable high-speed recording of dynamic scene information while maintaining relatively low bandwidth.

C. Principles of Inter-event Interval Microscopy

According to the principles introduced in Eq. (1), an event camera responds only to changes in brightness, including any factor that causes such changes. Common reconstruction algorithms, like E2VID [18], typically recover intensity images from events generated by motion. However, event data inherently does not record light intensity, with the only information related to intensity being the timestamp recorded in the event data. This raises the question: can we directly infer the structural information of a sample from the timestamps? To answer this, it is essential to consider the fundamental principles of fluorescence microscopy, which rely on inconsistencies in the distribution density of fluorescent molecules that result in different intensities of fluorescence being excited by different structures within the sample. The intensity difference is merely a visualization of the densities of fluorescent molecules in the sample. Therefore, the core idea is to establish a relationship between the densities of fluorescent molecules and the event’s timestamps.

Suppose that the fluorescence efficiency of a single fluorescent molecule is fixed. When the excitation light intensity suddenly changes, differences in densities of fluorescent molecules mainly affect the rate of intensity change in the sample. Event cameras, which are sensitive to these changes, can effectively capture this variation. With a fixed threshold in an event camera, the rate of intensity change corresponds to the time interval between consecutive events in the event stream or the firing frequency of the events. Consider a scenario where the initial light intensity applied to the sample is zero, meaning the sample is not yet excited. At a certain moment, a light intensity with an instantaneous increase is applied to excite the sample. Since the excitation fluorescence cannot increase instantaneously, the intensity will rise more rapidly in regions with higher densities of fluorescent molecules and more slowly in regions with lower densities. In the event records, regions where intensity changes rapidly will show smaller time intervals between consecutive events, i.e., higher firing frequencies, whereas regions with slower intensity changes will have larger time intervals, i.e., lower firing frequencies. Thus, we can directly represent the different densities of fluorescent molecules in the sample using the time intervals between consecutive events, without the need for traditional intensity reconstruction.

D. Implementation of IEI

The critical component of our method is the pulse modulation of the light source, which involves varying the excitation light intensity in the form of high-frequency pulses, similar to rapidly switching the light source on and off. In our setup, an AOTF is employed as the pulse modulation device, as illustrated in Fig. 1. Additionally, other devices with similar functionalities, such as electro-optic modulators (EOMs) and acousto-optic modulators (AOMs), can also be employed. The AOTF functions as an electronic switch, enabling high-frequency, stable on-off operations. When integrated into the excitation light path, it modulates the excitation light to produce a pulsed variation on the sample. The temporal fluctuation in light intensity on the sample can be described by the following equation: $I (t) = {\begin{matrix} B, & if 0 \leq t \leq \frac{T}{2}, \\ 0, & if \frac{T}{2} < t \leq T, \end{matrix}$ (2)where $B$ represents the maximum excitation light intensity, and $T$ denotes the period of the pulse modulation. By adjusting the value of $T$ , the acquisition speed of IEIM can be controlled, while the value of $B$ should be appropriately calibrated based on the specific sample. The intensity $B$ must exceed the light intensity threshold required to trigger an event, but it should not be excessively high. A detailed derivation will be provided below.

Suppose the fluorescence efficiency of a single fluorescent molecule is fixed at $K$ , where fluorescence efficiency refers to the ratio of the number of fluorescent photons emitted by the fluorophore to the number of excitation photons absorbed, and the density of fluorescent molecules varies across different structural regions within a biological sample. Such variations are essential for revealing the structural information of the sample in microscopic imaging. For the imaging sensor plane, let $D (x, y)$ denote the density of fluorescent molecules in the sample, which is then imaged at the pixel position $(x, y)$ . The primary objective of fluorescence imaging is to determine $D (x, y)$ . Under an excitation light intensity of $B$ , the fluorescence intensity detected by the imaging sensor plane in traditional imaging can be expressed as $m (x, y) = B K D (x, y),$ (3)where $m (x, y)$ represents the fluorescence intensity at the pixel position $(x, y)$ on the sensor plane. Since $B$ and $K$ are typically constant within a given scene, $D (x, y)$ is directly proportional to $m (x, y)$ . Therefore, the intensity image recorded by the sensor effectively reflects the density of fluorescent molecules, which corresponds to the structural information of the sample. However, in event cameras, the absence of direct intensity recording makes it challenging to reconstruct high-quality structural information of the sample.

In IEIM, we introduce modulation of the excitation light intensity so that during imaging, the light intensity no longer remains constant at $B$ but varies between zero and $B$ in a pulsed manner. When the light intensity abruptly shifts from zero to $B$ , the sensor detects a continuous increase in intensity due to the bandwidth limitations of the photodiode and the fluorescence delay time, rather than an instantaneous change. Since the intensity change at the detection end is equivalent to the change at the excitation end, we can incorporate the light intensity variation into the equation, and express it as follows: $m (x, y, t) = b (t) K D (x, y),$ (4)where $b (t) = H t$ represents the excitation intensity that continuously increases over time $t$ with a rate $H$ and $m (x, y, t)$ denotes the fluorescence intensity at the pixel position $(x, y)$ on the sensor plane at time $t$ . In event cameras, this intensity information is not recorded directly. According to Eq. (1), we can provide a circuit-level explanation for IEIM as follows: $\frac{1}{C_{p}} \log (1 + m (x, y, t)) - \frac{1}{C_{p}} \log (1 + m (x, y, t - 1)) = θ,$ (5)where $C_{p}$ represents the capacitance of the photodiode, and $\frac{1}{C_{p}} \log (1 + m (x, y, t)) = L (x, y, t)$ . When the value of $m (x, y, t)$ is kept as small as possible, approaching zero, Eq. (5) can be simplified to $m (x, y, t) - m (x, y, t - 1) = θ C_{p} .$ (6)

Substituting Eq. (4) into Eq. (6) yields $b (t) K D (x, y) - b (t - 1) K D (x, y) = θ C_{p} .$ (7)

In the event stream, $t$ and $t - 1$ correspond to adjacent event timestamps $t_{(x, y, k)}$ and $t_{(x, y, k - 1)}$ at pixel position $(x, y)$ , where $t_{(x, y, k)}$ denotes the timestamp of the $k$ -th event at pixel position $(x, y)$ and $t_{(x, y, k - 1)}$ denotes the timestamp of the $(k - 1)$ -th event at pixel position $(x, y)$ . Substituting $t_{(x, y, k)}$ and $t_{(x, y, k - 1)}$ into $b (t) = H t$ , Eq. (7) can be further expressed as $H t_{(x, y, k)} K D (x, y) - H t_{(x, y, k - 1)} K D (x, y) = θ C_{p} .$ (8)

It can be further written as $D (x, y) = \frac{θ C_{p}}{H K (t_{(x, y, k)} - t_{(x, y, k - 1)})} .$ (9)

Since $θ$ , $C_{p}$ , $H$ , and $K$ are constants, the density of fluorescent molecules $D (x, y)$ is inversely proportional to the time interval between adjacent events at position $(x, y)$ . Thus, the time interval between adjacent events can be used to represent the structural information of the sample.

According to the assumptions underlying Eq. (6), IEIM requires the light intensity to fluctuate at high frequencies near zero. When the light intensity is relatively high, that is, $m (x, y, t) ≫ 1$ , Eq. (5) is no longer approximately equal to Eq. (6) but instead approximates the following expression: $\log (m (x, y, t)) - \log (m (x, y, t - 1)) = θ C_{p} .$ (10)

Substituting Eq. (4) into Eq. (10), it can be rewritten as $\log (\frac{H t_{(x, y, k)} K D (x, y)}{H t_{(x, y, k - 1)} K D (x, y)}) = \log (\frac{t_{(x, y, k)}}{t_{(x, y, k - 1)}}) = θ C_{p} .$ (11)

According to Eq. (11), the event data in this scenario no longer correlates with the density of fluorescent molecules $D (x, y)$ within the sample, but instead conforms to a fixed event generation pattern. Therefore, in fluorescence microscopy, to directly extract structural information from the event stream, it is essential to modulate the excitation light intensity with high-frequency fluctuations around zero.

3. EXPERIMENTS

A. Simulated Data

To validate the theoretical performance of the IEIM method, we collected both static and dynamic fluorescence images. The static images included actin, membrane, mitochondria, and nucleus images from Ref. [32], while the dynamic images included Byn protein, sourced from Ref. [33]. These datasets were used to simulate time series modulated by pulsed light, which were then fed into the event simulator DVS-Voltmeter [34] to generate synthetic event streams. Since methods for network reconstruction are typically based only on events generated by motion, we also generated a time series by periodically shaking the images without light modulation. These sequences were similarly input into the DVS-Voltmeter to generate event streams for network reconstruction.

B. Performance Evaluation of the IEIM in Static and Dynamic Scenes Using Simulated Data

We have successfully achieved event-based fluorescence microscopy for both static and dynamic scenes, a capability that currently has no equivalent in microscopy. In contrast, several macroscopic methods, such as E2VID [18], E2VID+ [15], FireNet+ [15], and ET-Net [20], can convert motion-generated events to intensity. Therefore, we compared our method with these approaches using synthesized data. For the comparison, we utilized synthesized data without light modulation and employed pre-trained public models. The quantitative comparison results are presented in Table 1, demonstrating that our method significantly outperforms others, yielding results closest to the original images. Our method achieved an improvement of over 97% in MSE, at least 20% in LPIPS, and more than 14% in SSIM. The relatively lower improvement in the SSIM metric can be attributed to the intrinsic background signal filtering characteristic of IEIM, thereby limiting the extent of enhancement. However, background signals in fluorescence microscopy are typically regarded as noise, making this filtering beneficial for fluorescence microscopy.Table 1.

Comparison of Methods on Different Metrics^a

	E2VID			E2VID+			ET-Net			FireNet+			IEIM (Ours)
Data	MSE $↓$	SSIM $↑$	LPIPS $↓$	MSE $↓$	SSIM $↑$	LPIPS $↓$	MSE $↓$	SSIM $↑$	LPIPS $↓$	MSE $↓$	SSIM $↑$	LPIPS $↓$	MSE $↓$	SSIM $↑$	LPIPS $↓$
mem(norm_s)	0.224	0.072	0.607	0.153	0.059	0.689	0.236	0.050	0.748	0.247	0.053	0.779	0.001	0.888	0.152
mit(norm_s)	0.199	0.182	0.550	0.191	0.153	0.630	0.192	0.160	0.617	0.218	0.145	0.603	0.002	0.838	0.139
act(ivert_s)	0.153	0.652	0.475	0.249	0.583	0.541	0.160	0.645	0.473	0.141	0.593	0.486	0.003	0.907	0.162
nuc(ivert_s)	0.177	0.652	0.450	0.243	0.604	0.560	0.141	0.691	0.496	0.177	0.642	0.504	0.003	0.748	0.343
byn(norm_d)	0.514	0.037	0.811	0.245	0.035	0.653	0.323	0.048	0.837	0.654	0.037	0.665	0.007	0.915	0.106

The best-performing result in each group is highlighted in bold; “mem” stands for the first three letters of the sample name, “norm” indicates normal images, “ivert” refers to inverted color images, “s” denotes static scenes, and “d” signifies dynamic scenes. For example, “mem(norm_s)” denotes a normal static image of the sample “membrane.”

Qualitative comparison results on synthesized data are shown in Figs. 2 and 3, including static and dynamic scenes, respectively. In both scenarios, our method achieved the most accurate grayscale reconstruction. Additionally, due to the differences between macroscopic and fluorescence microscopy techniques, we applied a color reverse to the fluorescence images to make them more suitable for comparison with SOTA models. The results, as seen in the actin and nucleus samples in Fig. 2, further demonstrate that our method significantly outperforms SOTA methods. Current SOTA methods exhibit intensity distortions in the results, with the most critical issue being the completely inaccurate reconstruction of background intensity. We further compared the reconstruction results of state-of-the-art methods directly on the modulated data, which can be found in Appendix B.

Figure 2.Comparison in synthesized static scenes. (a) Qualitative comparison results on normal synthesized data: membrane and mitochondria. (b) Qualitative comparison results on color-reversed synthesized data: actin and nucleus. Our method significantly outperforms SOTA methods.

Download full size

View all figures

Figure 3.Comparison in synthesized motion scenes. Our method significantly outperforms SOTA methods. The red arrow in the GT image points to the area with significant motion. Scale bar, 5 μm.

Download full size

View all figures

C. IEIM Enables Event-Based Fluorescence Microscopy in Static Scenes

To evaluate the performance of the IEIM method on real-world data, we collected event streams from the sample under varying modulation frequencies and light powers. To compare with SOTA methods, we also collected event streams relying solely on slow sample movement without light modulation for model reconstruction. The models used for comparison were the publicly available pre-trained models. The qualitative results are shown in Fig. 4. As the modulation frequency increases while the light power remains constant, the imaging details of IEIM gradually decrease, as depicted in the first row of Fig. 4, where the details in the zoomed-in images are progressively lost. This is primarily due to the event camera’s sampling capability being limited by light power. When the light power is fixed, increasing the frequency shortens the exposure time, leading to insufficient data capture. This phenomenon is similar to the information loss in traditional cameras when the exposure time is too short. However, this issue can be mitigated by increasing the light power. As shown in the second row of Fig. 4, when the modulation frequency is fixed and the light power is gradually increased, we observe that the imaging details increase and the dynamic range is also enhanced. This implies that achieving high-speed imaging requires matching the light power appropriately. Furthermore, when employing SOTA models for reconstruction, as shown in the last row of Fig. 4, the reconstruction quality of our method significantly outperforms these methods, which suffer from notable power distortion issues and completely inaccurate background reconstruction.

Figure 4.Comparison in real-world static scenes. (a) When the power is fixed, the imaging quality varies with the modulation frequency. Excessively high frequencies can lead to a decline in imaging quality, as evident from the zoomed-in details in the second row, indicated by the arrow. (b) When the modulation frequency is fixed, the imaging quality varies with the power. Increasing power can improve imaging quality, but too much power can cause detail loss, as evident from the zoomed-in details in the second row, indicated by the arrow. (c) The reconstruction results of SOTA models on data with a power of 6 mW and a modulation frequency of 500 Hz. The reference image is a region captured by the sCMOS camera that overlaps with the field of view of the event camera. Imaging with IEIM requires appropriate calibration between modulation frequency and light power, and the resulting image quality far exceeds that of SOTA methods. The power in the lower right corner corresponds to the laser power at the output of the objective lens. Scale bar, 2 μm in (a)–(c).

Download full size

View all figures

D. IEIM Enables Fluorescence Microscopy in Dynamic Scenes

To evaluate the dynamic imaging capabilities of the IEIM method, we collected event streams from rapidly moving samples by systematically translating the sample stage. We then compared the reconstruction performance of our method with SOTA methods. Due to the limitations of the light power in our acquisition system, we collected event streams from the fast-moving samples at a maximum modulation frequency of 800 Hz. Theoretically, this frequency could be further increased with enhanced light power. Since the events generated by the fast-moving samples in dynamic scenes are similar to the data used in network training, the same data was used for both network reconstruction and our method in this experiment. As shown in Fig. 5, our method achieves a dynamic imaging speed of 800 frames per second in dynamic scenes while maintaining high imaging quality. Compared to the SOTA methods, our method demonstrates superior reconstruction quality. The curve shown in Fig. 5(e) corresponds to the yellow solid line in Figs. 5(a)–5(d). This region contains clear structural edges that are sensitive to reconstruction quality. It can be observed that the grayscale curve reconstructed by IEIM exhibits higher gradients and clearer structural transitions, indicating good edge preservation, whereas other methods show obvious smoothing or structural distortion in this area, resulting in significant detail loss. As shown in Fig. 5(f), the selected position corresponds to the white dashed line in Fig. 5(a), recording the intensity response of this region over time. This location lies on the trajectory of a moving sample. The curve obtained by IEIM clearly reflects the brightness changes during the sample’s passage, demonstrating good temporal resolution and consistent variation trends. In contrast, comparison methods exhibit discontinuous signals, and fail to accurately characterize the dynamic process. Additionally, we compared the reconstruction performance of the network methods at different temporal resolutions. The results show that in dynamic scenes, increasing the reconstruction’s temporal resolution significantly reduces motion blur, but this improvement comes at the cost of losing many details. In fluorescence microscopy, the observed samples typically do not exhibit uniform global motion. Typically, only a small portion of the structure is in rapid motion, while other areas remain static or move slowly. In such cases, network reconstruction methods present a trade-off between temporal resolution and reconstruction detail. In contrast, our method enables dynamic imaging with high global temporal resolution, effectively addressing this challenge.

Figure 5.Comparison in real-world motion scenes. (a) Dynamic imaging results of IEIM at a modulation frequency of 800 Hz and a light power of 9 mW. The second row of images shows detailed views of the white boxed areas in the first row of images, with the dashed lines indicating fixed positions. (b) Reconstruction results of E2VID at a temporal resolution of 1.25 ms. (c) Reconstruction results of E2VID at a temporal resolution of 0.62 ms. (d) Reconstruction results of E2VID at a temporal resolution of 0.31 ms. Our method achieves high-quality imaging with high temporal resolution without the trade-off between temporal resolution and reconstruction detail seen in methods like E2VID. The values in parentheses indicate the temporal resolution of the reconstruction, and the arrow points to the significant difference caused by the variation in reconstruction time resolution. (e) Intensity variation curve along the yellow solid line in (a)–(d). (f) Intensity variation over time at the white dashed line position in (a). Scale bar, 2 μm in (a)–(d).

Download full size

View all figures

E. IEIM Achieves Dynamic Fluorescence Imaging of Euglenae

To further validate the performance of IEIM on in vivo samples, we captured the motion of microorganisms using the system described in Section 2.A. At a modulation frequency of 500 Hz, we recorded event streams of common freshwater euglenae movement and applied IEIM for image reconstruction. The results are shown in Fig. 6, which illustrates the motion sequence within 60 ms at different temporal resolutions (6 ms, 4 ms, and 2 ms). In this scenario, the euglenae exhibited heterogeneous motion: some remained relatively static, while others moved rapidly. Compared with deep-learning-based approaches, which struggled to maintain reconstruction quality across both slow and fast motion regimes, as discussed in Section 3.D, IEIM successfully achieved high-speed imaging of euglenae movement while preserving the integrity of static spatial details. We also compared the reconstruction results of state-of-the-art methods, which can be found in Appendix D.

Figure 6.Dynamic imaging of common freshwater euglenae. The dynamic processes are presented at three different temporal resolutions, with a minimum of 2 ms, demonstrating that IEIM can effectively image dynamic processes. The images in the second row are magnified views of the regions enclosed by the white boxes in the first row, with dashed lines serving as reference markers to provide a more intuitive visualization of the motion.

Download full size

View all figures

F. Modulation in IEIM Enables High-Quality Event-Based Fluorescence Microscopy

To validate the effectiveness of our modulation method, we directly applied our proposed reconstruction method to both non-modulated and modulated light intensity data, including both synthesized and real-world data. The results are shown in Fig. 7, where it is evident that only the data with light modulation can achieve high-quality reconstruction. This is primarily because our proposed reconstruction method relies on the light intensity starting from zero and then varying in a pulsed manner. However, in the events generated by motion, the initial position already contains intensity, which does not satisfy the assumption in Eq. (6) that the intensity approaches zero. As a result, the reconstruction deviates from the expected outcome.

Figure 7.Comparison experiments with and without the modulation device. p.d.: pulsed modulation device. (a)–(c) Results in synthesized data. (d)–(f) Results in real-world data. The reference image is a region captured by the sCMOS camera that overlaps with the field of view of the event camera. Scale bar, 5 μm.

Download full size

View all figures

4. CONCLUSION AND DISCUSSION

In this paper, we have established the first imaging framework for event-based spatiotemporal fluorescence microscopy, named inter-event interval microscopy (IEIM), achieving both static and dynamic imaging through fundamental reengineering of both acquisition physics and a reconstruction algorithm. Our proposed IEIM incorporates three key innovations: (1) a novel event-driven fluorescence imaging architecture integrating a synchronized pulsed modulation device in the excitation pathway, enabling precise temporal control of illumination intensity; (2) an advanced temporal reconstruction algorithm based on inter-event time intervals achieving high-dynamic-range imaging with microsecond-level temporal resolution, eliminating the need for extensive training datasets required by supervised-learning-based reconstruction approaches; (3) a unified event-based computational framework that preserves the event camera’s high-temporal-resolution capture of dynamic biological processes while enabling static structure recording—capabilities that conventional event-driven imaging paradigms have long struggled to achieve. Experiments on both static and dynamic data demonstrate that the pulsed modulation strategy significantly enhances the imaging performance of the event camera. This strategy enables both static and dynamic fluorescence imaging with a fixed event camera. Furthermore, experiments on both simulated and real-world data demonstrate the state-of-the-art performance of IEIM, and its ability to achieve high-speed, high-dynamic imaging at 800 Hz. Our IEIM greatly expands the potential imaging scenarios for event cameras. Future research will focus on collecting imaging data under varying modulation frequencies and powers, establishing the relationship between imaging quality and these parameters, and predicting the optimal imaging parameters for different samples.

Despite its advantages, the performance of IEIM is subject to several practical constraints. A key limitation lies in the trade-off between modulation power and frequency: increasing imaging speed via higher modulation frequency requires a proportional rise in modulation power to avoid signal degradation or information loss. However, this balance is constrained by the fluorescence lifetime, which sets a physical limit on how fast modulation can occur without compromising signal integrity. Additionally, the event camera’s data throughput poses an upper bound to the imaging rate, particularly for dense samples, where excessive event rates may lead to system instability or failure. These factors necessitate careful calibration of system parameters based on the sample characteristics and hardware capabilities.

Category: Imaging Systems, Microscopy, and Displays

Received: Mar. 19, 2025

Accepted: Jul. 1, 2025

Published Online: Sep. 22, 2025

The Author Email: Bo Xiong (boxiong11@outlook.com)

DOI:10.1364/PRJ.562782

CSTR:32188.14.PRJ.562782