Event cameras detect intensity changes rather than absolute intensity, recording variations as a stream of “event.” Intensity reconstruction from these sparse events remains a significant challenge. Previous approaches focused on transforming motion-induced events into videos or achieving intensity imaging for static scenes through modulation devices at the acquisition end. In this paper, we present inter-event interval microscopy (IEIM), a paradigm-shifting technique enabling static and dynamic fluorescence imaging through photon flux-to-temporal encoding, which integrates a pulse-light modulation device into a microscope equipped with an event camera. We also develop the inter-event interval (IEI) reconstruction algorithm for IEIM, which quantifies time intervals between consecutive events at each pixel. With a fixed threshold in the event camera, this time interval can directly encode intensity. The integration of pulse modulation enables IEIM to achieve static and dynamic fluorescence imaging with a fixed event camera. We evaluate the state-of-the-art performance of IEIM using simulated and real-world data under both static and dynamic scenes. We also demonstrate that IEIM achieves high-dynamic, high-speed imaging at 800 Hz in mimetic dynamic mice brain tissues. Furthermore, we show that IEIM enables imaging the movements of in vivo freshwater euglenae at 500 Hz.
【AIGC One Sentence Reading】:This paper introduces inter-event interval microscopy (IEIM), integrating pulse-light modulation into an event-camera microscope. It enables static and dynamic fluorescence imaging via photon flux encoding, using an IEI reconstruction algorithm to quantify event intervals for intensity encoding, achieving high-speed imaging.
【AIGC Short Abstract】:Event cameras record intensity changes as "events." Reconstructing intensity from sparse events is challenging. This paper introduces inter-event interval microscopy (IEIM), integrating a pulse-light modulation device into an event-camera-equipped microscope. The IEI reconstruction algorithm quantifies time intervals between events to encode intensity. IEIM enables static and dynamic fluorescence imaging, achieving high-dynamic, high-speed imaging at 800 Hz in dynamic mice brain tissues and 500 Hz for in vivo freshwater euglenae.
Note: This section is automatically generated by AI . The website and platform operators shall not be liable for any commercial or legal consequences arising from your use of AI generated content on this website. Please be aware of this.
1. INTRODUCTION
In recent years, a novel type of neuromorphic sensor, known as an event camera, has been developed to mimic the dynamic perception capabilities of the retinal periphery. Unlike conventional cameras that capture scene intensity, event cameras independently detect intensity changes at each pixel and record these changes as a stream of “event.” Once the intensity change at a pixel exceeds a predefined threshold, the event camera outputs an event as a four-dimensional tuple, including a timestamp, pixel coordinates, and event polarity. This innovative design circumvents the exposure time limitations of traditional cameras, providing extremely high temporal resolution. Additionally, compared to traditional cameras, event cameras offer advantages such as a high dynamic range and low power consumption, demonstrating the significant potential for applications in microscopy [1,2]. However, while the sparse event output of event cameras reduces the requirements for transmission bandwidth, it also leads to the loss of original intensity information, thus constraining their further application in certain microscopy scenarios [3]. To mitigate this limitation, the dynamic and active pixel vision sensor (DAVIS) [4] was developed by incorporating an active pixel sensor into the event camera, allowing it to output both events and intensity images. By leveraging this fused data, numerous traditional computer vision tasks, such as image restoration [5], video interpolation [6], and object detection and tracking [7], have experienced significant performance enhancements. This improvement mainly utilizes the key advantages of the event stream, such as the high temporal resolution and extensive dynamic range, to compensate for the limitations of conventional frame-based images. Nevertheless, integrating two sensors into a single chip in DAVIS leads to a concomitant reduction in sensitivity and resolution [8], both of which are crucial for microscopic observations that typically require high sensitivity and high resolution.
Some research efforts have attempted to directly convert event streams to videos, which can be broadly categorized into three main types. (1) Traditional methods. These methods primarily rely on gradient information provided by events [9], constraints from the optical flow equation [10], and strong assumptions or prior knowledge [11]. Additionally, some studies [12] have also explored the direct integration of events for reconstruction, which offers good time efficiency. However, all these methods invariably suffer from severe artifacts, edge losses, and intensity distortions. (2) Learning-based methods. With the rapid development of deep learning, it has gradually been introduced into the field of event-based reconstruction [13–16], significantly improving performance in dynamic scenes compared to traditional methods. However, neural networks require large amounts of data for training, and obtaining such large datasets is often challenging. To address this problem, early researchers developed event simulators like ESIM [17] to generate events. These generated events, along with corresponding image frames, can be used to train networks. Rebecq et al. [18] trained a convolutional neural network model, named E2VID, on synthetic data for end-to-end event-based reconstruction, greatly enhancing the quality of reconstructed videos. Cadena et al. [14] enhanced the reconstruction details of E2VID, while Zhang et al. [19] extended its performance in low-light scenarios. Other studies [15,20,21] have improved event-based reconstruction from different directions. Some have even achieved event-based reconstruction through self-supervised methods [22]. However, these approaches still face the reconstruction quality issues typical in traditional methods, lacking perceptual realism. Additionally, they are generally limited to events generated by motion, whether from camera or object movement. (3) Photography-based methods. Due to the inherent loss of intensity information in event cameras, researchers have explored incorporating traditional photography methods during the imaging process to enhance the event camera’s imaging capabilities. He et al. [23] significantly improved the perceptual capability of event cameras by introducing a rotating prism at the acquisition end, which enhanced reconstruction quality to a certain extent. Bao et al. [24] introduced a controllable aperture at the acquisition end to regulate the intensity changes perceived by the event camera, achieving high-quality imaging of static scenes with event cameras. However, these methods can adversely impact acquisition efficiency, as the modulation at the acquisition end results in the loss of some fluorescence signals excited from the sample, making them unsuitable for microscopy. Moreover, the mechanical structure of the modulation devices limits imaging speed. Therefore, developing an efficient method that enables event cameras to achieve both static and dynamic imaging in microscopy remains a challenging problem.
In fluorescence microscopy, the bit depth of the sensor determines the maximum range of detectable fluorescence signals, defining the minimum and maximum intensities that can be detected. However, the distribution and concentration of fluorescent protein expression in biological samples can vary across several orders of magnitude [25]. This difference can cause the actual dynamic range of the excited fluorescence to exceed the sensor’s dynamic range. For instance, in neuronal imaging, the size disparity between cell bodies and neurons, as well as varying densities among cell clusters, can result in a scene with a high dynamic range [26]. In such scenarios, traditional imaging methods may fail, leading to indistinguishable structures in saturated areas and the loss of crucial structural information within noise. Therefore, expanding the dynamic range of imaging is crucial for fluorescence microscopy. The most intuitive method is to use a multi-exposure acquisition strategy [27], where multiple images of the same scene are captured at different exposure levels and then combined using fusion algorithms to reconstruct a high-dynamic-range image from several low-dynamic-range images. This method has essentially become a standard in photography [28], and its basic principles have been extended to fluorescence microscopy. Vinegoni et al. [26] introduced a multi-exposure acquisition strategy in confocal two-photon microscopy, enabling high-dynamic-range imaging without additional acquisition time. However, this approach faces challenges in aligning multiple low-dynamic-range images [29], with the additional difficulty of achieving high-speed, high-dynamic-range imaging. Therefore, there is an urgent need for a method that can achieve high-speed and high-dynamic-range microscopy.
Sign up for Photonics Research TOC Get the latest issue of Advanced Photonics delivered right to you!Sign up now
Fluorescence microscopy exploits fluorophore excitation-emission dynamics to resolve sample structures [30], with conventional intensity-based modalities [31] encoding molecular density through photon flux modulation. However, an event stream does not contain any intensity information, with the only intensity-related element being the recorded timestamp. Event-driven imaging presents a paradigm shift by encoding structural information temporally rather than spatially. We present inter-event interval microscopy (IEIM), a novel modality leveraging the asynchronous temporal resolution of neuromorphic vision sensors to map fluorophore density distributions through precisely timed excitation pulses and event interval analysis. In IEIM, the excitation light intensity of the sample is modulated in high-frequency, low-amplitude pulses rather than remaining constant. By leveraging the high-speed capability of the event camera, data is collected in the form of an event stream. The time intervals between adjacent events can reflect the structural information of the sample. By employing this strategy, IEIM enables both static and dynamic fluorescence imaging using a fixed event camera. Experiments on both real-world and simulated data in static and dynamic scenes demonstrate the state-of-the-art (SOTA) performance of IEIM. Compared to traditional frame-based camera methods, it offers a higher dynamic range, lower bandwidth, and higher speed. Furthermore, IEIM achieves high-speed and high-dynamic imaging at 800 Hz in mimetic dynamic mice brain tissues. Additionally, we demonstrate its capability to capture the movements of in vivo freshwater euglenae at 500 Hz.
2. PRINCIPLES AND METHODS
A. Optical Setup
A custom-designed pulse-modulation-based event-driven fluorescence microscope was built around an Olympus IX-73 microscope stand (IX-73, Olympus America) and the schematic of the optical setup is presented in Fig. 1. The excitation was performed using a continuous-wave laser at a wavelength of 488 nm (OBIS 488 nm LS 150 mW, Coherent) or 642 nm (2RU-VFL-P-2000-642-B1R, MPB Communications). The excitation will pass through an acousto-optic tunable filter (AOTF, AOTFnC-400.650-TN, AA Opto-Electronic) and an excitation filter (FF01-390/482/563/640-25, Semrock) for power control and cleaning, respectively. Subsequently, the excitation passes through a beam expander module and illuminates the fluorescence sample at a certain field of view. The illumination modulation was realized via a data acquisition card (National Instruments, USB-6343), which sends a voltage signal to the blanking channel of the AOTF to control the output power. Corresponding control codes were custom-written in the LabVIEW environment (National Instruments, 64 bit, LabVIEW 2020). The fluorescence collection was carried out using an Olympus UPLAPO100XOHR oil immersion objective or LUCPlanFLN air objective, and imaged onto the event camera (EVK4 HD, Prophesee) or the referencing sCMOS camera (Fusion-BT, Hamamatsu) by a relayed lens module including an emission filter (ZET405/488/561/640mv2, Chroma).
Figure 1.Pipeline of IEIM. (a) The IEIM data collection device. It employs a periodic pulsed modulation of light intensity with a period . The collected events are processed according to the (b) inter-event interval (IEI) principle, where intervals between events reflect the intensity. (c) IEI calculations are performed on all event streams, and an image is selected from each cycle to represent the intensity at that specific moment.
Event cameras differ from traditional cameras in that they do not operate on a fixed frame rate. Instead, all pixels on the sensor work asynchronously, independently responding to changes in intensity on the pixel and recording these changes in the form of events. For an event , it is represented as a four-dimensional tuple, where , with indicating the -th event, representing the pixel’s coordinates on the sensor, and indicating the time when the event was triggered. Once the intensity change on a pixel exceeds a predefined threshold , an event will be triggered, which can be expressed as where represents log-transformed pixel intensity, is the duration between the current event and previous event, and is the polarity signifying the intensity change. Here, represents an increase in intensity and represents a decrease. The event list of pixel in the time domain can be represented as . This unique design of recording and data output equips event cameras with several advantages, including high temporal resolution, low bandwidth, low latency, and high dynamic range. These features enable high-speed recording of dynamic scene information while maintaining relatively low bandwidth.
C. Principles of Inter-event Interval Microscopy
According to the principles introduced in Eq. (1), an event camera responds only to changes in brightness, including any factor that causes such changes. Common reconstruction algorithms, like E2VID [18], typically recover intensity images from events generated by motion. However, event data inherently does not record light intensity, with the only information related to intensity being the timestamp recorded in the event data. This raises the question: can we directly infer the structural information of a sample from the timestamps? To answer this, it is essential to consider the fundamental principles of fluorescence microscopy, which rely on inconsistencies in the distribution density of fluorescent molecules that result in different intensities of fluorescence being excited by different structures within the sample. The intensity difference is merely a visualization of the densities of fluorescent molecules in the sample. Therefore, the core idea is to establish a relationship between the densities of fluorescent molecules and the event’s timestamps.
Suppose that the fluorescence efficiency of a single fluorescent molecule is fixed. When the excitation light intensity suddenly changes, differences in densities of fluorescent molecules mainly affect the rate of intensity change in the sample. Event cameras, which are sensitive to these changes, can effectively capture this variation. With a fixed threshold in an event camera, the rate of intensity change corresponds to the time interval between consecutive events in the event stream or the firing frequency of the events. Consider a scenario where the initial light intensity applied to the sample is zero, meaning the sample is not yet excited. At a certain moment, a light intensity with an instantaneous increase is applied to excite the sample. Since the excitation fluorescence cannot increase instantaneously, the intensity will rise more rapidly in regions with higher densities of fluorescent molecules and more slowly in regions with lower densities. In the event records, regions where intensity changes rapidly will show smaller time intervals between consecutive events, i.e., higher firing frequencies, whereas regions with slower intensity changes will have larger time intervals, i.e., lower firing frequencies. Thus, we can directly represent the different densities of fluorescent molecules in the sample using the time intervals between consecutive events, without the need for traditional intensity reconstruction.
D. Implementation of IEI
The critical component of our method is the pulse modulation of the light source, which involves varying the excitation light intensity in the form of high-frequency pulses, similar to rapidly switching the light source on and off. In our setup, an AOTF is employed as the pulse modulation device, as illustrated in Fig. 1. Additionally, other devices with similar functionalities, such as electro-optic modulators (EOMs) and acousto-optic modulators (AOMs), can also be employed. The AOTF functions as an electronic switch, enabling high-frequency, stable on-off operations. When integrated into the excitation light path, it modulates the excitation light to produce a pulsed variation on the sample. The temporal fluctuation in light intensity on the sample can be described by the following equation: where represents the maximum excitation light intensity, and denotes the period of the pulse modulation. By adjusting the value of , the acquisition speed of IEIM can be controlled, while the value of should be appropriately calibrated based on the specific sample. The intensity must exceed the light intensity threshold required to trigger an event, but it should not be excessively high. A detailed derivation will be provided below.
Suppose the fluorescence efficiency of a single fluorescent molecule is fixed at , where fluorescence efficiency refers to the ratio of the number of fluorescent photons emitted by the fluorophore to the number of excitation photons absorbed, and the density of fluorescent molecules varies across different structural regions within a biological sample. Such variations are essential for revealing the structural information of the sample in microscopic imaging. For the imaging sensor plane, let denote the density of fluorescent molecules in the sample, which is then imaged at the pixel position . The primary objective of fluorescence imaging is to determine . Under an excitation light intensity of , the fluorescence intensity detected by the imaging sensor plane in traditional imaging can be expressed as where represents the fluorescence intensity at the pixel position on the sensor plane. Since and are typically constant within a given scene, is directly proportional to . Therefore, the intensity image recorded by the sensor effectively reflects the density of fluorescent molecules, which corresponds to the structural information of the sample. However, in event cameras, the absence of direct intensity recording makes it challenging to reconstruct high-quality structural information of the sample.
In IEIM, we introduce modulation of the excitation light intensity so that during imaging, the light intensity no longer remains constant at but varies between zero and in a pulsed manner. When the light intensity abruptly shifts from zero to , the sensor detects a continuous increase in intensity due to the bandwidth limitations of the photodiode and the fluorescence delay time, rather than an instantaneous change. Since the intensity change at the detection end is equivalent to the change at the excitation end, we can incorporate the light intensity variation into the equation, and express it as follows: where represents the excitation intensity that continuously increases over time with a rate and denotes the fluorescence intensity at the pixel position on the sensor plane at time . In event cameras, this intensity information is not recorded directly. According to Eq. (1), we can provide a circuit-level explanation for IEIM as follows: where represents the capacitance of the photodiode, and . When the value of is kept as small as possible, approaching zero, Eq. (5) can be simplified to
In the event stream, and correspond to adjacent event timestamps and at pixel position , where denotes the timestamp of the -th event at pixel position and denotes the timestamp of the -th event at pixel position . Substituting and into , Eq. (7) can be further expressed as
It can be further written as
Since , , , and are constants, the density of fluorescent molecules is inversely proportional to the time interval between adjacent events at position . Thus, the time interval between adjacent events can be used to represent the structural information of the sample.
According to the assumptions underlying Eq. (6), IEIM requires the light intensity to fluctuate at high frequencies near zero. When the light intensity is relatively high, that is, , Eq. (5) is no longer approximately equal to Eq. (6) but instead approximates the following expression:
Substituting Eq. (4) into Eq. (10), it can be rewritten as
According to Eq. (11), the event data in this scenario no longer correlates with the density of fluorescent molecules within the sample, but instead conforms to a fixed event generation pattern. Therefore, in fluorescence microscopy, to directly extract structural information from the event stream, it is essential to modulate the excitation light intensity with high-frequency fluctuations around zero.
3. EXPERIMENTS
A. Simulated Data
To validate the theoretical performance of the IEIM method, we collected both static and dynamic fluorescence images. The static images included actin, membrane, mitochondria, and nucleus images from Ref. [32], while the dynamic images included Byn protein, sourced from Ref. [33]. These datasets were used to simulate time series modulated by pulsed light, which were then fed into the event simulator DVS-Voltmeter [34] to generate synthetic event streams. Since methods for network reconstruction are typically based only on events generated by motion, we also generated a time series by periodically shaking the images without light modulation. These sequences were similarly input into the DVS-Voltmeter to generate event streams for network reconstruction.
B. Performance Evaluation of the IEIM in Static and Dynamic Scenes Using Simulated Data
We have successfully achieved event-based fluorescence microscopy for both static and dynamic scenes, a capability that currently has no equivalent in microscopy. In contrast, several macroscopic methods, such as E2VID [18], E2VID+ [15], FireNet+ [15], and ET-Net [20], can convert motion-generated events to intensity. Therefore, we compared our method with these approaches using synthesized data. For the comparison, we utilized synthesized data without light modulation and employed pre-trained public models. The quantitative comparison results are presented in Table 1, demonstrating that our method significantly outperforms others, yielding results closest to the original images. Our method achieved an improvement of over 97% in MSE, at least 20% in LPIPS, and more than 14% in SSIM. The relatively lower improvement in the SSIM metric can be attributed to the intrinsic background signal filtering characteristic of IEIM, thereby limiting the extent of enhancement. However, background signals in fluorescence microscopy are typically regarded as noise, making this filtering beneficial for fluorescence microscopy.
Comparison of Methods on Different Metricsa
E2VID
E2VID+
ET-Net
FireNet+
IEIM (Ours)
Data
MSE
SSIM
LPIPS
MSE
SSIM
LPIPS
MSE
SSIM
LPIPS
MSE
SSIM
LPIPS
MSE
SSIM
LPIPS
mem(norm_s)
0.224
0.072
0.607
0.153
0.059
0.689
0.236
0.050
0.748
0.247
0.053
0.779
0.001
0.888
0.152
mit(norm_s)
0.199
0.182
0.550
0.191
0.153
0.630
0.192
0.160
0.617
0.218
0.145
0.603
0.002
0.838
0.139
act(ivert_s)
0.153
0.652
0.475
0.249
0.583
0.541
0.160
0.645
0.473
0.141
0.593
0.486
0.003
0.907
0.162
nuc(ivert_s)
0.177
0.652
0.450
0.243
0.604
0.560
0.141
0.691
0.496
0.177
0.642
0.504
0.003
0.748
0.343
byn(norm_d)
0.514
0.037
0.811
0.245
0.035
0.653
0.323
0.048
0.837
0.654
0.037
0.665
0.007
0.915
0.106
The best-performing result in each group is highlighted in bold; “mem” stands for the first three letters of the sample name, “norm” indicates normal images, “ivert” refers to inverted color images, “s” denotes static scenes, and “d” signifies dynamic scenes. For example, “mem(norm_s)” denotes a normal static image of the sample “membrane.”
Qualitative comparison results on synthesized data are shown in Figs. 2 and 3, including static and dynamic scenes, respectively. In both scenarios, our method achieved the most accurate grayscale reconstruction. Additionally, due to the differences between macroscopic and fluorescence microscopy techniques, we applied a color reverse to the fluorescence images to make them more suitable for comparison with SOTA models. The results, as seen in the actin and nucleus samples in Fig. 2, further demonstrate that our method significantly outperforms SOTA methods. Current SOTA methods exhibit intensity distortions in the results, with the most critical issue being the completely inaccurate reconstruction of background intensity. We further compared the reconstruction results of state-of-the-art methods directly on the modulated data, which can be found in Appendix B.
Figure 2.Comparison in synthesized static scenes. (a) Qualitative comparison results on normal synthesized data: membrane and mitochondria. (b) Qualitative comparison results on color-reversed synthesized data: actin and nucleus. Our method significantly outperforms SOTA methods.
Figure 3.Comparison in synthesized motion scenes. Our method significantly outperforms SOTA methods. The red arrow in the GT image points to the area with significant motion. Scale bar, 5 μm.
C. IEIM Enables Event-Based Fluorescence Microscopy in Static Scenes
To evaluate the performance of the IEIM method on real-world data, we collected event streams from the sample under varying modulation frequencies and light powers. To compare with SOTA methods, we also collected event streams relying solely on slow sample movement without light modulation for model reconstruction. The models used for comparison were the publicly available pre-trained models. The qualitative results are shown in Fig. 4. As the modulation frequency increases while the light power remains constant, the imaging details of IEIM gradually decrease, as depicted in the first row of Fig. 4, where the details in the zoomed-in images are progressively lost. This is primarily due to the event camera’s sampling capability being limited by light power. When the light power is fixed, increasing the frequency shortens the exposure time, leading to insufficient data capture. This phenomenon is similar to the information loss in traditional cameras when the exposure time is too short. However, this issue can be mitigated by increasing the light power. As shown in the second row of Fig. 4, when the modulation frequency is fixed and the light power is gradually increased, we observe that the imaging details increase and the dynamic range is also enhanced. This implies that achieving high-speed imaging requires matching the light power appropriately. Furthermore, when employing SOTA models for reconstruction, as shown in the last row of Fig. 4, the reconstruction quality of our method significantly outperforms these methods, which suffer from notable power distortion issues and completely inaccurate background reconstruction.
Figure 4.Comparison in real-world static scenes. (a) When the power is fixed, the imaging quality varies with the modulation frequency. Excessively high frequencies can lead to a decline in imaging quality, as evident from the zoomed-in details in the second row, indicated by the arrow. (b) When the modulation frequency is fixed, the imaging quality varies with the power. Increasing power can improve imaging quality, but too much power can cause detail loss, as evident from the zoomed-in details in the second row, indicated by the arrow. (c) The reconstruction results of SOTA models on data with a power of 6 mW and a modulation frequency of 500 Hz. The reference image is a region captured by the sCMOS camera that overlaps with the field of view of the event camera. Imaging with IEIM requires appropriate calibration between modulation frequency and light power, and the resulting image quality far exceeds that of SOTA methods. The power in the lower right corner corresponds to the laser power at the output of the objective lens. Scale bar, 2 μm in (a)–(c).
D. IEIM Enables Fluorescence Microscopy in Dynamic Scenes
To evaluate the dynamic imaging capabilities of the IEIM method, we collected event streams from rapidly moving samples by systematically translating the sample stage. We then compared the reconstruction performance of our method with SOTA methods. Due to the limitations of the light power in our acquisition system, we collected event streams from the fast-moving samples at a maximum modulation frequency of 800 Hz. Theoretically, this frequency could be further increased with enhanced light power. Since the events generated by the fast-moving samples in dynamic scenes are similar to the data used in network training, the same data was used for both network reconstruction and our method in this experiment. As shown in Fig. 5, our method achieves a dynamic imaging speed of 800 frames per second in dynamic scenes while maintaining high imaging quality. Compared to the SOTA methods, our method demonstrates superior reconstruction quality. The curve shown in Fig. 5(e) corresponds to the yellow solid line in Figs. 5(a)–5(d). This region contains clear structural edges that are sensitive to reconstruction quality. It can be observed that the grayscale curve reconstructed by IEIM exhibits higher gradients and clearer structural transitions, indicating good edge preservation, whereas other methods show obvious smoothing or structural distortion in this area, resulting in significant detail loss. As shown in Fig. 5(f), the selected position corresponds to the white dashed line in Fig. 5(a), recording the intensity response of this region over time. This location lies on the trajectory of a moving sample. The curve obtained by IEIM clearly reflects the brightness changes during the sample’s passage, demonstrating good temporal resolution and consistent variation trends. In contrast, comparison methods exhibit discontinuous signals, and fail to accurately characterize the dynamic process. Additionally, we compared the reconstruction performance of the network methods at different temporal resolutions. The results show that in dynamic scenes, increasing the reconstruction’s temporal resolution significantly reduces motion blur, but this improvement comes at the cost of losing many details. In fluorescence microscopy, the observed samples typically do not exhibit uniform global motion. Typically, only a small portion of the structure is in rapid motion, while other areas remain static or move slowly. In such cases, network reconstruction methods present a trade-off between temporal resolution and reconstruction detail. In contrast, our method enables dynamic imaging with high global temporal resolution, effectively addressing this challenge.
Figure 5.Comparison in real-world motion scenes. (a) Dynamic imaging results of IEIM at a modulation frequency of 800 Hz and a light power of 9 mW. The second row of images shows detailed views of the white boxed areas in the first row of images, with the dashed lines indicating fixed positions. (b) Reconstruction results of E2VID at a temporal resolution of 1.25 ms. (c) Reconstruction results of E2VID at a temporal resolution of 0.62 ms. (d) Reconstruction results of E2VID at a temporal resolution of 0.31 ms. Our method achieves high-quality imaging with high temporal resolution without the trade-off between temporal resolution and reconstruction detail seen in methods like E2VID. The values in parentheses indicate the temporal resolution of the reconstruction, and the arrow points to the significant difference caused by the variation in reconstruction time resolution. (e) Intensity variation curve along the yellow solid line in (a)–(d). (f) Intensity variation over time at the white dashed line position in (a). Scale bar, 2 μm in (a)–(d).
E. IEIM Achieves Dynamic Fluorescence Imaging of Euglenae
To further validate the performance of IEIM on in vivo samples, we captured the motion of microorganisms using the system described in Section 2.A. At a modulation frequency of 500 Hz, we recorded event streams of common freshwater euglenae movement and applied IEIM for image reconstruction. The results are shown in Fig. 6, which illustrates the motion sequence within 60 ms at different temporal resolutions (6 ms, 4 ms, and 2 ms). In this scenario, the euglenae exhibited heterogeneous motion: some remained relatively static, while others moved rapidly. Compared with deep-learning-based approaches, which struggled to maintain reconstruction quality across both slow and fast motion regimes, as discussed in Section 3.D, IEIM successfully achieved high-speed imaging of euglenae movement while preserving the integrity of static spatial details. We also compared the reconstruction results of state-of-the-art methods, which can be found in Appendix D.
Figure 6.Dynamic imaging of common freshwater euglenae. The dynamic processes are presented at three different temporal resolutions, with a minimum of 2 ms, demonstrating that IEIM can effectively image dynamic processes. The images in the second row are magnified views of the regions enclosed by the white boxes in the first row, with dashed lines serving as reference markers to provide a more intuitive visualization of the motion.
F. Modulation in IEIM Enables High-Quality Event-Based Fluorescence Microscopy
To validate the effectiveness of our modulation method, we directly applied our proposed reconstruction method to both non-modulated and modulated light intensity data, including both synthesized and real-world data. The results are shown in Fig. 7, where it is evident that only the data with light modulation can achieve high-quality reconstruction. This is primarily because our proposed reconstruction method relies on the light intensity starting from zero and then varying in a pulsed manner. However, in the events generated by motion, the initial position already contains intensity, which does not satisfy the assumption in Eq. (6) that the intensity approaches zero. As a result, the reconstruction deviates from the expected outcome.
Figure 7.Comparison experiments with and without the modulation device. p.d.: pulsed modulation device. (a)–(c) Results in synthesized data. (d)–(f) Results in real-world data. The reference image is a region captured by the sCMOS camera that overlaps with the field of view of the event camera. Scale bar, 5 μm.
In this paper, we have established the first imaging framework for event-based spatiotemporal fluorescence microscopy, named inter-event interval microscopy (IEIM), achieving both static and dynamic imaging through fundamental reengineering of both acquisition physics and a reconstruction algorithm. Our proposed IEIM incorporates three key innovations: (1) a novel event-driven fluorescence imaging architecture integrating a synchronized pulsed modulation device in the excitation pathway, enabling precise temporal control of illumination intensity; (2) an advanced temporal reconstruction algorithm based on inter-event time intervals achieving high-dynamic-range imaging with microsecond-level temporal resolution, eliminating the need for extensive training datasets required by supervised-learning-based reconstruction approaches; (3) a unified event-based computational framework that preserves the event camera’s high-temporal-resolution capture of dynamic biological processes while enabling static structure recording—capabilities that conventional event-driven imaging paradigms have long struggled to achieve. Experiments on both static and dynamic data demonstrate that the pulsed modulation strategy significantly enhances the imaging performance of the event camera. This strategy enables both static and dynamic fluorescence imaging with a fixed event camera. Furthermore, experiments on both simulated and real-world data demonstrate the state-of-the-art performance of IEIM, and its ability to achieve high-speed, high-dynamic imaging at 800 Hz. Our IEIM greatly expands the potential imaging scenarios for event cameras. Future research will focus on collecting imaging data under varying modulation frequencies and powers, establishing the relationship between imaging quality and these parameters, and predicting the optimal imaging parameters for different samples.
Despite its advantages, the performance of IEIM is subject to several practical constraints. A key limitation lies in the trade-off between modulation power and frequency: increasing imaging speed via higher modulation frequency requires a proportional rise in modulation power to avoid signal degradation or information loss. However, this balance is constrained by the fluorescence lifetime, which sets a physical limit on how fast modulation can occur without compromising signal integrity. Additionally, the event camera’s data throughput poses an upper bound to the imaging rate, particularly for dense samples, where excessive event rates may lead to system instability or failure. These factors necessitate careful calibration of system parameters based on the sample characteristics and hardware capabilities.
APPENDIX A: SAMPLE PREPARATION AND MOUNTING
1. Mice Brain Tissues
The collection of mice brain tissues used in this study was approved by the Institutional Animal Care and Use Committee of the Westlake University (Approval No. 24-136-ZYD). To obtain slices with a fixed thickness, adult Thy1-eGFP-M mice [Tg(Thy1-EGFP)MJrs/J, Stock No. 007788, The Jackson Laboratory] were deeply anaesthetized by an intraperitoneal injection of pentobarbital (8 μL/g) and transcardially perfused with phosphate buffered saline (PBS) and 4% paraformaldehyde (PFA). The brains were removed and post-fixed in PFA for 24 h after which the tissue was stored in . 50-μm-thick coronal slices were cut on a vibratome (VT1200S, Leica Microsystems). The slices were washed three times, for 10 min each time, in with a gentle shake. Then, slices were incubated in blocking buffer [3% BSA (Jackson ImmunoResearch, Catalog No. 001000-162) and 0.2% Triton X-100 in ] for 3 h at room temperature. The blocked slices were incubated with rabbit anti-GFP antibody [A11122, Thermo Fisher; diluted to 1:500 in dilution buffer (1% BSA and 0.2% Triton X-100 in )] at 4°C for one day. After washing three times in the wash buffer (0.1% Triton X-100 in ), the slices were incubated with goat anti-rabbit Alexa Fluor 647 (A21245, Thermo Fisher; diluted to 1:500 in dilution buffer) at 4°C overnight. Following another three washes with wash buffer, slices were securely affixed to pre-cleaned high-precision coverslips, and PBS-immersed samples were subsequently mounted in a custom-designed sample holder. The fixed samples were excited by a 642 nm laser and imaged using a UPLAPO100XOHR oil immersion objective.
2. Euglenae
Common freshwater euglenae were cultured in water using open containers, with continuous illumination provided and ambient temperature maintained above 24°C. Before imaging, 1 mL of euglenae-containing liquid was collected and subjected to centrifugation. Following centrifugation, euglenae at the bottom of the tube were resuspended in 100 μL deionized water through repeated pipette mixing. The homogeneous suspension was then drop-cast onto pre-cleaned high-precision coverslips. These mounted living samples were excited by a 488 nm laser and imaged with a LUCPlanFLN air objective.
APPENDIX B: DIRECT COMPARISON ON MODULATED EVENT
To further compare the performance of our method with state-of-the-art methods, such as E2VID [18], E2VID+ [15], FireNet+ [15], and ET-Net [20], we also conducted additional tests directly on the light-modulated data using open-source pre-trained models. The results on synthetic data are shown in Fig. 8, and the results on real data are shown in Fig. 9. The reconstruction results of these networks are similar to those obtained by shaking, exhibiting noticeable issues such as intensity distortion and loss of details. In contrast, our method significantly outperforms the state-of-the-art methods in terms of reconstruction quality.
Figure 8.Comparison of our method and existing network-based methods on synthesized static data with light modulation.
To further eliminate the potential influence of pre-training data on the models, we retrained the SSL-E2VID [22] and SPADE-E2VID [14] using light-modulated data. The comparison of the original network, the retrained network, and our method is presented in Fig. 10. The results indicate that although the retrained SSL-E2VID network performs slightly better than the original network and reconstructs more details, the reconstructed intensity still exhibits significant distortion. In contrast, the retrained SPADE-E2VID shows substantial enhancement compared to its pretrained counterpart under the same conditions. Nevertheless, it remains inferior to IEIM in terms of structural fidelity, edge preservation, and robustness against noise. Overall, our method continues to deliver the best performance, closely resembling the original images.
Figure 10.Comparison of the original network, the retrained network, and our method.
APPENDIX D: COMPARISON OF DIFFERENT METHODS ON IMAGING THE MOVEMENTS OF IN VIVO FRESHWATER EUGLENAE
We used IEIM to capture event streams of common freshwater euglenae movements and compared it with state-of-the-art methods, including E2VID [18], E2VID+ [15], SPADE-E2VID [14], and SSL-E2VID [22]. We first applied the open-source models directly to reconstruct the modulated data and also retrained SSL-E2VID and SPADE-E2VID on the modulated data before using them for reconstruction. As shown in Fig. 11, the default open-source models struggled with our modulated data, leading to severe intensity distortion. As shown in Fig. 11(b), the curve corresponds to the yellow solid line crossing the region in Fig. 11(a) that contains a typical freshwater euglenae structure—the contractile vacuole. The IEIM method accurately restores the intensity fluctuations of this microstructure, presenting continuous and clear intensity variations; other methods fail to effectively preserve this structural feature, displaying considerable blurring or signal attenuation. Similarly, Fig. 11(c) presents the temporal intensity curve for the red dashed line position in Fig. 11(a), which records two consecutive passages of the tail of freshwater euglenae through the line position. The IEIM method fully preserves the details of these two signal changes with a high signal-to-noise ratio and continuous response, while other methods show obvious reconstruction failure in signal capture. Although the retrained model significantly improved reconstruction quality and restored more details compared to the original version, it still exhibited intensity distortion. In contrast, IEIM achieves superior reconstruction quality.
Figure 11.Comparison of different methods on dynamic imaging of in vivo freshwater euglenae. (a) Reconstruction motion sequences spanning over 60 ms at temporal resolutions of 6 ms, 4 ms, and 2 ms. The arrow points to the fastest-moving part. (b) Intensity variation curve along the yellow solid line in (a). (c) Intensity variation over time at the red dashed line position in (a). Scale bar, 5 μm.
APPENDIX E: COMPARING SPATIAL RESOLUTION AND SENSITIVITY WITH DAVIS CAMERAS
The DAVIS camera is a hybrid sensor that combines grayscale imaging and event detection within each pixel. While this enables simultaneous frame and event capture, it introduces two main limitations: (1) shared pixel resources reduce event detection precision, as the dual-function design compromises sensitivity and temporal resolution compared to dedicated event cameras; (2) limited spatial resolution, constrained by hardware design (e.g., DAVIS346 has only ), makes it unsuitable for high-resolution imaging tasks.
In contrast, the Prophesee event camera used in IEIM features a dedicated event-based architecture with optimized pixel design, offering superior temporal resolution (microsecond-level), higher spatial resolution (up to ), and greater photon sensitivity.
To compare their performance, we built a synchronized acquisition system using a 50:50 beam splitter to mount both sensors on a microscope imaging the same field. As shown in Fig. 12, under a 30 ms accumulation window, the Prophesee camera captures sharper structures and finer edges [Fig. 12(a)], while DAVIS yields blurrier results with missing details [Fig. 12(b)]. Over a 1 s interval, Prophesee camera also produces a denser and more uniform event distribution [Fig. 12(c)], demonstrating its superior responsiveness and lower noise. Overall, the Prophesee camera outperforms DAVIS in both design and practical imaging quality. When combined with IEIM, it enables high-fidelity imaging in both static and dynamic scenarios.
Figure 12.Comparison of imaging performance between DAVIS346 and Prophesee EVK4 cameras. (a) Event accumulation map from the Prophesee camera over a 30 ms time window. (b) Event map and grayscale image from the DAVIS camera under the same conditions. (c) Event density maps generated by both cameras over a 1 s duration. The lower-left corners of (a) and (b) indicate the image resolution.
Assume the total number of events is , the spatial resolution of the event camera is , and the total temporal range is , with event timestamps falling within the interval . According to the IEIM processing pipeline, the events are first divided into frames based on fixed temporal intervals:
Each event is then assigned to a frame by computing its corresponding index: which has a computational complexity of .
Subsequently, for each frame and each pixel, the temporal interval between consecutive events must be calculated. This involves searching for the nearest previous and next events per pixel location, which is efficiently implemented via a binary search. The complexity of this step is , where is the number of events occurring at a given pixel position across frames, and in the worst case .
Before a binary search can be performed, a preprocessing step is required to collect and organize the events for each pixel across all frames, which has a complexity of . Therefore, the overall time complexity of the IEIM algorithm is
In practice, the computation of inter-event intervals for each pixel can be parallelized across frames and pixels. If all pixels in each frame are processed in parallel, the effective complexity can be significantly reduced to . Since the number of events within a single modulation cycle is typically not large, real-time reconstruction is potentially achievable through parallel acceleration.
[3] B. Xiong, C. Su, Z. Lin. Real-time parameter evaluation of high-speed microfluidic droplets using continuous spike streams. Proceedings of the 32nd ACM International Conference on Multimedia, 6833-6841(2024).
[5] L. Sun, C. Sakaridis, J. Liang. Event-based fusion for motion deblurring with cross-modal attention. European Conference on Computer Vision, 412-428(2022).
[6] L. Sun, C. Sakaridis, J. Liang. Event-based frame interpolation with ad-hoc deblurring. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18043-18052(2023).
[7] J. Zhang, X. Yang, Y. Fu. Object tracking by jointly exploiting frame and event domain. Proceedings of the IEEE/CVF International Conference on Computer Vision, 13043-13052(2021).
[10] P. Bardow, A. J. Davison, S. Leutenegger. Simultaneous optical flow and intensity estimation from an event camera. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 884-892(2016).
[12] C. Scheerlinck, N. Barnes, R. Mahony. Continuous-time intensity estimation using event cameras. Asian Conference on Computer Vision, 308-324(2018).
[13] H. Rebecq, R. Ranftl, V. Koltun. Events-to-video: bringing modern computer vision to event cameras. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3857-3866(2019).
[15] T. Stoffregen, C. Scheerlinck, D. Scaramuzza. Reducing the sim-to-real gap for event cameras. Computer Vision–ECCV 2020: 16th European Conference, 534-549(2020).
[16] Q. Liang, X. Zheng, K. Huang. Event-diffusion: event-based image reconstruction and restoration with diffusion models. Proceedings of the 31st ACM International Conference on Multimedia, 3837-3846(2023).
[17] H. Rebecq, D. Gehrig, D. Scaramuzza. ESIM: an open event camera simulator. Conference on Robot Learning, 969-982(2018).
[19] S. Zhang, Y. Zhang, Z. Jiang. Learning to see in the dark with events. Computer Vision–ECCV 2020: 16th European Conference, 666-682(2020).
[20] W. Weng, Y. Zhang, Z. Xiong. Event-based video reconstruction using transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2563-2572(2021).
[21] L. Wang, Y.-S. Ho, K.-J. Yoon. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10081-10090(2019).
[22] F. Paredes-Vallés, G. C. De Croon. Back to event basics: self-supervised learning of image reconstruction for event cameras via photometric constancy. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3446-3455(2021).
[34] S. Lin, Y. Ma, Z. Guo. DVS-voltmeter: stochastic process-based event simulator for dynamic vision sensors. European Conference on Computer Vision, 578-593(2022).