
Single-shot ultrahigh-speed mapping photography is essential for analyzing fast dynamic processes across various scientific disciplines. Among available techniques, optical diffraction has recently been implemented as a nanosecond time gate for mapping photography. Despite attractive features in light throughput and cost efficiency, existing systems in this approach can sense only light intensity with limited sequence depth and imaging speed. To overcome these limitations, we develop diffraction-gated real-time ultrahigh-speed mapping schlieren (DRUMS) photography. Using a digital micromirror device as a coded dynamic two-dimensional blazed grating, DRUMS photography can record schlieren images of transient events in real time at an imaging speed of 9.8 million frames per second and a sequence depth of 13 frames. We present the working principle of DRUMS photography in both theoretical derivation and numerical simulation, and we apply DRUMS photography to the single-shot real-time video recording of laser-induced breakdown in water.
Three-dimensional (3D) integral imaging (InIm) is an active research topic in underwater optical imaging and sensing systems. 3D InIm is a passive method to visualize the 3D information of a scene. This method is attractive for imaging in degraded environments and can benefit short to long range applications. In this paper, we present an overview of previously published works on underwater optical imaging and sensing systems showcasing improved visualization and detection capabilities in turbid water using 3D InIm. A set of image sensors capturing images or video sequences were digitally reconstructed using a multidimensional InIm computational reconstruction algorithm for tasks such as visualizing the 3D information of a scene at a particular depth, signal detection, or decreasing the effect of turbidity and partial occlusion. 3D InIm sensing and dedicated computational approaches such as statistical image processing, correlation-based filtering, neural networks, and deep learning make it possible to improve the recovery of information and detection of scenes degraded by turbidity and partial occlusion.
With advancements in artificial intelligence, face recognition technology has significantly improved in accuracy and reliability, yet concerns over privacy and data security persist. Currently, methods for addressing privacy issues focus on software and hardware levels, facing challenges in system power consumption, computational complexity, and recognition performance. We propose a novel privacy-preserving face recognition system that safeguards privacy at the optical level before signals reach the sensor. This approach employs a mask-encoded microlens array for optical convolution, effectively protecting privacy while enabling feature extraction for face recognition with a backend electronic neural network. Based on this passive optical convolution implementation under incoherent illumination, our system achieves superior spatial resolution, enhances light throughput, and improves recognition performance. An end-to-end training strategy with a dual cosine similarity-based loss function balances privacy protection and recognition performance. Our system demonstrates a recognition accuracy of 95.0% in simulation and 92.3% in physical experiments, validating its effectiveness and practical applicability.
Wavefront shaping enables the transformation of disordered speckles into ordered optical foci through active modulation, offering a promising approach for optical imaging and information delivery. However, practical implementation faces significant challenges, particularly due to the dynamic variation of speckles over time, which necessitates the development of fast wavefront shaping systems. This study presents a coded self-referencing wavefront shaping system capable of fast wavefront measurement and control. By encoding both signal and reference lights within a single beam to probe complex media, this method addresses key limitations of previous approaches, such as interference noise in interferometric holography, loss of controllable elements in coaxial interferometry, and the computational burden of non-holographic phase retrieval. Experimentally, we demonstrated optical focusing through complex media, including unfixed multimode fibers and stacked ground glass diffusers. The system achieved runtime of 21.90 and 76.26 ms for 256 and 1024 controllable elements with full-field modulation, respectively, with corresponding average mode time of 85.54 and 74.47 µs—pushing the system to its hardware limits. The system’s robustness against dynamic scattering was further demonstrated by focusing light through moving diffusers with the correlation time as short as 21 ms. These results emphasize the potential of this system for real-time applications in optical imaging, communication, and sensing, particularly in complex and dynamic scattering environments.
Describing a scene in language is a challenging multi-modal task as it requires understanding various and complex scenes, and then transforming them into sentences. Among these scenes, the task of video captioning (VC) has attracted much attention from researchers. For machines, traditional VC follows the “imaging-compression-decoding-and-then-captioning” pipeline, where compression is a pivot for storage and transmission. However, in such a pipeline, some potential shortcomings are inevitable, i.e., information redundancy resulting in low efficiency and information loss during the sampling process for captioning. To address these problems, in this paper, we propose a novel VC pipeline to generate captions directly from the compressed measurement, captured by a snapshot compressive sensing camera, and we dub our model SnapCap. To be more specific, benefiting from signal simulation, we have access to abundant measurement-video-annotation data pairs for our model. Besides, to better extract language-related visual representations from the compressed measurement, we propose to distill knowledge from videos via a pretrained model, contrastive language-image pretraining (CLIP), with plentiful language-vision associations to guide the learning of our SnapCap. To demonstrate the effectiveness of SnapCap, we conduct experiments on three widely used VC datasets. Both the qualitative and quantitative results verify the superiority of our pipeline over conventional VC pipelines.