Advanced Imaging, Volume. 1, Issue 1, 012001(2024)

Future-proof imaging: computational imaging

Jinpeng Liu1,2、†, Yi Feng1, Yuzhi Wang1, Juncheng Liu1, Feiyan Zhou1, Wenguang Xiang1, Yuhan Zhang1, Haodong Yang1, Chang Cai1, Fei Liu1,2、*, and Xiaopeng Shao3、*
Author Affiliations
  • 1School of Optoelectronic Engineering, Xidian University, Xi’an, China
  • 2Xi’an Key Laboratory of Computational Imaging, Xi’an, China
  • 3Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, China
  • show less

    Computational imaging overcomes traditional optical imaging limitations by incorporating encoding and decoding. It represents a paradigm shift in imaging technology, leveraging the manipulation and interpretation of the light field to extract richer information than that was previously attainable. This review explores the emergence and development history of computational imaging. By analyzing the essence of computational imaging from the perspective of the light field, this review categorizes the entire technological roadmap of the computational imaging field based on the imaging framework. This review can serve as a reference for researchers, producers, and policymakers on the main trends, frontiers, and future directions of computational imaging.

    Keywords

    1. Introduction

    Curiosity serves as an intrinsic driving force behind the advancement of human society. We continuously explore the external world and the micro-depths of our bodies. Just as infants instinctively seek out brightness, humanity relies on light as its most vital sensing tool. The benefits of using light include the ability to observe stars from hundreds of millions of light-years away with a telescope or nanoscaled viruses with microscopy. Light carries considerable information about the physical world; it naturally has multiple physical dimensions of intensity, spectrum, polarization, time, phase, and others. Optical imaging’s ability to capture and process information using light provides numerous advantages across various fields. Its high resolution enables the detection of minute details with precision. Optical imaging also provides real-time imaging capabilities, enabling dynamic observations and analyses. Its non-invasive nature makes it suitable for studying living organisms and delicate materials without causing damage. Optical imaging techniques are powerful tools for various applications, including photography, entertainment, academic research, industry production, and healthcare.

    Optical imaging can be traced back to the Spring and Autumn Period in ancient China. Mozi (468–376 BC) recorded phenomena such as the straight-line propagation of light, the formation of shadows, and pinhole imaging in Mo Jing. Progress in optics was gradual until the late 15th and early 16th centuries when the establishment of reflection and refraction laws led to the development of precise optical components such as lenses and mirrors, giving rise to telescopes and microscopes and advancing astronomy, navigation, and biology. The studies of J. Kepler, W. Snell, R. Descartes, and P. de Fermat in the mid-17th century laid the groundwork for geometric optics. As the foundational framework for optical design, it has been continuously used to this day, evident in various imaging devices such as smartphone cameras, satellite remote sensing, and astronomical telescopes[1]. However, geometric optics-based imaging ability has also been limited by these principles. Typically, to improve imaging resolution, one must increase the aperture of the optical system; however, the aperture cannot be infinitely expanded. Additionally, only an intensity two-dimensional (2D) picture of the scene within the depth of field (DOF) can be obtained; the original depth information is lost. Advances in information theory, computer science, and optical modulation have raised questions about the optimality of current optical imaging methods and the need for new imaging models. Computer science researchers took the first step earlier than optics researchers. In computer vision, various image enhancement, segmentation, and reconstruction algorithms are applied to improve traditional optical imaging quality for ease of computer processing, analysis, and display. In this process, the imaging results obtained through “recreation” were far superior to direct optical imaging results. For example, image detail resolution improved, objects were reconstructed in three dimensions, and previously almost invisible details were observed. These three points have each evolved into forefront technology areas: super-resolution imaging, 3D imaging, and scattering imaging.

    In the 1990s, computational imaging emerged as a transformative paradigm, revolutionizing our approach to capturing, processing, and interpreting visual information. The burden of imaging was no longer borne solely by the physical optical system. Front-end optics and post-detection signal processing are jointly designed and optimized according to information transmission rather than energy. Utilizing computational imaging, one can design a system to obtain optical measurements from which images can be derived with information content surpassing the physical limits of traditional optics. The transformative potential of computational imaging reverberates across a myriad of domains, from biomedical imaging and remote sensing to augmented reality and autonomous driving. Moreover, computational imaging holds promise for addressing societal challenges such as environmental monitoring, disaster response, and cultural heritage preservation. In this study, we review the rapidly developing field of computational imaging (CI). The entire computational imaging field is sorted and classified according to a pivotal process. We define computational imaging that delineates a unitive concept from optically closely related disciplines such as photography, astronomy, machine vision, and image processing. We follow this in Sec. 2 with a brief history of optical imaging up to now. We outline the essentials of computational imaging from the perspective of light fields in Sec. 3. We begin with the relationship between light field and imaging, consider the physical meanings of different light field projections, and end with a dimensionality augmentation theory of computational imaging. According to the imaging frameworks, the existing computational imaging methods are classified as computational illumination, the medium, the optical system, the detector, and processing. In each section, we present examples to illustrate the imaging principle and applications. The examples we have curated aim to be representative rather than exhaustive. We conclude by summarizing the advantages and disadvantages of computational imaging technology and providing insights into its future development directions.

    2. Development of Optical Imaging

    The earliest written records of light can be traced back to the Western Zhou Dynasty (11th century BC to about 770 BC) when ancient people knew how to use bronze mirrors to focus light and make fire. Although it was based on the principle of focusing light, humans at the time did not understand its essence. It was not until the discovery of electromagnetic waves thousands of years later that the mystery of light was truly solved. Throughout the history of the development of optical imaging (Fig. 1), we can divide it into two aspects: One is the manipulation of light, which mainly describes the use of lenses to change the direction of light propagation (refraction domain reflection), polarization state, etc., to achieve imaging magnification, microscopy, distortion correction, focal length adjustment, and other functions. These belong to geometric imaging. The other is the perception of light. We know that the amplitude, frequency, phase, and other physical characteristics of light contain a wealth of information that can only be used if it is detected. If this information is recorded, extracted, and interpreted, it can significantly expand the function of the human eye so that the world we observe is no longer limited to the visible light band. The development of the field of optical imaging is based on the development of light manipulation and perception. The manipulation of light renders us no longer trapped under the influence of aberrations brought on by imaging devices; we can selectively obtain the required light. The perception of light enables us to record and interpret the physical information of the light field. These two aspects complement each other and jointly promote the development of computational imaging, which will be described in the sections that follow.

    Development of optical imaging.

    Figure 1.Development of optical imaging.

    2.1. History of light manipulation

    In our daily lives, we use glasses to correct nearsightedness or farsightedness and magnifying glasses to see distant objects. These applications involve using lenses to manipulate the direction of light to meet the imaging needs of focusing, magnification, and microscopy. The earliest history of the manipulation of light can be traced back to the Spring and Autumn and the Warring States Period. The founder of the Mohist school, Mo Zhai (about 468 BC to about 376 BC), recorded many phenomena and laws related to light, such as the linear propagation of light, reflection, and refraction, in the book Mo Jing. Among them, the most famous is the discussion on pinhole imaging, which uses the idea of light propagating along a straight line to manipulate light through the pinhole and obtain the image of an object. This is also the earliest study and discussion on pinhole imaging.

    Burning glass is mentioned in Aristophanes’ play Clouds (424 BC); crystal lenses were not made until the 11th and 12th centuries. In 1260–1290, Italian scientists made glasses from crystal stone, rose stone, and topaz, and lenses gradually became prevalent. As the law of refraction was further explored, the microscope and telescope were invented by opticians Yas Jansen and Galileo Galilei in the late 16th and mid-17th centuries, and people began to explore smaller and more distant worlds. Kepler compiled the existing knowledge on optics in 1611 and published his book Folding Optics. He proposed the illuminance law with a point light source and proposed that the square of the distance from the illuminated surface to the light source is inversely proportional. He also designed several new telescopes, particularly the Kepler astronomical telescope, composed of two convex lenses. He also found that, when light enters the interface at a small angle, the angle of incidence and the angle of refraction are approximately proportional. The exact formula for the law of refraction was proposed by Snell (1591–1626) and Descartes (1596–1650). In 1621, Snell showed in an unpublished article that the ratio of the cosecant of the angle of incidence to the cosecant of the angle of refraction was a constant, whereas Descartes gave the now-familiar law of refraction in terms of a sine function in his Optometry (1637). Then Fermat (1601–1665) in 1657 first pointed out the principle of taking the extreme value of the distance traveled by the light propagating in the medium. The reflection and refraction laws of light were introduced using this principle. In summary, by the middle of the 17th century, the foundation of optics as we know it was laid.

    The British scientist Isaac Newton proposed the theory of dispersion at the end of the 17th century, revealing the phenomenon that light will disperse when passing through a prism. The concept of the spectrum gradually became known, and people entered the color world of multispectrum from the black-and-white world of light and shadow. Although the law of refraction and reflection was used to achieve a higher level of manipulation and use of light, it was still limited to changing the direction of light. In the early 19th century, French physicist Agustin Jean Fresnel published his theory on the waves of light. His theory explained the interference and diffraction phenomena of light, and scientists began to focus on the nature and motion of light. Faraday gave a preliminary definition of the light field in 1846. With the concept of the light field proposed, the research perspective of optics moved from one-dimensional (1D) and 2D to multiple dimensions; the study of optics stepped into systematization and standardization. In 2016, based on the concept of the metasurface, Professor Federico Capasso’s[2] team invented the first hyperlens, which can flexibly and accurately regulate the phase, polarization, and amplitude of light, rendering light field regulation possible.

    2.2. History of light perception

    Although the current technology can control the multidimensional physical quantities of light, without the perception of this information, manipulating light is futile for imaging. The perception of light began with the ability to record the intensity of light wave information, with the concept of imaging through a small hole first entering people’s vision. After many centuries of development, pinhole images were detected and applied; however, the images could only be observed and not recorded, such as in the 15th to 16th century Renaissance period for painting “imaging camera boxes.” It was not until the 16th and 17th centuries when scientists gradually discovered light-sensitive materials capable of responding to different wave bands that they could record the intensity of light. By improving the camera obscura model, in 1816, the Niepse brothers began an experiment in recording images with sensitive materials in France, and the world’s first recognized photograph was taken. In 1886, Eastman developed roll-sensitive film, and the dream of photography technology “facing the masses” finally became a reality. After the world’s first photosensitive color film that could use the two-color printing process came out in 1933, the camera entered the color era, and a variety of cameras and equipment emerged in an endless stream. Although the film made photography accessible to thousands of homes, it was limited by its sensitive material and form and could not be used to process images. However, the discovery of the photoelectric effect gave rise to photoelectric perception technology. With the development of computers, the perception of light entered the digital and information age.

    The photoelectric effect was first discovered in 1887 by the German physicist Heinrich Rudolf Hertz in an experiment to prove the wave theory of light; the phenomenon was not explained until 1905 by Einstein’s hypothesis of relativity and photons, which successfully built a bridge between light and electricity. Application of the photoelectric effect theory led to the invention of the charge-coupled device (CCD)[3,4]. In 1969, scientists Willard Boyle and George Smith at Bell LABS creatively combined the video phone with semiconductor bubble memory technology to invent a device that could transfer electrical charges along the surface of a semiconductor. It was called the charge bubble device (CBD). The device collected the charge generated by the photoelectric effect and recorded the image signal; it eventually led to the invention of the CCD. After the advent of the CCD, people realized its significant application value in the imaging field, and the first commercial CCD was released by Fairchild Imaging in 1973 with a resolution of 100 pixel × 100 pixel. In 1975, Kodak released the first complete CCD camera. In 1978, Bayer et al., by adding a color filter array in front of the CCD, rendered the “color blind” CCD color aware, making it the first CCD digital camera single-chip sensor to record color images. CCDs can convert optical images into digital signals, marking the transformation of optical imaging from image recording to image processing. Simultaneously, with the rapid development of computer technology, people began to use computers to store and process digital images. Common methods include image inversion, image enhancement, image segmentation, and other mathematical transformations. However, these methods only process the intensity information of the image and do not consider the phase, spectrum, polarization, or other information of the light wave. Although these methods can improve the visual effect and enhance the boundary details, they only analyze the image from a mathematical point of view and completely ignore the physical process of light propagation. Therefore, achieving 3D imaging, nonvisual imaging, image defogging, and other technologies is challenging.

    In the 1960s, scientists measured the X-ray radiation emitted by celestial bodies. X-rays do not bend through glass, resulting in imaging lenses not being able to observe X-rays. Therefore, researchers introduced coding aperture, and the computational imaging revolution began. In 2003, Mait et al.[5] first used the term “computational imaging.”

    With the rapid development of a new generation of technologies such as multifunctional sensors and increasing information computing power, a new type of computational imaging technology emerged, integrating optics, mathematics, and signal processing, defying the separate characterization of the imaging process by traditional photoelectric imaging technology. Considering illumination, light transmission through mediums, optical systems, imaging detectors, imaging circuits, and displays, optical imaging is described systematically from a global perspective.

    3. Essence of Computational Imaging

    The light field, which captures the intensity of light and its direction, is the core of computational imaging; it enables novel functionalities such as refocusing and depth estimation. This section delves into the theoretical foundations of computational imaging, discussing key principles such as multi-view imaging, compressive sensing, and inverse problem formulation. By elucidating the mathematical underpinnings and physical principles, researchers gain a deeper understanding of the capabilities and limitations of computational imaging systems.

    3.1. Relationship between a light field and computation imaging

    Under ideal conditions, the target light field can be regarded as lossless during transmission. However, owing to the limitations of the lens and detector, part of the original light field will be lost when passing through the imaging system, and only the information left behind will be regarded as the imaging result. The more information passing through the imaging system, the better the imaging effect is. The information flux of an imaging system can be measured by the spatial bandwidth product (SBP), which is the number of pixels that can be resolved in the system’s field of view (FOV). The SBP is mainly limited by two factors: one is the pixel size and number of detectors, and the other is the imaging FOV. The larger the SBP, the richer the information that the system can transmit.

    In a traditional imaging system, people usually improve the imaging FOV to improve the SBP, which is mainly to expand the performance of the imaging lens. For example, simple optical systems rely on increasing the diameter of the lens or making the lens into a curved surface to achieve a large FOV, but owing to complex objective design and difficult manufacturing processes, the diameter of the lens cannot be indefinitely increased. Thus, the FOV can be expanded by utilizing multiple imaging lenses, such as microlens arrays, multi-detector splicing, bionic compound eye imaging, and multi-scale imaging.

    Computational optical imaging is a process of recovering image details and other information according to detected light field information, focusing on improving the SBP by improving the resolution of the system. These applications include super-resolution, synthetic aperture, lensless holographic microscopy, and Fourier lamination imaging. Lensless imaging technology successfully decouples the resolution and the FOV, enabling the system to obtain the same SBP with less hardware and more computational assistance. During imaging, different light field information can be retained in different application scenarios and imaging requirements, a select several required physical quantities can be selected, the maximum optimal principle can be obtained according to the amount of information, each dimension projection can be made orthogonal to each other, and then the required information can be reversed. The maximum information flux of the whole imaging system, that is, the SBP, is limited. Therefore, by discussing the computational imaging paradigm, we can selectively abandon unnecessary information and broaden the information limit to the extent possible under the physical limit.

    Here we define “computational imaging” as an image-forming technique that uses various forms of optical modulation techniques and a significant number of algorithms to capture and process light field information. It differs from current digital image processing or machine vision by globally optimizing front-end (optics) and back-end (electronics) processing to expand the amount of interesting information transmission.

    3.2. Light field projection

    The light field is the carrier of information transmission, which is the beginning of multidimensional information coordination and nonlinear ideas. It can be divided into geometric and physical light fields. From the perspective of physics, the light field can be expressed in terms of particle properties and the wave of light at the same time. Therefore, the light field information described from the physical perspective is more comprehensive; computational imaging interprets the complete light field information using the physical light field projection information. Thus, the projection of all types of “full” light field information in the physical, spatial, and time dimensions can realize polarization imaging, three-dimensional (3D) imaging, spectral imaging, and others. The physical properties of light, such as intensity, phase, spectrum, and polarization in the light field, and the geometric properties, such as propagation direction and space coordinates (x, y, z), constitute the “full” light field information.

    Optical imaging is the projection result of the original light field in different space–time dimensions (Fig. 2). For example, the image is the projection of intensity (color) information on the plane, which was originally established based on human vision, expressing the light flux in each direction of the light source and representing the size of the passing light energy. Video is the projection of intensity (color) information in the time dimension and plane, representing the shape of the light field at different times. The information in the time domain can be used to obtain the unique time-varying characteristics of the light site and is more conducive to further analysis of the light field. Polarization imaging is the projection of intensity (color), polarization degree, and polarization angle on the plane, and the polarization projection expresses the phenomenon that the vibration vector of the transverse wave of the light wave (perpendicular to the propagation direction of the wave) is biased in certain directions, indicating that light is also an electromagnetic wave. The change of the polarization state can deduce the change in material, environment, and other parameters. Phase projection makes further use of the electromagnetic wave properties of light and expresses the position of light waves at a specific moment. Phase projection means that when the phase takes a certain value, it determines that the system is in a certain state. The spectrum is a monochromatic light that can be dispersed into a variety of wavelengths. The red, green, and blue (RGB) space is the first quadrant of the 3D space formed by the nonlinear transformation of the spectral space. The spectral response of different materials to different bands is not the same, such as the intensity of the three primary colors that the human eye can feel, which can be understood as the energy spectrum of light. The light field camera makes an intensity (color) projection on the 3D space. The first study of light is the spatial dimension of light, that is, the geometric properties of light. The spatial information of light represents the most basic direction of light propagation. Research on light spatial information and the invention of lenses, super lenses, and other optical devices has laid the foundation for the development of optical technology.

    Light field projection.

    Figure 2.Light field projection.

    4. Advancements in Computational Imaging

    The physical process of a traditional photoelectric imaging model is linear; the imaging and reconstruction processes are discrete, which produce approximate errors and cannot accurately reflect the complex and changeable imaging process. Image processing is based on real number transformation, and there is the loss of information dimension, which cannot represent the physical information of multiple dimensions in the real imaging process.

    To solve the above problems, a new imaging model is established in computational imaging, which unifies the physical imaging process and image processing and maps the multidimensional regulation of the light field to every link in the imaging link according to the propagation law of information in the light field. For example, the whole light field is compared to a color palette (Fig. 3). Different colors on the palette correspond to different dimensional information (intensity, phase, and spectrum) of the light field. Owing to the limited pigments, we must select different colors to adjust the required colors when painting. In the imaging link, we regulate different optical field dimensions according to the imaging requirements, comprehensively consider the association information between the multidimensional optical field and the light source, medium, optical system design, detector, calculation, and other imaging links, and finally achieve an accurate description of the real physical scene.

    Relationship between light field and imaging link.

    Figure 3.Relationship between light field and imaging link.

    Therefore, by the establishment of a new imaging model and the development of a new detector for the overall physical and information domain, full link optimization and computational imaging technology describe the light source, propagation path, optical system, and processing circuit from a global perspective, breaking the traditional discrete characterization method of photoelectric imaging technology. From the single calculation and independent optimization of traditional imaging links to the full-link imaging design optimization, the limitations of traditional photoelectric imaging can be resolved through multiple channels. The following sections detail the specific techniques used for computational imaging in each link.

    4.1. Computational light source

    Illumination is an indispensable part of the imaging link, which directly or indirectly affects the imaging quality. In the simplest case, a dark environment significantly reduces the sensitivity of human photoreceptor cells, and the surrounding environment can only be recognized clearly by increasing the amount of light. In this case, illumination improves the signal-to-noise ratio (SNR) of the imaging system. Therefore, through proper modulation of the light source, the specific performance indicators of the imaging system can be improved or overcome. A so-called computational light source encodes the space, time, and physical dimensions at the side of the imaging system near the light source. As the number of encoded dimensions increases, the number of degrees of freedom that can be used to modulate the light field increases, which is a key feature of computational imaging, i.e., the dimensionality of the light field can be increased to solve problems that cannot be solved with low dimensionality. Among the physical dimensions, intensity is the basis for all modulation methods, and all modulation methods will eventually affect the intensity at the detector surface. Common modulation methods based on intensity include phase modulation, wavelength modulation, light vector modulation, time modulation, and coherent imaging (Fig. 4).

    4.1.1. Phase modulation

    Phase is a high-dimensional physical quantity that encodes a lot of high-dimensional information, such as position and resolution information. By modulating the phase of the light source, high-dimensional information can be encoded and projected onto lower dimensions, thereby improving the imaging performance. Structured illumination 3D imaging and structured illumination microscopy (SIM) are common phase-modulated imaging techniques. The phase function determines the wavefront structure of the beam; for example, plane and spherical waves are named after the wavefront structure of the beam; therefore, so-called structured light can also be obtained by modulating the phase of the light source. Structured illumination 3D imaging can be used to obtain 3D information about a target by interpreting an encoded structured light pattern modulated by the target. According to the different scenes, the coding of structured light can be divided into speckle coding[6,7], binary coding[8], and phase coding. Phase-coding methods benefit from good adaptability, high data density, and high imaging accuracy; hence, they are widely used. Here we only introduce structured illumination 3D imaging based on phase coding. Structured illumination 3D imaging based on phase coding is usually realized by projecting fringe, so the method is also called fringe projection profilometry. In 1983, Takeda and Mutoh[9] proposed Fourier transform profilometry. They utilized Fourier transform to solve the deformation phase, and the 3D shape of the object can be measured by a single shot. In 1984, Srinivasan et al.[10] proposed a 3D measurement method based on phase-shift interference, also known as phase profilometry. This method uses a series of grating patterns with a determined phase difference as a light source to illuminate a measured object and obtain the phase information. It can achieve accurate measurements even with a rough projected grating and low-density image sensor array. However, the phase is limited to the range of 0−2π, which severely limits the measurement range for the height of 3D objects. Therefore, in 1999, Sansoni et al.[11] combined the phase-shift method with Gray code, as shown in Fig. 5(a). Gray code can improve the height measurement range without affecting the measurement accuracy of the phase-shift method; hence, the two methods are complementary. The combined Gray code and phase-shift method is actually an intensity coding method; therefore, it is easily affected by factors such as ambient light, noise, and surface contrast, which can cause errors when judging the fringe order.

    Computational light source. (a) Light vector modulation: ptychographic iterative engine[59] and Fourier ptychographic microscopy[73]. (b) Phase modulation: structured-light 3D imaging[110] and structured illumination microscopy[111,112]. (c) Coherent imaging: optical coherence tomography[103,329] and holography[109]. (d) Time modulation: coded exposure[330] and time of flight[331]. (e) Wavelength modulation: stochastic optical reconstruction microscopy[48] and synthetic wavelength holography[57].

    Figure 4.Computational light source. (a) Light vector modulation: ptychographic iterative engine[59] and Fourier ptychographic microscopy[73]. (b) Phase modulation: structured-light 3D imaging[110] and structured illumination microscopy[111,112]. (c) Coherent imaging: optical coherence tomography[103,329] and holography[109]. (d) Time modulation: coded exposure[330] and time of flight[331]. (e) Wavelength modulation: stochastic optical reconstruction microscopy[48] and synthetic wavelength holography[57].

    Coding methods and experimental results of structured illumination 3D-imaging. (a) Example of the pattern sequence that combines gray code and phase-shift projection[11]. (b) Novel phase-coding method for absolute phase retrieval[12]. (b1) The sinusoidal fringe pattern and the wrapped phase obtained from it. (b2) Phase-coding fringe and the codewords extracted from it. (c) Comparison of projection results between the method based on the phase-coding and the traditional phase-shifted method[12]. (c1)–(c3) Three sinusoidal phase-shifted fringe images. (c4) Wrapped phase map. (c5)–(c7) Three phase encoded fringe patterns. (c8) Wrapped stair phase map. (d) The phase-measuring profilometry based on the composite color-coding method[15]. (d1) Schematic of the feature points mapping-based principle. (d2) 3D shape of a stair model. (d3) Experimental result.

    Figure 5.Coding methods and experimental results of structured illumination 3D-imaging. (a) Example of the pattern sequence that combines gray code and phase-shift projection[11]. (b) Novel phase-coding method for absolute phase retrieval[12]. (b1) The sinusoidal fringe pattern and the wrapped phase obtained from it. (b2) Phase-coding fringe and the codewords extracted from it. (c) Comparison of projection results between the method based on the phase-coding and the traditional phase-shifted method[12]. (c1)–(c3) Three sinusoidal phase-shifted fringe images. (c4) Wrapped phase map. (c5)–(c7) Three phase encoded fringe patterns. (c8) Wrapped stair phase map. (d) The phase-measuring profilometry based on the composite color-coding method[15]. (d1) Schematic of the feature points mapping-based principle. (d2) 3D shape of a stair model. (d3) Experimental result.

    In 2012, Wang et al.[12] proposed a phase-coding method in which code words were embedded into phases. In this method, a set of sinusoidal fringes and stepped phase-coded fringes are used to illuminate the object to be measured, and the fringe order is determined from the stepped phase, as shown in Fig. 5(b). The phase unwrapping is shown in Fig. 5(c). Owing to the limited number of code words, this method can cause problems when dealing with high-frequency fringes. In the same year, Zheng et al.[13] proposed a two-step phase-coding method that removes the limit on the number of code words. By encoding two sets of phase information, two sets of code words can be embedded in the lighting fringe, which increases the number of code words. However, this method requires more fringe patterns. To reduce the number of projected fringe patterns and improve the code word recognition rate, in 2015, Zhou et al.[14] proposed a method where sinusoidal and stepped phase-coded fringes are encoded into red and blue channels, respectively, to form color fringes. These color fringes can be used to illuminate objects to measure their 3D morphology. Based on the idea of multi-channel coding, in 2019, Zhou et al.[15] embedded sinusoidal fringes, phase-coded fringes, and grayscale coding into the red, green, and blue channels of color lighting, which reduced the number of projected patterns. A schematic diagram of this principle and the experimental results are shown in Fig. 5(d). In 2020, Chen et al.[16] proposed an S-shaped piecewise phase-coding method. This method uses Gray code to encode the piecewise number of the phase code and uses the S-shaped design to provide constraints for fringe order judgment, which reduces the error rate and improves the measurement accuracy. In 2022, Gui et al.[17] designed an improved dual-frequency phase-coded fringe projection method to improve the accuracy when processing a large number of code words, and they realized absolute phase recovery.

    In addition, the phase information of the object can be obtained by the phase retrieval method to acquire the 3D structure of the object. In terms of phase retrieval, in addition to iterative algorithms such as the Gerchberg–Saxton (GS) algorithm[18,19] and hybrid input–output (HIO) algorithm[20], there are also non-iterative phase recovery algorithms such as the transport of intensity equation (TIE). TIE is a second-order elliptic partial differential equation, which describes the relationship between the variation of intensity along the optical axis to the phase of the optical field at the plane perpendicular to the optical axis. Using the TIE, the phase information of the target can be quantitatively recovered from the intensity distribution of different transmission distances. The TIE was first derived in 1982 by Teague. He derived the TIE from the Helmholtz equation under the paraxial approximation and obtained the solution of the equation based on Green’s function in 1983[21]. In 1995, Gureyev et al.[22] proved the solvability and uniqueness of the TIE under different boundary conditions. Subsequently, the Zernike polynomial method[23], multi-grid method[24], fast Fourier transform method[12], and other numerical solution methods have been proposed. The method based on fast Fourier transform is the most widely used method to solve the TIE. The fast Fourier transform method was proposed by Gureyev et al. in 1996[25], which implicitly includes periodic boundary conditions and can solve the TIE quickly and effectively. On the basis of the Fourier transform method, there are also many improvements. For example, Paganin et al.[26] extended the Fourier transform method to the case of the non-uniform light field, and in 2014, Zuo et al.[27] proposed a method based on discrete cosine transform (DCT) to solve the TIE, which realized the fast solution under non-homogeneous boundary conditions. The establishment of a mathematical theoretical basis and numerical solution of the TIE has promoted its application in many fields such as adaptive optics (AO)[28], optical phase microscopy[29], and X-ray diffraction imaging[30].

    Structured illumination imaging based on phase modulation can also be used for super-resolution imaging. From the perspective of Fourier optics, the imaging resolution is limited by the bandwidth of the system; therefore, the resolution can be improved by expanding the optical transfer function. Based on the Moiré effect, the structured illumination microscope loads high-frequency information about an object into the detection passband of the optical system by mixing the frequency in the frequency domain with the structured illumination. This method indirectly extends the bandwidth and facilitates super-resolution optical microscopic imaging that exceeds the diffraction limit. In 2000, Gustafsson et al.[31] proposed a classical super-resolution microscopic imaging technique with a two-beam interferometric fringe as the light source. This method can achieve a lateral resolution that is greater than half the classical diffraction limit; however, it only improves the lateral resolution. In 2008, Gustafsson et al.[32] used three-beam interferometric structured light as the light source to achieve 3D imaging, which improves both the lateral and longitudinal resolution and deterministically eliminates defocusing blur.

    Illumination is an important part of SIM, and since its inception, researchers have made various improvements to the lighting mode to improve the imaging performance. For example, in contrast to traditional two- or three-beam interference lighting methods, in 2008, Shao et al.[33] proposed SIM based on six-beam interference, where the original three beams are transformed into six beams through two opposite objectives. This method can be used to achieve high-resolution 3D imaging. However, it is complicated, and the optical path is difficult to construct experimentally. Therefore, in 2020, Manton et al.[34], considering the feasibility of the system structure, used a mirror to reflect the central beam back to the interference region to achieve four-beam non-interference microscopic imaging, as shown in Fig. 6(a). This method improves the axial resolution, as shown in Fig. 6(b). In 2021, Xu et al.[35] proposed an asymmetric three-beam interference method based on the traditional three-beam interference method. They increased the 3D imaging speed by optimizing the modulation method and acquisition timing. In 2023, Li et al.[36] proposed a simple method for generating four-beam interference and combined it with deep learning methods to achieve an isotropic resolution of 120 nm. In addition, to reduce the system complexity caused by interference, many lattice-based lighting methods have been proposed.

    Common SIM scheme and experimental results. (a) Schematic of the four-beam experimental setup[34]. (b) Simulated imaging performance on a fibrous ground truth test image, shown as an x–z slice[34]. (b1) Ground truth. (b2) Three-beam SIM. (b3) I5S (dual-objective six-beam SIM + interferometric detection). (b4) Dual-objective four-beam SIM (without interferometric detection). (c) Key steps in implementing instant structured illumination[39]. (c1) A converging microlens array is used to produce a multifocal excitation. (c2) Out-of-focus fluorescence is rejected with a pinhole array that is matched to the microlens array. (c3) A twofold local contraction of each pinhole fluorescence emission is achieved with the aid of a second, matched microlens array. (c4) A galvo serves to raster multifocal excitation and sum multifocal emission, producing a super-resolution image during each camera exposure. (d) Comparison between traditional SIM and cSIM[40]. (d1) Conventional SIM relies on a high-NA objective lens for both excitation and collection. (d2) cSIM harnesses interference in a waveguide to excite the specimen via evanescent fields, decoupling the excitation and collection light paths.

    Figure 6.Common SIM scheme and experimental results. (a) Schematic of the four-beam experimental setup[34]. (b) Simulated imaging performance on a fibrous ground truth test image, shown as an xz slice[34]. (b1) Ground truth. (b2) Three-beam SIM. (b3) I5S (dual-objective six-beam SIM + interferometric detection). (b4) Dual-objective four-beam SIM (without interferometric detection). (c) Key steps in implementing instant structured illumination[39]. (c1) A converging microlens array is used to produce a multifocal excitation. (c2) Out-of-focus fluorescence is rejected with a pinhole array that is matched to the microlens array. (c3) A twofold local contraction of each pinhole fluorescence emission is achieved with the aid of a second, matched microlens array. (c4) A galvo serves to raster multifocal excitation and sum multifocal emission, producing a super-resolution image during each camera exposure. (d) Comparison between traditional SIM and cSIM[40]. (d1) Conventional SIM relies on a high-NA objective lens for both excitation and collection. (d2) cSIM harnesses interference in a waveguide to excite the specimen via evanescent fields, decoupling the excitation and collection light paths.

    In 2010, Müller and Enderlein[37] proposed image-scanning microscopy that uses a single laser focus to scan the sample. To improve the imaging speed of single-point scanning, in 2012, York et al.[38] proposed multifocal structured illumination microscopy (MSIM), which uses digital micromirror devices (DMDs) to generate a multifocal exciting light for scanning. In 2013, York et al.[39] also suggested that MSIM could be improved by adding a pinhole lattice corresponding to the illuminated light lattice behind the micromirror array and using the galvanoscope to cast the exciting light onto the sample to be measured, as shown in Fig. 6(c). This method further increased the imaging speed of the SIM system and gave it the ability to image living cells in real time. Following the development of photonic integrated circuits (PICs), in 2022, Helle et al.[40] proposed 2D SIM based on photonic chips (cSIM). The approach using a planar photonic chip, which acts as both the bearer and light source to replace the traditional glass sample slide, is illustrated in Fig. 6(d). The optical waveguide array on the chip generates interference patterns at different angles, and the evanescent field is used to illuminate the sample. This method greatly improves the illumination alignment and reduces the complexity of the imaging system.

    4.1.2. Wavelength modulation

    Wavelength modulation usually refers to the modulation of the wavelength λ or wave band Δλ. For example, specific wavelengths are used for fluorescence microscopic imaging, and wide-spectrum light sources are used to prevent speckle noise in holograms. From the perspective of the diffraction limits, the wavelength partially determines the resolution of the imaging system; therefore, the resolution can be improved by appropriately modulating the wavelength of the light source. From the perspective of the spectrum, different wavelengths can encode different information, and there is a correlation between them. Reasonable use of the correlation between spectral information at different wavelengths can also improve the performance of the imaging system. Common computational imaging methods based on wavelength modulation include stimulated emission depletion (STED) microscopy, STORM, and dual-wavelength holographic scattering imaging.

    STED controls the excitation and depletion of fluorescent molecules by adjusting the wavelength of the light source, reduces the spot size corresponding to the point light source on the image plane, and realizes the modulation of the optical system point spread function (PSF); hence, it overcomes the diffraction limit. STED microscopy was first proposed in 1994 by Hell and Wichmann[41]. The STED system requires two illuminating beams with different wavelengths, which are used as excitation and depletion lights. When the excitation light irradiates the sample, all the fluorescent molecules within the Airy disk are excited, and the depletion light is superimposed on the periphery of the spot. The excited fluorescence molecules in the periphery of the solid spot can be quenched to obtain a solid spot with higher resolution, thereby regulating the PSF of the system. The selection of suitable wavelengths for the excitation and depletion lights is necessary for optimizing the performance of STED systems. The wavelength of the excitation light should be near the peak wavelength of the fluorescence excitation spectrum to ensure that its energy is fully absorbed. The wavelength of the depletion light should be close to the tail of the long wave side of the fluorescence emission spectrum. However, this method of selecting the wavelength can reduce the stimulated emission cross-section at the wavelength of the depletion light. Consequently, the corresponding threshold light intensity is increased, which increases the required depletion light intensity and can cause severe bleaching of the sample.

    If the wavelength of the depletion light is close to the emission spectrum, then the stimulated emission cross-section will increase; however, the depletion light will cause a secondary excitation of the sample, which will interfere with the experimental results. Therefore, the interference caused by secondary excitation must be prevented. In 2012, Vicidomini et al.[42] considered the depletion light intensity as undesirable background light. They obtained the final light intensity by considering the difference between the images with conventional STED and with the depletion light, which improved the imaging efficiency and the contrast of the image. The effectiveness of this method has been verified experimentally; however, the technology is not mature enough for practical applications. In 2017, Gao et al.[43] proposed stimulated emission double depletion (STEDD) imaging technology, as shown in Fig. 7(a). Two pulses were used to perform traditional STED and center intensity quenching, as shown in Fig. 7(b). The traditional STED and background intensity images were obtained, and the result was obtained by subtracting the images. This method reduces the effect that the wavelength has on the optical power and improves the imaging resolution, as shown in Fig. 7(c).

    Scheme of STEDD microscopy[43]. (a) Sketch of the STEDD, including the sequence of excitation and depletion pulses. (b) Detailed temporal sequence of fluorescence excitation. Shortly after the excitation pulse, the first STED1 pulse (intensity profile visualized in the x–z plane) depletes the majority of excited fluorophores except for those near the center. A fraction of fluorophores in peripheral regions of the observation volume still escape depletion or are re-excited by the STED beam. The second weaker STED2 pulse (intensity profile also visualized in the x–z plane) depletes excited fluorophores near the center but leaves those in the periphery unaffected. (c) Combined confocal and STEDD image of a COS-7 cell expressing the mGarnet–RITA fusion protein as a microtubule marker.

    Figure 7.Scheme of STEDD microscopy[43]. (a) Sketch of the STEDD, including the sequence of excitation and depletion pulses. (b) Detailed temporal sequence of fluorescence excitation. Shortly after the excitation pulse, the first STED1 pulse (intensity profile visualized in the xz plane) depletes the majority of excited fluorophores except for those near the center. A fraction of fluorophores in peripheral regions of the observation volume still escape depletion or are re-excited by the STED beam. The second weaker STED2 pulse (intensity profile also visualized in the xz plane) depletes excited fluorophores near the center but leaves those in the periphery unaffected. (c) Combined confocal and STEDD image of a COS-7 cell expressing the mGarnet–RITA fusion protein as a microtubule marker.

    STED essentially regulates the PSF of the system by modulating the wavelength of the light source, thereby reducing the size of the Airy spot to achieve super-resolution imaging. Wavelength modulation also allows imaging to be separated over time, thereby reducing the aliasing of the Airy spots. Using light of different wavelengths, STORM controls the brightness of random single fluorescent molecules in the sample and then locates the activated fluorescent molecules precisely. After several cycles, all the information is superimposed to reconstruct the information for the entire sample.

    STORM was first proposed in 2006 by Rust et al.[44], and it can be used to image biological structures with resolution below the diffraction limit. To further study the molecular structure and the interactions between biological macromolecules, STORM is being developed to incorporate multicolor imaging. In 2007, Bates et al.[45] proposed a series of probe pairs with different absorption spectra and emission spectra, which form a reporter and activator. These probe pairs can be used as fluorescent switches. Different wave bands of pulsed light are used to activate the activator to realize multicolor STORM. The advantage of this method is that the imaging resolution of each channel is high; however, the non-specific activation of the laser luminescence causes crosstalk between the different colors. In 2015, Zhang et al.[46] added a prism to one of the optical paths of the dual-objective imaging system to obtain lateral spectral expansion fluorescence signals, and they performed conventional STORM on the other path. Thus, they obtained a super-resolution image that contained both spatial and spectral information. This method only requires one beam of excitation light to achieve multicolor imaging, and the resolution is high; however, the data processing is more complex.

    In 2016, Shechtman et al.[47] used a spectrally dependent phase plate, based on PSF engineering technology, to directly encode spectral information and change the shape of different color PSFs, as shown in Fig. 8(a). Thus, they achieved efficient two-color imaging, as shown in Figs. 8(b) and 8(c). In terms of 3D imaging, the most commonly used techniques are astigmatism, biplanar imaging, and double helix structures. In 2008, Huang et al.[48] used optical astigmatism to obtain PSFs for multiple planes near the focal plane, and they determined the axial and lateral positions of single fluorophores according to the deformation of the PSF. Through the iterative and random activation of the optical switch, high-precision 3D positioning of each probe can be achieved, which facilitates the construction of 3D images without scanning the samples. In the same year, the team of Huang[49] combined 3D STORM with multicolor STORM and conducted multi-structure imaging of whole cells. Thus, they successfully imaged the mitochondrial network, which is beneficial when studying the spatial relationship between mitochondria and surrounding microtubules.

    PSF-based multicolor STORM and deep learning-based STORM. (a)–(c) Multicolor STORM[47]. (a) Raw data from the recorded super-resolution imaging movie. Insets: two enlarged example PSFs of a green label (horizontally elongated, top) and a red label (vertically elongated, bottom) with arrows indicating the elongation direction. (b) Super-resolution image obtained by localizing each emitter in the movie and assigning its color (red, microtubules; green, mitochondria). Inset: diffraction-limited data. (c) Histogram of all of the localizations within the dotted white box surrounding an ∼2 μm-long microtubule section in (b) (dark gray, FWHM=53 nm) and the diffraction-limited intensity cross-section from the same region (light gray, FWHM=329 nm). (d) FD-DeepLoc inference process[53].

    Figure 8.PSF-based multicolor STORM and deep learning-based STORM. (a)–(c) Multicolor STORM[47]. (a) Raw data from the recorded super-resolution imaging movie. Insets: two enlarged example PSFs of a green label (horizontally elongated, top) and a red label (vertically elongated, bottom) with arrows indicating the elongation direction. (b) Super-resolution image obtained by localizing each emitter in the movie and assigning its color (red, microtubules; green, mitochondria). Inset: diffraction-limited data. (c) Histogram of all of the localizations within the dotted white box surrounding an 2  μm-long microtubule section in (b) (dark gray, FWHM=53  nm) and the diffraction-limited intensity cross-section from the same region (light gray, FWHM=329  nm). (d) FD-DeepLoc inference process[53].

    The biplane method proposed by Juette et al.[50] in 2008 and the double helix PSF method proposed by Pavani et al.[51] in 2009 both achieved 3D imaging. Compared with the astigmatism and biplane methods, the double helix method has higher positioning accuracy and a greater DOF. In 2012, Xu et al.[52] used the dual objective lens method to achieve STORM with super-resolution imaging. The number of fluorescence photons collected during the imaging process doubled, and the resolution was increased to 1.4 times that of a single objective lens. In traditional 3D single-molecule imaging, the aberrations of fluorescent molecules far from the central optical axis are difficult to correct, which limits the FOV. In 2020, Fu et al.[53] used advances in deep learning and proposed the FD-DeepLoc network, which can accurately locate dense emission units within a large FOV. Its inference process is shown in Fig. 8(d). This method has spatial awareness and can achieve high-throughput 3D super-resolution imaging of whole cells with a large FOV and DOF. Compared to traditional astigmatism-based 3D imaging, FD-DeepLoc has a greater DOF and FOV, which increases the throughput by approximately 100 times and enables high-precision, high-fidelity imaging of biological structures across the entire camera frame (180×180×53).

    Another computational imaging method based on wavelength modulation is scattering imaging, which utilizes dual-wavelength holography. In contrast to traditional holography, dual-wavelength holography uses two different wavelengths to record digital holograms, obtains their wrapped phase maps via numerical computation, and obtains the equivalent wavelength phase diagram by determining the difference to realize phase unwrapping. Dual-wavelength holography avoids the problems of time-consuming calculations and poor stability faced by the traditional phase unwrapping algorithm. In 2000, Wagner et al.[54] realized the combination of dual-wavelength interference technology and digital holography for the first time, which did not require complex phase unwrapping and could accurately measure the topography of millimeter-scale objects.

    The combination of dual-wavelength holography and spectral correlation of the scattering medium can also be used to realize imaging through the scattering medium and non-line-of-sight (NLOS) imaging. In 2006, Hayasaki et al.[55] introduced dual-wavelength holography for imaging through the scattering media. They used two lights with wavelengths λ1 and λ2 with a CCD camera to obtain two holograms with different wavelengths and calculated the synthetic hologram and synthetic wavelength. The synthetic hologram propagates backward along the optical path at the synthetic wavelength to obtain the image of the obscured object. This approach greatly promoted the development of digital holography in the field of scattering imaging. In 2019, Willomitzer et al.[56] used dual-wavelength holography and the spectral correlation in the scattered light to restore the holographic image of an obscured object with high spatial and temporal resolution in a wide-angle FOV. In 2021, Willomitzer et al.[57] also suggested that a scattering medium could be irradiated with two beams of light with different wavelengths to form scattered light to illuminate the detection target. Then, the synthesized wavelength hologram is calculated and propagated backward along the optical path to realize nonvisual imaging of the object. The schematics and experimental results are shown in Figs. 9(a) and 9(b), respectively.

    Schematics and experimental result synthetic wavelength holography (SWH) for NLoS imaging through scattering media[57]. (a) SWH image formation and reconstruction. The synthetic wavelength Λ=λ1λ2|λ1−λ2| in the reconstruction process. (b) Experimental result. (b1)–(b4) Reconstructions of measurements taken through the ground glass diffuser for different SWLs Λ. (b5)–(b8) Reconstructions of measurements taken through the milky plastic plate for different SWLs Λ.

    Figure 9.Schematics and experimental result synthetic wavelength holography (SWH) for NLoS imaging through scattering media[57]. (a) SWH image formation and reconstruction. The synthetic wavelength Λ=λ1λ2|λ1λ2| in the reconstruction process. (b) Experimental result. (b1)–(b4) Reconstructions of measurements taken through the ground glass diffuser for different SWLs Λ. (b5)–(b8) Reconstructions of measurements taken through the milky plastic plate for different SWLs Λ.

    4.1.3. Light vector modulation

    The light vector of a light source can be modulated by adjusting the illumination direction of the incident light source. Multi-angle illumination can be introduced to provide additional constraints for the underdetermined problem to ensure that the solution is unique or to encode target information into the illumination direction, which can be recovered by decoding. The main applications of light vector modulation are the ptychographic iterative engine (PIE) and Fourier ptychographic microscopy (FPM).

    PIE is based on coherent diffraction imaging. By introducing light vector modulation, constraints are introduced to the underdetermined problem, which can solve problems such as slow convergence speed, local optimization, and algorithm stagnation that arise in traditional coherent diffraction imaging. In 2004, Rodenburg and Faulkner[58] proposed a lensless PIE based on an illumination array scanning sample and the corresponding iterative algorithm. The method scans the object to be measured by changing the direction of the illumination and records a series of diffraction spots. Physical mechanisms, such as interference, are contained in the partially overlapping position relationship. Therefore, the complex phase object can be solved without any restrictions on the surface of the object to be measured. This approach has the advantages of a large imaging FOV, strong algorithm robustness, and high error tolerance. Moreover, it offers the possibility of lensless transmission microscopy with subatomic resolution using electrons, X-rays, or nuclear particles.

    The imaging effect of PIE can be improved by modulating and encoding the light probe. In 2018, Zhou et al.[59] proposed a lensless imaging scheme with multi-angle light-emitting diode (LED) lighting where a pinhole was placed between the LED array and the object, and the diffraction pattern was obtained through a single illumination. The optical setup and corresponding forward model expression are shown in Figs. 10(a) and 10(b), respectively. Figures 10(c) and 10(d) show the measurement and recovery results of the method. The information in the overlapping region is demultiplexed by an optimization algorithm to reconstruct the image, which solves the problem of time-consuming measurement processes. In 2021, Lu et al.[60] suggested that the pinhole between the LED array and the object could be replaced by a random optical mask to avoid dependence on mechanical scanning accuracy and improve the imaging resolution. LED arrays are used to provide illumination at different angles to project translational structured patterns without mechanical scanning. The forward imaging model of the mask-modulated lensless imaging is shown in Fig. 10(e). The method uses a regularized graph iteration engine to obtain higher resolution and better quality images and to improve the imaging efficiency, as shown in Fig. 10(f). According to Nyquist’s sampling theorem, the resolution of a reconstructed object is limited by the size of the detector pixels. Therefore, in 2024, Lan et al.[61] suggested that the original flat illumination should be replaced by divergent illumination, which can improve the resolution by a factor of 2 using only 30 diffraction patterns for 10 iterations.

    Multi-angle illumination lensless imaging and mask-modulated lensless imaging. (a)–(d) Multi-angle illumination lensless imaging[59]. (a) The optical setup of multi-angle illumination lensless imaging system. (b) The corresponding forward model expression. (c) The corresponding single-shot measurement. (d) Recovered results of a USAF-1951 resolution chart. (e)–(f) Mask-modulated lensless imaging[60]. (e) Forward imaging model of the mask-modulated lensless imaging. (f) Comparison of the recovered images using the USAF-1951 resolution target.

    Figure 10.Multi-angle illumination lensless imaging and mask-modulated lensless imaging. (a)–(d) Multi-angle illumination lensless imaging[59]. (a) The optical setup of multi-angle illumination lensless imaging system. (b) The corresponding forward model expression. (c) The corresponding single-shot measurement. (d) Recovered results of a USAF-1951 resolution chart. (e)–(f) Mask-modulated lensless imaging[60]. (e) Forward imaging model of the mask-modulated lensless imaging. (f) Comparison of the recovered images using the USAF-1951 resolution target.

    Light field information can also be encoded and interpreted in the Fourier domain using the concept of ptychographic imaging. In 2013, Zheng et al.[62] proposed FPM imaging, which integrated the concepts of phase recovery and synthetic aperture. This method achieved high-resolution imaging with a large FOV using multiple iterations of low-resolution intensity images from different lighting angles in Fourier space, as shown in Fig. 11(a). In traditional FPM imaging, the sampling speed is limited by the number of samples and the relatively long exposure time required to obtain dark field images. To make Fourier ptychographic imaging suitable for real-time imaging, the illumination mode can be adjusted to reduce the sampling time. Intuitively, the most direct way to reduce the sampling time is to reduce the number of samples or shorten the sampling time for a single sample.

    FPM and corresponding illumination improvement strategies. (a) Iterative recovery procedure of FPM (five steps)[62]. (b) Multiplexed coded illumination for FP with an LED array microscope[66]. (Top) Four randomly chosen LEDs are turned on for each measurement. (Middle) The captured images corresponding to each LED pattern. (Bottom) Fourier coverage of the sample’s Fourier space for each of the LED patterns (drawn to scale). (c) Experimental setup of FP based on the laser illumination source[72].

    Figure 11.FPM and corresponding illumination improvement strategies. (a) Iterative recovery procedure of FPM (five steps)[62]. (b) Multiplexed coded illumination for FP with an LED array microscope[66]. (Top) Four randomly chosen LEDs are turned on for each measurement. (Middle) The captured images corresponding to each LED pattern. (Bottom) Fourier coverage of the sample’s Fourier space for each of the LED patterns (drawn to scale). (c) Experimental setup of FP based on the laser illumination source[72].

    The methods commonly used to reduce sampling times include adaptive illumination and multi-channel sampling. Content-based adaptive illumination was proposed in 2014 by Bian et al.[63], Fourier ptychographic imaging based on self-learning was proposed in 2015 by Zhang et al.[64], and Fourier ptychographic imaging based on a predictive search algorithm was proposed in 2017 by Li et al.[65]. All these methods are based on the idea of adaptive illumination. That is, the illumination angle is selected according to the spectral energy distribution in Fourier space, which reduces the sampling speed and sampling time. However, this approach is only effective for samples with highly structured Fourier spectra.

    In multiplexing, multiple LEDs with different angles or colors are used to illuminate the sample simultaneously. In 2014, Tian et al.[66] proposed a multiplexing illumination strategy where LEDs are randomly turned on for each captured image. Each LED corresponds to a different region of Fourier space, as shown in Fig. 11(b); therefore, the total number of images can be reduced significantly without affecting the image quality. In the same year, Dong et al.[67] proposed state-multiplexed Fourier ptychographic imaging to address the partial coherence effects of light sources and realize color imaging. In 2018, Sun et al.[68] proposed a single quantitative phase imaging (QPI) method based on color multiplexing FPM, where the samples are simultaneously illuminated with their respective RGB channels in a programmable LED array, which overcomes the problem of pixel aliasing, improves the accuracy of phase recovery, and increases the efficiency of image acquisition. The main methods of shortening the sampling time by reducing the exposure time involve the use of spherical light sources[69,70] and lasers[71,72] for illumination. The experimental setup for FPM based on a laser illumination source is shown in Fig. 11(c).

    4.1.4. Time modulation

    Time modulation implies the modulation of a light source over time. The modulation can separate the target from the background signal in the time domain to improve the SNR of the image or determine the distance between the target and the detector according to the time difference to produce a 3D reconstruction of the target. Common computational imaging methods based on time modulation include time of flight (ToF) and coded exposure technologies.

    ToF can be broadly understood as a technique to understand certain properties of ions or media by measuring the time it takes an object, particle, or wave to travel a certain distance in a fixed medium. ToF is mainly realized via pulse modulation, and it has a broad range of applications. In the imaging field, ToF is mainly used for laser range-gated and scattering imaging.

    Laser range-gated imaging realizes signal separation according to the difference between the target and background signal in the time domain to improve the SNR. Laser range-gated imaging has developed rapidly since the 1990s owing to advances in laser, photodetector, and electronic signal processing technologies and is used for 3D imaging applications in the field of national defense. In 2004, Busck et al.[74] from the Danish Defense Research Department proposed a laser 3D imaging method based on range-gated time slicing. This method acquired multiple image frames by taking slices at time intervals within the range of the gated distance and then reconstructed one frame of the 3D image using the Gray information from multiple frames. In 2006, Andersson et al.[75] used this time-slice method to conduct laser 3D imaging experiments. Multi-frame slices were obtained using a 532 nm pulsed light source and an ICCD camera, which resulted in measurements with an accuracy of 6 m for a target at a distance of 830 m, as shown in Fig. 12(a).

    3D imaging and scattering imaging based on ToF. (a) Experimental results of range-gated laser imaging based on the time slice[75]. Terrain vehicle imaged from ranges of 1.9 km (left) and 7.2 km (right). (b) The imaging results of range-gated laser 3D imaging based on intensity correlation at different distances[76]. (c) 3D structure of the towers derived from the polarization-modulated 3D imaging lidar[78]. (d) Principle and results of imaging through realistic fog with a SPAD camera[82].

    Figure 12.3D imaging and scattering imaging based on ToF. (a) Experimental results of range-gated laser imaging based on the time slice[75]. Terrain vehicle imaged from ranges of 1.9 km (left) and 7.2 km (right). (b) The imaging results of range-gated laser 3D imaging based on intensity correlation at different distances[76]. (c) 3D structure of the towers derived from the polarization-modulated 3D imaging lidar[78]. (d) Principle and results of imaging through realistic fog with a SPAD camera[82].

    In 2007, Laurenzis et al.[76] proposed a super-resolution intensity correlation laser 3D imaging method. This method illuminates a scene hundreds of meters away using microsecond laser pulses and sensor gate widths, and the scene is recorded as a single image. The trapezoidal distance intensity profile is analyzed to determine the reflectivity and depth of the scene, as shown in Fig. 12(b). This method only requires two intensity maps to generate a distance map, has a high imaging speed, and involves a small amount of data. However, the laser pulse must be twice the fate pulse width, and the light source must be linear, which is difficult to achieve in practice. Therefore, in 2008, Zhang et al.[77] proposed a linear–linear gain pulse modulation model based on the linear–constant gain pulse modulation model, which ensures the linearity of the light source and effectively improves the anti-interference ability and ranging resolution of the system. In 2018, Chen et al.[78] proposed a polarization-modulated 3D imaging method to achieve remote 3D imaging with high resolution and low-light sensitivity, simplify the data acquisition process, and reduce the acquisition time. The results are shown in Fig. 12(c). This addresses the fact that the indirect ToF measurement method based on intensity is unstable when the echo signal is very small. In 2021, Liu et al.[79] proposed a polarimetric modulated photon-counting 3D imaging method based on a negative parabolic pulse model (NPPM). The number of photons received after each laser pulse is used to measure the weak signal. The calculation method was established by exploring the relationship between the ToF of the photons that corresponded to the polarization-modulated state controlled by the phase shift. The photon rates were calculated from the received photon counts based on the Poisson negative log-likelihood function and used to find the distance. When the average number of echo signals received by each pulse laser is less than 0.05, this method can still achieve millimeter accuracy, and it has good 3D imaging performance.

    Imaging through scattering media based on the ToF is essentially a correction problem with multipath interference. In the scattering environment, multipath interference is the signal received by the detector that is an alias of the target signal and the scattered signal, which creates a bias in the results. By analyzing the time dimension information and distinguishing the photon arrival time, ToF technology can enhance the target signal and suppress the scattered signal to realize imaging through scattering media. In 1991, Wang et al.[80] used a pulse-modulated light source to separate ballistic light and scattered light in the time domain for the first time. Hence, they realized submillimeter imaging through a variety of scattering media. The proportion of ballistic photons can be increased by adjusting the gate, which improves the contrast. This work verified the validity of the time-domain separation method and has been an important reference for subsequent work.

    In 2007, Laurenzis et al.[76] used microsecond laser pulses and sensor gate widths under foggy conditions to suppress the scattered signal and achieve high-resolution 3D scattering imaging. However, the imaging contrast was low owing to the small number of ballistic photons, and this method cannot be applied to complex scattering scenes. In 2012, Laurenzis et al.[81] used a 3D laser range-gated imaging system to model the scattering imaging based on ToF, and they analyzed the effect of the scattering environment on the depth resolution of the imaging system. The work provided a theoretical basis for subsequent work and helped suppress the influence of multipath interference from the scattering mechanism. To solve the problem of low imaging contrast caused by traditional gating, in 2018, Satat et al.[82] used a single-photon avalanche diode (SPAD) camera to record all the information about photons, including the time domain information, pixel by pixel. The distribution of the scattered and non-scattered light in the time profile was modeled to distinguish between the background and signal photons to achieve high-contrast and high-resolution imaging through scattering media, as shown in Fig. 12(d).

    In 2020, Yin et al.[83] studied ToF imaging for underwater scenes, used time-gating to suppress backscattering, and adopted the Bayesian probability model to distinguish reflected pulses from the return signals affected by forward scattering. Using information about neighboring pixels to reconfigure the depth information, they were able to recover depth information from objects 7–10 m away from the camera in gulf, coastal, and deep-sea underwater environments. In 2021, Kijima et al.[84] further refined the fog imaging model by adding time gating. In the refined model, the gate is opened immediately after the light source transmits the signal so that it only accepts the scattered component of the signal. Therefore, the scattering properties of the fog can be determined, and the intensity and depth image can be determined for a target in a foggy scene with a depth of 10 m.

    Motion blur is a basic problem in the imaging field. Essentially, it is the degradation of image quality during image acquisition caused by the limited frequency band. Coded exposure technology can expand this limited band by controlling the exposure to save more high-frequency information and recover a clear image. The most common encoding exposure mode is time coding, which is the overall exposure mode in the time order. According to the specific coding, the one original exposure is extended to multiple exposures; hence, more high-frequency information about the original target image can be retained. Time-coded exposure was proposed by Agrawal and Raskar[85] in 2006. This approach quickly controls the opening and closing of the shutter during the exposure according to a prefabricated binary sequence, which widens the band and improves the PSF deconvolution performance. This method can be used to restore a clear image from a fuzzy image. The selection of an optimal code word is key to achieving successful time-coded exposure technology. In 2012, McCloskey et al.[86] proposed an optimal code word sequence criterion dependent on the moving speed of the object, which further improved the coding exposure performance, as shown in Fig. 13(a). However, the effective PSF was determined by the moving speed of the object, and it became irreversible when the speed of the object was too great.

    Experimental results of different methods of deblurring. (a) Coded exposure that depends on the speed of an object’s motion[86]. Column 1: input images. Column 2: matching metric versus velocity. Column 3: deblurred results using our estimated velocity. (b) Comparison of the deblurring performance with different sequence lengths under the same exposure[88]. (b1) Sequence length = 40, 1 chop duration = 3 ms. (b2) Sequence length = 120, 1 chop duration = 1 ms.

    Figure 13.Experimental results of different methods of deblurring. (a) Coded exposure that depends on the speed of an object’s motion[86]. Column 1: input images. Column 2: matching metric versus velocity. Column 3: deblurred results using our estimated velocity. (b) Comparison of the deblurring performance with different sequence lengths under the same exposure[88]. (b1) Sequence length = 40, 1 chop duration = 3 ms. (b2) Sequence length = 120, 1 chop duration = 1 ms.

    In 2015, He et al.[87] introduced coded exposure imaging into flutter remote sensing image restoration. According to the optimal coding sequence selection criterion, the remote sensing image is quickly restored. In 2017, Jeon et al.[88] explored the effects of binary coding and proposed an optimal time-coding design scheme for different application conditions. By studying motion deblurring based on coded exposure, they found that the binary coding in coded exposure should conform to the law of random sequence code words. A comparison of the deblurring performance with different sequence lengths under the same exposure is shown in Fig. 13(b). Jeon et al. also proposed two algorithms that could be used to generate short and long binary sequences. Different code lengths were selected according to the external conditions to improve the recovery effect. In the same year, Jeon et al.[89] also used the low cross-correlation between multiple sets of codes to form complementary code sets, which protected the information collection process.

    4.1.5. Coherent imaging

    Coherent imaging refers to imaging under coherent light. In contrast to incoherent imaging, coherent imaging can be used to determine the phase information of a target; therefore, it has been used in the fields of microscopic imaging, 3D imaging, and optical defect detection. Common coherent imaging methods include optical coherence tomography (OCT) and holography.

    OCT is a new 3D tomography technology. Based on the principle of low-coherence light interference, OCT uses a spectroscope to divide the low-coherence light source into reference and sample beams. The scattered light from the sample beam loses its coherence with the reference light; thus, only the target light information is retained in the interference pattern. The small coherence length of the light source ensures that the separation between the target and scattered light is precise. In 1991, Huang et al.[90] used the optical time-domain interference method to achieve 2D sectional imaging of the retina and coronary artery in vitro for the first time. Time-domain OCT (TDOCT), shown in Fig. 14(a), is considered to be the first generation of OCT technology. However, the imaging speed of TDOCT is limited by the scanning speed of the reference arm, which means that it cannot be used for real-time imaging. In 1995, Fercher et al.[91] proposed the concept of frequency-domain OCT (FDOCT), which uses spectral-domain OCT (SDOCT) measurements instead of single-point measurements in the time domain. Hence, FDOCT can obtain all the depth information from a single measurement of the spectra of the interference signals and Fourier transforms. In contrast to TDOCR, SDOCT does not require a scanning reference arm, which greatly improves the measurement speed and makes real-time OCT imaging possible. In 1997, based on FDOCT, Chinn et al.[92] proposed that the spectral information could be mapped to the time using a wide-range swept laser as the source. Thus, they further improved the speed of OCT using the fast response rate of single-point detection in the time domain. This type of OCT is called swept-source OCT (SSOCT).

    TDOCT structures and metasurface-based bijective illumination collection imaging (BICI). (a) Simplified block diagram of the TDOCT method[103]. (b) Incorporation of BICI through one arm of an interferometer (orange lines represent a single-mode fiber)[102]. (c) Tissue imaging comparison of BICI and a conventional approach[102]. Imaging swine tracheobronchial tissue specimens using a plano-convex lens with common illumination and collection paths (c1, c2, c5, and c6) and BICI (c3, c4, c7, and c8). (c9) Corresponding histology image of the tissue imaged using the conventional approach.

    Figure 14.TDOCT structures and metasurface-based bijective illumination collection imaging (BICI). (a) Simplified block diagram of the TDOCT method[103]. (b) Incorporation of BICI through one arm of an interferometer (orange lines represent a single-mode fiber)[102]. (c) Tissue imaging comparison of BICI and a conventional approach[102]. Imaging swine tracheobronchial tissue specimens using a plano-convex lens with common illumination and collection paths (c1, c2, c5, and c6) and BICI (c3, c4, c7, and c8). (c9) Corresponding histology image of the tissue imaged using the conventional approach.

    (a) Several examples of scattering imaging using ballistic light[117,125,141]. (b) Several examples of computational light field restoration techniques based on scattered light[146,152,161]. (c) Non-line-of-sight imaging (NLOS)[169].

    Figure 15.(a) Several examples of scattering imaging using ballistic light[117,125,141]. (b) Several examples of computational light field restoration techniques based on scattered light[146,152,161]. (c) Non-line-of-sight imaging (NLOS)[169].

    The lateral resolution and axial resolution of OCT systems are independent. The lateral resolution is determined by the focusing conditions of the beam used to illuminate the sample, whereas the axial resolution is determined by the central wavelength of the light source and the half-peak full width of the spectrum. Therefore, OCT can achieve imaging with both high lateral and axial resolutions.

    Considering the axial resolution, as high-power light sources such as mode-locked lasers have been developed, many methods of using them to generate supercontinuum spectra to improve the axial resolution have been proposed. In 2002, Povazay et al.[93] suggested that a supercontinuum spectrum light source could be generated using a femtosecond titanium gem laser, and they realized an OCT imaging system with an axial resolution of 0.75 µm. This indicates that the measurement accuracy of OCT has reached the subcellular level. In addition to the use of ultra-wide spectral light sources, SLD light sources with different center wavelengths and bandwidths can also be combined to obtain a wider spectral range. This idea was adopted by Liu et al. in 2011[94], who proposed µOCT, and by Gui et al. in 2014[95], who used two infrared diode light sources to obtain wide-spectrum infrared lighting. The axial resolution of the OCT system is directly proportional to the central wavelength of the light source. Therefore, in 2017, Fuchs et al.[96] used an extreme ultraviolet light source to achieve nanoscale axial resolution. However, owing to the substantial loss of ultraviolet light in biological tissues, this approach is not suitable for biological observations, and ultraviolet OCT is mainly used to measure semiconductor materials. In 2019, Israelsen et al.[97] used a high-brightness mid-infrared supercontinuum light source to achieve micron-scale axial resolution OCT in the mid-infrared band, which can be used for real-time nondestructive testing. In 2020, Jerwick et al.[98] achieved high-resolution imaging in ophthalmology using a 1060 nm light source.

    Considering the lateral resolution, in 2008, Ralston et al.[99] proposed that a synthetic aperture could be used to prevent lateral resolution deterioration with increasing measurement depth, and this approach enabled large-depth imaging with high resolution. In 2011, Blatter et al.[100] used a Bessel beam as the light source, which alleviated the constraint between the imaging depth and lateral resolution and improved the lateral resolution. In 2019, Zhou et al.[101] proposed optical coherence refraction tomography (OCRT), which extends superior axial resolution to the lateral dimension and reconstructs undistorted cross-sectional images from multiple conventional images with angular diversity. Moreover, this approach corrects the distortion caused by refraction and improves the lateral resolution by a factor greater than 3. In 2022, Pahlevaninezhad et al.[102] proposed a metasurface-based dual-beam illumination imaging technique that is capable of imaging over a relatively large depth range without affecting the lateral resolution. The experimental principles and results are shown in Figs. 14(b) and 14(c), respectively.

    Holography produces a complex-valued field of a target via interference recording and diffraction reproduction. Since Gabor[104] first proposed holography in 1948, it has mostly been based on highly coherent laser light sources. However, when a laser is used as the illumination source, highly coherent noise, such as speckle noise and parasitic interference fringes, will appear in the hologram and affect the image quality. Therefore, holography based on partially coherent illumination has begun to attract attention. In 2008, Kemper et al.[105] proposed a time-phase-shift digital holographic microscopy technique using LED lights. The phase noise of LED and laser light sources was tested under the same conditions, which proved that an LED light source can effectively reduce phase noise and improve phase measurement accuracy. In 2011, Choi et al.[106] suggested that a dynamic speckle could be obtained by rotating a ground glass screen to destroy the coherence of a highly coherent laser. Then, they implemented a digital holographic technique based on dynamic speckle illumination, which greatly improved the lateral and axial spatial resolution owing to the wide angular spectrum of the scattered waves. In 2018, Cho et al.[107] proposed a dual-wavelength off-axis digital holographic interferometry system with low coherence based on an LED light source. Their method used two diffraction gratings to adjust the central wavelength and bandwidth of the outgoing LED beam and used filtering to extend the coherence of the broadband light source to improve the SNR of the results. In 2020, Mann et al.[108] proposed a coaxial white phase-shifting interferometer based on a white light source, which separated the R, G, and B components from a single white light interference pattern, thereby providing simultaneous multispectral information about a sample.

    In summary, illumination is the source of the imaging link. When illumination is introduced into an imaging system as coded information, the high-dimensional physical information of the light field can be fully utilized. This approach can increase the resolution of the imaging system, the operating distance, and the ability to adapt to the environment.

    4.2. Computational medium

    The computational medium is the most important part of a computational optical imaging system. When light passes through the scattering medium, the target information is hidden in the light field; however, the difficulty of interpreting the light field information is directly determined by the scattering medium. When light is scattered, the photons received by the sensor and camera are generally divided into two categories: ballistic and scattered photons. When the light is scattered by an evenly distributed static medium and the degree of scattering is small, the unscattered ballistic light is dominant, the target information in the scattered light field is more prominent, and the light field is easy to interpret. However, when the scattering medium is thick, the light is scattered several times, the ballistic light almost disappears, and only severely scattered light remains in the final light field. Therefore, the target information must be mined from the scattered light field before target interpretation can be conducted.

    When the degree of scattering is low, we can separate the ballistic and scattered light, and use the ballistic light directly to extract the target information. In general, methods such as range gating and dark channels can be used to extract the ballistic photons, remove the scattered photons, and obtain the target information. However, when the degree of scattering is large, the scattered light must be used to extract the target information, and it is necessary to analyze and extract the target information from the light field. This can be achieved using methods such as deconvolution, speckle autocorrelation, wavefront shaping, and transmission matrix calibration.

    Nonvisible imaging technology is used to image invisible targets, which is achieved by extracting target information from scattered light. Nonvisible imaging can be divided into active and passive imaging technology according to whether it is necessary to provide a light source. Physical dimension information in the scattered light field, such as the intensity, polarization, and phase, is analyzed and extracted to solve the nonvisible imaging problem (Fig. 15).

    4.2.1. Scattering imaging technique based on ballistic light

    Extracting effective information in scattered light, currently, there are three main methods: one is range gating, which uses a small number of photons to recover the effective information in the optical field; another is to use dark channels to remove water vapor, haze, and other scattered gases. The last is to extract and analyze the target signal based on the polarized information of the optical field.

    “Seeing farther and clearer” is the primary demand for active imaging in remote sensing and target recognition. Active optical imaging systems use their own light source to recover information about the scene. To suppress photon noise inherent to the optical detection process, the detection of many photons is usually required. However, in remote sensing of dynamic scenes and microscopic imaging of biological samples, many photons cannot be collected owing to the limitations of the luminous flux and integration time. Therefore, a key challenge in such scenarios is to use a small amount of photons to accurately recover the information about the scene. Furthermore, for any fixed total acquisition time, serial acquisition through raster scanning reduces the number of photons detected per pixel. The low number of returned photons, strong background noise, and limited range of action are the main problems that must be addressed to realize remote active sensing.

    Previous studies have shown that single-photon light detection has high sensitivity and time resolution. The light was first captured into flight by Abramson in 1978[113], who used holographic technology to record the wavefront of a pulse and scattered it using a white screen placed in the path. This high-speed recording technique can be used for dynamic observations of phenomena such as reflections, interference, and focusing through static motion. In 2007, Kubota et al.[114] experimented with in-flight light holography in a scattering medium. They used a fringe camera with picosecond resolution to capture moving light in the scattering medium, thereby eliminating the need for interferometry and coherent illumination. However, this approach required additional hardware to raster scan 2D scenes, which increased the acquisition time to several hours. In 2015, Gariepy et al.[115] simplified the data acquisition process and reduced acquisition time by achieving full imaging capability and low light sensitivity with picosecond temporal resolution. Thus, they provided an imaging solution for acquiring spatial and temporal information. Their approach used 2D complementary metal–oxide semiconductor (CMOS) arrays based on SPAD detectors to obtain the data, and each pixel operated in a time-correlated single-photon counting (TCSPC) mode.

    These technological breakthroughs are based on 2D planes; however, single-photon light detection can also be used to obtain 3D shapes remotely by accurately measuring the ToF, which has a variety of applications in fields such as Earth science, construction, and defense. Therefore, a lot of effort has been dedicated to developing single-photon light detection and ranging (LiDAR) for long-range 3D imaging. However, in practical applications, the working range of single-photon LiDAR systems still has some limitations owing to the effects of high background noise, which results in weak echo signals, and it cannot exceed several tens of kilometers above the Earth’s atmosphere. To improve the photon recovery rate and noise tolerance, in 2014, Kirmani et al.[116] developed low-flux imaging technology, the first photon-imaging technology. The simultaneous acquisition of the subrule duration range and reflectance information in the presence of high background noise is of great value for microscopy and remote sensing.

    Xu et al.[117] proposed and demonstrated an efficient photon LiDAR method for long-range super-resolution single-photon imaging within 8.2 km. In 2019, they achieved the longest distance single-photon imaging in the world, and their results were selected as one of the top 10 advances in Chinese optics. This validation of an efficient, noise-resistant method for photons demonstrates the feasibility of fast, long-range, and low-power radar imaging. When an imaging device is moved precisely below the pixel scale to capture a series of low-resolution images using a fine subpixel scanning method, the subpixel displacement between the images will suppress frequency aliasing. This can be used to improve the imaging resolution of the system. Xu et al.[118] developed a 3D deconvolution method that could retrieve subpixel resolution information from fine-scanning results to achieve image reconstruction. The experimental device is shown in Fig. 16(e). They applied this approach to single-photon LiDAR for long-range 3D imaging and extended the distance limit to improve the imaging range. In 2020, Li et al.[119] realized active single-photon 3D imaging at a distance of 45 km in an urban environment. To meet the challenge of ultra-long distance imaging, they developed and optimized an efficient and low-noise coaxial scanning system, which effectively collected a small number of echo photons and suppressed background noise, as shown in Fig. 17(a). The reconstruction depth obtained by various imaging algorithms is shown in Fig. 17(b). Xu et al. demonstrated that the algorithm successfully recovered the fine features of the building; therefore, scenes with multi-layer distributions can be accurately identified.

    (a)–(d) Imaging results of single-photon LiDAR at 8.2 km[117]. (e) Schematic diagram of the experimental setup[118]. (f) Ideal analogue resolution charts[117]. (g) Simulation under low-light and low-brightness conditions[117].

    Figure 16.(a)–(d) Imaging results of single-photon LiDAR at 8.2 km[117]. (e) Schematic diagram of the experimental setup[118]. (f) Ideal analogue resolution charts[117]. (g) Simulation under low-light and low-brightness conditions[117].

    (a) Aerial view of the remote active imaging experiment. (b) Results obtained based on different imaging algorithms. (c) Long-range 3D imaging over 45 km[119].

    Figure 17.(a) Aerial view of the remote active imaging experiment. (b) Results obtained based on different imaging algorithms. (c) Long-range 3D imaging over 45 km[119].

    These results clearly show that the algorithm proposed by Xu’s team is the most effective for spatial and deep reconstruction of remote targets. By applying the microscanning method, the lateral resolution was shown to be approximately 0.6 m at a far field of 45 km, as shown in Fig. 17(c). This method generates 3D images at the level of a single photon per pixel, which allows objects to be identified and recognized at very low light levels. The proposed high-efficiency coaxial single-photon LiDAR system, noise suppression methods, and advanced computational algorithms provide new opportunities for low-power LiDAR remote imaging. In 2021, Xu’s team continues to increase the long-range imaging distance. Li et al.[120] have also proposed a compact coaxial single-photon LiDAR system for 3D imaging at distances of up to 201.5 km. Using efficient optical devices to collect and detect photons, new noise suppression effects can be developed for remote active imaging applications.

    The single-photon detector has great potential in the field of remote observation and target recognition, especially in the field of single-photon LiDAR. By continuously improving the detection efficiency, reducing the noise level, optimizing the calculation algorithm, and realizing the integration and compact of the detection system, as well as the in-depth advancement of interdisciplinary research, the single-photon detector will achieve longer distance and higher precision 3D imaging, providing strong technical support for a variety of practical application scenarios.

    Factors such as climate change and air pollution make it difficult to avoid foggy images. In various indoor and outdoor scenes with fog and haze, there are varying concentrations of suspended particulates, such as gaseous water and dust, which can significantly reduce various aspects of image quality, as shown in Fig. 18(a). As light propagates, it is affected by particles in the air, and it is scattered and absorbed, which weakens the reflected light from the target object. Simultaneously, atmospheric light becomes part of the image owing to scattering by the air particles. These factors cause problems such as image color attenuation, contrast reduction, increased blur, and detail loss, which reduce the visual effect to different degrees and negatively affect subsequent image processing. Video images of hazy weather can be represented by atmospheric degradation models.

    (a) Haze imaging model[120]. (b) Flow chart of the DCP dehazing algorithm[121]. (c) Comparison of the effects of other dehazing algorithms and the Lu Z dark channel dehazing algorithm[125].

    Figure 18.(a) Haze imaging model[120]. (b) Flow chart of the DCP dehazing algorithm[121]. (c) Comparison of the effects of other dehazing algorithms and the Lu Z dark channel dehazing algorithm[125].

    The atmospheric light intensity and its value at infinity are relatively easy to estimate, which makes it easier to recover the original reflected light intensity from the target that is not affected by haze scattering. The atmospheric light intensity and transmittance must be estimated with high accuracy to optimize the image defogging effect. The dark channel prior (DCP) haze transmission algorithm proposed by He et al. in 2009[121] is based on the atmospheric degradation model theory, and it is the most widely used method of image restoration. The algorithm analyzes fog-free images, and in most non-sky local areas it finds that some pixels always have at least one color channel with a very low value, which is called the DCP. The steps of the DCP algorithm are shown in Fig. 18(b). Atmospheric light and transmission map estimations are combined with a restoration model to achieve image defogging and clarity processing. DCP is a classic image processing algorithm, and it has been widely studied owing to its excellent image quality and easily understood and implemented process. This theory represents a revolutionary breakthrough in the field of image defogging and provides new avenues for improving image quality.

    Since its introduction in 2009, a lot of research has been done to improve the performance of the DCP algorithm. G. Bi et al.[122] combined observations of fog-free or fog-free outside images with multi-scale guided filtering and effectively reduced problems such as inaccurate color information recovery, halo artifacts, and block effects caused by the DCP algorithm; avoided artifacts caused by sudden changes in the scene depth; and obtained high-quality fog-free images. However, after removing the fog, the images showed color distortion in the sky area. Zhu et al.[123] also proposed a method of artifact removal based on the DCP algorithm, which used the energy minimization function to estimate the transmission, smoothing term, and boundary term. Kim et al.[124] used a simple stretching method and algorithm to estimate the relationship between the transmission and the brightness saturation of a scene, which effectively improved the detail and clarity of the image; however, the overall brightness still needed to be enhanced using additional algorithms. Based on the observation that dark channels are closely related to saturation and brightness, Lu et al.[125] proposed a novel algorithm that does not require additional calculation of the DCP and avoids subsaturation when restoring fuzzy images, as shown in Fig. 18(c).

    A method based on prior knowledge has achieved some results in image defogging; however, its universality still requires improvement. This is an important topic in current research. Therefore, further exploration of methods that can be used to make the prior knowledge-based method more widely applicable and able to cope with the effects of different scenes and types of fog is needed. In general, the dark channel method based on prior knowledge is a promising research direction; however, its universality and robustness require further study.

    When light waves propagate through particular weather conditions such as fog and haze, through water, or in biological tissues, a large number of tiny scattered particles in the translucent medium will impose serious absorption and scattering effects, partly change the transmission direction, and weaken the energy of part of the light waves, as well as lead to a loss of light field information and a decline in imaging quality. Therefore, effectively removing the negative effects of scattering during imaging, recovering the lost light field, and improving the imaging quality have become hot issues. Currently, there are two main methods to reduce the scattering of light waves from turbid media: image processing (which will not be detailed here) and physical optical imaging technology, including polarization technology. Polarization descattering technology is primarily divided into difference-based, atmospheric-model-based, Stokes-based, and Mueller-matrix-based polarization imaging.

    Polarization differential imaging technology highlights parts of the image information using the difference between two images with orthogonal polarizations. The basic principle is to enhance image details by comparing the difference between two orthogonal polarization vectors, improve image contrast, and reduce the impact of reflection, scattering, absorption, and other phenomena on the image.

    It can be assumed that the scattered light of the background and the target light are completely polarized, and their polarization directions differ according to the scattering medium. As shown in Fig. 19(a), we rotated a polarizer so that the angle between the light transmission axis and the polarization direction of the background scattered light was 45°, and two images with orthogonal polarizations were obtained. By taking the difference between the two images, the effects of the background scattered light can be eliminated, and the target light can be isolated.

    (a) Principle of polarization difference[127]. (b) Differences between conventional imaging and polarization differential imaging[128]. (c) A is the imaging effect of the traditional TYO model, and B is the imaging effect of active linearly polarized illumination[129]. In (d)[130], (d1) is a polarization image, (d2) is a polarization angle image, (d3) is an imaging effect of traditional polarization differential imaging, and (d4) is an imaging effect of adaptive polarization differential imaging.

    Figure 19.(a) Principle of polarization difference[127]. (b) Differences between conventional imaging and polarization differential imaging[128]. (c) A is the imaging effect of the traditional TYO model, and B is the imaging effect of active linearly polarized illumination[129]. In (d)[130], (d1) is a polarization image, (d2) is a polarization angle image, (d3) is an imaging effect of traditional polarization differential imaging, and (d4) is an imaging effect of adaptive polarization differential imaging.

    The concept of polarization differential imaging was proposed in 1995 by Cameronda et al., from the University of Pennsylvania, after being inspired by the polarized vision of animals[126]. Through a series of experiments, they found that polarization differential imaging can substantially magnify the details of conventional images. In 1996, Tyojs’ team from the University of Pennsylvania in the United States began research on polarization differential imaging under the condition of macro scenes[127]. They found that compared with traditional imaging, in a scattering environment, polarization differential imaging can better reflect image details, enhance image contrast, and achieve higher image quality[128], as shown in Fig. 19(b).

    Because the reflected light of the target and the backscattered light from the scattering medium will have a negative impact on imaging under passive illumination, Kirmani et al. applied active linear polarized light illumination based on the traditional model to eliminate this effect. Experiments have proved that with active illumination the imaging is improved. The background noise of the image is significantly reduced[129], as shown in Fig. 19(c). In 2006, Yemely et al. at the University of Pennsylvania proposed an adaptive polarization differential imaging method based on the principal component analysis of scene polarization statistics[130]. Yusheng and Pucheng of the Anhui Institute of Optics and Fine Mechanics at the Chinese Academy of Sciences proposed an adaptive polarization difference method based on minimum mutual information, aiming to solve the problems with traditional polarization difference methods and combining information theory. The Stokes vector was used to obtain the polarization information of the target, and the polarization images in each direction were calculated. By comparing the mutual information between the images, the two least correlated images were selected for differential imaging. The results show that this method greatly improved the image quality, as shown in Fig. 19(d).

    Polarization differential imaging technology can improve the clarity of target information in turbid water and has transformative applications in many fields, such as underwater rescue, resource exploration, and biomedicine. Current imaging technology research focuses on the separation of the target and background scattered light using the difference in polarization and has performed well in practical applications.

    However, there has been comparatively little focus on the exact distributions of the polarization characteristics of the two, which may affect the accuracy and efficacy of the imaging. In traditional imaging processes, the same polarization direction is usually used for the whole image, which practically limits the flexibility of imaging. Moreover, the problem of optimal imaging angle extraction with different background scattered light directions at different positions has not been sufficiently solved, which may lead to unsatisfactory imaging results. In addition, the brightness of polarization difference images is not typically high, which limits its application in low-brightness environments. Most existing technologies are for static target imaging only; moving target imaging needs further research. This greatly limits the application of polarization differential imaging technology in some fields. These challenges and problems will become the primary focus for future polarization differential imaging technology development.

    The first atmospheric model was proposed in 1975 by McCartney[131]. Narasimhan and Nayar derived a monochromatic atmospheric scattering model based on McCartney’s attenuation and ambient light models[132,133].

    According to their model, the light intensity received by a camera sensor mainly comprises two parts: the directly transmitted light emitted by the target and the scattered atmospheric light, as shown in Fig. 20(a). Because the scattering particles in the atmosphere also scatter the radiation from the Sun, the component of scattered light reflected by these particles into the camera is called atmospheric light, and its intensity will increase with increasing propagation distance. This is the primary cause of image degradation. Inspired by the atmospheric scattering model, in 2001, Schechner et al. began a series of studies on polarization descattering in atmospheric scattering media at the Israel Institute of Technology. They established a set of classical polarization descattering models, as shown in Fig. 20(b). The models obtain two orthogonal polarization images by rotating the polarizer in front of the lens and produce a clear image according to the difference[134,135].

    (a) Schematic diagram of the atmospheric scattering model[132,133]. In (b), A and B are the best and worst polarization images, respectively, and C is the effect of dehazing using the Y. Y. Schechner method[134,135]. In (c), A is the original intensity image under dense haze conditions, and B is the rendering after multi-scale polarization imaging trans-haze algorithm[136]. (d) Comparison before underwater scattering imaging[137].

    Figure 20.(a) Schematic diagram of the atmospheric scattering model[132,133]. In (b), A and B are the best and worst polarization images, respectively, and C is the effect of dehazing using the Y. Y. Schechner method[134,135]. In (c), A is the original intensity image under dense haze conditions, and B is the rendering after multi-scale polarization imaging trans-haze algorithm[136]. (d) Comparison before underwater scattering imaging[137].

    In the Schechner model, the light scattered by atmospheric particles is usually partially polarized, whereas the light radiated by the target is completely unpolarized. Taking the polarization direction of atmospheric light as a reference, the images polarized parallel and perpendicular to the reference direction are collected, and the A_ and the polarization degree P are successfully estimated to achieve the scene defogging. The model they established weakens the attenuation and absorption of scattered particles during light transmission, simplifies the calculation, and achieves a good scattering effect. Their method requires a set of orthogonal polarization images, namely the best polarization image I_Best and worst polarization image I_West, which are difficult to obtain in practice.

    Based on this model, Fei of Xidian University conducted a multi-scale analysis of the images with frequency information. Different polarization states corresponding to different weather conditions were classified and solved to obtain clear target images, as shown in Fig. 20(c). In addition, the depth of the target is estimated by the mechanism of the interaction between the light wave and scattering medium. In 2006, Schechner et al. applied their model to an underwater scattering environment and achieved good results[136]. However, the model is flawed and does not consider the effect of wavelength on the scattered light. Therefore, Fei et al. plotted the scattering and absorption coefficients of pure seawater in the visible light range, showing that scattering increases with decreasing wavelength. Thus, they proposed an active polarization imaging method based on wavelength selection and used red-light illumination to minimize the scattering during light propagation. This method successfully transformed undetectable underwater targets with high turbidity into detectable targets[137], as shown in Fig. 20(d). Different teams optimized the models for speed and the effects of scattered light, maximizing the quality of the images.

    The imaging efficacy of traditional differential polarization imaging technology depends largely on the choice of two orthogonal polarization images, and it is necessary to ensure that the scattered light has the same intensity in the two images. However, rotating the polarizer is a complicated operation, and it is difficult to obtain two appropriate polarization pictures. Heng et al. proposed a method to obtain polarization difference images using the Stokes vector[138], as shown in Fig. 21(a).

    (a) Principle of polarization imaging based on Stokes[138]. In (b)[139], b1 is the imaging effect based on Stokes vector interpolation, and b2 is the imaging effect of traditional differential imaging. In (c)[140], (c1) is the original polarization image, and (c2) is the rendering of the polarization dehazing method based on the polarization angle distribution analysis. In (d)[141], 1 is the original intensity image, 2 is the reconstructed target image, and 3 is the estimation of backscattered light.

    Figure 21.(a) Principle of polarization imaging based on Stokes[138]. In (b)[139], b1 is the imaging effect based on Stokes vector interpolation, and b2 is the imaging effect of traditional differential imaging. In (c)[140], (c1) is the original polarization image, and (c2) is the rendering of the polarization dehazing method based on the polarization angle distribution analysis. In (d)[141], 1 is the original intensity image, 2 is the reconstructed target image, and 3 is the estimation of backscattered light.

    The polarization difference image can be obtained based on the azimuth of the scattered light and the Stokes vector of the scene. In 2018, Guan et al. analyzed the interaction between scattered light and polarizers using the Stokes–Mueller matrix form and proposed an interpolation method based on the Stokes vector to replace the rotating polarizer. Experiments have proved that, compared with direct imaging, Stokes vector-based imaging can effectively reduce the influence of scattered light and enhance image contrast[139], as shown in Fig. 21(b).

    In addition to improving the defects of traditional differential imaging, the Stokes vector is also used in other polarization imaging methods. In 2021, Guan introduced low-pass filtering into a polarization-angle-based polarization imaging model and overcame the disadvantage of noise sensitivity when estimating polarization angle values. Also, in 2021, Pingli et al. of Xidian University established an underwater polarization imaging model to describe light transmission through polarization information derived from the Stokes vector. In addition, based on the independent characteristics of the target and backscattered light, an optimization function was designed to estimate target and backscattering field information. Experiments and mean square error analysis verified that this method can accurately remove backscattered light and recover target information[140], as shown in Fig. 21(c). In the same year, Jin et al. solved the polarization estimation problem of target and backscattered light in underwater polarization imaging, using the Stokes vector method to calculate the polarization of target light at each pixel point in the scene[141]. This global pixel calculation method has a marked effect on the image detail recovery of underwater scenes and realizes clear underwater vision; the effect of backscattered light on underwater imaging is effectively suppressed, and the image contrast is significantly improved, as shown in Fig. 21(d).

    From the above analysis, the Stokes-based polarization defogging imaging method is superior to the polarization differential defogging imaging method in performance. This is because Stokes vectors contain richer and more useful polarization information, such as the degree of polarization (DoP) and angle of polarization (AoP). This information can more accurately reflect the characteristics and laws of the background scattered light caused by particles in the scattering medium.

    In the process of defogging imaging, it is important to accurately obtain the polarization information in the scattering medium to recover a clear image. By analyzing the DoP and AoP in the Stokes vector, the Stokes-based polarization defogging imaging method can suppress the background scattered light more effectively, thus improving the image quality. In contrast, the polarization differential defogging imaging method can only use the differential component of the polarization information, which may not be the complete polarization information in the scattering medium, resulting in a slightly inferior defogging effect. To control the polarization state of the light source more accurately, researchers have also used a Mueller matrix to characterize the polarization information of the light field.

    To effectively suppress the scattering noise, Morgan proposed the rotationally orthogonal polarization imaging method[142]; however, it was not without issues, such as the detection being dependent on the selection of the polarization axis and the low efficiency of the rotational polarization axis. In this method, the polarization information based on the Mueller matrix replaces the polarization filtering performed by the rotary polarizer, which controls the illumination polarization angle more accurately, and the fast imaging process.

    Considering the influence of the polarization state of active illumination on the target, in 2021, Wang et al. proposed a calculated polarization differential imaging descattering method based on a 3×3 Mueller matrix and active illumination modulation, as shown in Fig. 22(a). The influence of the active illumination polarization state on the image quality was extended, and the traditional polarization difference method and active polarization illumination filtering were combined. A comprehensive descattering method based on two dimensions of system input (active illumination modulation during imaging) and output (polarization processing after imaging) was implemented. This method can significantly improve the global quality of the descattered image[143], as shown in Fig. 22(b). In 2022, Fei et al. developed a polarization descattering method using a Mueller matrix, as shown in Fig. 22(c). By studying the backward distributions in the turbidities of different nephelometric turbidity units, they found that the background intensity was significantly linearly correlated with the depolarization index. Thus, they derived the depolarization index from the Mueller matrix and characterized the scattering medium by combining the developed optimal function to estimate the transmittance map. By quantifying light attenuation using transmittance maps, a clear vision of the target can be restored. This method enhances underwater vision in varied turbidities using only information from the scattering medium.[144]

    (a) Principle of polarization difference based on the Mueller matrix[143]. In (b)[143], (b1), (b2), and (b3) are the intensity images of three targets in a highly concentrated scattering medium, (b4), (b5), and (b6) are descatter images of three targets under the worst linearly polarized light illumination, and (b7), (b8), and (b9) are the descattering images of three targets under optimal linearly polarized light illumination. In (c)[144], (c1) is the intensity image, (c2) is the image recovered with the proposed descattering method, and (c3) and (c4) are magnified views of the region of interest marked with a red rectangle in (c1) and (c2).

    Figure 22.(a) Principle of polarization difference based on the Mueller matrix[143]. In (b)[143], (b1), (b2), and (b3) are the intensity images of three targets in a highly concentrated scattering medium, (b4), (b5), and (b6) are descatter images of three targets under the worst linearly polarized light illumination, and (b7), (b8), and (b9) are the descattering images of three targets under optimal linearly polarized light illumination. In (c)[144], (c1) is the intensity image, (c2) is the image recovered with the proposed descattering method, and (c3) and (c4) are magnified views of the region of interest marked with a red rectangle in (c1) and (c2).

    4.2.2. Computational light field restoration technology based on scattered light

    The technique of calculating light field restoration based on scattered light refers to the integrated consideration of ballistic and scattered photons by analyzing the imaging response of the target at the sensor plane after transmission through the scattering medium. Although the imaging response of a scattering medium is complex and difficult to fully obtain, for a certain FOV the medium has spatial translation invariance that can be used for local analysis of the imaging response. This theory is called the “memory effect” of the scattering medium. Based on this theory, we can develop correlation imaging techniques based on the spatial characteristics of scattering media. Additionally, we can fully model the imaging response. When the parameters of the partial scattering medium are known, we can estimate the imaging response and thus solve an inverse imaging problem, such as the atmospheric transmission equation. Further, we can directly calibrate the scattering imaging response prior, such as the scattering transmission matrix calibration. Then, using the imaging response obtained from the calibration, a more accurate reconstruction of the target can be produced. These methods are all based on the spatial characteristics of the scattering medium. Through analysis and modeling of the imaging response, reconstruction of the target object can be realized. This can improve the accuracy and expand the application range of imaging to provide more accurate imaging solutions for various fields.

    Image deconvolution is an image-quality improvement technique that estimates the original undistorted image from the distortion and degradation of observations, and reduces and eliminates image distortion and noise as much as possible. This is highly useful for the advanced research and application of imaging, such as feature extraction, automatic recognition, and image analysis. In the past 50 years, image deconvolution has been applied to many scientific and technological fields, such as astronomical observation, medical imaging, space exploration, military science, remote sensing and telemetry, biological science, case detection, and industrial vision.

    The convolution between the PSF and the input image is equivalent to filtering the input image, suppressing or losing the high-frequency components of the input image. Deconvolution is an inverse process, and the result may deviate from the real solution. To obtain the most realistic solution possible, it is necessary to make an appropriate compromise between restoring the image and amplifying the noise. By measuring the PSF of the scattering imaging system and using the convolutional relationship between the speckle generated by the target and the imaging system, object reconstruction can be realized through a deconvolution operation. In 2016, Edrei et al. demonstrated a new microscopy technique that utilizes the optical memory effect (OME)[145], as shown in Fig. 23(a). This technique can image through cloudy media below the diffraction limit of the optical system. The technique uses deconvolution image processing to estimate the system PSF by the blind processing or using guide stars embedded in the target plane medium so that no iterative focusing, scanning, or phase recovery procedures are required. However, their method can only image in the object plane and cannot obtain the axial information of the object. In 2018, Xie et al. studied and developed the properties of axially scattered light[146] to recover objects in a large FOV with depth resolution. They described the relationship between the PSFs of thin scattering media at different reference points, as shown in Fig. 23(b). By adjusting the scale of one PSF, other PSFs from different object planes can be inferred, enabling the layer-by-layer reconstruction of three-layer objects outside the original DOF through the thin scattering media. The plane and axial information of a static target can be obtained; thus, information from a moving target becomes the primary research direction. To achieve super-resolution imaging through the scattering medium, Dong et al. proposed random optical scattering localization imaging technology in 2021, as shown in Fig. 24[147]. Their method used the speckle correlation properties of the scattering medium to retrieve images with a resolution of 100 nm, which is 8 times better than the diffraction limit. The above methods were all realized within the range of OMEs. In 2022, Lei[148] used matrix decomposition and fingerprint-based reconstruction to defog the speckle pattern emitted by fluorescent objects under unknown random illumination. The image retrieval process was based on deconvolution technology instead of the previously used phase retrieval method. The FOV covered by this method can reach 3 times the OME range. The simplicity of this non-invasive imaging technique, which requires neither a spatial light modulator (SLM) nor a guide star, opens a promising avenue for deep fluorescence imaging in highly scattering media and can be extended to a variety of incoherent contrast mechanisms and illumination schemes.

    (a) Edrei et al. demonstrated a new microscopy technique that utilized the optical memory effect (OME)[145]. (b) Xie et al. described the relationship between the PSFs of thin scattering media at different reference points[146].

    Figure 23.(a) Edrei et al. demonstrated a new microscopy technique that utilized the optical memory effect (OME)[145]. (b) Xie et al. described the relationship between the PSFs of thin scattering media at different reference points[146].

    (a) Super-resolution imaging through scattering media with SOSLI in comparison to other imaging techniques. (b) Principle and simulation results of SOSLI. (c) Experimental results of imaging through a ground glass diffuser with different techniques. (d) Experimental demonstration of three techniques for imaging several complex objects hidden behind a ground glass diffuser[147].

    Figure 24.(a) Super-resolution imaging through scattering media with SOSLI in comparison to other imaging techniques. (b) Principle and simulation results of SOSLI. (c) Experimental results of imaging through a ground glass diffuser with different techniques. (d) Experimental demonstration of three techniques for imaging several complex objects hidden behind a ground glass diffuser[147].

    Although the research and application of deconvolution scattering imaging technology have made great progress, it still faces several problems that must be solved before its widespread application. For example, as the thickness of the scattering medium increases, the memory effect range will decrease significantly, which may cause failure in the deconvolution method. Also, the current process needs to be accelerated to adapt to a dynamic scattering environment and further improve the resolution, and we must learn how to obtain the PSF with high precision in a complex environment. To solve these problems, an in-depth study of scattering physics, technical and theoretical development, and multidisciplinary research is required.

    The deconvolution method requires intrusive calibration of the scattering medium, that is, obtaining the PSF of the system. However, in practical applications, this is often difficult to achieve, such as in biological tissues, where invasive calibration can destroy cell activity. Furthermore, it is very difficult to obtain the PSF in unknown scenarios. Therefore, there is an effort to find a non-invasive scattering imaging method that does not need to calibrate the scattering medium and only needs to use the scattering light field information obtained by the detector to obtain and interpret the target information.

    The discovery of the OME has promoted the development of non-invasive scattering imaging techniques. In 1988, Feng et al. discovered the angular optical memory effect (AME): a small displacement of the incoming wave vector does not change the speckle image formed by the outgoing wave vector on the detector but only causes a displacement in the opposite direction. Based on the AME, in 2012, Bertolotti et al.[149] proposed a scanning-based non-invasive speckle reconstruction technique, referred to as scanning-based speckle correlation. It does not need to calibrate the scattering medium parameters or obtain the microscopic characteristics of the scattering medium; however, the method requires 3D scanning of the sample in the AME range, which takes a long time. To overcome the long scanning time, efficient non-invasive scattering imaging was pursued. In 2014, Katz et al. of Israel found that, on the basis of a strong correlation, the autocorrelation of a PSF is a spiking function. Thus, they proposed single-shot speckle correlation (SSC)[150]. The Wiener–Khinchin theorem can be used to obtain the Fourier amplitude information of a target from the speckle autocorrelation. Meanwhile, HIO and error reduction phase recovery algorithms proposed by Fienup et al.[151] can be used to reconstruct the structure of the imaging target. The results are shown in Fig. 25(a).

    (a) Single-shot speckle correlation[151]. (b) 3D imaging using a diffusing medium via spatial speckle intensity cross-covariance[152]. (c) Superposed reconstruction to enlarge the limited FOV[153]. (d) Pipeline for multitarget imaging through scattering media regardless of OME[154].

    Figure 25.(a) Single-shot speckle correlation[151]. (b) 3D imaging using a diffusing medium via spatial speckle intensity cross-covariance[152]. (c) Superposed reconstruction to enlarge the limited FOV[153]. (d) Pipeline for multitarget imaging through scattering media regardless of OME[154].

    Owing to the single frame imaging characteristics of SSC, it takes only between 10 ms and 2 s exposure to acquire single experimental data, which greatly improves the temporal resolution of scattering imaging and has received extensive attention. When SSC was proposed, it only considered imaging 2D planar binary objects. To realize the imaging of axially extended targets, Takasaki et al. introduced phase space measurement into scattering imaging in 2014. Because spatio-angular correlation is determined by transmission distance, the axial depth of a target can be determined[152]. This method compensates for the deficiencies of the AME, which only focuses on angular correlation and cannot distinguish the target axial direction. The imaging optical path and experimental results are shown in Fig. 25(b).

    Limited by the scattering medium AME range, the effective imaging FOV of SSC technology is often less than 1° for typical biological tissues. Because the calculation of the statistical characteristics of the speckle field is related to the number of measurements, to improve the SNR of PSF measurements, this method needs thousands of speckle images in advance. Tang et al.[153], at Nanyang Technological University in Singapore, divided a large FOV into discrete parts. The size of each part of the object was in the memory effect range, and the autocorrelation of the PSF was a function. They used the PSF to keep translation invariance within the memory effect range, decouple and reconstruct different parts of the target, and perform spatial fusion for the whole target. The imaging device and results are shown in Fig. 25(c). To achieve target imaging beyond the AME range, Li et al. proposed a multi-target super-AME imaging technology based on independent component analysis[154]. The method takes full advantage of the statistical independence of speckles that are not in the AME range. By changing the illumination mode, multiple speckle patterns with different proportions of target information can be obtained. Independent component analysis was used to de-alias target speckles in multiple speckle plots, and multi-target imaging through scattering media was realized. The reconstruction flow chart is shown in Fig. 25(d).

    The autocorrelation scattering imaging method uses the acquired speckle pattern to achieve non-invasive imaging of the target, which is of great significance for imaging through unknown scenes and living media, such as biological tissues. Moreover, it has the potential of real-time scattering imaging to interpret the target information from a single-frame speckle pattern. Autocorrelation scattering imaging is not only limited by the OME but is also sensitive to the illumination spectrum width and time-varying characteristics of the scattering medium. When the illumination spectrum width increases or the scattering medium changes with time, the contrast of the collected speckle pattern decreases, and the method cannot be applied. The realization of wide-spectrum illumination or non-invasive imaging through dynamic scattering media will be a great step toward the practical application of autocorrelation scattering imaging.

    The global development of optical devices has promoted the development of image processing and computational imaging technology. At the same time, many breakthroughs have been achieved in scattering imaging technology research. With the appearance of optical devices such as SLMs, the phase control of light fields is realized. Wavefront shaping (WFS) technology utilizes optical devices such as SLMs to compensate for the random phase effect of the scattering medium to focus the scattered light field and image the target.

    The theory of wavefront modulation was first developed by Freund et al. at Bar-Ilan University in Israel[155] and was first experimentally verified by Vellekoop et al. of the Netherlands[156159], as shown in Fig. 26. By adding SLMs or other phase modulation devices to the optical path, the additional phase of the light field from the scattering medium can be compensated for, realizing a focused light field through the scattering medium. At this point, the scattering medium can be viewed as a conventional geometric lens. Imaging through scattering media is realized in the OME range. The feedback wavefront optimization divides the SLM into N modulation regions. The intensity of the target region obtained by the camera is taken as feedback information, and the best phase of modulation is when the intensity is the strongest. The phase modulation of the N regions is carried out by the SLM to focus the light field.

    (a) Wavefront shaping technology[160]. (a1) Experimental setup. (a2) System with a layer of airbrush paint present and unmodified incident wavefront. (a3) The wave was shaped to achieve constructive interference in the target. (b) Spatiotemporal focusing by optimizing a two-photon fluorescence (2PF) signal[156]. (b1) Experimental setup. (b2) 2PF images before optimization at the optimized plane (x–y). (b3) 2PF images after optimization at the optimized plane (x–y). (c) scattered light fluorescence microscopy[157]. (c1) Experimental setup. (c2) Seen through the scattering layer with a wide-field fluorescence microscope. (c3) Seen through the scattering layer with an SLM microscope.

    Figure 26.(a) Wavefront shaping technology[160]. (a1) Experimental setup. (a2) System with a layer of airbrush paint present and unmodified incident wavefront. (a3) The wave was shaped to achieve constructive interference in the target. (b) Spatiotemporal focusing by optimizing a two-photon fluorescence (2PF) signal[156]. (b1) Experimental setup. (b2) 2PF images before optimization at the optimized plane (xy). (b3) 2PF images after optimization at the optimized plane (xy). (c) scattered light fluorescence microscopy[157]. (c1) Experimental setup. (c2) Seen through the scattering layer with a wide-field fluorescence microscope. (c3) Seen through the scattering layer with an SLM microscope.

    The wavefront shaping technique proves that the scattering medium can be regarded as a “black box” without understanding its properties. The interference of the scattering medium can be overcome by modulating the phase of the incident light wavefront so that the light can pass through the scattering medium and be focused on a very high growth factor. In 2010, on the basis of traditional imaging, Mosk added a scattering medium to the imaging optical path and used feedback optimization wavefront shaping technology to obtain a focal point that was one-tenth the size of the traditional lens focusing spot[160]. This proves that scattering effects can be used to achieve high-resolution imaging and that not all are detrimental. The resulting focus and image are even sharper than in transparent media, as shown in Fig. 26(a).

    The wavefront modulated optical path can treat the scattering medium as a traditional lens and focus the image on the detector. However, the wavefront modulated imaging method requires intrusive calibration of the system, and the feedback optimization process is time consuming. Beyond imaging, wavefront modulation technology has important prospects in biomedicine, photogenetics, and more.

    The wavefront shaping technique compensates the phase interference caused by the scattering medium through the phase modulation device when the transmission characteristics of the scattering medium are not clear and the scattering medium is equivalent to a black box. In contrast, the transmission matrix calibration method calibrates the influence of the scattering medium in advance. After the corresponding relationship between the input and output light fields through the scattering medium is clarified, the scattered light field from the input medium can be obtained by solving the target scattering light field. At this point, the scattering medium is no longer a black box, and its effects are relatively clear.

    To clarify the influence of the scattering medium on the input light field, in 2010, Popoff[161] of France proposed measuring the transmission matrix of the scattering medium. They posited that the amplitude and phase relationship between the input and output light fields passing through the scattering medium was deterministic. An experimental device for measuring the optical transmission matrix of a scattering medium with a single arm was proposed, as shown in Fig. 27(a). The SLM regulatory area was divided into a central control area and a peripheral reference area, which provided the reference signals needed to measure the transmission matrix. To improve the SNR of the light intensity distribution map of the output field recorded by the CMOS camera after passing through the scattering medium, the Hadamard basis was adopted as the input light field to achieve imaging through an 80-μm-thick ZnO sample. Choi[162] of South Korea hypothesized that the transmission matrix of the scattering medium could be determined according to different incident-light spatial frequencies, and a 2D galvanometer was used to obtain the transmission matrix. Compared with the transmission matrix in air, the reconstructed matrix in the frequency domain is clearer, as shown in Fig. 27(b).

    (a) Measuring transmission matrix in the spatial domain[161]. (a1) Experimental setup. (a2) Initial grayscale image. (a3) Reconstructed image using scattered input. (b) Measuring the transmission matrix in the spatial domain[162]. (b1) Experimental setup. (b2) Pattern before inserting the scattering medium. (b3) Reconstructed image using scattered input.

    Figure 27.(a) Measuring transmission matrix in the spatial domain[161]. (a1) Experimental setup. (a2) Initial grayscale image. (a3) Reconstructed image using scattered input. (b) Measuring the transmission matrix in the spatial domain[162]. (b1) Experimental setup. (b2) Pattern before inserting the scattering medium. (b3) Reconstructed image using scattered input.

    The obtained transmission matrix reflects the transmission characteristics of the scattering medium, elucidating the corresponding relationship between the input and output light fields. Compared with other scattering imaging methods, transmission matrix calibration imaging can image complex targets and has a strong descattering ability and higher reconstructed target definition. However, the resolution of the phase modulation device and detector determines the obtained matrix dimensions, which also limits the FOV. The acquisition of the transmission matrix requires an invasive calibration of the medium, and because the calibration process is long (on the order of minutes) this method cannot be applied to dynamic media such as biological tissues.

    4.2.3. Non-line-of-sight imaging technology

    Owing to the absence of light field information and the limited imaging models, traditional imaging techniques cannot image objects outside the FOV. In recent years, with the development of new photoelectric sensors and improvements in information computing power, NLOS imaging technology has developed rapidly. According to the imaging mechanism, NLOS imaging can be divided into two types: active and passive imaging.

    Active NLOS field imaging requires modulated light source detection imaging. The modulated light source (usually a laser) is controlled by the detector, and the emitted photons are reflected off an intermediate surface to the target. The target surface is then reflected twice, and a third reflection occurs on the intermediate surface before the photons are received by the detector. The recovery and reconstruction of the target are primarily based on the calculation of the flight time of the photons. In 2012, the Multimedia Laboratory of the Massachusetts Institute of Technology used femtosecond pulses and stripe cameras to achieve the reconstruction of hidden objects in NLOS scenes for the first time[163]. In the experiment, a pulsed laser with a pulse width of 50 fs was used to illuminate the scene, and a fringe tube camera was used to record the photon flight time, as shown in Fig. 28(a). Compared with other ToF measurement methods, this method has a better imaging resolution that can reach centimeter levels. However, the stripe-tube camera is expensive and has some disadvantages, such as low quantum efficiency and serious noise.

    (a) NLOS imaging based on a streak camera[163]. (a1) The process of capturing photons. (a2) An example of streak images sequentially collected. (a3) The 2D projected view of the hidden object. (b) NLOS imaging based on SPAD[164]. (b1) Experimental setup. (b2) Objects in the scene to be reconstructed. (b3) Reconstruction of the letter T. (c) NLOS imaging based on ToF[165]. (c1) Experimental setup. (c2) Unknown object. (c3) Reconstructed depth (volume as probability). (c4) Reconstructed depth (strongest peak).

    Figure 28.(a) NLOS imaging based on a streak camera[163]. (a1) The process of capturing photons. (a2) An example of streak images sequentially collected. (a3) The 2D projected view of the hidden object. (b) NLOS imaging based on SPAD[164]. (b1) Experimental setup. (b2) Objects in the scene to be reconstructed. (b3) Reconstruction of the letter T. (c) NLOS imaging based on ToF[165]. (c1) Experimental setup. (c2) Unknown object. (c3) Reconstructed depth (volume as probability). (c4) Reconstructed depth (strongest peak).

    SPAD uses a very strong bias voltage so that a single photon can create an avalanche breakdown, thereby converting photon information into current pulse information. This is usually combined with a single photon counter to detect the ToF of a single photon. In 2015, Buttafava et al. collected the target photon information through a combination of the above devices and reconstructed the target shape using the late back projection algorithm[164], as shown in Fig. 28(b). SPAD has higher optical quantum efficiency (up to 40% efficiency) and relatively low cost but has lower temporal resolution than streak cameras. ToF cameras typically use sinusoidal amplitude modulation light sources to illuminate the NLOS field scene and demodulate the propagation path of the photon through the phase difference between the received photon information and the transmitted modulation information. They have a lower cost than other photon detection devices, typically only a few hundred dollars. This low cost has led to commercial use for some devices, such as the Microsoft Kinect camera. In 2014, Hullin and Heide et al.[165] in Germany, respectively, used ToF cameras to achieve target reconstruction in NLOS field scenes, as shown in Fig. 28(c).

    Passive NLOS field imaging requires the use of photon information reflected from the intermediate surface to restore images. In contrast to the active method, passive imaging does not need to modulate the lighting source and only needs to collect ambient light to complete the recovery and reconstruction of hidden targets beyond the line of sight, thereby simplifying the experimental equipment and becoming more suitable for practical applications. However, owing to the use of natural light illumination, the detection equipment can collect very little photon information, rendering the image quality poor and the noise large. Using ambient light to realize the passive imaging of NLOS objects is of great significance to extending the application range of NLOS imaging technology.

    Owing to scattering from the intermediate surface, the outgoing photons become random and disordered and lose the original target information. However, some coherent features are retained in the residual photon information, which can be used for target reconstruction in NLOS field scenes.

    Imaging based on spatial coherence information typically uses high-precision interferometers to measure the coherence characteristics of photons in different spatial positions and computes and reconstructs the geometry information of hidden objects using the measured spatial coherence function. In 2018, Batarseh et al. used a dual-phase Sagnac interferometer (DuPSaI) to successfully reconstruct target information in NLOS field scenarios based on the above theory[166]. As shown in Fig. 29(a), an object irradiated by incoherent light was received by DuPSaI after diffuse reflection of the intermediate surface, and the phase information of the spatial coherence function of the object was obtained by changing the position. The geometric shape of the target was reconstructed, and the positional information of the target was estimated. Based on this, in 2019, Tamasan et al. used the four-dimensional (4D) spatial coherence function to achieve target reconstruction. By integrating the intensity of the NLOS field and spatial coherence information at different scales, NLOS imaging was considered a multi-criterion convex optimization problem[167], as shown in Fig. 29(b). Based on the sparsity of the image, the optical field transmission model was constructed, and an alternate direction multiplier algorithm was proposed to solve the convex optimization problem effectively.

    (a) Shape recovery from coherence measurements[166]. (a1) Experimental setup. (a2), (a3) Plots of real and imaginary components of SCF measured for the square and equilateral triangle objects, respectively. (b) NLOS imaging based on multimodal data fusion[167]. (b1) Experimental scene. (b2) The intensity sample. (b3) The reconstruction using this intensity sample alone. (b4) The additional measurement of scattered coherence. (b5) The reconstruction when both the intensity and coherence measurements are used.

    Figure 29.(a) Shape recovery from coherence measurements[166]. (a1) Experimental setup. (a2), (a3) Plots of real and imaginary components of SCF measured for the square and equilateral triangle objects, respectively. (b) NLOS imaging based on multimodal data fusion[167]. (b1) Experimental scene. (b2) The intensity sample. (b3) The reconstruction using this intensity sample alone. (b4) The additional measurement of scattered coherence. (b5) The reconstruction when both the intensity and coherence measurements are used.

    Based on the spatial coherence function obtained by measurement, the NLOS imaging technique constructs a transformation model in the observation plane to solve the target information, using spatial coherence and intensity information to construct a multi-criterion optimization problem. In the time dimension, the location of the target can be obtained by interpreting the cross-correlation results in the TOF information.

    Light intensity information is the most intuitive and relatively simple type of data with which ordinary cameras capture visual data. This allows it to play a key role in passive NLOS imaging technology. Because this technology aims to capture information about objects without a direct line of sight, light intensity information provides the possibility to infer the presence and state of objects even when the line of sight is blocked. Therefore, the research and development of technology that can effectively use light intensity information are crucial for the application of passive NLOS imaging in practical engineering projects. In 2017, Bouman et al. at the Massachusetts Institute of Technology designed a system known as Edge Camera. The system uses consumer-grade cameras commonly available on the market to capture shadows in the environment[168]. Recovering the movement of objects from these shadows is akin to reverse engineering and makes visible the movements hidden behind corners. In 2018, Tancik et al.[169] used data-driven techniques to locate and image objects in nonvisual scenes. Their method breaks the limitations of traditional imaging and allows imaging without a direct light source, extracting subtle information that is difficult to detect by the eye, as shown in Fig. 30(a). To reduce the amount of information to be collected and improve the imaging efficiency, in 2019, Saunders et al. of Boston University proposed a new NLOS imaging method with an occlude, named computational periscopy[170], shown in Fig. 30(b). This technique can predict the position of the occluder and realize non-occluder imaging by reflecting the NLOS scene image as measured by a single camera when the occluder shape is known. Yedidia et al. extended the NLOS imaging scene with occluders to a more general situation and abstracted the whole system into a convolutional model[171] by assuming that the observed scene, occluders, and NLOS scenes to be imaged are located on parallel planes, as shown in Fig. 30(c). Using a video frame image of the observed scene, the shape and size of the occluders were estimated by blind deconvolution.

    In (a)[169], (a1) is the example scenario, (a2) shows the recovered light fields for a simulated scene and different occluders, and (a3) is the recovered light field of another simulated scene. In (b)[170], (b1) is the experimental setup for computational periscopy, (b2) is the reconstruction procedure, and (b3) shows the reconstructions of different hidden scenes. In (c)[171], (c1) is the model of the scenario, and (c2) shows the still frames from reconstructed videos under a variety of different experimental settings.

    Figure 30.In (a)[169], (a1) is the example scenario, (a2) shows the recovered light fields for a simulated scene and different occluders, and (a3) is the recovered light field of another simulated scene. In (b)[170], (b1) is the experimental setup for computational periscopy, (b2) is the reconstruction procedure, and (b3) shows the reconstructions of different hidden scenes. In (c)[171], (c1) is the model of the scenario, and (c2) shows the still frames from reconstructed videos under a variety of different experimental settings.

    The intensity information obtained by ordinary cameras can be used to achieve the target reconstruction of NLOS scenes. This method constructs a transformation matrix between the NLOS object and the detection plane, and it converts the image to the inverse matrix. Using objects such as occluders or corners to increase the sparsity of the transformation matrix between the object and the observation plane is conducive to improving the solution.

    In the NLOS scenario, an object solution using intensity information needs occluders to improve the sparsity of the calculation process. In 2019, Hassan proposed the use of the polarization domain information of the scattered light field to satisfy the sparsity condition and solve the inverse problem[172]. When the intensity information could not be interpreted, a polarizer was inserted in front of the camera to obtain polarization information. A diffuse reflection material was modeled as a combination of numerous micromirrors to achieve the reconstruction of NLOS targets. The results are shown in Fig. 31(a). Based on this same hypothesis, in 2020, Tanaka et al. introduced a polarization-related leakage effect coefficient into the transmission matrix by obtaining the polarization information of the NLOS light field, related to the incident direction, exit direction, and light transmission axis of the polarizer, to achieve modulation of the transmission matrix, as shown in Fig. 31(b)[173].

    In (a)[172], (a1) is the geometry used in the Cook–Torrance model, (a2) is the experimental setup with a monitor illuminating diffuse surface and DSLR camera with a polarizing filter imaging illuminated region on a wall, and (a3) shows two similar monitor images with largely different effects on NLOS imaging. (b) NLOS imaging/enhanced system based on polarization information[173].

    Figure 31.In (a)[172], (a1) is the geometry used in the Cook–Torrance model, (a2) is the experimental setup with a monitor illuminating diffuse surface and DSLR camera with a polarizing filter imaging illuminated region on a wall, and (a3) shows two similar monitor images with largely different effects on NLOS imaging. (b) NLOS imaging/enhanced system based on polarization information[173].

    In contrast to the intensity information, the sparsity of the transmission matrix needs to be improved using occluders, and the polarimetric NLOS imaging technology uses the polarization of the light field to achieve regulation of the matrix. This removes the scene’s dependence on the modulation of the matrix. However, this method is limited by the polarization state of the scene light, and its applicability when the light source is not linearly polarized needs to be improved.

    In addition to the use of visible information in NLOS scenes, the use of nonvisible band information, such as infrared light, can also realize the reconstruction of target information. When visible light cannot penetrate obstacles or little visible light information is reflected, the advantages of infrared bands become obvious.

    Compared with the detection method for visible light information in NLOS scenes, the target information in the long-wave infrared band has stronger specular reflection characteristics. In 2019, Maeda proposed NLOS thermal imaging technology based on the long-wave infrared spectral range[174]. A thermal imaging camera was used to obtain long-wave infrared information emitted by NLOS objects after a single reflection. Target reconstruction was achieved by the new optical transmission model, as shown in Figs. 32(a)32(c). When the intermediate surface was a copper plate, marble slab, and other objects, the position of the object could be determined, and shape reconstruction was achieved. To improve the imaging resolution, Divitt et al. used dual-spectrum and phase retrieval methods to achieve speckle imaging of targets in NLOS scenes in the mid-wave infrared (MWIR) range [175]. This method does not require an external light source and achieves higher resolution imaging than longer wavelengths at the same aperture. In addition, for media that visible light cannot penetrate but MWIR can, the hidden target can still be imaged, as shown in Figs. 32(d)32(g).

    (a) Corner setup. (b) Comparing HOG features in the raw frames and the denoised frames. (c) Reconstruction algorithm for 2D shape recovery and 3D localization. (d) A general diagram of the experiments. (e) A schematic diagram of the speckle correlation imaging setup with a monochromatic, pseudothermal source object in an around-the-corner geometry. (f) Image recovery under the pseudothermal setup. (g) A comparison of results under line-of-sight and NLOS conditions using the setup[174,175].

    Figure 32.(a) Corner setup. (b) Comparing HOG features in the raw frames and the denoised frames. (c) Reconstruction algorithm for 2D shape recovery and 3D localization. (d) A general diagram of the experiments. (e) A schematic diagram of the speckle correlation imaging setup with a monochromatic, pseudothermal source object in an around-the-corner geometry. (f) Image recovery under the pseudothermal setup. (g) A comparison of results under line-of-sight and NLOS conditions using the setup[174,175].

    General framework for computing optical systems. (a) Metalens. The use of metalenses can meet the needs of miniaturization and integration of optical systems[332]. (b) Simplified optical system. The simplified optical system seeks to achieve optimal performance of the entire system[214]. (c) Adaptive optical systems. The adaptive optical imaging system is designed to eliminate the interference of complex environments on the amplitude and phase of the imaging light field[334]. (d) Coded aperture. The introduction of coded aperture improves the dimension of information collected, making it possible to create super-resolution imaging and high-speed imaging[210]. (e) Single pixel imaging. Only a single pixel detector is used for spatial imaging. The advantages are high SNR and low cost[248]. (f) Wide area optical system. Wide area optical systems can achieve both large FOV and high resolution[189].

    Figure 33.General framework for computing optical systems. (a) Metalens. The use of metalenses can meet the needs of miniaturization and integration of optical systems[332]. (b) Simplified optical system. The simplified optical system seeks to achieve optimal performance of the entire system[214]. (c) Adaptive optical systems. The adaptive optical imaging system is designed to eliminate the interference of complex environments on the amplitude and phase of the imaging light field[334]. (d) Coded aperture. The introduction of coded aperture improves the dimension of information collected, making it possible to create super-resolution imaging and high-speed imaging[210]. (e) Single pixel imaging. Only a single pixel detector is used for spatial imaging. The advantages are high SNR and low cost[248]. (f) Wide area optical system. Wide area optical systems can achieve both large FOV and high resolution[189].

    In 2020, Wu et al.[176] proposed solutions in both hardware and software to realize remote NLOS imaging. A fully integrated InGaAs/InP negative feedback SPAD was developed. A telescope with high coating efficiency and a single-photon detector with a large photosensitive surface were used to develop an efficient optical receiver to improve collection efficiency. The confocal system uses a binocular optical design to improve the SNR. Finally, a forward model and a custom deconvolution algorithm are derived, which includes the effects of space-time broadening over long periods of time. The result was NLOS imaging and tracking in the centimeter resolution range of 1.43 km.

    The strong specular reflectance ratio and penetrating ability of the infrared allow this technology to realize target information interpretation when visible light information cannot be obtained. Using blackbody radiation, the target object can also be regarded as self-luminous, which opens a new direction for NLOS imaging technology.

    In this section, we analyzed the scattering of light in a medium from the point of view of calculating the medium parameters. Various technological breakthroughs in recovering target information were described from the perspectives of separating and using scattered light. In the same experimental environments, the method of imaging by separating scattered light has higher experimental requirements and greater limitations, and breakthroughs are difficult to achieve. Although the methods using scattered light are simple and widely used, most of them require prior information for calibration. NLOS imaging technology is divided into active and passive NLOS imaging. To circumvent the limitations of traditional imaging methods, multidimensional physical information such as phase, polarization, and infrared wavelengths are introduced, based on intensity information, to analyze and interpret the target information carried in the light source and realize passive NLOS imaging. The imaging process of active NLOS imaging technology is more complicated because natural light is weaker than the background light and background noise is significant. This renders technical breakthroughs and further development difficult. Therefore, it remains challenging to make any breakthrough in the application of scattered-light-based computational light field restoration technology to active NLOS imaging technology.

    4.3. Computational optical systems

    New optical system design based on computational imaging involves systematically integrating the concept of full-link global optimization to describe the imaging process. This approach allows the correction of optical aberrations traditionally addressed during the imaging process to be shifted to other stages. The use of metalenses in optical systems meets the demands for miniaturization and integration. Wide-field optical systems break the conventional trade-off between wide FOVs and high resolution, with progress toward achieving them simultaneously, introducing coded apertures enhances information collection and enabling super-resolution and high-speed imaging. With advancements in the computational performance of electronic chips, employing an end-to-end design method maximizes the roles of optical system design and algorithmic correction, organically combining the two to distribute the pressure of optical aberration correction to the image restoration process. This achieves high-quality imaging while reducing hardware complexity and precision constraints, leading to minimalist optical system imaging. Computational field-adaptive optical systems aim to hierarchically eliminate the dual interference of complex environments on the amplitude and phase of imaging fields through computational imaging-related processing methods. Single-pixel imaging surpasses traditional imaging technologies in detector requirements, offering advantages such as a high SNR, wide spectral range, low cost, and higher detection efficiency. There are significant application prospects across various research areas, including multi-level depth imaging, 3D information modeling, obstacle imaging, full-field multi-view wavefront detection, and multi-level depth phase recovery (Fig. 33).

    4.3.1. Metalenses

    A metasurface is a planar 2D metamaterial that is different from traditional optical elements. By designing the structure and arrangement of meta-atoms appropriately, metasurfaces can flexibly control optical field parameters within a 2D plane, possessing superior optical field manipulation capabilities beyond traditional components.

    Lenses made from metasurfaces that focus light are called metalenses. Metalenses offer advantages such as thinner volumes, lighter weight, lower cost, better imaging, and easier integration. Introducing metalenses into optical systems can meet the demands for miniaturization and integration. Moreover, by adjusting parameters such as the shape, orientation, and height of the structure, control over properties of light such as polarization, phase, and amplitude can be achieved.

    There are three basic phase control methods for metalenses: Resonant phase controlPropagation phase controlGeometric phase control (also known as Pancharatnam–Berry phase control). Resonant phase control achieves phase discontinuities by altering the resonant frequency, which is controlled by the geometric shape of nanostructures. However, resonant phase metasurfaces, typically made of metals like gold, silver, or aluminum, inevitably suffer from ohmic losses, making it challenging to achieve efficient optical field manipulation. This issue can be effectively addressed using metasurface lenses made from low-loss dielectric materials. In 2018, Hsiao et al. optimized integrated resonant units in metalenses, constructing a multifunctional polarization converter, as shown in Fig. 34(a)[177]. Through experiments, they demonstrated that achromatic metalenses with different numerical apertures exhibit consistent focal lengths within the visible light bandwidth. Furthermore, these metalenses showed high focusing efficiency, significantly enhancing the conversion efficiency from visible to near-infrared light.Propagation phase control arises from the optical path difference that occurs as electromagnetic waves propagate. This characteristic enables the manipulation of phase. Phase modulation (φ) is adjusted by the optical path difference, where λ represents the wavelength, n is the effective refractive index of the medium, and d is the distance over which the electromagnetic wave propagates in the uniform medium (the height of the structure). With k0=2π/λ as the free-space wave vector, the accumulated propagation phase of the electromagnetic wave can be expressed as θ=nk0d.When the height of micro–nanostructures is fixed, metasurfaces designed based on the principle of propagation phase modulation can be adjusted through the shape, size, and periodicity of the structures. These metasurfaces typically consist of isotropic micro–nanostructures, characterized by highly symmetric features. Consequently, they exhibit polarization insensitivity, meaning that the phase response of the structures is independent of the polarization type of the incident light, making them suitable for most applications. In 2015, researchers at the Harvard John A. Paulson School of Engineering and Applied Sciences utilized dielectric ridge waveguides as phase-shifting elements in metasurfaces[178]. They achieved the desired phase accumulation through propagation over subwavelength distances, realizing high-resolution metagratings with broadband and efficient routing (splitting and bending) to a single diffraction order, overcoming the limitations of conventional gratings. Additionally, as shown in Fig. 34(b), they demonstrated polarization beam splitting capabilities with high suppression ratios.Geometric phase modulation involves adjusting the rotation angle of micro–nanostructures with identical dimensions to achieve phase discontinuities in light waves, thereby enabling artificial control over phase gradients or distributions. This significantly reduces the complexity of designing and fabricating metasurfaces. One of the advantages of geometric phase modulation is that it is unaffected by material dispersion, structural dimensions, or structural resonances. In 2021, Jisha et al. highlighted that the geometric phase is a unified core concept in physics (including optics) and demonstrated how to utilize geometric phase to generate a novel type of waveguide without requiring any refractive index gradients, as shown in Fig. 34(c)[179]. Leveraging the sensitivity of circular polarization to geometric phase, Shalaev et al. proposed using the photonic spin Hall effect for hand-shaped optical polarization and spectral analysis on plasmonic metasurfaces[180]. When left-handed circularly polarized (LCP) and right-handed circularly polarized (RCP) light are incident, opposite geometric phase characteristics are produced, resulting in additional phase gradients on the reflecting surface. This allows LCP and RCP light to be reflected at symmetrical angles. By measuring the two reflection angles and the intensity of the reflected light, spectral components and polarization information of the reflected light can be obtained. Using the same theory, Gao et al. achieved high mode purity and background-free vortex beam generation[181]. Orthogonal circularly polarized light generates vortex beams with opposite topological charges at symmetrical positions. Addressing chromatic aberration in metalenses, Capasso proposed compensating for lens dispersion using resonant mode coupling in dielectric gratings[182], thereby achieving broadband achromatic focusing. This study offered a new approach to achromatic metalens implementation. Subsequently, Chen et al. realized achromatic transmission metalenses with a large bandwidth by rationally designing nano-fins on the surface while controlling the phase, group delay[183], and group delay dispersion of light. Lin et al. proposed a metalens array made of gallium nitride (GaN) nanoantennas[184]. This full-color, achromatic optical field camera finds applications in various fields such as robotics, autonomous vehicles, and virtual and augmented reality. Kivshar et al. controlled the mode intensity of dielectric scatterers to construct a low-reflection-loss Huygens metasurface[185]. This enabled efficient grayscale holography in the near-infrared spectrum, further enhancing the efficiency of metasurfaces. Research on these new phenomena and applications of metasurface beam manipulation demonstrate the rich mechanisms and applications of metasurfaces at subwavelength scales, further advancing the development of integrated micro–nano-optical devices.

    Three basic phase control methods of metalenses. (a) Resonance phase control[177]. (b) Propagation phase control[178]. (c) Geometric phase control[179].

    Figure 34.Three basic phase control methods of metalenses. (a) Resonance phase control[177]. (b) Propagation phase control[178]. (c) Geometric phase control[179].

    Currently, metalenses have tremendous prospects in numerous areas of modern optical imaging. In 2022, Jian et al. from Tsinghua University designed a real-time hyperspectral imaging chip based on reconfigurable metasurfaces[186]. As shown in Fig. 35(a), the chip contains 150,000 micro-spectrometers. In the process of spectral reconstruction of an object, adjacent metasurface units in the metasurface superunit dynamically combine to form a reconfigurable and image-adaptive micro-spectrometer, with ultra-high center wavelength accuracy and spectral resolution. Figure 35(b) shows a schematic diagram of the basic modulation unit of the chip, including the metasurface, microlens (for increased quantum efficiency), and CMOS image sensor. Moreover, they seamlessly integrated the reconfigurable metasurface superunits with a commercial camera to avoid system incompatibility issues and achieve real-time dynamic spectral measurements in all optical imaging systems, as shown in Fig. 35(d). In 2024, Aun et al. discovered the potential application of metasurfaces in the field of polarization detection. They utilized metasurfaces to create a compact Mueller matrix imaging system, consisting of a metasurface for generating structured polarized illumination and another for polarization analysis. This system can capture all 16 components of the Mueller matrix for spatial variation of an object in a single shot. The optical path diagram for this compact Mueller matrix imaging system and the Mueller matrix information obtained from single reflection imaging are shown in Fig. 36[187]. This work’s proposal holds the greatest practicality in applications requiring compact and single-shot polarization imaging and holds potential for development in fields such as food, pharmaceuticals, biomedical imaging, nanoscale structure characterization, and fundamental scientific research.

    The operational status of the hyperspectral imaging device[186]. (a) Schematic diagram of the structure of the device. (b) Schematic diagram of the basic modulation unit, including, from top to bottom, the metasurface, microlens (used to increase quantum efficiency), and CMOS image sensor. (c) Snapshot of spectral imaging. The light from the object to be imaged is incident on the metasurface superunit. (d) Hyperspectral imaging chip with reconfigurable metasurface superunits placed on top of the camera.

    Figure 35.The operational status of the hyperspectral imaging device[186]. (a) Schematic diagram of the structure of the device. (b) Schematic diagram of the basic modulation unit, including, from top to bottom, the metasurface, microlens (used to increase quantum efficiency), and CMOS image sensor. (c) Snapshot of spectral imaging. The light from the object to be imaged is incident on the metasurface superunit. (d) Hyperspectral imaging chip with reconfigurable metasurface superunits placed on top of the camera.

    Mueller matrix imaging reflection results[187]. (a) Imaging of the Mueller matrix placed in the “Fourier plane” using a 4f imaging system, conjugated with two metasurfaces. Metasurface 1 generates structured polarized light illuminating the object, while metasurface 2 diffracts and analyzes the resulting field imaged onto the CMOS sensor. The aperture is placed in the Fourier domain to limit the FOV, and the zero-order block is placed to prevent sensor saturation. (b) Chrysina gloriosa, commonly known as the “chirality beetle,” illuminated by right-circularly polarized (RCP) and left-circularly polarized (LCP) lights and imaged with a standard digital camera. (c) Original image of the chiral beetle captured using the compact Mueller matrix imaging system. (d) Full Stokes image derived from the original image. (e) Mueller matrix image obtained from full Stokes image using a no-reference method (demodulation and normalization).

    Figure 36.Mueller matrix imaging reflection results[187]. (a) Imaging of the Mueller matrix placed in the “Fourier plane” using a 4f imaging system, conjugated with two metasurfaces. Metasurface 1 generates structured polarized light illuminating the object, while metasurface 2 diffracts and analyzes the resulting field imaged onto the CMOS sensor. The aperture is placed in the Fourier domain to limit the FOV, and the zero-order block is placed to prevent sensor saturation. (b) Chrysina gloriosa, commonly known as the “chirality beetle,” illuminated by right-circularly polarized (RCP) and left-circularly polarized (LCP) lights and imaged with a standard digital camera. (c) Original image of the chiral beetle captured using the compact Mueller matrix imaging system. (d) Full Stokes image derived from the original image. (e) Mueller matrix image obtained from full Stokes image using a no-reference method (demodulation and normalization).

    In summary, the introduction of metalenses not only meets the demands for miniaturization and integration in optical systems but also enables flexible control of the optical field phase, effectively overcoming the effects of material dispersion, structural dimensions, and other factors. Thus, metalenses have tremendous prospects in the field of advanced optical imaging technologies including integrated micro–nano-optical devices, hyperspectral imaging, and polarization detection. However, as metalenses are diffractive lenses, the issue of wide spectral bandwidth remains to be addressed. Additionally, the high difficulty in precisely aligning nanoscale components on centimeter-scale chips leads to high manufacturing costs for metalenses. Moreover, metalenses typically have micron-scale dimensions, limiting their ability to capture a large amount of light and resulting in relatively low transmittance efficiency. Therefore, there is still a long way to go before metalenses can generate high-quality images.

    4.3.2. Wide-area optical system

    In traditional imaging, optical systems cannot simultaneously meet the demands for wide-FOV and high-resolution imaging because, for a single optical system, wide FOV, and high resolution are mutually constraining. If the FOV increases, the focal length of the optical system decreases. Consequently, if the detector pixel size remains constant, the detector pixel density decreases, reducing system resolution. Conversely, if resolution is increased, the FOV needs to be reduced.

    The emergence of computational imaging technologies has resulted in high-performance imaging of optical systems. Compared with traditional imaging, computational imaging technology combines the efficient processing performance of computers with optical systems to achieve wide-area high-resolution imaging. Currently, numerous computational optical imaging systems, including drone monitoring, remote sensing mapping, machine vision, biomedicine, and intelligent monitoring, have been widely used. Moreover, ultra-high pixel imaging with a large FOV can be achieved using the following systems: single-lens scanning, multi-scale imaging, multi-detector splicing, and multi-lens splicing systems. The single-lens scanning system usually installs a single high-resolution camera on a pan/tilt, controls the pan/tilt to change the imaging area of the high-resolution camera, and uses image stitching technology to stitch together multiple captured images to obtain a large FOV and high resolution. In 2007, Kopf et al. used a single lens reflex (SLR) lens to scan and stitch images, thereby obtaining a wide-area high-resolution billion-pixel image[188], as shown in Fig. 37(a). However, relying on the automatic exposure mode of the camera for each shot, the DOF provided by a long lens is extremely shallow and consequently not suitable for scenes with both near and distant objects. However, using a single lens to rotate requires a long shoot time, and a certain time delay occurs in post-image stitching, limiting the application scope of this technology. Generally, this imaging method is only suitable for wide-area high-resolution imaging in static or quasi-static scenes and is not suitable for dynamic scenes or high-frame-rate video imaging.The multi-scale imaging system collects light energy through a large-scale main optical system and performs transfer imaging through multi-level small-scale optical systems. The large-scale main optical system and multiple small-scale optical systems are cascaded and combined with image stitching technology to achieve a wide area, which is a method of high-resolution imaging. Xidian University has conducted extensive research in multi-scale imaging. In 2019, Liu et al.[189] developed a prototype of a multi-scale, wide-field, high-resolution computational optical imaging system using co-centric spherical lenses, as shown in Fig. 37(b). The main imaging system comprises a 113.8-mm-diameter four-laminated spherical lens, and the secondary imaging system comprises six sets of nine-piece double-Gaussian structures with a length of 62 mm. The total system length is 295 mm, with an F-number of 3.3 and a focal length of 47 mm. The single-lens small camera has a full FOV of 8°. A total of 399 small cameras are arranged in a hexagonal pattern on the first image plane of the primary imaging system. Combined with post-image stitching technology, the system achieves wide-field, high-resolution imaging with an imaging field of 120°×90° and a resolution of 5  cm×5  km, totaling up to 3.2 billion pixels. This imaging system effectively realizes the engineering application of co-centric multi-scale systems, enabling high-definition, distortion-free imaging of targets within 0 to 5 km, with good real-time imaging effects and high adaptability. In 2018, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, supervised by Shen et al., developed a distributed multi-focal length multi-scale imaging system[190], which has the advantages of low distortion across the entire FOV and high imaging quality. Moreover, the resolution decrease across the entire FOV is approximately 50% less than that of traditional design schemes. Currently, numerous research achievements on multi-scale imaging systems have been reported. The use of large-scale primary optical systems for collecting light energy and conducting initial aberration correction and the combination of small-scale secondary optical systems for imaging on detectors can effectively address the contradiction in traditional imaging where large FOV and high resolution cannot be achieved simultaneously. This approach is effective for achieving wide-field, high-resolution imaging. However, the size and complexity of secondary optical systems mainly depend on the type and magnitude of aberrations that need to be corrected. In the design process of secondary imaging of optical systems, balancing information capacity and lens complexity is challenging.The multi-detector splicing system splits the internal image plane of the optical system, projects light onto the photosensitive surfaces of each detector, and finally uses digital image processing algorithms to obtain the entire spliced image, thus achieving wide-area imaging. UltraCam-D (UCD) is an array aerial camera introduced by the Australian company Vexcel in 2003. It comprises eight independent cameras, including four panchromatic and four multispectral cameras[191]. The latter corresponds to the red, green, blue, and infrared bands, with a focal length of only 28 mm, covering the FOV of the panchromatic cameras. The former, used to capture black and white images, employs a multi-detector stitching method—the panchromatic band stitching scheme, as shown in Fig. 38(a1). The four panchromatic cameras are arranged in parallel, with 4, 2, 2, and 1 detectors placed on the focal plane for image acquisition, achieving a 3×3 array imaging. Compared with traditional detector stitching methods, the method of placing adjacent detectors under different cameras effectively eliminates interference between detector pins. The Kepler telescope is an imaging system designed by the National Aeronautics and Space Administration (NASA) for searching exoplanets orbiting Sun-like stars[192,193]. The optical system of this telescope mainly comprises a spherical primary mirror, Schmidt corrector plate, and focal plane array (FPAA) component, as shown in Fig. 38(a2). The FPAA comprises 21 modules with a total of 42 CCD detectors. These systems employ a seamed stitching method. Similarly, seamless stitching is a stitching method. The Defense Advanced Research Projects Agency and BAE Systems plc developed an autonomous real-time ground ubiquitous surveillance-imaging system (ARGUS-IS)[194], as shown in Fig. 38(a3). This system, equipped with an 18-billion-pixel airborne pod, achieves wide-field, high-resolution Earth observation. As shown in Fig. 38(a4), using image stitching technology, sub-images are seamlessly stitched to form a full-frame image with sufficient clarity to identify and track vehicles and pedestrians from a high altitude of 6500 m. However, this technology has the phenomenon of missing seams in detector splicing, consequently decreasing image quality. In addition, the optical system is bulky and expensive, affecting the practical application of this technology.The multi-lens splicing system arranges multiple small cameras following specific rules and combines post-processing computational imaging technology to obtain wide-area high-resolution images. Unlike multi-detector splicing, multi-lens splicing imaging is image field splicing. By splitting the image plane of the optical system and transmitting it to the photosensitive surface of multiple imaging devices, image splicing technology is used to achieve a large FOV and high-resolution imaging. Multi-lens splicing imaging is an object-side FOV splicing. By dividing the object-side FOV into multiple sub-fields of view and imaging them separately, image stitching technology is used to achieve wide-area high-resolution imaging. The Swiss Federal Institute of Technology in Lausanne has made significant progress in multi-lens panoramic imaging. In 2012, they introduced a multi-camera system inspired by the vision system of flying insects, known as the Panoptic Camera[195]. Building upon the Panoptic Camera in 2013, they proposed a super-high-resolution light field imaging and recording system using panoramic methods[196]. The system can record omni-directional videos at a speed of 30 frame/s with a resolution exceeding 9000 pixel × 2400 pixel. It can capture the surrounding light field within the FOV, creating rooms for various post-processing techniques such as quality-enhanced 3D cinematography, super-resolution depth map estimation, and applications requiring beyond-standard stitching and panorama generation with high dynamic range (HDR). In 2017, they introduced a miniaturized high-definition vision system inspired by insect eyes, matching the size and resolution of natural counterparts[197]. Using distributed illumination methods, this system can operate in dark environments, suitable for endoscopic and adjacent imaging applications. In 2021, they proposed a hybrid synthetic imaging system combining the advantages of fisheye and compound eyes. It uses a single spherical lens as the objective, followed by a series of miniature cameras, enabling high-resolution imaging under wide FOV. Shao et al. from Xidian University designed a multi-aperture imaging system[198], as shown in Fig. 38(b). The total FOV of the system is 123.5°×38.5°, with over 100 million pixels. It enables real-time global viewing of images and videos, supporting functions such as viewing and exporting local detail information. However, because multiple lenses are used, the entire system has a large volume and mass.

    Wide-field optical imaging method. (a) Single-lens scanning imaging system. MeadeLX200 stent and its imaging effects[188]. (b) Multi-scale computational optical imaging system and imaging renderings[189].

    Figure 37.Wide-field optical imaging method. (a) Single-lens scanning imaging system. MeadeLX200 stent and its imaging effects[188]. (b) Multi-scale computational optical imaging system and imaging renderings[189].

    Wide-field optical imaging method. (a) Multi-detector splicing system[191,192,194]. (a1) UltraCam-D(UCD) camera detector splicing scheme. (a2) Complete focal plane array assembly of the Kepler telescope. (a3) ARGUS-IS imaging system. (a4) Full FOV image. (b) Multi-aperture imaging system prototype and imaging effect[198].

    Figure 38.Wide-field optical imaging method. (a) Multi-detector splicing system[191,192,194]. (a1) UltraCam-D(UCD) camera detector splicing scheme. (a2) Complete focal plane array assembly of the Kepler telescope. (a3) ARGUS-IS imaging system. (a4) Full FOV image. (b) Multi-aperture imaging system prototype and imaging effect[198].

    In summary, wide-field, high-resolution computational optical imaging systems are essential in areas such as aerial reconnaissance, ecological monitoring, and support for social activities owing to their broad imaging range and high imaging performance. However, the limitations of the aforementioned four imaging technologies represent key issues that computational optical imaging systems need to address in the future.

    4.3.3. Coded aperture

    From the perspective of computational optics, computational optics can be considered an optical imaging method that involves encoding information. Regarding the imaging chain, encoding can be introduced at almost any position in the chain. Typical methods include encoded illumination (structured light, self-healing beams, and Fourier ptychographic imaging), encoded media (water and atmosphere), encoded optical elements (coded apertures, coded shutters, and coded phase plates), and encoded detectors (polarization detectors and spectral detectors). By organically introducing physical optical information using geometric optical imaging and information transmission as a guideline, higher-dimensional information can be obtained through information interpretation.

    The introduction of coded apertures was aimed at increasing the optical throughput of optical systems without reducing resolution. In the mid-1950s, the French scientist Gilbert Malusi first proposed the coded aperture technique. The key feature of this technique is adjusting the aperture to change the PSF, thus encoding the PSF. The coded aperture technique involves inserting a mask with a specific structure into the traditional optical aperture, thus addressing the limitations of traditional imaging systems and achieving high performance. Coded apertures originated in the field of astronomy in the 1960s. In X-ray and gamma-ray imaging, conventional optical lenses such as lenses are ineffective for high-energy radiation. Therefore, a mask should be carved on a material that does not transmit X-rays and gamma rays to indirectly encode computational imaging. Such masks are the origin of coded apertures. In the late 1980s, Tiloitta et al. began to use liquid crystal SLMs as coding templates and designed a Hadamard transform spectrometer using a fixed coding template[199]. The spectrometer provides a fully solid-state system without moving parts for future spectral analysis. Encoding control of the spatial or temporal domain of the light field is achieved during the signal acquisition process of the imaging system, obtaining encoded compressed measurement values, and is combined with back-end image reconstruction algorithms to achieve high-resolution imaging[200] and high-speed imaging[201]. Edward et al. designed an optical digital system using wavefront coding[202]. By modifying the phase mask and digitally processing the intermediate image obtained, the system can provide a large DOF near-diffraction-limited imaging performance. In 2004, Dowski and Cathey from the University of Colorado introduced a wavefront coding optical element, termed the phase mask[203], into optical imaging systems. By placing an odd-symmetry phase mask at the aperture of the optical system, light rays on the image plane do not converge to a point but become uniform thin beams within a certain defocus range, rendering the system insensitive to defocus and achieving extended DOF. The schematic of the system is shown in Fig. 39(c). The challenge of wavefront coding technology is designing phase masks with complex surface shapes for achieving defocus invariance within a certain range.

    (a) Coded aperture mask used in gamma-ray imaging. (b) Comparison of traditional sampling and coded exposure sampling[204]. (c) Wavefront coding imaging system[210]. (d) Schematic diagram of the coded aperture snapshot spectral imaging (CASSI) physical system[205]. (e) Schematic diagram of the CASSI imaging process[205].

    Figure 39.(a) Coded aperture mask used in gamma-ray imaging. (b) Comparison of traditional sampling and coded exposure sampling[204]. (c) Wavefront coding imaging system[210]. (d) Schematic diagram of the coded aperture snapshot spectral imaging (CASSI) physical system[205]. (e) Schematic diagram of the CASSI imaging process[205].

    In 2006, Professor Ramesh Raskar from the Massachusetts Institute of Technology (MIT) introduced coded apertures into the field of computational imaging to address the challenge of achieving extended DOF[204]. Additionally, he used a coded exposure in a temporal domain, combined with PSF estimation and image deconvolution, to address motion blur. The key idea of coded exposure is to sample motion in time while minimizing spatial frequency loss. As shown in Fig. 39(b), conventional exposure leads to motion blur, resulting in the loss of high spatial frequencies. However, coded exposure retains these frequencies. The effectiveness of coded exposure is evident from the recovered results.

    The essence of computational imaging lies in the manipulation of the light field, with encoding significantly essential in achieving optimal results. Encoding fundamentally involves modulating the light field through various means to enhance certain mathematical characteristics in specific projection dimensions. Within the light field dimensions, spectral information contains essential details about object composition, structure, and material properties, rendering it invaluable in applications such as aerospace remote sensing, medical diagnostics, and machine vision. Traditional spectral imaging techniques typically employ scanning along spatial or spectral dimensions to sequentially acquire spectral information from the surface of the target object. However, because of long exposure times, traditional spectral imaging methods are unsuitable for capturing dynamic scenes. In 2007, the research team of Professor David Brady at Duke University introduced a novel linear imaging system that revolutionized this approach. The hardware setup includes an objective lens with an encoded aperture, a relay lens, filters, a dispersive prism, and a monochrome camera, as shown in Fig. 39(d)[205]. This system, known as coded aperture snapshot spectral imaging (CASSI), enables the acquisition of complete spectral images in a single exposure, as shown in Fig. 39(e). This capability offers a significant advantage in rapidly capturing spectral information from dynamic scenes. In 2013, researchers at the University of Delaware in the United States used a DMD as the spatially varying coded aperture and employed a prism as the dispersing element to develop a CASSI system[206]. Their research focused on the use of compressive sensing algorithms to reconstruct spectral images and proposed a high-order computational model to improve reconstruction quality. In 2014, Gonzalo et al. also developed a compressed coded aperture spectral imaging device, thus addressing the limitations of traditional spectrometers, which require proportional scanning of multiple regions[207]. To its uniqueness, the device requires only a few FPAA measurements to sense the entire data cube, and in some cases, only one FPAA measurement is required. Yuan et al.[208] and Wang et al.[209] proposed a hybrid dual-camera system consisting of CASSI and an RGB camera. By fusing the scene coding information obtained from the CASSI branch and the color information acquired from the RGB branch, they achieved high-fidelity image reconstruction. Encoded aperture compressed spectral imaging benefits from using simple optical sensing elements to control compressed projections, yielding remarkable efficiency, thus holding promising prospects in many applications in remote sensing and surveillance fields. Its strength lies in the combination of optics with excellent encoded apertures and computational imaging theory. Although CASSI spectral imagers naturally embody the coherence of these domains, new spectral imagers and numerous versatile multidimensional imaging sensors are being discovered by utilizing advanced optics and photonics devices as sensor elements. The potential of coded aperture optimization and optical sensing in multimodal and multidimensional imaging shows great potential for the near future, providing a basis for signal processing exploration.

    4.3.4. Minimalist optical system

    Minimalist optical systems have smaller volume and mass, lower system complexity, reduced assembly difficulty, and higher energy transmittance than traditional optical systems comprising multiple optical lenses. These advantages are conducive to shortening the manufacturing steps of optical lenses.

    The concept of single-lens computational imaging was first proposed by Heide et al. in 2013. They used a camera containing only a single glass component to capture images and employed a series of computational imaging techniques on the back-end to eliminate the effects of optical system aberrations, thus simplifying the front-end optical system to render it light and cost-effective[211]. The imaging effect is shown in Fig. 40(a). Building on the proposed single-lens computational imaging, many studies have investigated how to achieve high-quality imaging using a single lens. In 2015, Li et al. improved single-lens imaging using a single lens instead of complex lenses to capture images and employing computational photography techniques to eliminate corresponding imaging artifacts, thereby enhancing the deblurring quality of single-lens imaging[212]. In 2017, they further combined single-lens optical devices with complex capture-and-correction methods based on computational photography[213]. This research further improved lens design by correcting chromatic aberration, and simple image deconvolution methods could effectively produce high-quality images thereafter.

    (a) Single-lens camera input image and deblurring results[211]. (b) Joint end-to-end optimization of the optical design framework[214]. (c) Diagram of the principle of the diffractive telescope imaging system experimental platform and comparison of the results with and without the image recovery point target[220].

    Figure 40.(a) Single-lens camera input image and deblurring results[211]. (b) Joint end-to-end optimization of the optical design framework[214]. (c) Diagram of the principle of the diffractive telescope imaging system experimental platform and comparison of the results with and without the image recovery point target[220].

    The global optimization of optical systems and image processing have been extensively investigated. In 2006, Robinson et al. from Ricoh Innovations in Japan made a new attempt at optical-digital co-design methods[214]. They performed algorithmic restoration of blurred images during optical system design, calculated the root mean square error between the restored and blurred images at each pixel, and used this as an evaluation metric to iteratively optimize the optical system. This method was validated via the simulation of a single-lens system and compared with single-lens and double-lens systems designed using traditional optical or digital sequential methods. The design framework is shown in Fig. 40(b), indicating that joint design improves image restoration and simplifies the optical system. In 2008, Mirani et al. proposed a similar approach of jointly designing optical systems and image restoration in computational imaging[215]. They established the optical system transfer function and image processing transfer system and used the mean square error between the reconstructed and original target scene images as a performance metric for optimization, thereby achieving end-to-end optimization. This method considers the optical transfer function as a special filtering function in mathematics from the perspective of image restoration. However, it did not extensively investigate the effects of optical aberrations, and the consideration of optical aberrations in modeling the physical process into a mathematical model during computational imaging was insufficient. Dowski and Cathey pioneered a new imaging method termed wavefront encoding, which utilizes freeform optics and signal processing to reduce system complexity and provide high-quality images[216,217]. Robinson and Stork introduced a novel framework for designing digital imaging systems, particularly based on an end-to-end evaluation function using pixel mean square error[218]. Building upon these methods, in 2018, researchers including Xiaopeng Shao from Xidian University proposed a new holistic optimization design model termed SWaP (Size, Weight & Power/Price) from optical system to image processing[219]. The optical SWaP computational imaging method can address the limitations of traditional optical imaging, significantly reducing the size, weight, power consumption, and cost of military electro-optical systems. This method can be widely applied in areas such as wide-area surveillance and alarm systems, airborne electro-optical equipment, ground-based early warning systems, and high-resolution Earth observation systems. In 2019, Yang et al. utilized the concept of global optimization in computational imaging and employed the adaptive Wiener filtering algorithm to perform image deconvolution on practical diffractive telescope imaging systems, effectively improving the imaging quality of simple optical systems[220]. The diffraction telescope imaging system is shown in Fig. 40(c).

    In the end-to-end design process of ultra-compact optical imaging systems, neural networks are employed to optimize the system parameters. Feature PSFs are extracted using these optical system parameters. The imaging degradation caused by the PSF and the correction capability of the algorithm are computed, with the final image quality after correction serving as the evaluation metric. This iterative process enables the derivation of optimal parameters for the optical system, achieving high-quality imaging in ultra-compact optical imaging systems. With the increasing demand for miniaturization and lightweight optical systems, novel optical system design technologies based on computational imaging theory offer lower processing difficulty, shorter manufacturing cycles, and lower manufacturing costs while maintaining imaging clarity than traditional optical systems. Using these technologies, the weight of optical systems is significantly reduced compared with those of traditional optical systems, offering promising development prospects. In 2020, Metzler proposed an end-to-end method that jointly optimized diffractive optical elements (DOEs) and neural networks to achieve single-shot HDR imaging[221]. Moreover, in the same year, Dun et al. proposed a snapshot HDR imaging method that uses DOEs to map saturated highlights to adjacent unsaturated areas by learning HDR coding in a single image[222]. They introduced a novel DOE Rank-1 parameterization, significantly reducing the optical search space while effectively encoding high-frequency details. Further, they proposed a customized reconstruction network tailored to this 1-level parameterization to recover clipped information from the encoded measurements. The proposed end-to-end framework was validated through simulation and actual experiments, achieving a peak signal-to-noise ratio (PSNR) improvement of over 7 dB compared with those of state-of-the-art end-to-end designs. In 2023, Wei Shijie from Xidian University proposed an optimized encoding method for phase plates using the framework of deep learning, reducing the requirements for full-field aberration correction[223]. As shown in Fig. 41, compared with traditional Cooke triplet and doublet lens systems, these aberrations, together with the encoding mask, form an optical encoding combination that can be digitally decoded, reducing the optical complexity of traditional systems. Results indicate that this method ultimately obtains images with the best resolution, and the DOF of the system is increased by 13 times, which is significant for the high-precision detection and attachment of machine vision small parts.

    Imaging results and image quality evaluation of cooke triplet and doublet lens and optical systems based on deep learning combined with wavefront encoding[223]. (a) Optical structure models of the three optical systems. (b)–(f) the imaging results of different systems at defocus distances of −0.2, −0.1, 0, 0.1, and 0.2 mm, respectively. (g) Structural similarity (SSIM) values of different systems within the defocus range. (h) Peak signal-to-noise ratio (PSNR) values for different systems within the defocus range.

    Figure 41.Imaging results and image quality evaluation of cooke triplet and doublet lens and optical systems based on deep learning combined with wavefront encoding[223]. (a) Optical structure models of the three optical systems. (b)–(f) the imaging results of different systems at defocus distances of 0.2, 0.1, 0, 0.1, and 0.2 mm, respectively. (g) Structural similarity (SSIM) values of different systems within the defocus range. (h) Peak signal-to-noise ratio (PSNR) values for different systems within the defocus range.

    At this stage, minimalist optical systems have achieved many results. The optical system structure can be simplified through optical joint design, and simple surface shapes can be used to achieve optical parameters that are convenient for image processing. The optical-image joint design method uses the idea of global optimization, and the automatic iteration of joint optimization can achieve appropriate and complementary aberration correction. However, minimalist optical systems also have multiple limitations: Poor environmental adaptability: Only an accurate imaging model can reflect the location, shape, size, and other information of the target, but the accuracy of the model is significantly affected by environmental information.The algorithm requires a priori and pre-training. To ensure the correctness and effectiveness of the algorithm, assumptions or preprocessing need to be made on the premise of the algorithm, and the preparation is difficult.The real-time performance of imaging needs to be improved. Data processing, analysis, and image reconstruction are time-consuming, and complex algorithms and large-scale data increase processing time.

    4.3.5. Adaptive optics system

    To address the influence of complex channels on optical imaging, adaptive optics (AO) technology is usually adopted to detect and compensate for random interference in the environment, thus obtaining optical imaging with near-diffraction-limit resolution. AO technology is high-tech and integrates modern optics, optoelectronics, computers, automatic control, functional materials, and precision machinery, focusing on the study of wavefront aberration. It primarily measures and corrects wavefront aberration in real time[224]. The adaptive optical imaging system mainly comprises three parts: wavefront detection, wavefront control, and wavefront correction. It uses wavefront detection devices to measure wavefront dynamic aberration in real time and uses a fast electronic system for control calculations. Then, it uses wavefront correction devices to correct wavefront aberration in real-time, thus enabling the optical system to automatically adapt to changes in the external environment and address the impact of dynamic disturbances, maintaining the system in good working conditions. Finally, high-resolution light-intensity recording devices are used to record and image objects.

    To address the interference of atmospheric turbulence and observe the true appearance of stars from the ground, in the 1950s, the astronomer Horace Bobcock proposed a method that uses a sensor to measure the wavefront distortion in the light beam and then compensate for the wavefront distortion using a deformable optical element to restore the original wavefront of the light beam, thus eliminating the influence of atmospheric turbulence and improving image clarity[225]. This marked the era of AO. This idea clarified the traditional belief that higher resolution can only be obtained by improving the manufacturing accuracy of optical instruments, enabling optical systems to actively adapt to external error changes and maintain a high resolution of the imaging system. However, because of the technological limitations of the time, this idea could not be implemented in engineering. Until 1972, Itek Corporation in the United States developed a deformable mirror and an interference wavefront sensor and, based on these, built the world’s first adaptive optical system, which can effectively correct aberrations introduced by atmospheric turbulence[226]. In 2013, Wu et al. investigated the transmission of light beams using a dual adaptive optical system in a turbulent atmosphere[227]. They established a typical model of the dual adaptive optical system and analyzed the working principle of the system and theory of beam propagation using the optical system in a turbulent atmosphere. They introduced the power efficiency of the received beam and beam quality to evaluate the performance of the optical system. To eliminate the impact of atmospheric dispersion on astronomical observations using telescopes, in 2023, Gao et al. leveraged the adaptive optical system and linear fitting method to conveniently measure atmospheric dispersion from scientific images and control the atmospheric dispersion corrector (ADC) system for dispersion correction, thereby enabling the full width at half-maximum of the final image to approach the diffraction limit of the telescope[228].

    In telescopes, light from natural stars or artificial stars passing through the atmosphere is collected by the telescope, reflected off a deformable mirror, and illuminated onto a wavefront sensor, as shown in Fig. 42(a). The wavefront sensor determines pointwise phases of the received wavefront, and then the information is used to guide the shape of the deformable mirror to minimize aberrations and achieve the best resolution. Thanks to AO, the Keck Observatory can resolve stars near the supermassive black hole Sagittarius A* at the Galactic Center, and a similar optical geometry can be applied to microscopes. Unlike telescope systems, it is difficult for isolated self-luminous objects to naturally occur in biological samples. Therefore, researchers found that luminous sources can be generated by exciting fluorescence or backscattered exciting light. For a wide-field fluorescence microscope that simultaneously illuminates an extended volume, it is usually only necessary to correct for aberrations in the emitted fluorescence. Azucena et al.[229] injected fluorescent beads into fruit fly embryos and measured the aberrations of fluorescence passing through the embryos using a Shack–Hartmann (SH) wavefront sensor in a wide-field microscope. Jorand et al.[230] integrated fluorescent beads into 3D multicellular tumor spheroids, correcting aberrations caused by the spheroids in the detection path of selective plane illumination microscopy. In both examples, a closed loop between the wavefront sensor and the deformable mirror minimizes the detected wavefront errors, thus improving image quality, as shown in Figs. 42(b) and 42(c).

    Adaptive optics using direct wavefront sensing[334]. (a) The distortion of the wavefront (blue lines) is directly measured with a wavefront sensor and minimized by a wavefront modulator (e.g., a deformable mirror) to improve the image quality of a telescope. Sgr A*, Sagittarius A*. (b) Beads inside a Drosophila embryo17. (c) Neurons in zebrafish larval brain22 obtained without and with AO correction.

    Figure 42.Adaptive optics using direct wavefront sensing[334]. (a) The distortion of the wavefront (blue lines) is directly measured with a wavefront sensor and minimized by a wavefront modulator (e.g., a deformable mirror) to improve the image quality of a telescope. Sgr A*, Sagittarius A*. (b) Beads inside a Drosophila embryo17. (c) Neurons in zebrafish larval brain22 obtained without and with AO correction.

    The AO system was first used in biological sample imaging, which uses direct wavefront sensing to measure the aberration of the human eye with the light reflected back by the retina as the guide star and then corrects it to achieve high-resolution retinal imaging[231]. In 2014, Yang et al. achieved optical stabilization and digital image registration in the adaptive optics scanning laser ophthalmoscope (AOSLO). Through real-time digital image registration, residual eye movements after optical stabilization can be corrected, thereby efficiently obtaining high-resolution retinal images[232]. In 2023, Soohyun Lee proposed a high-speed AO confocal ophthalmoscope combining a DMD and high-speed 2D CMOS camera[233]. This system can easily control the trade-off between image acquisition rate and contrast by applying different illumination patterns on the DMD. The camera is synchronized using the DMD to project multi-point patterns onto the human retina, which is pre-corrected by AO for parallel scanning. Compared with standard flood illumination, the multi-point scheme enables frame acquisition rates of up to 250 frame/s, resulting in a 2–3 times improvement in contrast.

    Currently, AO systems have been widely applied in research fields such as astronomical observations, laser beam shaping, laser precision engraving, human retinal imaging, biomedicine, microscopy, wireless laser communication, and photolithography. Although significant improvements have been achieved, technically, these systems still face challenges such as system complexity, difficulty in popularization, high costs, and control difficulties of aberration compensation devices such as deformable mirrors, and inability to address interference caused by obstacles in complex imaging environments.

    The computational light field adaptive optical imaging system aims to mitigate the dual interference of complex environments on the amplitude and phase of the imaging light field using hierarchical processing methods in computational imaging. This system involves measuring the overall light field of the target and interference. By exploiting the distribution characteristics of the 4D light field information of the target and interference, computational methods are used to effectively distinguish and filter out the interference. In the field of computational imaging, the emergence of the light field camera has provided a novel solution for integrating imaging systems. The light field camera, developed by Stanford University, can capture 4D light field imaging information within a large DOF range. It enables effective imaging at different depths via post-processing of the data after a single shot. Although the resolution of the image recorder is reduced, the light field camera simultaneously achieves a resolution in the depth direction of the spatial light field. Additionally, the massive amount of data in a single image contains sufficient light field information, offering high autonomy in post-processing. This approach offers advantages in terms of a large detecting FOV, using extended objects as wavefront information-solving beacons, replacing mechanisms such as deformable mirrors for aberration compensation, and providing a large dynamic range for aberration detection and compensation. The system is compact and cost-effective, and it effectively eliminates the impact of obstacles on imaging in high-dimensional light fields while compensating for environmental wavefront distortions. This method shows great potential in various research areas, such as multi-level depth imaging[234], 3D information modeling[235,236], imaging through obstacles, full-field multi-view wavefront detection[237,238], and multi-level phase recovery[239]. Additionally, the potential applications of AO in flow field 3D structure detection, image enhancement, lucky imaging, describing and guiding incoherent illumination, describing the output of light fields through resonant cavities, multiplexing in laser communication, and high-energy fiber laser mode decomposition in various research areas are also worth exploring.

    Multi-level depth imaging: Because the full-field data record the intricacies of the light field, distribution information can be obtained at different levels of the light field using the full-field data, combined with computational imaging principles, as shown in Fig. 43(b). With the multi-level depth distribution of the light field, digital refocusing imaging of targets at different depths can be achieved, enabling a “capture now, focus later” functionality.

    (a) Schematic diagram of the light field camera structure[235]. (b) All-light images and detailed information[239]. (c) Optical model of the light field microscope[240].

    Figure 43.(a) Schematic diagram of the light field camera structure[235]. (b) All-light images and detailed information[239]. (c) Optical model of the light field microscope[240].

    3D information modeling: The full-field data captured by a light field camera can be considered equivalent to the data collected by an imaging array system. Using the information from the imaging array along with the principle of parallax, depth information for each point of the imaged target can be obtained. The light field model for this is shown in Fig. 43(a).

    Imaging through obstacles: Through the application of AO principles and the analysis of the distribution of the obstacle in the 4D light field information, the light field information of the layer where obstacles occur can be weakened while enhancing the light field information of the target layer. This enables imaging through obstacles.

    Full-field multi-view wavefront detection: Leveraging the AO theory using full-field data enables complex phase information to be solved. Compared with traditional wavefront sensing methods, this approach utilizes few microlenses to perform wavefront detection in a single view direction. By partitioning the entire sub-aperture plane, it is possible to simultaneously achieve large-field, multi-view wavefront detection, where the wavefront detection of each view direction is independent and unaffected by other directions.

    Multi-level phase recovery: Using the full-field data and AO theory, complex phase information can be calculated. Combined with tomographic principles, multi-level phase distribution can be obtained. This system is simpler, has better synchronization and uniformity, and is cheaper than traditional multi-conjugate AO.

    4.3.6. Single-pixel imaging

    Unlike traditional high-resolution cameras, single-pixel imaging uses only a single-pixel detector for spatial imaging. During imaging, the target scene is spatially sampled using structured illumination, and the corresponding reflected or transmitted light intensity values are synchronously recorded by the single-pixel detector. The detection signal is then correlated with the distribution of the structured illumination to reconstruct the target image. Single-pixel imaging can be further divided into passive and active single-pixel imaging. The most typical example of a passive single-pixel imaging device is an optical compressive imaging camera. Active single-pixel imaging generally refers to computational ghost imaging. Both imaging modes originate from different research fields but have similar operational principles and basic mechanisms.

    Passive single-pixel imaging is most realized by optical compressive imaging cameras. In 2006, Donoho et al. introduced the concept of compressive sensing in the field of signal processing[241]. The theory of compressive sensing states that accurate reconstruction of sparse signals can be achieved at sampling rates far below the Nyquist sampling theorem requirements. In 2007, to further validate the theory of compressive sensing, Takhar et al. built an imaging system using a single-pixel detector and DMD[242]. The schematic of this camera is shown in the Fig. 44(a). In this system, the target scene is imaged onto the DMD, and spatial coding of the image is performed by displaying a series of randomly distributed patterns on the DMD. The intensity of light after each coding is recorded using a single-pixel detector. Using the distribution information of the coding patterns and the detection signals, each different mirrored pattern generates a voltage on a single photodiode corresponding to the measured values. Using compressive sensing algorithms, the scene image can be reconstructed. Importantly, the spatial resolution of the reconstructed image is significantly higher than the number of measurements taken during imaging. They termed this imaging system optical compressive imaging.

    (a) Frame diagram of the compressed imaging (CI) camera and its imaging results[242]. (b) Hyperspectral “ghost imaging” camera experimental originals and experimental results[248].

    Figure 44.(a) Frame diagram of the compressed imaging (CI) camera and its imaging results[242]. (b) Hyperspectral “ghost imaging” camera experimental originals and experimental results[248].

    Active single-pixel imaging refers to computational ghost imaging. In 2001, researchers at the Boston University used entangled photon pairs to achieve ghost imaging[243]. However, a year later, researchers at the University of Rochester demonstrated ghost imaging using classical light sources. Their experiment was controversial in the academic community and sparked a debate about whether ghost imaging is a quantum phenomenon or belongs to classical theory. In 2004, the Lugiato group from Italy theoretically proved the possibility of ghost imaging using incoherent thermal light sources by comparing the correlation properties of entangled and thermal lights[244]. In 2005, the research group supervised by Yanhua Shih at the University of Maryland made the first experimental achievement of ghost imaging using pseudothermal light sources[245]. The conclusion from scientific verifications highlighted that ghost imaging is not purely a quantum phenomenon but an imaging technique based on second-order correlations of the optical field. In 2008, Shapiro from MIT introduced SLMs to generate customized structured light fields in ghost imaging experiments[246]. This eliminated the need for a reference light path used to measure the distribution of the optical field and achieved computational ghost imaging using a single optical path. The author revolutionized the ghost imaging system, significantly improving its practicality. In 2009, high-order correlation reconstruction algorithms for ghost imaging were theoretically and experimentally confirmed, and high-order correlation imaging was reported to enhance the SNR and contrast of reconstructed images[247]. In 2018, Liu et al. from the Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, used a spectral camera based on ghost imaging via sparse constrain to obtain a 3D spatial-spectral data cube of a target in a single snapshot using a 2D detector. The experimental schematic and result of the camera are shown in Fig. 44(b)[248]. The results show that the new system can modulate the spatial and spectral resolutions separately and provide the possibility to optimize the fluctuations of the light field of different wavelengths depending on the imaging scene.

    Single-pixel imaging offers a unique sampling approach. While modern digital cameras use pixelated detector arrays to capture images, single-pixel imaging samples a scene using a series of masks and correlates the content of these masks with the corresponding intensity measurements of a single-pixel detector to reconstruct the image. Typically, a series of structured light patterns are projected onto the object and the light intensity is measured using a single-pixel detector. Initially, these light patterns were generated by rotating ground glass; subsequently, SLMs and DMD were used to artificially create illumination patterns. However, employing the Nyquist sampling theorem for single-pixel sampling does not prevent the proliferation of illumination modes. Reducing the sample count would sacrifice SNRs or resolution[249]. Additionally, the single-pixel video method[250] largely depends on the modulation speed of the DMD and high data compression rates. Therefore, choosing an appropriate approach to compress large data or flexibly coordinate the relationship between the SNR, resolution, and imaging speed is crucial. Regions of interest (ROIs) are highly valuable in image and video encoding[251]. Their use optimizes coding performance, reduces processing time and bandwidth usage, and enhances accuracy in specific regions. In 2017, Phillips et al. adopted a novel strategy to stimulate the animal’s foveal vision system to achieve dynamic region of interest sampling in single-pixel imaging[252]. In 2019, Ye et al. demonstrated the application of secure regions of interest (SROIs) in the field of single-pixel imaging using two experiments[253]. Experiment A involved a DMD for generating binary illumination patterns based on the Hadamard matrix, as shown in Figs. 45(a) and 45(c); Experiment B included a digital light projector to play composite colored illumination patterns based on the Hadamard matrix, as shown in Figs. 45(b) and 45(d). Hadamard matrices were linearly combined into illumination patterns with different size and color distributions, thereby enhancing the resolution and spectral information of the region of interest imaging. Moreover, this linear mapping method has high randomness. They implemented this randomness into a multicolor cipher pattern, and only users possessing the cipher pattern can decrypt the correct image. Various illumination patterns can be displayed on the object, and the light intensity is then recorded using a single-pixel detector. Subsequently, a simple and fast algorithm can be employed to reconstruct the object. This approach has applications in various fields such as single-pixel imaging of biological tissues, real-time imaging of moving targets, and multispectral image fusion, confirming the feasibility of the approach.

    (a), (b) Schematic diagrams of the two experimental setups. (c) Schematic diagram of generating a composite light pattern (64×64) in Experiment A. (d) Schematic diagram of generating a composite color illumination pattern (64×64) in Experiment B[253].

    Figure 45.(a), (b) Schematic diagrams of the two experimental setups. (c) Schematic diagram of generating a composite light pattern (64×64) in Experiment A. (d) Schematic diagram of generating a composite color illumination pattern (64×64) in Experiment B[253].

    Single-pixel imaging utilizes an SLM to encode the spatial information of an object into a 1D optical signal, which is then decoded to reconstruct the image using a non-scanning point sensor device. Specifically, in projection-based single-pixel imaging, the object is illuminated with pre-programmed 2D spatial coding patterns, and the reflected or transmitted light signals are collected. A single-pixel detector retrieves the spatial information of the target along with the fine details of the object from these light signals. The single-pixel detector serves as the imaging device, detecting the object either by providing a temporally varying detection end signal using an SLM or providing temporally varying structured illumination to the scene. In addition to cost-effectiveness, other advantages of single-pixel imaging include low dark current, high sensitivity, and high-quality images. Additionally, introducing compressive sensing theory into single-pixel imaging enables reconstruction with few measurements. In certain cases, single-pixel imaging techniques have shown competitive advantages over traditional cameras in practical application scenarios owing to the breakthrough in detector requirements. These advantages include a high SNR, wide spectral range, low cost, higher detection efficiency, low dark count, and faster temporal response. Over the past decade, single-pixel imaging has attracted significant attention for applications in various fields such as infrared imaging, gas imaging, photoacoustic imaging, 3D imaging, terahertz imaging, tomography, neutron imaging, encryption imaging, and lensless imaging.

    In the field of infrared imaging, in 2018, Zeng et al. pixelated a hybrid graphene metasurface[254] to create an SLM prototype for high-frame-rate single-pixel imaging, demonstrating an order-of-magnitude improvement over traditional liquid crystal or micromirror SLMs. The introduction of single-pixel imaging provides possibilities for wavefront engineering in infrared technology.

    In the field of photoacoustic imaging, in 2019, researchers at University College London experimented using a single-pixel camera for 3D compressed sensing photoacoustic tomography[255]. The experimental setup and results are shown in Fig. 46, demonstrating the ability to reduce data acquisition time and the required amount of data, thus providing high-resolution images with large fields of view.

    Experimental setup and results of photoacoustic imaging[255]. (a) Experimental setup diagram. (b) Experimental phantom for photoacoustic imaging—distorted black polymer ribbon. (c) The z-y slice images of the polymer ribbon.

    Figure 46.Experimental setup and results of photoacoustic imaging[255]. (a) Experimental setup diagram. (b) Experimental phantom for photoacoustic imaging—distorted black polymer ribbon. (c) The z-y slice images of the polymer ribbon.

    Single-pixel imaging is also essential in 3D imaging. In 2015, Sun et al. proposed a method for single-pixel 3D reconstruction, designing an imaging system based on the use of short-pulse structured illumination and high-speed photodiodes in a single-pixel camera[256]. The imaging system is depicted in Fig. 47(a), and an overview of the reconstruction algorithm is shown in Figs. 47(b)47(h). It reconstructs the 3D image of a scene using the backscattered intensity varying over time (measured for each output pulse of the laser) and a correlated set of N structural illumination patterns. The incident laser pulse [Fig. 47(b)] scatters back from the scene. The amplified analog signal [Fig. 47(c)] is converted into discrete data points [Fig. 47(d)] by a high-speed digitizer and then is processed by a computer algorithm. The algorithm uses M discrete sampling intensity points from the time-varying signal to reconstruct M 2D images, resulting in an x, y, z image cube [Fig. 47(h)]. In the image cube, each lateral pixel (x, y) has an intensity distribution along the vertical axis (z) [Fig. 47(g)], depending on the temporal shape of the pulse, detector response, readout digitization, pixel depth, and reflective information. This system could reconstruct 3D scenes with a resolution of 128 pixel × 128 pixel within a range of 5 m, achieving an accuracy of up to 3 mm. Furthermore, by employing a compressed sensing strategy, continuous real-time 3D videos at frame rates up to 12 Hz could be obtained.

    (a) Single-pixel 3D imaging system. (b) Illumination laser pulses backscattered from the scene are measured as (c) broadened signals. (d) Image cubes containing images of different depths are obtained using measurement signals. (e) Each lateral position has an intensity distribution along the vertical axis, indicating depth information. (f) Reflectance and (g) depth maps can be estimated from the image cube and then used to reconstruct a 3D image of the (h) scene[256].

    Figure 47.(a) Single-pixel 3D imaging system. (b) Illumination laser pulses backscattered from the scene are measured as (c) broadened signals. (d) Image cubes containing images of different depths are obtained using measurement signals. (e) Each lateral position has an intensity distribution along the vertical axis, indicating depth information. (f) Reflectance and (g) depth maps can be estimated from the image cube and then used to reconstruct a 3D image of the (h) scene[256].

    In the field of medical microscopic imaging, in 2021, Deng et al. proposed a transmission liquid crystal modulated single-pixel microscope[257]. This microscope employs a partially transparent liquid crystal spatial light modulator (LC-SLM) to achieve a transmission optical system, simplifying the optical path of single-pixel microscopic imaging, which obviates the need for a complex optical path to project Fourier patterns and eliminates the necessity for a 4F Fourier filter to mitigate diffraction effects. With microscopic imaging at different sampling rates, it was discovered that acceptable imaging quality could be achieved with 10% Fourier spectrum reconstruction, thereby reducing the number of measurements. Cancer multispectral images were obtained through illumination at different wavelengths and evaluated for contrast. Using a ground glass diffuser as the scattering medium, the thickness and position of the scattering medium in microscopic imaging were analyzed. This prototype of a transmissive single-pixel microscope is expected to find wide applications in microscopic imaging of scattering media and medical imaging.

    Single-pixel cameras have also been utilized for terahertz and neutron imaging. In 2020, researchers at the University of Hong Kong discovered the cost-effectiveness and exceptional durability of single-pixel cameras[258]. They used a single-pixel fiber-coupled system to demonstrate real-time display of terahertz videos (32 pixel × 32 pixel, 6 frame/s) using photocurrent-based terahertz detectors. This design achieved fast, noise-resistant imaging without requiring lengthy post-processing and without reducing the time-resolving capability of the terahertz spectrometer. The following year, researchers at the Institute of Physics, Chinese Academy of Sciences, presented neutron single-pixel imaging with specifically designed masks[259]. This approach utilized a single-pixel detector to obtain images of complex objects with high spatial and temporal resolution. The experimental setup demonstrated simplicity, low cost, and ease of operation.

    Summarily, optical systems serve as a crucial component in imaging, responsible for light field modulation and information collection. The design of computational optical systems is also a critical aspect of the computational imaging pipeline. Addressing the limitations of traditional photonic imaging, computational optical systems are evolving toward smaller size, simpler structure, lower cost, larger FOV, higher resolution, stronger adaptability to different environments, and the ability to acquire more dimensions of information.

    4.4. Computational detector

    A computational detector is designed based on the idea of computational optical imaging and can project spatial, temporal, and physical multidimensional information. Compared with traditional detectors, computational imaging detectors have many revolutionary advantages, such as significantly improving the quality of imaging (SNR, contrast, and dynamic range), simplifying the imaging system (no lens, reduced volume, and reduced cost), breaking through the physical limitations of optical systems and image acquisition devices (imaging dimension, resolution, and field size), and significantly improving the information acquisition capability, functionality, and performance indicators (phase, coherence, 3D shape, depth extension, blur restoration, and refocusing) of the imaging system. Current computational detectors include non-uniform, curved, multidimensional physical quantity, and ultra-high-speed detectors (Fig. 48).

    4.4.1. Non-uniform sampling detector

    The sampling method of uniform detectors is global equal weight resampling, which leads to excessive sampling of unimportant information, redundant waste, and insufficient sampling in the parts that require key sampling, hindering the collection of information. Non-uniform sampling methods can effectively address this issue by selectively collecting the information required, thereby improving sampling efficiency and saving sampling space. The non-uniform sampling detector was developed using non-uniform sampling and non-uniform sampling methods were used to collect information.

    Notably, signals with specific structures can be sampled below the Nyquist rate. In the field of algorithms, Landau discovered in 1967 that, assuming a bandwidth of B, a multi-band signal with N discontinuous bands of bandwidth B can be represented by an average sampling rate not less than twice the total bandwidth. In 2006, Candès et al. extended Landau’s concept to sparse signals, such as discrete Fourier transform. These results are termed compressive sensing (CS). In 2017, Michaël et al. proposed a novel compressed sensing-based converter architecture for cognitive radio frequency receivers[260], as shown in Fig. 49(a). This method is termed non-uniform wavelet sampling. It combines wavelet preprocessing with non-uniform sampling to address the challenges of existing converters, such as signal noise, aliasing, and strict clock constraints, thereby achieving a wide range of target feature extraction tasks. However, this method is subject to various real-world limitations, such as noise folding, low sensitivity, aliasing, and limited flexibility. In 2020, Golowicz et al. used sparse or non-uniform sampling techniques to significantly shorten experimental time by omitting most of the data during the measurement process and mathematically reconstructing the data[261].

    (a) Non-uniform detector[260]. (b) Curved surface detector[264]. (c) Multidimensional physical quantity detector[335]. (d) Ultra-high-speed detector[336,337].

    Figure 48.(a) Non-uniform detector[260]. (b) Curved surface detector[264]. (c) Multidimensional physical quantity detector[335]. (d) Ultra-high-speed detector[336,337].

    (a) Empirical phase transition graph of non-uniform wavelet bandpass sampling (NUWBS) for multi-band signal acquisition compared to the theoretical τ1-norm phase transition for a Gaussian measurement ensemble (shown with the dashed purple line)[260]. (b) Nyquist real-time sampling and hybrid sampling[262]. (c) The 3D detection results of left and right images and the corresponding results in bird view are shown[263].

    Figure 49.(a) Empirical phase transition graph of non-uniform wavelet bandpass sampling (NUWBS) for multi-band signal acquisition compared to the theoretical τ1-norm phase transition for a Gaussian measurement ensemble (shown with the dashed purple line)[260]. (b) Nyquist real-time sampling and hybrid sampling[262]. (c) The 3D detection results of left and right images and the corresponding results in bird view are shown[263].

    In terms of hardware, non-uniform sampling detectors have also made new progress. In 2022, Wang et al. established a single-pixel camera using an array detection imaging system based on the application of CS theory for high-pixel detection[262]. This new system can reduce the number of measurements required to reconstruct high-quality images and address situations where the target may appear in the FOV without increasing the number of detectors. In the same year, Wang et al. proposed a novel spaceborne high-resolution synthetic aperture radar (SAR) system using non-uniform mixed sampling technology. As shown in Fig. 49(b), non-uniform mixed sampling technology can optimize the timing of SAR signal transmission and reception. Simultaneously, using the oversampling requirements of SAR imaging in the azimuth direction, a theoretical model of non-uniform mixed sampling parameters and the relative velocity between the SAR system and spatial targets has been established. In 2023, Gao et al. proposed a new shape-aware non-uniform sampling strategy[263]. As shown in Fig. 49(c), dense sampling is performed in the peripheral area, and sparse sampling is performed in the internal area instead of uniform sampling, thus sampling additional points from external regions and extracting useful features for 3D detection. However, this technology still has certain limitations, as it cannot be directly extended to general object detection using a single non-uniform sampling model, rendering it difficult to detect multiple occluded targets.

    Non-uniform sampling detectors have been an important research direction in the fields of image processing and signal processing in recent years. By unevenly allocating sampling points during the sampling process, non-uniform sampling detectors can significantly reduce the sampling rate while maintaining image quality, thereby reducing data transmission and storage costs and improving system efficiency and performance. Non-uniform sampling detectors are expected to be widely applied in fields such as medical imaging, video surveillance, and remote sensing.

    4.4.2. Curved surface detector

    Curved surfaces are generally favored for image sensors, much like the 35 mm photographic film used in both human eyeballs and traditional analog cameras. However, contemporary digital image sensors are typically flat and suffer from vignetting, where image quality degrades from the center toward the edges of the detector. This occurs because photons strike the external pixels at an angle. To mitigate vignetting, a series of optical lenses are often employed, but this can result in a bulky optical system. Consequently, detectors designed with a retina-like surface are well-suited for applications demanding a wide FOV, high resolution, and real-time performance.

    Many researchers have examined the properties of the retina, as illustrated in Figs. 50(a)50(f), and they have developed curved imaging theories and systems that mimic the retina. The surface of these curved detectors is perpendicular to the incident light, which significantly enhances their performance by eliminating dark angles, improving overall brightness, and enhancing image quality at the periphery.

    (a) Schematic of the human visual system. (b) The human eye and (c) the retina. (d) Schematic of our eyes’ imaging system. (e) The working mechanism of eyes. (f) Perovskite nanowires and their crystal structures[265]. (g) Schematic of the test bench used for characterization of the curved digital X-ray detector, showing the X-ray source, bone phantom, and curved digital X-ray detector[264]. (h) Imaging results acquired by the adaptive imager for objects at different distances[266].

    Figure 50.(a) Schematic of the human visual system. (b) The human eye and (c) the retina. (d) Schematic of our eyes’ imaging system. (e) The working mechanism of eyes. (f) Perovskite nanowires and their crystal structures[265]. (g) Schematic of the test bench used for characterization of the curved digital X-ray detector, showing the X-ray source, bone phantom, and curved digital X-ray detector[264]. (h) Imaging results acquired by the adaptive imager for objects at different distances[266].

    To enhance image quality, Albert et al. introduced a technique in 2020 that employs organic photodetectors (OPDs) to develop high-resolution curved detectors on thin plastic substrates[264]. These curved detectors offer more uniform image quality compared to flat digital detectors and, when paired with 3D reconstruction algorithms, provide improved 3D visualization. Additionally, the integration of curved detectors has halved the size of 3D X-ray imaging systems. In the same year, Gu et al. unveiled a biomimetic electrochemical eye[265], featuring a hemispherical retina composed of a high-density perovskite nanowire array sensitive to light, mimicking the photoreceptors of the biological retina. Illustrated in Fig. 50(g), this retina replicates the structure of human photoreceptors. The design of this device closely resembles the human eye, and processing individual nanowires electrically can lead to high imaging resolution. The biomimetic electrochemical eye boasts high responsiveness, fast response times, a low detection threshold, and a broad FOV. Additionally, the image-sensing capabilities of this biomimetic device were demonstrated by reconstructing the optical pattern projected onto it. This technology paves the way for the broad adoption of biomimetic optical sensing devices.

    To achieve focused views of objects at various distances, Rao et al. introduced a high pixel fill factor curved and shaped adaptive imager in 2021[266]. This technology was further enhanced with the development of an adaptive imager that combines a concave imager with an adjustable mirror to maintain focus across different distances. Although successful, as shown in Fig. 50(h), the curved imager has limitations. Its flat origami structure cannot perfectly fold into a hemisphere, leading to increased optical aberrations, image stitching errors, and complexities in the readout circuit.

    Furthermore, to enable multispectral imaging over a large FOV, Zhang et al. developed a novel biomimetic multispectral curved compound eye camera in 2023[267]. This camera system, designed for aerial multispectral imaging, offers a maximum FOV of 120 degrees and captures images in seven spectral bands from visible to near-infrared wavelengths. The technology has proven effective for large-field aerial multispectral imaging and shows considerable potential for long-range detection applications based on aerial imaging.

    In summary, curved detectors hold promising prospects across various fields, including optical imaging, radar imaging, and medical imaging. Unlike traditional flat detectors, which struggle with imaging distortions when handling curved or irregularly shaped objects, curved surface detectors can effectively overcome these issues due to their larger optical receiving area. By modifying their shape and curvature, these detectors are capable of multi-point focusing, allowing them to adapt and optimize imaging for different scenarios. Looking ahead, advancements in curved detectors are expected to deliver higher resolution, broader fields of view, and quicker imaging speeds, significantly enhancing convenience and efficiency in both everyday life and professional settings.

    4.4.3. Multidimensional physical quantity detector

    Since Maxwell first theorized over 150 years ago that light is an electromagnetic wave, it has been acknowledged that amplitude, polarization, phase, and frequency are fundamental parameters of light waves. However, like human eyes, current photoelectric imaging detectors primarily capture only intensity information, missing other multidimensional physical properties such as spectrum, polarization, and phase. This limitation results in a significant loss of light field information during the imaging process. To address this, multidimensional physical quantity detectors have been developed, including types such as filter-based, quantum dot thin films, and metasurfaces.

    Metasurfaces, which are artificially structured materials with thicknesses less than the wavelength of light, allow for flexible and effective control over the electromagnetic wave’s characteristics, including polarization, amplitude, phase, and propagation mode. In 2018, Mitrofanov et al. utilized this technology by designing an optical thin photoconductive channel as a fully dielectric metasurface[268]. These metasurfaces achieve enhanced optical absorption and have been integrated into photoconductive terahertz detectors, yielding high efficiency and sensitivity. Compared to similar detectors with unstructured surfaces that only use 0.5 mW of light excitation, this metasurface detector produces a photocurrent an order of magnitude higher and exhibits the high dark resistance essential for low-noise detection in terahertz time-domain spectroscopy and imaging. With such low light excitation, the metasurface detector achieves an exceptionally high SNR of 106.

    Terahertz (THz) photoconductive devices, which are instrumental in generating, detecting, and modulating terahertz waves, operate by switching conductivity on a sub-picosecond timescale using optical pulses. In 2019, Siday et al. enhanced the efficiency of this conductivity switching using an electrically connected network of nanoscale GaAs resonators, creating a fully absorbed photoconductive metasurface[269]. This type of metasurface, when integrated with terahertz antennas, forms efficient photoconductive terahertz detectors. The perfectly absorbed photoconductive metasurface paves the way for the development of various efficient optoelectronic devices, optimizing optical and electronic performance through a network of nanostructured resonators. In the same year, Li et al. developed an intelligent metasurface imager and recognizer utilizing artificial neural network (ANN) technology for adaptive control of data flow[270]. This system incorporates three ANNs in a hierarchical structure to process microwave data into comprehensive human body images, classify specific anatomical regions (such as hands and chest), and instantly recognize human hand gestures at a 2.4 GHz Wi-Fi frequency.

    Metasurfaces, created by scanning a focused laser beam within a glass substrate, can be seamlessly integrated with conventional optical components. In 2019, Zhou et al. introduced an edge detection mechanism using metasurfaces[271]. Through experiments, they demonstrated the use of a specifically designed dielectric metasurface that achieved high optical efficiency for broadband edge detection. This technology has significant applications in real-time image processing monitoring, high-contrast microscopy, and compact optical platforms like smartphones and smart cameras.

    The discovery of entangled photons has significantly enhanced the imaging capabilities of metasurfaces. In 2020, Zhou et al. proposed and experimentally demonstrated the use of polarized entangled photon sources to selectively activate or deactivate the optical edge detection mode in imaging systems equipped with efficient dielectric metasurfaces[272]. As illustrated in Figs. 51(a) and 51(b), this experiment broadens the scope of metasurfaces and quantum optics, offering a promising avenue for quantum edge detection and image processing with improved SNRs. Metasurfaces, whether engineered from dielectric or metallic structures, hold substantial potential for advancing quantum edge detection and image processing.

    (a) The schematics of metasurface enabled quantum edge detection. (b) The switch state ON or OFF of the heralding arm. When the idler photon of the omen arm projects onto the surface |H⟩, it indicates a closed state, resulting in the capture of a solid cat. The predicted photons are projected onto the surface |V⟩, and the edge-enhanced contour cat is obtained in the ON switch state. (c) Edge-detection experiments with red and green HeNe laser sources[274].

    Figure 51.(a) The schematics of metasurface enabled quantum edge detection. (b) The switch state ON or OFF of the heralding arm. When the idler photon of the omen arm projects onto the surface |H, it indicates a closed state, resulting in the capture of a solid cat. The predicted photons are projected onto the surface |V, and the edge-enhanced contour cat is obtained in the ON switch state. (c) Edge-detection experiments with red and green HeNe laser sources[274].

    In the realm of 3D computer vision technology, LiDAR is regarded as a benchmark for robotic vision at the industrial level. Despite ongoing advancements in LiDAR integration and optimization, commercial devices often suffer from slow frame rates and low resolution, primarily due to limitations in mechanical or solid-state deflection systems. In 2022, Martins et al. introduced an advanced LiDAR technique that leverages ultrafast low FOV deflectors combined with large-area metasurfaces. This configuration enables a broad FOV (150°) and high frame rates (kHz)[273], allowing for simultaneous imaging of peripheral and central areas. Integrating this innovative LiDAR technology with sophisticated learning algorithms offers a new method for enhancing the perception and decision-making capabilities in advanced driver assistance systems (ADAS) and robotic systems.

    Additionally, significant progress was made in 2023 with metasurface detectors for edge detection, as depicted in Fig. 51(c). Tanriover et al. proposed and experimentally demonstrated that metasurfaces based on Fourier optics exhibit high transmission efficiency for 2D isotropic[274], polarization-independent, and broadband edge detection across the visible light frequency spectrum under both coherent and incoherent illumination.

    A multidimensional physical quantity detector is a device capable of simultaneously detecting various physical quantities, including polarization, phase, spectrum, and light intensity. This capability enhances the efficiency and accuracy of data acquisition. Unlike traditional detectors that measure a single physical quantity and require multiple assessments to gather comprehensive information, multidimensional detectors can capture data from several parameters in a single measurement. This not only conserves time and resources but also provides multidimensional spatial information about the target. Consequently, these detectors enable a thorough description and precise analysis of the target’s state.

    4.4.4. Ultra-high-speed computing detector

    The ultra-high-speed computing detector is designed for high-speed imaging, offering minimal delay and a broad dynamic range, presenting significant research potential. These detectors are categorized into two main functionalities: global perception of the array and dynamic perception of rapid changes, such as those captured by event cameras.

    Ultrafast and efficient single-photon detectors are crucial in modern quantum optics and quantum communication. Although their detection efficiency is often hampered by imperfect mode matching and limited photon absorption, Pernice et al. made a significant advancement in 2012 by demonstrating a superconducting nanowire detector atop a nanophotonic waveguide[275]. This design markedly increases the absorption length of incident photons, enabling high on-chip single-photon detection efficiencies up to 91% at telecommunications wavelengths. It also allows for replication across multiple chips, maintaining a low dark count rate without compromising detection efficiency. Additionally, these detectors offer high temporal resolution, making them suitable for on-chip implementation.

    To enhance the detection system’s SNR, Chen et al. developed an algorithm in 2018 aimed at accelerating target recognition[276]. This method is based on the assumption that the mean of the eigenvalues of the Wigner matrix is zero, effectively eliminating background noise eigenvalues for ultra-high-speed target detection. Despite its ability to identify target speed using the distribution and mean function of additive Wigner matrix’s eigenvalues, the method has limitations. Specifically, it loses information on the variability of feature values at different speeds, which hampers effective speed differentiation. To address this issue, Li et al. introduced a new hardware-oriented algorithm in 2019, illustrated in Figs. 52(a) and 52(b)[277]. Designed for implementation on field programmable gate arrays, this algorithm supports high-speed vision platforms. It is tailored for high-frame-rate, high-data-throughput, and high-parallelism processing of low-latency video streams, effectively distinguishing between different speeds.

    (a) Detection results in the multi-object detection experiment. (b) Object numbers in the multi-object detection experiment[277]. (c) Video reconstructions of high-speed physical phenomena[278]. (d) Data processing flow. (e) The event denoising results of the dataset overlaid on the corresponding image[280].

    Figure 52.(a) Detection results in the multi-object detection experiment. (b) Object numbers in the multi-object detection experiment[277]. (c) Video reconstructions of high-speed physical phenomena[278]. (d) Data processing flow. (e) The event denoising results of the dataset overlaid on the corresponding image[280].

    Computational processing. (a) Image fusion [283,286,288]. (b) Computational image enhancement[292,297,302,309]. (c) Super-resolution reconstruction[317,318].

    Figure 53.Computational processing. (a) Image fusion [283,286,288]. (b) Computational image enhancement[292,297,302,309]. (c) Super-resolution reconstruction[317,318].

    Recent advancements have also been made in the field of event cameras, a novel type of sensor that captures brightness changes as asynchronous “event” streams instead of traditional intensity frames. Event cameras offer several advantages over conventional cameras, including high temporal resolution, HDR, and the absence of motion blur. In 2019, Rebecq et al. introduced a new recurrent network designed to reconstruct videos from event streams, which was trained using a large dataset of simulated event data[278]. This development enhances the method of synthesizing color images from color event streams. Experiments have demonstrated that this network can generate high-frame-rate videos (over 5000 frames per second) capturing high-speed phenomena, such as bullets striking objects, and can provide HDR reconstruction even under challenging lighting conditions. As depicted in Fig. 52(c), the network’s ability to effectively reconstruct event data as an intermediate representation has been proven, showing that traditional computer vision algorithms can be adapted for tasks like target classification reconstruction and visual-inertial odometry.

    In a recent development, Glover et al. introduced a new method for corner detection in 2021, named Find Event Harris, which utilizes the Harris algorithm to achieve high accuracy while improving event throughput[279]. This algorithm minimizes the computational load for each event and conducts computationally intensive convolutions as swiftly as possible, that is, only when computing resources become available. The result is an effective, real-time angle detector that operates at a speed more than 2.6 times faster than current state-of-the-art devices.

    Building on previous research, Baldwin et al. proposed a method in 2022 for representing temporal aspects of events[280]. As illustrated in Figs. 52(d) and 52(e), this approach aims to compactly store original peak timing information with minimal information loss. The biomimetic design features high memory efficiency and rapid processing speeds, and it avoids time constraints such as fixed, predefined frame rates. It also incorporates “local memory” to retain past data, enhancing performance in various applications including event denoising, image reconstruction, classification, and human pose estimation.

    Ultrafast detectors are crucial for observing and capturing extremely rapid phenomena in the microscopic world. With response times of milliseconds or even sub-milliseconds, these detectors enable the high-speed detection and recording of various physical processes, such as light, electricity, and magnetism. Providing precise time resolution, ultra-high-speed detectors offer critical data support for studying transient behaviors of physical processes and tracking rapidly moving objects. They hold vast potential in fields like engineering testing, medical diagnostics, and communication technology.

    The emergence of computational detectors has revolutionized imaging systems by not only simplifying them but also enhancing their imaging quality. These detectors transcend the physical limitations of traditional optical detectors, significantly boosting the information acquisition capabilities and performance metrics of imaging systems. As these technologies continue to evolve, computational detectors are expected to advance further, providing even more robust support for scientific research and exploration.

    4.5. Computational processing

    As science and technology advance, a variety of new detectors have emerged. However, in certain situations such as overly strong or weak illumination, or insufficient resolution of equipment, these detectors still cannot directly produce satisfactory images. Computational processing serves as the final step in the imaging chain, where image data captured by detectors are algorithmically processed to adjust images to levels more suitable for human observation. Computational processing can generally be categorized into three areas: image fusion, image enhancement, and super-resolution (Fig. 53).

    4.5.1. Image fusion

    Image fusion involves using computer algorithms to combine source images captured by different types of detectors into a single image that contains rich details from the original sources, making it easier for the human visual system to observe. Compared to individual source images, a fused image can more clearly capture the scene information of the target, significantly improving the quality and clarity of the image. Image fusion can be tailored for specific applications, including multi-focus image fusion, infrared-visible light fusion, and multispectral hyperspectral fusion.

    In daily life, when using cameras, people strive to capture clear images of entire scenes. However, due to the limited DOF of camera lenses, not all areas can be in focus simultaneously, resulting in some parts of the image being sharp while others are blurred. Multi-focus image fusion technology addresses this issue by combining multiple images, each focused on different areas of the same scene, into a single clear image. This significantly enhances the effective utilization of the information captured in the images.

    In the realm of multi-focus imaging, challenges, such as anisotropic blur and registration errors, frequently occur due to movement of the objects or camera. These issues considerably degrade the quality of the fused images. In 2014, Zhou et al. introduced an improved fusion method that utilizes a weighted gradient approach to resolve artifacts caused by anisotropic blur and misalignments[281]. This method surpasses traditional fusion techniques by more effectively handling anisotropic ambiguity and registration errors, and it also requires less memory. Following this, in 2014, Liu et al. developed a novel multi-focus image fusion method employing dense scale-invariant feature transform (SIFT), which aligns misregistered pixels across multiple source images to enhance the fusion quality[282]. These methods involve blocking techniques to detect focus areas; however, the fixed size of image blocks leads to a block effect at the boundaries of the fused image, affecting the quality of the fused result.

    With the advancement of deep learning, its significance in image processing has grown. In 2021, Zhang et al. proposed an unsupervised generative adversarial network (GAN), named MFF-GAN, with adaptive and gradient joint constraints for multi-focus image fusion[283]. The architecture of this network is illustrated in Fig. 54. MFF-GAN not only achieves good overall clarity but also preserves local details, particularly near the junctions of focused and defocused areas.

    MFF-GAN[283]. (a) Overall fusion framework. (b) Illustration of the decision block. (c) Network architecture of the discriminator. (d) Network architecture of the generator.

    Figure 54.MFF-GAN[283]. (a) Overall fusion framework. (b) Illustration of the decision block. (c) Network architecture of the discriminator. (d) Network architecture of the generator.

    Infrared imaging is notable for its robust anti-interference capabilities, strong target recognition, and all-weather functionality. However, infrared images often suffer from low contrast, blurred edges, a low SNR, and complex components. Conversely, visible light imaging boasts rich spectral information, high resolution, and a wide dynamic range, but it struggles with low contrast in night vision and low visibility environments. Using either infrared or visible images alone has significant limitations. Visible and infrared image fusion (VIF) combines the infrared radiation information with the detailed information of visible light, finding applications in fields such as industry, daily life, military, and surveillance, and it is a key research area in image fusion. One objective of combining infrared and visible images is to merge the complementary information from both to provide a comprehensive view of a scene from different perspectives. Existing fusion methods based on GANs often fail to identify and enhance the most distinctive regions of the images. Addressing this, in 2021, Li et al. introduced an end-to-end infrared and visible image fusion method known as Attention FGAN[284]. This method enables the generator and discriminator to focus on the foreground target information in infrared images and the prominent details in visible images, ensuring that the fusion retains the intensity and texture information of the original images effectively.

    Moreover, the aim of infrared and visible image fusion is to create a composite image that not only highlights prominent targets and preserves rich texture details but also supports the completion of advanced visual tasks. Existing fusion algorithms often focus solely on the visual quality and statistical metrics of the composite image, neglecting the demands of more sophisticated visual tasks. To address these challenges, in 2022, Tang et al. developed a semantic-aware real-time image fusion network, SeAFusion[285], which bridges the gap between image fusion and advanced visual tasks. This enhances the performance of visual tasks on fused images and improves the network’s ability to capture spatial details. The algorithm is efficient and suitable for real-time preprocessing in advanced visual tasks. This is due to the fact that the existing image fusion algorithm does not consider the illumination factor in the modeling process.

    Additionally, considering the challenges of extreme lighting conditions, Tang et al. in 2022 proposed the PIAFusion network, a progressive image fusion framework based on illumination perception[286]. This network, depicted in Fig. 55, adaptively maintains the intensity distribution of prominent targets and preserves the texture details in the background by integrating meaningful information from the source images around the clock based on varying illumination conditions.

    PIAFusion network[286]. (a) The framework of PIAFusion network. (b) Visualized results of images and feature maps in the nighttime scenario. The first column shows the infrared image, visible image, and fused image, respectively. The following three columns present the feature maps corresponding to the infrared, visible, and fused images in various channel dimensions.

    Figure 55.PIAFusion network[286]. (a) The framework of PIAFusion network. (b) Visualized results of images and feature maps in the nighttime scenario. The first column shows the infrared image, visible image, and fused image, respectively. The following three columns present the feature maps corresponding to the infrared, visible, and fused images in various channel dimensions.

    Considering the limitations in optical imaging, image acquisition equipment usually compromises between spatial information and spectral information. Hyperspectral images (HSIs) are rich in spectral information, allowing for precise identification and classification of targets. Multispectral images (MSIs), on the other hand, provide detailed geometric features due to their richness in spatial information. The fusion of multispectral and HSIs aims to combine high-resolution multispectral (HrMS) and low-resolution hyperspectral (LrHS) images to create high-resolution hyperspectral (HrHS) images. In 2021, Dian et al. proposed a new fusion method for HSI and MSI based on subspace representation and a convolutional neural network (CNN) denoiser, termed CNN-Fus[287]. This method requires only an initial training on more accessible gray-level images and can be applied to any HS and MS dataset without the need for retraining. It outperforms the most advanced fusion methods in terms of performance. In 2022, Xie et al. developed a network architecture called MHF-net for the MS/HS convergence task and introduced two deep learning mechanisms for common real-world scenarios: consistent MHF-net and blind MHF-net[288]. The former is designed for scenarios in which the spectral and spatial responses of the training and test data are consistent, while the latter caters to situations where these responses do not match, ensuring successful image fusion. The structure and experimental results are illustrated in Fig. 56.

    MHF-net[288]. (a) and (b) are illustrations of the observation models for HrMS and LrHS images, respectively. (c) is the illustration of how to create the training data when HrHS images are unavailable. (d) is the illustration of the blind MH/HS fusion net. (e) is the experimental results.

    Figure 56.MHF-net[288]. (a) and (b) are illustrations of the observation models for HrMS and LrHS images, respectively. (c) is the illustration of how to create the training data when HrHS images are unavailable. (d) is the illustration of the blind MH/HS fusion net. (e) is the experimental results.

    However, hyperspectral images often contain significant noise due to factors such as adverse weather or aging sensors, including Gaussian, stripe, and mixed noise, which degrade the quality of the fused images. In 2022, Sun et al. introduced a multi-scale low-rank depth back projection fusion network (MLR-DBPFN), which effectively removes spectral noise characteristics, achieving high-quality HS fusion under noisy conditions[289]. While MLR-DBPFN demonstrates robust fusion performance and noise removal, it is currently limited to datasets with a spatial resolution of 4. Enhancing these methods to accommodate higher spatial resolutions remains a crucial area of research.

    4.5.2. Computational image enhancement

    Computational image enhancement involves emphasizing important information in an image according to specific needs while reducing or eliminating unnecessary details. For instance, images captured at night often suffer from low contrast and dark colors. Image enhancement techniques can adjust these images to make them more suitable for human observation. This section will cover four key areas: contrast enhancement, low light enhancement, HDR imaging, and virtual histological staining.

    Contrast is a crucial visual feature in digital image processing, referring to the degree of brightness difference within an image. High-contrast images show a clear distinction between bright and dark areas, whereas low-contrast images do not display these differences distinctly. Contrast enhancement involves adjusting the image’s brightness distribution to amplify the differences between bright and dark areas and enhance the gray level differences across various parts of the image, making it clearer and easier to observe and analyze. In 2013, Lee et al. proposed a contrast enhancement algorithm based on the hierarchical differential representation of a 2D histogram[290]. This method enhances image contrast by enlarging the gray level differences between adjacent pixels, effectively improving both the objective and subjective quality of the image.

    Previous single-image contrast enhancement (SICE) methods typically adjusted the image’s tone curve to correct contrast. However, limited by the information available in a single image, these methods have often failed to reveal detailed image features. In 2018, Cai et al. utilized a CNN to train a SICE enhancer using a large-scale multi-exposure image dataset[291]. This approach allows the CNN to enhance the contrast of underexposed or overexposed images effectively. However, in cases of severe overexposure, where little usable information remains, these methods struggle to reconstruct the lost details in highly overexposed areas.

    Histogram equalization (HE), a common technique for enhancing contrast, does not consider the neighborhood information around each pixel, which can introduce noise into the output image. To address this issue, in 2022, Agrawal et al. introduced new joint histogram equalization (JHE) technology[292]. This technique utilizes the relationships between each pixel and its adjacent pixels to improve image contrast more effectively than traditional HE methods. Importantly, it also works well for images with low dynamic ranges. Figure 57 illustrates the enhancement results achieved using these three algorithms.

    Results of three methods. (a) Lee’s method[290]. (b) SICE[291]. (c) JHE[292].

    Figure 57.Results of three methods. (a) Lee’s method[290]. (b) SICE[291]. (c) JHE[292].

    The existing CNN architecture has failed to achieve the best results in both the performance and application scope of infrared image enhancement tasks. In order to solve this problem, in 2018, Kuang et al. proposed a deep learning method for single infrared image enhancement (IE-CGAN)[293] and introduced a conditional generation countermeasure network into the optimization framework to avoid amplifying background noise and enhance contrast and detail. This method is superior to the existing image enhancement algorithms in contrast and detail enhancement. The specific structure of the network and the enhancement results are shown in Fig. 58.

    IE-CGAN[293]. (a) An overview of IE-CGAN. (b) Results of two methods.

    Figure 58.IE-CGAN[293]. (a) An overview of IE-CGAN. (b) Results of two methods.

    Due to the absorption and scattering of light, the captured underwater images usually contain serious color distortion and contrast reduction. In order to solve the above problems, in 2020, Fu et al. combined the advantages of deep learning and traditional image enhancement technology and proposed a double-branch network to compensate for global color distortion and local contrast reduction, respectively[294]. This method can generate realistic results without introducing excessive enhancement and extra computational burden.

    Due to unavoidable environmental or technical limitations, such as insufficient lighting and limited exposure time, images captured under poor lighting conditions often suffer aesthetically and perform poorly in advanced visual tasks. Low-light enhancement processes these images through algorithms to improve their visibility and suitability for advanced visual tasks. This technique has broad applications across various fields, including visual monitoring, autonomous driving, and computational photography.

    In surveillance and tactical reconnaissance, collecting and accurately processing visual information from dynamic environments is crucial, but cameras often struggle to capture clear images or videos in low-light conditions. In 2017, Lore et al. introduced a method based on a depth self-encoder (LLNet) that identifies signal features in low-light images and adaptively brightens them without over-amplifying or saturating the brighter areas in HDR images[295]. Enhancing low-light images involves not only restoring brightness but also addressing complex issues such as color distortion and noise. Simple brightness adjustments are insufficient for solving these challenges. In 2021, Lv et al. proposed an attention guidance enhancement scheme that uses an attention map and noise map to guide the enhancement in a region-adaptive manner[296]. In 2022, Li et al. developed a depth network for low-light image enhancement called Zero-DCE, which supports end-to-end training without reference images and is noted for its lightweight and rapid application value[297].

    Among existing enhancement technologies, Retinex-based and learning-based enhancement methods are at the forefront of research. To bridge the gap between these two approaches, in 2022, Zhao et al. introduced a new Retinex decomposition strategy termed RetinexDIP, which reinterprets the decomposition process as a generation problem and performs Retinex decomposition without relying on external images[298]. This method allows for easy adjustment of estimated illuminance for enhancement, though it is limited by lengthy optimization times. Figure 59 displays the results from the aforementioned methods.

    Imaging results of four methods. (a) RetinexDIP[298]. (b) LLNet[295]. (c) Zero-DCE[297]. (d) Lv’s method[296].

    Figure 59.Imaging results of four methods. (a) RetinexDIP[298]. (b) LLNet[295]. (c) Zero-DCE[297]. (d) Lv’s method[296].

    Because of the great changes in brightness and contrast, aerial images in low-light conditions are a challenging problem. In 2022, Singh et al. proposed a new architecture called RNet, which was used to enhance aerial images in low light[299]. RNet uses multi-scale feature fusion to extract rich local semantic information through high-resolution features and uses low-resolution representation to understand the global context. The proposed network can be superior to other methods based on deep learning and traditional enhancement techniques.

    Low-light image enhancement aims at improving the visual quality of images taken in low illumination. However, there are many problems in existing low-light enhancement methods, such as poor robustness to various low-light conditions or sacrificing computational efficiency to enhance performance, which hinder their practical application. In order to solve these problems, in 2024, Li et al. proposed a new enhancement method called Pixel-by-Pixel Gamma Correction Mapping (PWGCM)[300], which combined Pixel-by-Pixel Gamma Correction (GC) and deep learning, and could handle various low-light scenes with extremely fast speed and low computational cost. The specific structure of the network and the enhancement results are shown in Fig. 60.

    PWGCM[300]. (a) Overview of PWGCM. (b) Visualization of gamma correction map and the results in each iteration. (c) Results of several methods.

    Figure 60.PWGCM[300]. (a) Overview of PWGCM. (b) Visualization of gamma correction map and the results in each iteration. (c) Results of several methods.

    In the field of low-light image enhancement, existing deep learning methods face three challenges: inaccurate reflection component estimation, poor image enhancement ability, and high calculation cost. In 2024, Yang et al. proposed ULENet, an ultra-lightweight and efficient neural network for low-light image enhancement[301]. In the complex low-light scene, ULENet is obviously superior to other most advanced low-light enhancement methods in speed, accuracy, and adaptability, but the processed image noise will be very obvious under extremely low-light conditions.

    In the future, advancements in low-light image enhancement should aim not only to enhance image quality but also to increase optimization speed. Achieving real-time imaging under low-light conditions will better support advanced visual tasks such as unmanned driving and visual monitoring.

    Histological staining is the gold standard of tissue examination in clinical pathology and life science research. It uses color dyes or fluorescent markers to visualize the tissue and cell structure, making the sample easy to observe. However, the current histological staining work requires complicated sample preparation steps, specialized laboratory infrastructure, and trained technicians, which makes it expensive and time-consuming. Virtual histological staining technology directly generates dyed samples through the neural network, which reduces the time-consuming and laborious histological staining procedure. In 2019, Rivenson et al. used CNNs to convert the wide-field fluorescence images of unlabeled tissue sections into histologically stained versions of the same samples[302]. Professional certified pathologists have made a blind comparison between this virtual histological staining method and the standard histological staining method. There is not much difference in microscopic image staining of human tissue sections of the salivary gland, thyroid gland, kidney, liver, and lung with different types of stains. The specific structure of the network and the results of lung staining are shown in Fig. 61.

    Rivenson’s method[302]. (a) The schematic outlines the steps in the standard (top) and virtual (bottom) staining techniques. (b) Virtual staining GAN architecture. (c), (d) Virtual staining results match Masson’s trichrome stain for lung tissue sections.

    Figure 61.Rivenson’s method[302]. (a) The schematic outlines the steps in the standard (top) and virtual (bottom) staining techniques. (b) Virtual staining GAN architecture. (c), (d) Virtual staining results match Masson’s trichrome stain for lung tissue sections.

    Histological analysis of arterial tissue samples is a widely used method for diagnosis and quantification of cardiovascular diseases. Labor-intensive tissue staining procedures hinder histological image analysis. In 2020, Li et al. developed a method based on deep learning[303], which transformed the bright-field microscope images of unlabeled tissue sections into equivalent bright-field images of histologically stained versions of the same samples. After evaluation by professional pathologists, there is no obvious difference between the virtual staining and standard histological staining images of rat carotid artery tissue slices, and this method can be combined with other unlabeled microscopic imaging methods.

    Histological analysis of tissue samples is the basis of diagnosing the risk and severity of ovarian cancer. The commonly used hematoxylin-eosin (H&E) staining method has complicated steps and strict requirements, which seriously affects the study of histological analysis of ovarian cancer. The virtual histology staining of GAN provides a feasible method for these problems. In 2021, Meng et al. proposed a weak supervised learning method to generate the autofluorescence image of the undyed ovarian tissue slice corresponding to the H&E-stained ovarian tissue slice[304]. Through the doctor’s evaluation, the accuracy of the fluorescent image of ovarian cancer generated by this method reaches 93%. H&E staining of pathological sections of ovarian cancer provides a more effective solution.

    The dynamic range of natural scenes is extensive, but the dynamic range of commonly used cameras is limited. A single shot from a camera can only capture a restricted interval of the natural scene’s dynamic range, often resulting in the loss of some scene information. To address this, HDR imaging has been developed. This section briefly discusses multi-exposure image fusion and learning-based HDR imaging technologies.

    Multi-exposure image fusion is a vital technique for reconstructing HDR images without the need for hardware changes, radiance restoration, or complex methodological workflows. It has been widely adopted across various fields. Based on fusion rules, this method merges images taken at different exposures to produce images rich in dynamic range, capturing the full breadth of natural scenes as much as possible. In 1997 at Special Interest Group on GRAPHics and Interactive Techniques (SIGGRAPH), Debevec et al. presented a seminal paper titled “Recovering high dynamic range radiation patterns from photographs”[305]. This paper detailed a process of taking multiple photos of the same scene at different exposure settings and then combining these photos into a single HDR image that spans from dark shadows to bright light sources or high reflections.

    Eliminating the ghosting phenomenon in traditional HDR images, especially those with movement, poses a challenge. In 2012, Takao et al. proposed a multi-exposure fusion method designed to compensate for motion, occlusion, and saturated areas, enabling the production of HDR images free from motion blur[306]. More recently, in 2022, Han et al. introduced a multi-exposure fusion depth perception enhancement network, known as DPE-MEF[307]. The specific structure and experimental results of this network are depicted in Fig. 62. The network comprises two sub-modules: the detail enhancement module (DEM), which ensures the preservation of details and structure in the fused image; and the color enhancement module (CEM), which enhances the vividness of the colors in the image. However, issues such as camera and subject movement can sometimes cause misalignment between the foreground and background in the exposure sequence, potentially leading to unsatisfactory fusion results when using static fusion methods.

    DPE-MEF[307]. (a) The architecture of the detail enhancement module. The numbers indicate the channel amounts. (b) The architecture of the color enhancement module. The numbers indicate the channel amounts. (c) Imaging results of DPE-MEF.

    Figure 62.DPE-MEF[307]. (a) The architecture of the detail enhancement module. The numbers indicate the channel amounts. (b) The architecture of the color enhancement module. The numbers indicate the channel amounts. (c) Imaging results of DPE-MEF.

    HDR technology, based on deep learning, is a combination of deep learning and HDR imaging technology. It uses the neural network to learn and predict the brightness information of the image. Hence, it can generate a more realistic image. Compared with multi-exposure fusion, HDR technology based on deep learning has higher computational efficiency and better imaging quality. Additionally, HDR technology based on deep learning has better adaptability and flexibility, and it can better handle complex image data. In 2017, Eilertsen et al.[308] solved the problem of predicting the information lost in the saturated image area to realize HDR reconstruction from a single exposure. This method can reconstruct high-resolution and visually convincing HDR results in most cases, but there is a large saturated area for all color channels. Therefore, it is impossible to infer the structure and details. In 2021, Niu et al.[309] proposed a new HDR model based on GAN, HDR-GAN, to solve the problems due to the movement of large objects in the scene. This method can produce reliable information in areas where content is missing. The specific network structure and experimental results are shown in Fig. 63.

    HDR-GAN[309]. (a) Illustration of the proposed framework. (b) Imaging results of HDR-GAN.

    Figure 63.HDR-GAN[309]. (a) Illustration of the proposed framework. (b) Imaging results of HDR-GAN.

    Due to the lack of benchmark datasets and solutions for dynamic scenes, learning-based multi-exposure fusion (MEF) mainly focuses on static scenes, and it is easy to produce ghosting artifacts when dealing with more common scenes (input images contain motion). In 2023, Tan et al.[310] created a dynamic scene MEF dataset to fill this gap. This dataset contains multi-exposure image sequences and their corresponding high-quality reference images. Furthermore, they proposed a deep dynamic MEF (DDMEF) framework, which only reconstructs high-quality images without ghosts from two dynamic scene images with different exposures.

    With the development of technology, the resolution of photos acquired via the latest electronic products has become very high. For the existing models based on CNN, it is an arduous task to reconstruct HDR images directly from high-resolution images due to limited memory resources. In the future, it is necessary to examine a model that can efficiently and effectively perform HDR imaging on high-resolution images.

    4.5.3. Super-resolution reconstruction

    Image resolution is a critical performance parameter used to assess the amount of detailed information an image contains. High-resolution (HR) images, compared to low-resolution (LR) images, typically feature more pixels per inch, richer texture details, and higher fidelity. However, due to various constraints such as the limitations of imaging equipment, environmental factors, network transmission mediums, and bandwidth, as well as the inherent flaws in the image degradation models, obtaining ideal high-resolution images directly is often not feasible. The most straightforward method to enhance image resolution involves upgrading the optical hardware in the acquisition system. However, significant improvements in manufacturing processes are challenging to achieve and often come with high costs. Therefore, the focus has shifted toward software and algorithmic solutions, where specific algorithms are employed to convert a given low-resolution image into a corresponding high-resolution one. This approach, known as image super-resolution reconstruction, has become a prominent area of research across various fields, including image processing and computer vision.

    Traditional super-resolution reconstruction techniques can be broadly classified into three categories: interpolation-based, reconstruction-based, and learning-based methods. The interpolation approach primarily utilizes the known grayscale information of pixel points in low-resolution (LR) images, employing interpolation formulas to enhance the grayscale information between pixel points and achieve image enlargement. Reconstruction-based methods leverage probability theory and set theory, using LR images and prior knowledge to establish an optimal solution model. Shallow learning methods, on the other hand, depend on rule constraints and mapping relationships. These methods learn the transformation from LR to high-resolution (HR) images from a large number of training samples and apply this learned relationship to predict HR images from LR images. In 2001, Rajan et al. proposed a generalized interpolation scheme for image expansion and super-resolution image generation[311].

    This scheme excels at preserving regional uniformity and local variations in scene reflectivity during the interpolation process. By 2006, Zhang et al. developed an edge-guided linear minimum mean square error estimation technique for image interpolation[312], which avoids interpolating in the direction of edges, thus significantly reducing ringing and other visual artifacts. In 2011, Wu et al. introduced a learning-based super-resolution method that utilizes a KPLS regression model to generate an initial super-resolution image, which is then enhanced by compensating with a residual HR image before fusing the original and residual images to produce the final super-resolution image[313]. The principles and experimental results of this method are displayed in Fig. 64. In 2013, Wang et al. proposed an edge-oriented single-image super-resolution (SISR) algorithm[314]. This method estimates a clear HR gradient field directly from the input LR image, and this gradient is then used as a constraint to reconstruct the HR image, preserving fine details and sharp edges while minimizing blurry artifacts.

    Experimental results of Wu’s method[313]. (a) shows sample HR images including a wall image and a grape image, which are downsampled by factor 4 to get the corresponding LR images for testing. (b)–(d) show the experimental results conducted on the low-resolution image.

    Figure 64.Experimental results of Wu’s method[313]. (a) shows sample HR images including a wall image and a grape image, which are downsampled by factor 4 to get the corresponding LR images for testing. (b)–(d) show the experimental results conducted on the low-resolution image.

    Traditional sparse representation models (SRMs) often struggle with image interpolation because the data fidelity term does not impose structural constraints on missing pixels. In response, Dong et al. in 2013 introduced a nonlocal autoregressive model (NARM), integrating it with the SRM to enhance its effectiveness for image interpolation[315]. This integration significantly reduces the coherence between the sampling matrix and the sparse dictionary, improving SRM’s performance. In 2014, Liu et al. adapted a Bayesian method for video super-resolution that estimates potential motion, blur kernels, and noise levels while reconstructing the original HR frame[316]. This method is capable of producing super-resolution results and is adaptable to various noise levels and fuzzy kernels.

    Despite these advancements, the limited data retrieved from LR images still poses a challenge in restoring clear, detailed, and artifact-free images. In 2018, Yang et al. proposed a SISR method based on adaptive fractional step interpolation and reconstruction[317]. This approach effectively synthesizes clear edges while preserving texture information, with results demonstrated in Fig. 65.

    Experimental result of Yang’s method[317]. (a) Low-resolution image. (b) The result of bicubic interpolation. (c) Results of the proposed method.

    Figure 65.Experimental result of Yang’s method[317]. (a) Low-resolution image. (b) The result of bicubic interpolation. (c) Results of the proposed method.

    In traditional methods, the interpolation-based method has a poor processing effect in pixel abrupt changes such as edges and textures, and it is prone to sawtooth and block effects. The method based on reconstruction cannot simulate the real scene well. The method based on shallow learning is used in the case of small data, and the process of artificially designing features is complicated. The method based on deep learning involves using a significant amount of training data to learn certain corresponding relationships between low-resolution images and high-resolution images and then predict the high-resolution images corresponding to low-resolution images according to the learned mapping relationship. This aids in realizing the super-resolution reconstruction process of images. This type of algorithm not only changes the extraction and reconstruction of image features from the deep network structure but also solves the problems due to the deepening of network structure, such as over-fitting, gradient disappearance or explosion, sharp increase of model parameters, non-convergence or instability of the network, and self-optimization of parameters. This in turn obtains multi-scale and multi-detail image information. In 2016, Dong et al.[318] proposed the super-resolution convolutional neural network (SRCNN) method, which directly learned the end-to-end mapping between low/high-resolution images. It not only has good imaging quality but also has fast imaging speed.

    Most of the existing SR models based on CNN require high computing power and rarely explore the intermediate features that can aid in the final image restoration. To solve these problems, in 2021, Lan et al.[319] proposed MADNet, which exhibits enhanced performance with few multiple additions and parameters. Although the methods based on CNNs exhibit good performance, their ability is limited when dealing with large-scale super-resolution tasks such as remote sensing images. In 2021, Dong et al.[320] developed a dense sampling super-resolution network (DSSR) to explore the large-scale SR reconstruction of remote sensing images. The image SR of deep CNN often has the problem of unstable training, which leads to poor performance of image SR. To solve this problem, in 2021, Tian et al.[321] proposed a super-resolution CNN (CFSRCNN) from coarse to fine. The low-resolution and high-resolution features are combined by cascading several types of modular blocks to prevent unstable training and performance degradation due to up-sampling operation, and a feature fusion scheme based on heterogeneous convolution is used to significantly improve the computational efficiency of super-resolution without sacrificing the visual quality of reconstructed images.

    Considering that super-resolution CNN algorithms usually require extremely deep architecture and long training time, they cannot use features on multiple scales and weight features equally or only on static scales. This limits their learning ability. In 2022, Anwar et al.[322] proposed the dense residual Laplacian network (DRLN). The network adopts cascaded residuals in the residual structure such that the low-frequency information flow can learn high-order and middle-order features centrally. Additionally, in-depth supervision is realized by setting closely connected residual blocks, which can aid in learning from advanced complex features. The specific structure and experimental results of the network are shown in Fig. 66.

    DRLN[322]. (a) The detailed network architecture of DRLN. (b) Results of different methods. The key contrast parts in the red rectangle are magnified to display on the right. The LR image used for reconstruction is obtained by downsampling the HR image by a factor of 4.

    Figure 66.DRLN[322]. (a) The detailed network architecture of DRLN. (b) Results of different methods. The key contrast parts in the red rectangle are magnified to display on the right. The LR image used for reconstruction is obtained by downsampling the HR image by a factor of 4.

    Super-resolution (SR) of the remote sensing image can make up for the lack of resolution of the original image. However, due to the lack of image information in the low-resolution (LR) image, SISR is an inherently ill-posed problem. In 2022, Dong et al.[323] established a benchmark dataset and proposed RRSGAN, an end-to-end network with the gradient-assisted feature alignment (GAFA) module, and texture converter. The fine texture in LR images can be effectively reconstructed using the aligned reference image (Ref) features.

    With the continuous progress of super-resolution methods based on CNNs, the parameters of these methods and consumption of computing resources are also increasing. Hence, it is difficult to implement these methods on devices with low computing power. To solve this problem, Zhu et al.[324] proposed a lightweight SISR network in 2022, which has the expected maximum attention mechanism (EMASRN) to obtain better balance performance and applicability. Compared with the existing lightweight SISR method, EMASRN reduces the number of parameters by nearly one-third. Figure 67 shows the structure and results of the network.

    EMASRN[324]. (a) An overview of the EMASRN network. (b) Results of different methods. The key contrast parts in the red rectangle are magnified to display on the right. The LR image used for reconstruction is obtained by downsampling the HR image by a factor of 4.

    Figure 67.EMASRN[324]. (a) An overview of the EMASRN network. (b) Results of different methods. The key contrast parts in the red rectangle are magnified to display on the right. The LR image used for reconstruction is obtained by downsampling the HR image by a factor of 4.

    The super-resolution (SR) of remote sensing images using CNNs is mostly amplified by the up-sampling layer at the end of the model, which ignores the feature extraction in high-dimensional space, thus limiting the performance of SR. To solve this problem, in 2022, Lei et al.[325] proposed a new remote sensing image SR framework (TransENet) to enhance the high-dimensional feature representation after the up-sampling layer. TransENet can be combined with the traditional SR framework to integrate multi-scale high/low-dimensional features, improve the super-resolution results, and exhibit superior performance. In the same year, Zhu et al.[326] proposed a cross-view capture network (CVCnet) for stereoscopic image super-resolution, which uses the global context and local features extracted from two views to realize stereoscopic image super-resolution.

    In the field of high-resolution image reconstruction, ghost imaging (GI) typically requires a large number of single-pixel samples, which constrains its practical application. To address this, in 2022, Wang et al. developed a far-field super-resolution GI technology named GIDC[327]. This method combines the physical model of GI with a deep neural network to create a hybrid system that does not require pre-training on any dataset and can reconstruct far-field images surpassing the diffraction limit. The experimental setup and comparative results are illustrated in Fig. 68.

    Experimental comparisons of differential ghost imaging (DGI)[327]. GISC (GI using sparsity constraint) and GIDC in terms of both the sampling ratio and reconstruction SNR. (a) Schematic diagram of the experimental setup. (b) Experimental results for binary objects. (c) Experimental results for a grayscale object. (d) Experimental results on a flying drone.

    Figure 68.Experimental comparisons of differential ghost imaging (DGI)[327]. GISC (GI using sparsity constraint) and GIDC in terms of both the sampling ratio and reconstruction SNR. (a) Schematic diagram of the experimental setup. (b) Experimental results for binary objects. (c) Experimental results for a grayscale object. (d) Experimental results on a flying drone.

    The aim of lightweight network design is to strike a balance between computational efficiency and performance adaptability. Historically, network structures have been manually designed with complex, fixed configurations that often require extensive experimentation and offer limited flexibility to adapt to varied input image statistics. In 2023, Park et al. introduced a dynamic residual self-attention network (DRSAN) for lightweight SISR. This network adapts to input statistics using different combinations of residual features and incorporates a residual self-attention (RSA) module to enhance performance in conjunction with existing structures, all without the need for additional modules[328]. This design approach and attention mechanism can be seamlessly integrated into other residual networks without the need for a complex network structure.

    However, given that image super-resolution reconstruction primarily relies on post-processing data, the results may differ from actual values. Bridging super-resolution reconstruction more closely with the imaging process to achieve genuine super-resolution is a promising direction for future research.

    As the final component of computational imaging technology, computational processing refines images through various algorithms to meet practical application demands. The evolution of computational imaging technology shifts the reliance from detectors to post-processing, potentially achieving the performance of higher-end detectors at reduced costs and even surpassing them, which holds significant implications for both scientific research and practical applications.

    5. Summary

    Computational optical imaging promotes the organic combination of traditional optical imaging and information processing, driven by imaging information transmission and guided by imaging purposes, coordinates the integrated design of the whole imaging link, carries out dimensional upgrading processing of light field information, enhances the utilization rate and interpretation of light field information, achieves revolutionary advantages that are difficult to obtain by traditional optical imaging technology, realizes the improvement of resolution, the expansion of imaging distance, and the increase of imaging FOV from different dimensions, and is expected to realize disruptive imaging applications such as optical cloud penetration, depth imaging of living biological tissues, and NLOS imaging. In addition, the development of computational imaging technology is an effective way to break through the limitations of traditional photoelectric imaging, and it is also an inevitable choice for the future development of photoelectric imaging technology. At the same time, the acquisition and utilization of light field information of different dimensions in the imaging process are analyzed from the whole link of the computational light source, computational medium, computational optical system, computational detector, and computational processing, and a complete research system is systematically sorted out for the development of computational imaging technology and the influencing factors.

    Despite the rapid development of computational imaging technology, there are still the following four problems that need to be solved urgently: 1. The basic theory of computational imaging is insufficient. Most of the computational imaging technologies are still based on the traditional photoelectric imaging theory, and the nonlinear complex field and multidimensional physical quantity detection theory for computational imaging has not yet been formed, resulting in a lack of theoretical guidance for the development of technology. 2. The development direction is not clear, and there is a lack of systematic exploration of the common basic problems and key technologies of computational imaging technology. 3. The research is fragmented; the technical research is fragmented, independent, and lacks contact with each other. The understanding of computational imaging is limited to a narrow field, and the development is easy to be limited to a partial. 4. It is difficult to implement technology applications, and there is a lack of a general leader who can systematically and clearly guide theory to technology to application. Additionally, the connection between technology supply and application demand is not smooth. In the future, while aiming at the above four problems, combined with the rapid development of freeform surface, deep learning, and other technologies, computational imaging technology can become a truly future-oriented imaging technology.

    [1] M. Born, E. Wolf. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light(2013).

    [3] W. S. Boyle. Information storage devices. U.S. patent(1974).

    [4] M. F. Tompsett. Charge transfer imaging devices. U.S. patent(1978).

    [18] R. W. Gerchberg. A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik, 35, 237(1972).

    [19] R. W. Gerchberg, W. Saxton. Phase determination from image and diffraction plane pictures in electron microscope. Optik, 34, 275(1971).

    [56] F. Willomitzer et al. Synthetic wavelength holography: an extension of Gabor’s holographic principle to imaging with scattered wavefronts(2019).

    [82] G. Satat, M. Tancik, R. Raskar. Towards photography through realistic fog(2018).

    [108] P. Mann et al. White light interference microscopy with color fringe analysis for quantitative phase imaging and 3-D step height measurement, JW2A.13(2020).

    [131] E. J. McCartney. Optics of the Atmosphere: Scattering by Molecules and Particles(1975).

    [134] Y. Y. Schechner, S. G. Narasimhan, S. K. Nayar. Instant dehazing of images using Polarization(2001).

    [165] F. Heide et al. Diffuse mirrors: 3D reconstruction from diffuse indirect illumination using inexpensive time-of-flight sensors, 3222(2014).

    [168] K. L. Bouman et al. Turning corners into cameras: principles and methods, 2289(2017).

    [169] M. Baradad et al. Inferring light fields from shadows, 6267(2018).

    [171] A. B. Yedidia et al. Using unknown occluders to recover hidden scenes, 12223(2019).

    [172] B. Hassan. Polarization-Informed Non-Line-of-Sight Imaging on Diffuse surfaces(2019).

    [173] K. Tanaka, Y. Mukaigawa, A. Kadambi. Polarized non-line-of sight imaging, 2133(2020).

    [174] T. Maeda et al. Thermal non-line-of-sight imaging, 1(2019).

    [187] A. Zaidi et al. Metasurface-enabled single-shot and complete Mueller matrix imaging. Nat. Photonics, 18, 704(2024).

    [191] J.-P. Souchon et al. Is there an ideal digital aerial camera?. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences(2006).

    [209] L. Wang et al. Compressive hyperspectral imaging with complementary RGB measurements(2016).

    [219] X. Shao et al. Study on optical swap computational imaging method, 119(2020).

    [221] C. A. Metzler et al. Deep optics for single-shot high-dynamic-range imaging, 137(2020).

    [234] R. Ng. Digital Light Field Photography(2006).

    [236] L. Kyle. Development of a 3-D Fluid Velocimetry Technique Based on Light Field Imaging(2011).

    [239] J. M. Rodríguez et al. The CAFADIS camera: a new tomographic wavefront sensor for adaptive optics, 05011(2010).

    [305] P. E. Debevec, J. M. Malik. Recovering high dynamic range radiance maps from photographs, 369(1997).

    [318] C. Dong et al. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell., 38, 29(2016).

    [330] Y. W. Tia et al. Coded exposure imaging for projective motion deblurring. IEEE Computer Society Conference on Computer Vision and Pattern Recognition(2010).

    [337] M. Gehrig et al. Event-based angular velocity regression with spiking networks. IEEE International Conference on Robotics and Automation (ICRA)(2020).

    Tools

    Get Citation

    Copy Citation Text

    Jinpeng Liu, Yi Feng, Yuzhi Wang, Juncheng Liu, Feiyan Zhou, Wenguang Xiang, Yuhan Zhang, Haodong Yang, Chang Cai, Fei Liu, Xiaopeng Shao, "Future-proof imaging: computational imaging," Adv. Imaging 1, 012001 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Review Article

    Received: May. 19, 2024

    Accepted: Jun. 20, 2024

    Published Online: Jul. 17, 2024

    The Author Email: Liu Fei (feiliu@xidian.edu.cn), Shao Xiaopeng (xpshao@opt.ac.cn)

    DOI:10.3788/AI.2024.20003

    Topics