Photonics Insights, Volume. 3, Issue 3, R06(2024)

Image reconstruction from photoacoustic projections

Chao Tian1,2,3,4, Kang Shen2, Wende Dong5, Fei Gao6,7, Kun Wang8, Jiao Li9, Songde Liu2,3, Ting Feng10, Chengbo Liu11, Changhui Li12,13, Meng Yang14、*, Sheng Wang3、*, and Jie Tian8,15,16、*
Author Affiliations
  • 1Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
  • 2School of Engineering Science, University of Science and Technology of China, Hefei, China
  • 3Department of Anesthesiology, the First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
  • 4Anhui Province Key Laboratory of Biomedical Imaging and Intelligent Processing, Hefei, China
  • 5College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
  • 6School of Information Science and Technology, ShanghaiTech University, Shanghai, China
  • 7Shanghai Clinical Research and Trial Center, Shanghai, China
  • 8CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
  • 9School of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin, China
  • 10Academy for Engineering and Technology, Fudan University, Shanghai, China
  • 11Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
  • 12Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing, China
  • 13National Biomedical Imaging Center, Peking University, Beijing, China
  • 14Departments of Ultrasound, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
  • 15School of Engineering Medicine, Beihang University, Beijing, China
  • 16Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
  • show less

    Photoacoustic computed tomography (PACT) is a rapidly developing biomedical imaging modality and has attracted substantial attention in recent years. Image reconstruction from photoacoustic projections plays a critical role in image formation in PACT. Here we review six major classes of image reconstruction approaches developed in the past three decades, including delay and sum, filtered back projection, series expansion, time reversal, iterative reconstruction, and deep-learning-based reconstruction. The principal ideas and implementations of the algorithms are summarized, and their reconstruction performances under different imaging scenarios are compared. Major challenges, future directions, and perspectives for the development of image reconstruction algorithms in PACT are also discussed. This review provides a self-contained reference guide for beginners and specialists in the photoacoustic community, to facilitate the development and application of novel photoacoustic image reconstruction algorithms.

    Keywords

    1 Introduction

    Photoacoustic tomography (PAT), also referred to as opto-acoustic tomography, is a cross-sectional or three-dimensional (3D) biomedical imaging technique. The physical foundation for PAT is the photoacoustic effect discovered in 1880 by Alexander Graham Bell[1], who found that modulated light can excite sound waves in materials. Modern PAT typically uses a short-pulsed laser to illuminate biological tissues. The laser energy is absorbed by tissue chromophores and converted into heat. Under the conditions of stress confinement and thermal confinement, acoustic signals are generated due to the thermoelastic effect and are recorded by ultrasound detectors to recover the optical absorption property of tissues[25]. By combining optical excitation and acoustic detection, PAT has the advantages of rich optical contrast and high ultrasonic resolution in deep tissues and has found unique applications in a range of biomedical fields[69].

    PAT has three major embodiments: photoacoustic computed tomography (PACT), photoacoustic microscopy (PAM), and photoacoustic endoscopy (PAE). Among them, PACT uses diffused light to illuminate biological tissues and can achieve hundred-micron-level imaging resolution at centimeter-level tissue depth[1013]. Analogous to X-ray computed tomography, high-quality image formation in PACT relies on sophisticated image reconstruction procedures, without which the details in an image may not be able to be properly resolved. In 1981, Bowen studied thermoacoustic imaging of soft tissues using ionizing radiation (high-energy electrons, X-ray photons, neutrons, and other charged particles) and non-ionizing radiation (radio waves, microwaves, and ultrasonic waves)[1416] and presented one-dimensional (1D) depth-resolved thermoacoustic signals from a human upper arm using radio-wave excitation. However, the results are only 1D depth-resolved and have no lateral resolution. In the mid-1990s, Oraevsky[1719] and Kruger[20,21] independently studied laser-induced photoacoustic imaging of biological tissues and presented experimental 1D signals and two-dimensional (2D) scan images (see Fig. 1). The results are depth-resolved but poorly laterally resolved due to the lack of image reconstruction procedures. To obtain cross-sectional or 3D images resolved both axially and laterally, Kruger et al. in 1995 developed an approximate filtered back projection (FBP) image reconstruction algorithm for laser-induced PACT analogous to the FBP algorithm used in X-ray computed tomography[22]. The results suggest that the approximate FBP algorithm helps resolve imaging targets both in the axial and lateral directions (see Fig. 1).

    Key events in the development of PACT image reconstruction algorithms.

    Figure 1.Key events in the development of PACT image reconstruction algorithms.

    High-performance image reconstruction is critical to image formation in modern PACT imaging systems. Ever since the publication of the approximate FBP algorithm by Kruger and colleagues, several novel image reconstruction approaches have been proposed to reconstruct high-quality PACT images. According to whether the emerging deep learning technique is involved, image reconstruction approaches in PACT can be divided into two main categories: conventional image reconstruction methods and deep learning (DL)-based image reconstruction methods. Conventional image reconstruction methods do not involve deep learning and mainly include five classes of algorithms: FBP, delay and sum (DAS), series expansion (SE), time reversal (TR), and iterative reconstruction (IR).

    FBP is a class of analytical image reconstruction algorithms originating from X-ray computed tomography that project filtered photoacoustic signals back to target regions according to the time of flight (TOF) of photoacoustic signals. Kruger adopted this idea and proposed an approximate version of FBP for image reconstruction in PACT in 1995, as mentioned previously[22]. However, this algorithm can only achieve approximate image reconstruction. To reduce reconstruction error, Finch et al. developed an exact FBP algorithm for image reconstruction in a spherical detection geometry in 2004[23], and Xu and Wang developed a universal FBP algorithm for exact image reconstruction in infinite planar, cylindrical, and spherical detection geometries in 2005[24]. FBP algorithms are simple yet efficient and have been widely used for image reconstruction in PACT.

    DAS-based reconstruction is essentially a simplified version of the FBP algorithm, which directly projects measured photoacoustic signals back to target regions without filtering. It is one of the most widely used beamforming methods in ultrasound imaging and can potentially be adapted for image reconstruction in PACT. In 1998, Hoelen et al. first used a simple weighted DAS algorithm for image reconstruction in PACT and achieved good reconstruction results[25,26]. Several modified algorithms were subsequently developed to improve the reconstruction performance of DAS[2730]. DAS-based reconstruction algorithms are typically fast and easy to implement. However, they are intuitive and empirical image reconstruction techniques and can only achieve approximate image reconstruction.

    SE is a class of image reconstruction algorithms that use mathematical series to represent the image to be reconstructed. In 2001, Köstli et al. derived an exact SE-based inverse formula in the Fourier domain for image reconstruction in a planar detection geometry[31]. Kunyansky took a step further and derived SE reconstruction formulas for image reconstruction in circular, spherical, cylindrical, and cubic geometries[32,33]. Compared with other image reconstruction algorithms in PACT, SE algorithms may be computationally efficient for certain detection geometries (e.g., planar geometry) due to the application of the fast Fourier transform (FFT) algorithm during computation[31].

    TR algorithms recover photoacoustic images by running a forward numerical acoustic propagation model backward and re-transmitting the photoacoustic signals measured by each detector in a temporally reversed order. In 2004, Xu and Wang presented an analytical TR model for image reconstruction in arbitrary closed detection geometries[34]. Burgholzer et al. developed a numerical TR algorithm for image reconstruction in arbitrary detection geometries based on the finite difference method[35]. To improve computational efficiency and image quality, Treeby et al. further developed a numerical k-space pseudospectral TR algorithm for image reconstruction in heterogeneous media[34]. The TR algorithms can couple the acoustic properties of media (e.g., SOS, density, dispersion, and absorption) and can be used for image reconstruction in arbitrary closed detection geometries. Therefore, they are regarded as the “least restrictive” image reconstruction algorithms in PACT.

    The last class of conventional image reconstruction methods is IR. It solves the image reconstruction problem by iteratively seeking the optimal solution that minimizes the error between measured projection data and the estimate from constructed mathematical models. In 2002, Paltauf et al. proposed the first IR method to improve image reconstruction quality under non-ideal imaging conditions and achieved 3D image reconstruction with reduced artifacts[36]. After that, much research was conducted by different groups to improve the performance of IR, such as improving the computational accuracy of the system matrix in the model[37,38], compensating for the response of detectors[39,40], coupling the acoustic property of the media[41], and accelerating the reconstruction process[42,43]. Compared with other algorithms, IR algorithms are typically slow but can yield high-quality imaging results when only limited projection data are available. Therefore, they are more suitable for non-ideal imaging scenarios, such as limited-view imaging, sparse-view imaging, and imaging in acoustically heterogeneous media.

    The second major category of image reconstruction approaches in PACT is fast-developing DL-based methods, which are inspired by the successful use of DL in a range of fields. DL-based methods typically train neural networks and use them to automatically transform input data into output photoacoustic images. Compared with conventional image reconstruction approaches, DL methods are more efficient and can handle more complicated scenarios. In 2018, Antholzer et al. first proposed a deep convolutional neural network (CNN)-based method for PACT image reconstruction under sparse-view sampling and opened new opportunities for intelligent image reconstruction in PACT[44]. After that, a series of DL-based methods was developed for image reconstruction using both simulated and experimental datasets[4547]. State-of-the-art DL methods are powerful enough to achieve preprocessing in the data domain, postprocessing in the image domain, hybrid processing in both the data and image domains, learned IR, and direct image reconstruction from the data domain to the image domain[4850]. They have been used to address a range of image reconstruction problems in PACT, such as detector bandwidth expansion[51,52], resolution enhancement[53,54], artifact removal[55], ultralow-laser-energy imaging[56], reconstruction acceleration[57], and reconstruction enhancement in sparse-view and limited-view imaging[44,46,47,58,59].

    Over the past three decades, great achievements have been made in PACT image reconstruction. To date, there have been a few excellent reviews devoted to summarizing the research progress in this field[6063]. However, most review papers in the literature are from more than 10 years ago and cannot reflect the latest research advances[6062]. Moreover, some review papers cover only particular subjects of the field, such as IR reconstruction[63,64] or DL-based reconstruction[4850,6568]. There is a pressing need to prepare new work to systematically review the recent achievements in this field. For the above reasons, we prepared this review, which differs from other works in the following aspects. First, the review was prepared from a historical perspective. We surveyed the development of each type of reconstruction algorithm and put them into the context of the entire history of PACT. Second, the review is comprehensive. It not only contains the five classes of conventional reconstruction algorithms (FBP, DAS, SE, TR, and IR) but also covers the state-of-the-art DL-based reconstruction algorithms. Third, the review contains comparative studies. A dedicated part (Sec. 5) was prepared to compare the performance of each type of image reconstruction algorithm in terms of image reconstruction quality, reconstruction speed, and memory footprint. Finally, the review is expected to be beginner-friendly. The entire review contains 51 figures, 14 tables, 126 mathematical equations, and comparative studies of each algorithm and is easy for novices to understand.

    The remainder of this review is organized as follows. Section 2 describes the forward problem of PACT, including photoacoustic signal generation, propagation, detection, and the mathematical foundation (i.e., the Radon transform) for image reconstruction. Section 3 reviews the basic principles of the five classes of conventional image reconstruction algorithms, namely, FBP, DAS, SE, TR, and IR. Section 4 surveys the DL-based image reconstruction methods from the perspective of preprocessing, postprocessing, hybrid processing, learned IR, and direct image reconstruction. Section 5 provides comparative studies of major image reconstruction algorithms in PACT. Section 6 highlights the major challenges and future directions for PACT image reconstruction. Finally, Sec. 7 offers concluding remarks. Figure 2, Table 1, and Table 2 list major topics, abbreviations, and symbols used in this review, respectively.

    Major topics discussed in this review.

    Figure 2.Major topics discussed in this review.

    • Table 1. Abbreviations Used in this Review

      Table 1. Abbreviations Used in this Review

      AbbreviationMeaningAbbreviationMeaning
      1DOne-dimensional2DTwo-dimensional
      3DThree-dimensionalCFCoherence factor
      CNNConvolutional neural networkCNRContrast-to-noise ratio
      CTComputed tomographyDASDelay and sum
      DMASDelay multiply and sumDLDeep learning
      EIRElectrical impulse responseFBPFiltered back projection
      FFTFast Fourier transformGPUGraphics processing unit
      IRIterative reconstructionMVMinimum variance
      PACTPhotoacoustic computed tomographyPATPhotoacoustic tomography
      ROIRegion of interestSESeries expansion
      SIRSpatial impulse responseSLSCShort-lag spatial coherence
      SNRSignal-to-noise ratioSOSSpeed of sound
      TOFTime of flightTRTime reversal
    • Table 2. Symbols Used in this Review

      Table 2. Symbols Used in this Review

      SymbolMeaningSymbolMeaning
      Lowercase English letters
      b(rd,t)Back-projection termb(r)The Kaiser–Bessel function
      dcCharacteristic size of the heated regiondσElement of a detection surface S
      dΩSolid angle subtended by dσfFrequency
      g(s,θ)Radon transformhEIREIR of a detector
      h˜EIRFourier transform of the EIR of a detectorhSIRSIR of a detector
      h˜SIRFourier transform of the SIR of a detectoriImaginary unit
      jnThe spherical Bessel function of the first kind of order nkx, ky, kzSpatial wavenumbers in the x, y, and z directions
      nA general variablep0(x,y,z)Initial photoacoustic pressure (image to be reconstructed)
      p(r,t)Photoacoustic signal at position r and time tp(rd,t)Real photoacoustic signal measured by a detector
      pideal(rd,t)Ideal photoacoustic signal measured by a detectorsi(t)Photoacoustic signal measured by the ith detector at time t
      tTimeuk(r)Normalized eigenfunctions of the Dirichlet Laplacian
      v0Sound speed
      Uppercase English letters
      CpSpecific heat capacity at constant pressureCvSpecific heat capacity at constant volume
      FOptical fluenceG(rd,t;rs,ts)The Green’s function
      HHeat deposited per unit volumeH|k|(1)The Hankel function of the first kind of order k
      InThe modified Bessel function of the first kind of order nJ|k|The Bessel function of the first kind of order |k|
      KSampling lengthMTotal number of detectors
      NTotal number of image grid pointsNx, Ny, NzNumbers of image grids along the x, y, and z axes
      P0(kx,ky,kz)Spatial Fourier transform of p0(x,y,z)P(rd,ω)Temporal Fourier transform of p(rd,t)
      PΩPoisson operator of harmonic extensionR(x)Regularization
      SA detection surfaceSDASImage reconstructed by DAS
      TTemperatureVVolume
      W(ω)Window function in the frequency domain
      Lowercase Greek letters
      α0Acoustic absorption coefficientαthThermal diffusivity
      βThermal coefficient of volume expansionδDirac delta function
      ϕ(rd,t)Velocity potentialηthPhotothermal conversion efficiency
      η(r)Dispersion proportionality coefficientκIsothermal compressibility
      λm2Eigenvalues of the Dirichlet LaplacianμaOptical absorption coefficient
      ρ(r)Distribution of mass densityρ(r,t)Acoustic density
      ρ0(r)Ambient densityτLaser pulse duration
      τthThermal relaxation timeτsStress relaxation time
      τ(r)Absorption proportionality coefficientωAngular frequency
      ψ(r)Expansion function
      Uppercase Greek letters
      ΦλkFree-space rotationally invariant Green’s functionΓGrüneisen parameter
      ΩSolid angle of a detection surface or domain defined by a detection surfaceΔfFrequency sampling interval
      ΔpChange in pressureΔtTemporal sampling interval
      ΔTChange in temperatureΔVChange in volume
      Vectors or matrices
      ndUnit normal vector of a detector surface pointing to a photoacoustic sourcepPhotoacoustic signal in matrix form
      r=(x,y,z)Rectangular coordinates in spacer=(R,φ,θ)Spherical coordinates in space
      rdDetector positionrsPhotoacoustic source position
      u(r,t)Particle velocityxPhotoacoustic image in matrix form
      ASystem matrixAPseudo-inverse matrix of A
      A*Adjoint matrix of AA˜The Fourier transform of A
      AHConjugate transpose of AATTranspose of A
      DDifferential matrixE˜Fourier transform of the EIR of a detector
      GSpherical Radon transformation matrixIIdentity matrix
      PThe Fourier transform of pRCovariance matrix
      Others symbols
      AForward acoustic propagation operatorAmodifyTRModified TR operator
      FThe Fourier transformF1The inverse Fourier transform
      H(r,t)Heating functionNabla operator

    It is worth pointing out that image reconstruction in PACT involves two aspects: acoustic inversion for determining the distribution of initial acoustic pressure and optical inversion for determining the map of optical absorption[61,62,69]. This review focuses on the acoustic inversion problem, which is important for image formation in PACT. Moreover, the mathematical foundation for image reconstruction in PACT is equivalent to that in thermoacoustic tomography[7073]. Therefore, the image reconstruction algorithms discussed in this review can in principle be used for thermoacoustic tomography.

    2 The Forward Problem

    2.1 Photoacoustic Signal Generation

    To achieve efficient photoacoustic signal excitation and generation, two conditions, i.e., thermal confinement and stress confinement, should be satisfied. The two conditions are related to two timescales, i.e., the thermal relaxation time τth and the stress relaxation time τs. The thermal relaxation time τth characterizes the diffusion of heat from the region heated by laser pulses, while the stress relaxation time τs describes the propagation of acoustic waves from the heated region. τth is given by[74]τth=dc2αth,where dc is the characteristic dimension of the heated region and αth is the thermal diffusivity (m2/s). τs is given by[74]τs=dcv0,where v0 is the speed of sound (SOS). If the laser pulse duration is less than the thermal relaxation time, i.e., τ<τth, the thermal diffusion is negligible during laser heating, in which case the thermal confinement condition can be considered to be satisfied. Similarly, if the laser pulse duration is less than the stress relaxation time, i.e., τ<τs, the pressure propagation can be ignored during pulse heating, in which case the stress confinement condition can be considered to be satisfied. In PACT, the thermal and stress confinement conditions should be satisfied simultaneously. For instance, assuming that in soft tissues, the SOS is 1500 m/s, the thermal diffusivity is 0.14  mm2/s, and the dimension of the heated region is 50 µm, the thermal relaxation time τth is calculated to be 18 ms, and the stress relaxation time τs is 33 ns. This indicates that the laser used for imaging should have a pulse duration of less than 33 ns to achieve efficient photoacoustic signal excitation and generation.

    In PACT, the local fractional volume expansion ΔV/V induced by the photoacoustic effect can be described as[74]ΔVV=βΔTκΔp,where ΔT and Δp represent the changes in temperature and pressure, respectively, β is the thermal coefficient of volume expansion, and κ is the isothermal compressibility, which can be written as κ=Cpρv02Cv,where ρ is the mass density and Cp and Cv are the specific heat capacities at constant pressure and volume, respectively.

    Under the conditions of thermal and stress confinement, the fractional volume expansion in Eq. (3) is negligible (i.e., ΔV/V=0). The initial photoacoustic pressure p0=Δp and can be written as p0=βΔTκ,where the temperature rise can be calculated by[75]ΔT=ηthHρCv.

    Here ηth is the percentage of the laser energy converted into heat, and H is the heat energy deposited per unit volume, which is defined as the product of the absorption coefficient μa and the optical fluence F (i.e., H=μaF). The initial acoustic pressure p0 can thus be rewritten as p0=(βv02Cp)ηthH=ΓηthH,where Γ is the Grüneisen parameter, a dimensionless constant representing the efficiency of the conversion of heat to pressure. For water at body temperature (37°C), Γ0.2, which indicates that 20% of the thermal energy deposited by a laser in water couples into acoustic energy.

    The generation of photoacoustic signals can be illustrated using a numerical example. Assume that the laser employed for imaging has a fluence of 10  mJ/cm2 (F=10  mJ/cm2), which is within the American National Standards Institute (ANSI) safety limit[76], and the biological tissue being imaged has the following physical parameters: Γ=0.2, μa=0.1  cm1, ρ=1.0  g/cm3, and Cv=4.0  Jg1K1. The factor ηth can be set to 1 because the fluorescence and nonthermal absorption of biological tissues are typically weak. In this way, the temperature rise ΔT is calculated to be 0.25 mK and the initial acoustic pressure p0 is 200 Pa, which indicates that a 1 mK temperature rise can produce an acoustic pressure rise of 800 Pa.

    2.2 Photoacoustic Signal Propagation

    2.2.1 The photoacoustic wave equations

    The generation and propagation of photoacoustic waves can be mathematically modeled by the photoacoustic wave equation[77]. Generally, three first-order equations, including the linearized equation of motion, the linearized equation of continuity, and the thermal elastic equation, can be used to model the properties of acoustic wave generation and propagation[78]. If the medium is lossless and the thermal conductivity can be ignored, the three equations can be written as[79]tu(r,t)=1ρ(r)p(r,t),·u(r,t)=1ρ(r)v02(r)tp(r,t)+βtΔT(r,t),ρ(r)CptT(r,t)=H(r,t),where u(r,t) is the particle velocity, v0(r) is the SOS, ρ(r) is the mass density, p(r,t) is the acoustic pressure at position r and time t, T(r,t) and ΔT(r,t) are the temperature and temperature rise, respectively, and H(r,t) is the heating function that represents the thermal energy deposited per unit volume and per unit time. The second-order photoacoustic wave equation can be obtained by eliminating the variable u(r,t) from the three equations as[78,79]{ρ(r)·[1ρ(r)]1v02(r)2t2}p(r,t)=βCpH(r,t)t.

    Equation (11) describes the relationship between the acoustic pressure p(r,t) and the photoacoustic source associated with the heating function H(r,t). The source term [right side of Eq. (11)] is proportional to the first derivative of the heating function H(r,t), which indicates that the heating function H(r,t) should be time-varying to achieve efficient photoacoustic signal generation.

    When the medium is acoustically homogeneous, the photoacoustic wave equation in Eq. (11) reduces to[78,79](21v022t2)p(r,t)=βCpH(r,t)t.

    2.2.2 Forward Green’s function solution

    The forward solution for the wave equation in a homogeneous medium in Eq. (12) can be obtained using Green’s function[75,80] as p(rd,t)=βCp+VG(rd,t;rs,ts)H(rs,ts)tsdrsdts,where p(rd,t) is the acoustic pressure detected at position rd and time t and rs and ts represent the location and time of the photoacoustic source, respectively. The Green’s function G(rd,t;rs,ts) in an infinite 3D space has the following form: G(rd,t;rs,ts)=δ(tts|rdrs|/v0)4π|rdrs|,where δ is the Dirac delta function.

    Under the condition of stress confinement, the heating function can be decomposed as H(rs,ts)=H(rs)δ(ts), where H(rs) is the heat deposited per unit volume. In this way, Eq. (13) becomes p(rd,t)=βCp+VG(rd,t;rs,ts)H(rs)δ(ts)drsdts,where δ is the derivative of the Dirac delta function. Using the property δ(tt0)f(t)dt=f(t0), Eq. (15) becomes p(rd,t)=βCpVH(rs)G(rd,t;rs,ts)tdrs|ts=0.

    Applying the relation in Eq. (7) (ηth is set to 1), p0(rs)=ΓH(rs)=(βv02Cp)H(rs),and substituting the Green’s function [Eq. (14)] into Eq. (16), we obtain the general forward solution of the photoacoustic wave equation, i.e., p(rd,t)=14πv02tVp0(rs)|rdrs|δ(t|rdrs|v0)drs.

    The measured pressure p(rd,t) is associated with the velocity potential ϕ(rd,t) via[81]p(rd,t)=ρϕ(rd,t)t.

    The velocity potential can be written as ϕ(rd,t)=14πv02ρVp0(rs)|rdrs|δ(t|rdrs|v0)drs.

    Equations (18) and (20) indicate that the measured pressure p(rd,t) and velocity potential ϕ(rd,t) are linearly related to the initial acoustic pressure p0(rs) and are inversely proportional to the distance between the detector and the source, i.e., |rdrs|.

    2.2.3 Photoacoustic signal of a spherical absorber

    The concept of acoustic pressure and velocity potential can be illustrated through a simple example[81]. Suppose that the photoacoustic source is a uniform sphere and that the detector is a point detector. We examine the velocity potential and acoustic pressure received by the point detector in two situations. In the first situation, the point detector is located at the center of the photoacoustic source, as shown in Fig. 3(a). The velocity potential can be obtained as [Eq. (20)] ϕ(rd,t)={p0(rs)v0ρt,if  v0tra,0,if  v0t>ra,where ra is the radius of the spherical source. In the second situation, the point detector is located outside the photoacoustic source, as shown in Fig. 3(d). The velocity potential can be calculated as ϕ(rd,t)={p0(rs)4v0ρd[(dv0t)2ra2],if  drav0td+ra,0,otherwise,where d is the distance between the point detector and the center of the photoacoustic source. The acoustic pressure p(rd,t) in these two cases can be accordingly calculated using the relationship in Eq. (19).

    Velocity potential and acoustic pressure generated from a 4-mm-diameter spherical source. The first and second rows show the results when the detector is located at the center of the source and is 10 mm away from the center of the source, respectively. (a), (d) Schematic diagrams showing the point detector and the spherical source. (b), (e) Negative velocity potentials at the point detector. (c), (f) Corresponding acoustic pressures.

    Figure 3.Velocity potential and acoustic pressure generated from a 4-mm-diameter spherical source. The first and second rows show the results when the detector is located at the center of the source and is 10 mm away from the center of the source, respectively. (a), (d) Schematic diagrams showing the point detector and the spherical source. (b), (e) Negative velocity potentials at the point detector. (c), (f) Corresponding acoustic pressures.

    To quantify the velocity potential and acoustic pressure measured by the point detector, we make the following assumptions: the radius of the spherical source is 4 mm (ra=4  mm), the mass density of the medium is 1000  kg/m3 (ρ=1000  kg/m3), the SOS is 1480 m/s (v0=1480  m/s), the Grüneisen parameter at room temperature is 0.12 (Γ=0.12), the optical absorption coefficient μa is 100  cm1 (μa=100  cm1), and the optical fluence is 10  mJ/cm2 (F=10  mJ/cm2). Therefore, the heat energy H deposited at time zero is 106  J/m3, and the initial acoustic pressure p0(rs) [Eq. (7)] is 1.2×105  Pa.

    Figures 3(b) and 3(c) show the velocity potential and acoustic pressure measured by the point detector in the first case (the point detector is located at the center of the source). The results show that the negative velocity potential ϕ(rd,t) measured by the point detector first linearly increases as the shell of the integration [Eq. (20)] increases with time and then suddenly drops to zero due to no energy deposition outside the source. Consequently, the acoustic pressure p(rd,t) (i.e., the first derivative of the velocity potential) is a non-zero constant at the beginning and becomes zero with a negative impulse caused by the sudden drop of velocity potential to zero. Figures 3(e) and 3(f) show the velocity potential and acoustic pressure measured by the point detector in the second case (the point detector is located 10 mm away from the center of the source). The results show that the negative velocity potential ϕ(rd,t) measured by the point detector first increases as the incremental hemispherical shell of the integration [Eq. (20)] advances to the center of the source, and then decreases with time as the shell advances to the rear of the source. Consequently, the acoustic pressure p(rd,t) is initially positive, passes through zero, then becomes negative, and has an N-shaped waveform.

    Based on the example in Fig. 3, we can also deduce that the size of a photoacoustic source impacts the characteristics of the photoacoustic signals received by a detector in the time domain. To illustrate this, Fig. 4(a) shows three time-domain photoacoustic signals measured by a point detector 3 mm away from the centers of three uniform spherical sources with diameters of 1 mm, 200 µm, and 50 µm. The time-domain photoacoustic signals all have an N-shaped waveform as expected. However, the amplitude and duration of the photoacoustic signals are distinct. The photoacoustic signals produced by sources with a smaller size have smaller amplitudes and shorter durations, which correspond to higher frequencies and broader bandwidths in the frequency domain, as shown in Fig. 4(b). The center frequency of the photoacoustic signals fc generated by a spherical source can be estimated by fc=v0/3ra, where ra is the radius of the spherical source[82]. This indicates that detectors with a high center frequency should be employed for high-frequency imaging.

    Signals of spherical photoacoustic sources with different sizes and their Fourier spectra. (a) Time-domain N-shaped photoacoustic signals generated from three spherical sources with diameters of 1 mm, 200 µm, and 50 µm. (b) Normalized Fourier spectra of the corresponding photoacoustic signals. Reprinted from Ref. [82] with permission.

    Figure 4.Signals of spherical photoacoustic sources with different sizes and their Fourier spectra. (a) Time-domain N-shaped photoacoustic signals generated from three spherical sources with diameters of 1 mm, 200 µm, and 50 µm. (b) Normalized Fourier spectra of the corresponding photoacoustic signals. Reprinted from Ref. [82] with permission.

    2.2.4 Photoacoustic field visualization: the k-Wave toolbox

    Numerical simulation of the forward propagation of photoacoustic fields helps visualize the sound fields in complex media and solve the acoustic inversion problem in PACT. Numerical simulation can be implemented using the powerful open-source k-Wave toolbox[83], which was developed by the Photoacoustic Imaging Group at University College London (UCL). In the k-Wave toolbox, the propagation of photoacoustic fields is modeled by three coupled first-order partial differential equations[84], which are fundamentally the same as the three equations in Eqs. (8)–(10). The propagation model considers the acoustic properties of media, such as acoustic speed, dispersion, and absorption, and can characterize the acoustic propagation problem in heterogeneous media. The k-Wave toolbox solves the propagation model via a k-space pseudospectral method[83,85,86], which can perform fast and accurate computations with reduced memory. Figure 5 is an example showing the propagation of the photoacoustic fields of a 2D disk and a 3D sphere using the k-Wave toolbox. The results visualize the instant characteristics of the photoacoustic fields during propagation.

    Photoacoustic field visualization using the k-Wave toolbox. (a) Propagation of photoacoustic fields generated from a 2D disk (diameter: 2 mm). Red: positive pressure; blue: negative pressure. (b) Propagation of photoacoustic fields generated from a 3D sphere (diameter: 2 mm). For observation, negative pressure is not displayed in this case.

    Figure 5.Photoacoustic field visualization using the k-Wave toolbox. (a) Propagation of photoacoustic fields generated from a 2D disk (diameter: 2 mm). Red: positive pressure; blue: negative pressure. (b) Propagation of photoacoustic fields generated from a 3D sphere (diameter: 2 mm). For observation, negative pressure is not displayed in this case.

    2.3 Photoacoustic Signal Detection

    The photoacoustic signals propagating outward from a photoacoustic source need to be captured by ultrasound detectors for image formation. Ideally, for photoacoustic signal detection, detector arrays that can perfectly record original signals in time and space should be used. However, perfect detector arrays never exist in reality. The characteristics of a practical detector array, such as detector aperture, detector bandwidth, detector number, and view angle, impact detected photoacoustic signals and final images.

    Aperture and bandwidth are two fundamental characteristics of an ultrasound detector and have important impacts on measured photoacoustic signals. An ideal ultrasound detector should have a point-like aperture, in which way it has an omnidirectional response. Moreover, an ideal ultrasound detector should also have an infinite bandwidth so that it can respond to all frequency contents of a signal. Neither of the two conditions, however, is attainable in practice. A practical ultrasound detector always has a finite aperture size and a finite bandwidth. The finite aperture averages photoacoustic signals in space, resulting in a smoothed spatial impulse response (SIR), while the finite bandwidth affects the conversion of photoacoustic signals to electrical signals, leading to a degraded electrical impulse response (EIR). Taking the SIR and EIR of an ultrasound detector into account, the photoacoustic signal measured by a practical ultrasound detector can be mathematically formulated as[4]p(rd,t)=pideal(rd,t)hSIR(rd,rs,t)hEIR(t),where pideal(rd,t) is the ideal photoacoustic signal, * represents temporal convolution, and hSIR(rd,rs,t) and hEIR(t) represent the SIR and EIR of a detector, respectively. The non-ideal SIR and EIR of an ultrasound detector distort photoacoustic signals measured in the time domain, as illustrated in Fig. 6, which eventually degrades the image quality of photoacoustic images.

    Effects of the SIR and EIR of a detector on photoacoustic signals. (a) Schematic diagram showing a photoacoustic source and a circular detector array. The photoacoustic source is a 1-mm-diameter sphere and is 10 mm away from the center of the detector array, which has a radius of 25 mm. (b) Effect of the finite aperture size (SIR) on the photoacoustic signals. In this case, the height and width of each detector are set to 10 and 5 mm, respectively. (c) Effect of finite bandwidth on the photoacoustic signals. In this case, the center frequency and fractional bandwidth of the detector are set to 1 MHz and 100%, respectively.

    Figure 6.Effects of the SIR and EIR of a detector on photoacoustic signals. (a) Schematic diagram showing a photoacoustic source and a circular detector array. The photoacoustic source is a 1-mm-diameter sphere and is 10 mm away from the center of the detector array, which has a radius of 25 mm. (b) Effect of the finite aperture size (SIR) on the photoacoustic signals. In this case, the height and width of each detector are set to 10 and 5 mm, respectively. (c) Effect of finite bandwidth on the photoacoustic signals. In this case, the center frequency and fractional bandwidth of the detector are set to 1 MHz and 100%, respectively.

    In addition to the aperture and bandwidth of a detector, the detector number and view angle of a detector array also play important roles in photoacoustic signal detection. Ideally, the number of detectors in a detector array used for photoacoustic signal measurement should meet the spatial Nyquist sampling theorem for complete signal acquisition[4]. The view angle of a detector array should be 4π steradian (full 3D view) to record complete photoacoustic signals in 3D space. However, these two conditions are actually unattainable in practice due to the high fabrication cost of a large number of detectors and the configuration constraints of an imaging system (e.g., a separate space is required for laser illumination). Violating these conditions leads to the problems of image reconstruction from sparse-view and limited-view projections, which are mathematically challenging and will be discussed later in this review. Figure 7 showcases commonly used 2D and 3D detection geometries in PACT[87,88], which have limited view angles except for the closed spherical array.

    Commonly used detection geometries in PACT. (a) Linear array. (b) Curved array. (c) Circular array. (d) Planar array. (e) Cylindrical array. (f) Hemispherical array. (g) Closed spherical array. Reprinted from Refs. [87,88] with permission.

    Figure 7.Commonly used detection geometries in PACT. (a) Linear array. (b) Curved array. (c) Circular array. (d) Planar array. (e) Cylindrical array. (f) Hemispherical array. (g) Closed spherical array. Reprinted from Refs. [87,88] with permission.

    2.4 Radon Transform

    The forward signal propagation and detection processes in PACT can be described by the well-known Radon transform[89], which is the mathematical foundation of computed tomography (CT).

    Before discussing the Radon transform in PACT, we first introduce the linear Radon transform in X-ray CT, which is defined as the integral of a function along a straight line. Specifically, the linear Radon transform in X-ray CT is written as[90]g(s,θ)=f(x,y)δ(xcosθ+ysinθs)dxdy,where f(x,y) is the original function, g(s,θ) is the sinogram or projection data of the function f(x,y) along the straight line defined in the Dirac delta function δ, and (s,θ) are two parameters of the normal equation of the straight line in the delta function. The inverse Radon transform reverses the forward process and recovers the original function f(x,y) from measured sinograms g(s,θ). A schematic representation of the linear Radon transform is shown in Fig. 8(a).

    Forward and inverse Radon transforms in X-ray CT and PACT. (a) Linear Radon transform and its inverse in X-ray CT. (b), (c) Circular and spherical Radon transforms and their inverses in PACT.

    Figure 8.Forward and inverse Radon transforms in X-ray CT and PACT. (a) Linear Radon transform and its inverse in X-ray CT. (b), (c) Circular and spherical Radon transforms and their inverses in PACT.

    In PACT and thermoacoustic tomography, the Radon transform integrates a function along a circle (2D) or a sphere (3D) instead of a straight line. The 2D circular Radon transform can be written as[91]g(s,θ)=f(x,y)δ[s(xrcosθ)2+(yrsinθ)2]dxdy,where (r,θ) represents the position of the detector in a polar coordinate system and s is the radius of the integral circle. Similarly, the 3D spherical Radon transform g(s,θ) can be mathematically written as[92]g(s,θ)=f(x,y,z)δ[s(xrsinφcosθ)2+(yrsinφ  sinθ)2+(zrcosφ)2]dxdydz,where (r, φ, θ) represents the position of the detector in a spherical coordinate system and s is the radius of the spherical shell to be integrated. Schematic representations of the circular and spherical Radon transforms are shown in Figs. 8(b) and 8(c).

    Using the definition in Eq. (26), the spherical Radon transform in PACT can be obtained from Eq. (18) and has the following form: g(rd,t)=p0(rs)δ(t|rdrs|v0)drs.

    Equation (27) is in the form of the spherical Radon transform [Eq. (26)], representing the integral over a spherical shell with a radius of v0t centered at the detector position rd. The projection data g(rd,t) are related to the measured photoacoustic signal p(rd,t) via g(rd,t)=4πv02|rdrs|0tp(rd,τ)dτ.

    The projection data g(rd,t) have a similar definition as the velocity potential in Eq. (19) and can be calculated from the measured photoacoustic signal p(rd,t). The task of image reconstruction in PACT is to find the inverse spherical Radon transform and recover the original function f(x,y) or p0(rs) from the measured photoacoustic signal p(rd,t), which is the focus of the following sections.

    3 Conventional Approaches

    3.1 DAS-Type Algorithms

    Delay and sum (DAS)-based beamforming is a commonly used image reconstruction technique in ultrasound imaging[93]. Due to similar image formation processes, DAS-based beamforming is also widely used in PACT, where it reconstructs an image by summing the delayed raw photoacoustic signals of each detector. The time delay between each channel is calculated according to the acoustic TOF of the photoacoustic signals from the point of interest to each ultrasound detector. To yield high-quality images, preprocessing of raw photoacoustic signals and/or postprocessing of reconstructed images are usually needed. Here we review the five most commonly used DAS-type image reconstruction algorithms: DAS, delay multiply and sum (DMAS), short-lag spatial coherence (SLSC), minimum variance (MV), and coherence factor (CF), which adopt different preprocessing and/or postprocessing strategies. The basic workflows of these methods are first illustrated in Fig. 9, and detailed descriptions are presented below.

    Workflows of different DAS-based image reconstruction algorithms in PACT. (a) DAS. (b) DMAS. (c) SLSC (M=3, L=1, l=1, t1=t2=t). (d) MV. (e) CF-DAS; sign denotes the signum function and sqrt denotes the square root.

    Figure 9.Workflows of different DAS-based image reconstruction algorithms in PACT. (a) DAS. (b) DMAS. (c) SLSC (M=3, L=1, l=1, t1=t2=t). (d) MV. (e) CF-DAS; sign denotes the signum function and sqrt denotes the square root.

    3.1.1 Delay and sum

    DAS is the most basic beamforming algorithm in ultrasound imaging due to its simplicity, speed, and robustness[9496]. Hoelen et al. introduced a DAS method for 3D PACT imaging of blood vessels based on a planar detection geometry in 1998[26,97]. Feng et al. applied a DAS method in linear-scanning thermoacoustic tomography in 2001[98]. The DAS method can be mathematically formulated as SDAS(x,z)=i=1Msi(t),where SDAS is the reconstructed image, si(t) is the photoacoustic signal measured by the ith detector at time t, M is the total number of detectors, and (x,z) represents the position in a coordinate system. The variable t in si(t) denotes the TOF of the photoacoustic signals from position (x,z) to the ith detector. The workflow and principle of the DAS algorithm are illustrated in Figs. 9(a) and 10, respectively.

    Principle of DAS-based image reconstruction.

    Figure 10.Principle of DAS-based image reconstruction.

    The performance of the DAS algorithm was evaluated using a designed numerical phantom with multiple point sources equally distributed along the longitudinal centerline [Fig. 11(a)]. In the evaluation, a linear detector array with a width of 200 mm and 128 elements was placed at the top of the phantom to receive the photoacoustic signals generated from the sources. Figure 11(b) shows the image reconstructed by the DAS method. The results show that DAS can successfully reconstruct the structural information of photoacoustic sources but fails to reproduce the correct amplitude. The reconstructed image has both positive and negative components and is bipolar in magnitude. A unipolar image can be obtained by finding the envelope of the bipolar image, as shown in Fig. 11(c). Log transformation can be further used to improve the contrast of the reconstructed image, as shown in Fig. 11(d). Since DAS treats the delayed photoacoustic signals si(t) from different detectors equally and is non-adaptive, significant side lobes are present in the reconstructed image in Fig. 11(d), which degrades the spatial resolution of the image.

    An example showing DAS-based image reconstruction in PACT. (a) Ground truth. (b) Image reconstructed by DAS. (c) Envelope of (b). (d) Log transform of (c). The detector array is at the top of the image.

    Figure 11.An example showing DAS-based image reconstruction in PACT. (a) Ground truth. (b) Image reconstructed by DAS. (c) Envelope of (b). (d) Log transform of (c). The detector array is at the top of the image.

    3.1.2 Delay multiply and sum

    Delay multiply and sum (DMAS) is a variant of the DAS algorithm that can provide improved contrast, signal-to-noise ratio (SNR), and resolution. Similar to DAS, DMAS also sums photoacoustic signals measured by different detectors according to calculated time delays. However, the measured photoacoustic signals in DMAS need to be combinatorically coupled and multiplied before summation. Therefore, DMAS is essentially a nonlinear algorithm.

    The DMAS algorithm was first proposed by Lim et al. for confocal microwave detection of breast cancer in 2008 and showed improved identification of embedded malignant tumors in a variety of numerical breast phantoms compared with DAS[99]. In 2015, Matrone et al. modified this method for B-mode ultrasound imaging and demonstrated that DMAS can provide higher contrast resolution than DAS[100]. Inspired by the success of DMAS in confocal microwave imaging and ultrasound imaging, several research groups worldwide have conducted in-depth studies of DMAS-based photoacoustic image reconstruction. For example, Alshaya et al. introduced DMAS to the field of PACT in 2016 and proposed a filter DMAS to improve the SNR of reconstructed images[101]. To further improve the performance of DMAS, in 2018, Mozaffarzadeh et al. developed a double-stage DMAS algorithm that can produce images with improved quality compared with DAS and DMAS but at the expense of greater computational cost[27]. In the same year, Kirchner et al. proposed a signed DMAS algorithm that can provide linear image reconstruction with increased image quality[102]. In 2022, Mulani et al. presented a high-order DMAS method, in which multi-term (e.g., three, four, or five terms) multiplication is used to replace two-term multiplication in the original DMAS algorithm[103].

    Generally, the original DMAS algorithm can be mathematically written as[100]SDMAS(x,z)=i=1M1j=i+1Mxij(t),where SDMAS is the reconstructed image, M is the total number of detectors, and xij(t) is given by xij(t)=sign[si(ti)sj(tj)]|si(ti)sj(tj)|,where sign(·) denotes the signum function. The workflow of the DMAS algorithm is illustrated in Fig. 9(b). If the center frequency of the photoacoustic signals si(t) and sj(t) is f0, the multiplication operation in Eq. (31) shifts the center frequency of the resultant signal to 0 and 2f0. As a result, the output signal SDMAS needs to be filtered by a bandpass filter centered at 2f0 to extract the second harmonic component 2f0 while removing the direct current (DC) component. Therefore, DMAS is also called filtered-DMAS. Compared with DAS, DMAS uses both the amplitude and spatial correlation of delayed photoacoustic signals from different detectors for image reconstruction. For this reason, it can partially overcome the drawbacks of DAS and reconstruct images with improved spatial resolution and reduced side lobes[100].

    A downside of DMAS is that the combinational multiplication in the algorithm increases the computational complexity. To solve this problem, in 2019, Jeon et al. proposed a DMAS algorithm with a modified CF, which avoids combinatorial multiplication in DMAS and significantly reduces the total number of multiplication operations[104]. In 2022, Paul et al. proposed a simplified-delay-multiply-and-standard-deviation (SDMASD) method[105], which is based on the measurement of the standard deviation of delayed and multiplied signals instead of normal delayed signals. Compared with native DAS and DMAS algorithms, the SDMASD algorithm can achieve real-time imaging using graphics processing units (GPUs) and produce improved image quality.

    3.1.3 Short-lag spatial coherence

    Short-lag spatial coherence (SLSC) is a beamforming technique that was initially developed by Lediju et al. for ultrasound imaging in 2011[106]. SLSC reconstructs an ultrasound image by calculating the spatial coherence of measured signals, and the reconstructed image is thus independent of the amplitude of the signals. As a result, SLSC can eliminate adverse effects caused by the different strengths of scatterers in ultrasound imaging. It has been demonstrated that compared with the conventional DAS beamforming algorithm, SLSC can achieve image reconstruction with considerable improvements in terms of resolution, contrast-to-noise ratio (CNR), and SNR.

    In PACT, Muyinatu Bell et al. applied the SLSC algorithm to achieve imaging of prostate brachytherapy seeds[28,107] and demonstrated that the SLSC algorithm can enhance photoacoustic image quality compared with DAS, especially when the intensity of laser illumination is insufficient. Graham and Muyinatu Bell later developed a spatial coherence theory based on the van Cittert–Zernike theorem, a classical theorem in statistical optics, to explore the strengths and limitations of the SLSC algorithm[108,109]. In 2021, Mora et al. combined SLSC with DMAS and proposed a generalized spatial coherence algorithm for PACT, which can preserve relative signal magnitudes and improve the CNR and SNR of a reconstructed image[110].

    Similar to DMAS, the photoacoustic signals in SLSC are also first delayed according to the TOF from the point of interest to each ultrasound detector and then combinatorically coupled and multiplied before summation. The SLSC algorithm can be formulated as[106]R^p(l)=1Mli=1Mlt=t1t2si(t)si+l(t)t=t1t2si2(t)t=t1t2si+l2(t),where R^p(l) is the normalized spatial coherence of the signals measured by a detector, M is the total number of detectors, and l represents the number of intervals between two detectors used to calculate the spatial coherence and is called the lag. The final SLSC image is obtained by summing all R^p(l) terms, i.e., SSLSC(x,z)=l=1LR^p(l),where L is the total number of lags. A large L helps improve the lateral resolution but decreases the CNR and SNR. Therefore, the value of L needs to be elaborately selected to achieve the best tradeoff between key image quality metrics, such as lateral spatial resolution, CNR, and SNR. The workflow of the SLSC algorithm is illustrated in Fig. 9(c).

    3.1.4 Minimum variance

    Minimum variance (MV) is a weighted DAS method that was originally devised by Capon in 1969 for narrowband signal processing applications such as sonar, radar, and wireless communication[111]. In 2002, Mann and Walker used a constrained adaptive beamformer (the MV method) for medical ultrasound imaging[112] and demonstrated its effectiveness in spatial resolution and contrast enhancement. Due to the remarkable improvement in spatial resolution, a large number of MV-based methods have been developed for ultrasound imaging[113116].

    MV-based image reconstruction has also been studied in PACT in recent years[30,117120]. The MV reconstruction formula can be written as[114,116]SMV(x,z)=i=1Mwi(t)si(t)=w(t)Hs(t),where M is the total number of detectors, wi(t) is the optimal weight for the photoacoustic signal si(t) measured by the ith detector, s(t) is a vector containing delayed photoacoustic signals from all detectors, w(t) is a vector containing optimal weights for the delayed photoacoustic signals in s(t), and the superscript H denotes the conjugate transpose. The workflow of the MV algorithm is illustrated in Fig. 9(d). The weight w(t) can be adaptively calculated by minimizing the variance of the output SMV while maintaining the unit signal gain at the focal imaging point. The optimization problem can be mathematically written as[114]minwwHRw,subjecttowHa=1,where R=E[ssH] is the spatial covariance matrix of the delayed photoacoustic signals s(t), E denotes the expectation, and a is the equivalent of the steering vector in narrowband applications[114,116]. When the photoacoustic signals of each detector are delayed, a is a simple all-one vector. The optimization problem in Eq. (35) can be solved by the method of Lagrange multipliers[121], which gives w=R1aaHR1a.

    By substituting Eq. (36) into Eq. (34), MV-based image reconstruction can be achieved.

    To improve the robustness of the MV method, the covariance matrix R can be calculated based on a spatial smoothing strategy, where a detector array is divided into a group of overlapping subarrays, and the covariance matrices of all subarrays are calculated and averaged to form the final covariance matrix[114]. In this way, the covariance matrix R is given as[114]R=1NdL+1l=1NdL+1sl(t)sl(t)H,where L is the number of detectors in a subarray and l is the index of the detector in the current subarray. To estimate the covariance matrix more accurately and enhance the contrast of MV, temporal averaging of multiple samples can be used together with spatial smoothing[116]. Once the covariance matrix R is estimated, the optimal weights w can be obtained by Eq. (36). The final image reconstructed by the MV algorithm can be written as[114]SMV(x,z)=1NdL+1l=1NdL+1w(t)Hsl(t).

    Although the MV method can produce narrower main lobes (i.e., the lobe located at the target) and higher spatial resolution than algorithms such as DAS and DMAS, its performance in terms of side lobe (i.e., the lobes adjacent to the main lobe) suppression and image contrast enhancement is limited. Therefore, many studies have been devoted to the development of MV-based hybrid beamforming algorithms. In 2008, Park et al. imposed additional CF weights on MV and achieved enhanced spatial resolution, contrast, and side lobe suppression[117]. In 2018, Mozaffarzadeh et al. developed an MV-based DMAS method[118] and an eigenspace-based MV method combined with DMAS for resolution improvement and side lobe reduction[119]. In 2021, Paul et al. proposed an adaptive-weighting-based MV to address the side lobe issue in MV beamformed images. It was demonstrated that the weighted MV approach can improve SNR while reducing main lobe width and side lobe intensity and has the potential for use in PACT imaging systems with a limited number of ultrasound detectors[30].

    3.1.5 Coherence factor

    The coherence factor (CF) is a pixel-level adaptive weighting factor that can improve the performance of DAS-based beamforming methods in side lobe suppression and spatial resolution improvement. CF was originally proposed by Mallart and Fink in 1994 for the evaluation of phase aberration correction techniques in scattering media[122] and was later used as an image quality metric for ultrasound imaging by Hollman et al.[123]. In 2003, Li et al. presented a generalized CF for ultrasound beamforming in heterogeneous media and showed that the combination of generalized CF and DAS-based beamformers could yield improved image quality[124].

    In PACT, Liao et al. incorporated CF into DAS in 2004 and demonstrated the superiority of the CF-weighted DAS method in improving spatial resolution and SNR compared with DAS[29]. Wang and Li considered the local SNR in the formulation of CF and developed an SNR-dependent CF for ultrasound and photoacoustic imaging in 2014[125]. Wang et al. integrated CF with a focal-line-based image formation method to improve the contrast and elevational resolution of 3D photoacoustic imaging in 2016[126]. Mozaffarzadeh et al. proposed a high-resolution CF weighting technique and achieved improved resolution, SNR, and side lobe suppression in 2019[127]. Paul et al. considered the noise level variations of raw ultrasound data in the formulation of CF and achieved improvements in image resolution, side lobe reduction, SNR, and contrast in 2021[128]. In the same year, Mukaddim and Varghese extended CF from the spatial domain to the spatiotemporal domain[129]. This extension helps cancel out signals with low spatial and temporal coherence and results in higher background noise cancellation while preserving the main features of interest in reconstructed images.

    Mathematically, the CF is defined as the ratio of the coherent sum of photoacoustic signals across detectors to the incoherent sum and can be formulated as[123]CF(x,z)=|i=1Msi(t)|2Mi=1M|si(t)|2,where si(t) is the photoacoustic signal measured by the ith detector at time t and M is the total number of detectors. The value of the CF ranges from zero to one. A large value means that the signals at that point are highly focused and that the point can be reconstructed with high quality. In contrast, a small CF value indicates that the signals are weakly focused and will result in lower image quality. The CF-weighted DAS method can be expressed as SCF-DAS(x,z)=CF(x,z)SDAS(x,z),where SCF-DAS(x,z) is the reconstructed image. The term SDAS(x,z) can be replaced by the outputs of other beamforming methods, such as DMAS, SLSC, and MV. The workflow of the CF algorithm is illustrated in Fig. 9(e).

    Each beamforming method mentioned above has advantages and disadvantages. To improve the performance of image reconstruction in PACT, they can be combined to yield hybrid beamforming methods such as DMASD plus DAS/DMAS[105], SLSC plus filter DMAS[110], MV plus DMAS[118,119], CF plus DMAS[104,130], and CF plus MV[117,127,131].

    Representative work on DAS-based image reconstruction in PACT is summarized in Table 3. For completeness, Table 3 also includes relevant work in ultrasound or microwave imaging.

    • Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT

      Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT

      MethodAuthorYearVariantSource
      DASMa et al.2020Multiple DAS with enveloping[132]
      Hoelen et al.2000Modified DAS[97]
      Hoelen et al.1998Modified DAS[25,26]
      DMASMulani et al.2022High-order DMAS[103]
      Jeon et al.2019CF-weighted DMAS[104]
      Mozaffarzadeh et al.2018Double-stage DMAS[27]
      Kirchner et al.2018Signed DMAS[102]
      Alshaya et al.2016Filter DMAS[101]
      Lim et al.2008DMAS (microwave imaging)[99]
      SLSCGraham et al.2020Photoacoustic spatial coherence theory for SLSC[109]
      Bell et al.2013SLSC (for PACT)[28]
      Lediju et al.2011SLSC (for ultrasound)[106]
      MVAsl & Mahloojifar2009Modified MV (for ultrasound)[116]
      Synnevag et al.2007MV (for ultrasound)[114]
      Mann & Walker2002Constrained adaptive beamformer[112]
      CFMao et al.2022Spatial coherence + polarity coherence[133]
      Mukaddim et al.2021Spatiotemporal CF[129]
      Paul et al.2021Variational CF[128]
      Wang et al.2014SNR-dependent CF[125]
      Liao et al.2004CF-weighted DAS[29]
      Li et al.2003Generalized CF[124]
      Mallart & Fink1994CF[122]
      HybridPaul et al.2022SDMASD + DAS/DMAS[105]
      Mora et al.2021SLSC + Filter DMAS[110]
      Mozaffarzadeh et al.2019CF + MV[127,131]
      Mozaffarzadeh et al.2018CF + DMAS[104,130]
      Mozaffarzadeh et al.2018MV + DMAS[118,119]

    3.2 Filtered Back Projection

    DAS-type algorithms can achieve approximate photoacoustic image reconstruction and are inexact reconstruction techniques. To achieve exact image reconstruction, advanced algorithms are needed. Filtered back projection (FBP) is a class of algorithms that are based on the Radon transform (see Sec. 2.4). It achieves image reconstruction by first filtering measured photoacoustic signals and then back-projecting the filtered signals into the image domain. The back-projection operation in FBP is similar to the reconstruction procedure in DAS-type algorithms. Therefore, the native DAS algorithm can be regarded as a simplified version of FBP, which achieves image reconstruction by directly back-projecting original photoacoustic signals into the image domain.

    3.2.1 Approximate filtered back projection

    Early FBP algorithms were developed based on the condition of far-field approximation[35], which states that if the distance between a detector and a photoacoustic source is much greater than the size of the photoacoustic source itself (i.e., |rsrd|d, d: source size), the approximation |rdrs||rdrs·(rd/|rd|)| holds (see Fig. 12). Under the condition of far-field approximation, the photoacoustic integral over a spherical shell can be approximated by the integral over its tangential plane. Consequently, image reconstruction in PACT can be achieved by inverting the linear Radon transform. Kruger et al. proposed an approximate FBP algorithm in 1995[22], which is the first formal image reconstruction method in PACT. Xu and Wang developed approximate FBP algorithms from a more rigorous perspective in 2002[134,135] and 2003[136]. They deduced exact solutions to the image reconstruction problem and proposed approximate time-domain FBP algorithms for circular[134], spherical[135], planar, and cylindrical detection geometries[136].

    Schematic diagram showing the signal detection and image reconstruction geometry in FBP. The forward problem and the image reconstruction problem in PACT correspond to the spherical Radon transform and its inverse, respectively. Under the condition of the far-field approximation, the integral over a spherical shell can be approximated by the integral over its tangential plane.

    Figure 12.Schematic diagram showing the signal detection and image reconstruction geometry in FBP. The forward problem and the image reconstruction problem in PACT correspond to the spherical Radon transform and its inverse, respectively. Under the condition of the far-field approximation, the integral over a spherical shell can be approximated by the integral over its tangential plane.

    The preceding approximate FBP algorithms were generally developed for specific detection geometries. In 2007, Burgholzer and Matt extended the approximate FBP algorithms for image reconstruction in arbitrarily closed detection geometries under the assumption of the far-field approximation. The extended FBP reconstruction formula can be written as[35]p0(rs)14πSb(rd,t)δ(t|rsrd|v0)dΩ,where p0(rs) is the reconstructed photoacoustic image, δ is the Dirac delta function, b(rd,t) is the back-projection term given by b(rd,t)=2tp(rd,t)t,and dΩ is the solid angle subtended by the element dσ of a detection surface and can be calculated by dΩ=dσ|rsrd|2(nd·rsrd|rsrd|).Here nd is the unit normal vector of the detector surface pointing to the region of interest (ROI).

    The time-domain first derivative in Eq. (42) can be interpreted in the frequency domain as p(rd,t)t=F1{iωF{p(rd,t)}},where F and F1 denote the forward and inverse Fourier transforms, respectively; ω is a ramp filter, which suppresses the low-frequency contents of the measured photoacoustic signal p(rd,t) and amplifies high-frequency information. Since the value of ω extends from to , the ramp filter is not integrable, and the inverse Fourier transform in Eq. (44) is undefined. To solve this problem, the ramp filter can be band-limited by a window function and Eq. (44) becomes p0(rd,t)t=F1{iωW(ω)F{p0(rd,t)}},where W(ω) represents the window function. In practice, a smooth window such as the Hanning function is preferred over a box function because the latter may introduce undesirable ringing artifacts in the image domain.

    One benefit of the far-field approximation in FBP is that it allows for simplified calculation of the solid angle dΩ. In other words, under the far-field approximation, dΩ in Eq. (43) reduces to dΩdσ|rd|2(nd·rd|rd|).

    Compared with Eq. (43), Eq. (46) involves only the detector position rd when calculating dΩ and is independent of the source position rs, thereby reducing the computational complexity. For example, assuming that the size of a 3D image to be reconstructed is Nx×Ny×Nz voxels (Nx=Ny=Nz=n) and the number of detectors used for imaging is M (M=n×n), the computational complexity is O(n5) for Eq. (43) and only O(n2) for Eq. (46). Furthermore, if the detection geometry is spherical, dΩ in Eq. (46) becomes a constant (dΩ=dσ/|rd|2), indicating that the computational complexity is O(1).

    3.2.2 Exact filtered back projection

    The approximate FBP algorithms are based on the condition of far-field approximation. This condition, however, may not be fully met in practice considering that photoacoustic signals may attenuate with propagation distance and that signal detection in space may be restricted. To solve this problem, in 2004, Finch et al. derived an exact FBP formula for image reconstruction in spherical detection geometries with odd dimensions (n3)[23]. In 2005, Xu and Wang presented a universal FBP formula for image reconstruction in infinite planar, infinite cylindrical, and closed spherical detection geometries[24]. The reconstruction formula is given as p0(rs)=Ωb(rd,t)δ(t|rsrd|v0)dΩΩ,where the back-projection term b(rd,t)=2[p(rd,t)tp(rd,t)t].Ω is the solid angle of the detection surface with respect to the reconstruction point and equals 2π for the infinite planar geometry and 4π for the spherical and cylindrical geometries. FBP is one of the most commonly used algorithms in PACT.

    By comparing Eqs. (47) and (41), we find that the exact FBP algorithm is very close to the approximate FBP algorithm in the formulas. They both consist of three reconstruction steps: filtering, back projection (the δ function), and summation. Their major difference is in the back-projection terms. The back-projection term b(rd,t) in the exact FBP algorithm has an extra term p(rd,t), which is not present in the approximate FBP algorithm. The reasons are as follows. According to the forward solution of the photoacoustic wave equation [Eq. (18)], the amplitude of the photoacoustic signals received by a detector is proportional to the size of the source and inversely proportional to the detection distance, i.e., p(rd,t)d/|rdrs|. Under the condition of the far-field approximation, i.e., |rdrs|d, the amplitude of p(rd,t) is small enough compared with the derivative term tp(rd,t)/t in Eq. (48) to be ignored without significant loss of reconstruction accuracy. Therefore, the back-projection term b(rd,t) in the approximate FBP algorithm does not involve the term p(rd,t).

    Figure 13 presents an example demonstrating the principle of FBP-based image reconstruction. In this example, the photoacoustic source is a 5-mm-diameter uniform spherical absorber, and the photoacoustic signals generated from the source are recorded by a 40-mm-diameter circular detector array [Fig. 13(a)]. Figures 13(b) and 13(c) show the representative photoacoustic signal p(rd,t) measured by a detector and the corresponding back-projection signal b(rd,t), respectively. Figure 13(d) shows the projection images of detectors at different positions. Figures 13(e)13(g) are the reconstruction results obtained by summing the projection images of four, 16, and 256 detectors, respectively. The results demonstrate that the FBP algorithm can effectively reconstruct the structure and amplitude information of the photoacoustic source. Note that the artifacts in the reconstructed images are not caused by the FBP algorithm but by the limited view angle of the circular detector array[137].

    Illustration of the principle of the FBP algorithm. (a) Schematic diagram showing a spherical photoacoustic source (diameter: 5 mm) and an array of point-like detectors uniformly distributed over a circle (diameter: 40 mm). (b) N-shaped photoacoustic signal recorded by a detector on the detection circle. (c) Back-projection signal [Eq. (48)]. (d) Projection images produced by the detectors at different positions. (e)–(g) Images reconstructed using 4, 16, and 256 detectors, respectively. Adapted from Ref. [137] with permission.

    Figure 13.Illustration of the principle of the FBP algorithm. (a) Schematic diagram showing a spherical photoacoustic source (diameter: 5 mm) and an array of point-like detectors uniformly distributed over a circle (diameter: 40 mm). (b) N-shaped photoacoustic signal recorded by a detector on the detection circle. (c) Back-projection signal [Eq. (48)]. (d) Projection images produced by the detectors at different positions. (e)–(g) Images reconstructed using 4, 16, and 256 detectors, respectively. Adapted from Ref. [137] with permission.

    To evaluate the performance of the FBP algorithm for different detection geometries, a group of image reconstruction simulations was conducted. In the simulations, a multi-sphere phantom is used as the photoacoustic source. The phantom contains nine spherical absorbers with unit intensity, among which eight with diameters uniformly varying from 1 mm to 2 mm are evenly distributed on a circle with a diameter of 10 mm, while the ninth one, with a diameter of 1.6 mm, is seated at the origin. Figures 14(a)14(c) show the relative positions of the multi-sphere phantom and a finite planar, finite cylindrical, and closed spherical detection surface. The planar detection surface is located 12 mm below the xy plane, and the cylindrical and spherical detection surfaces are centered at the origin. The three detection surfaces have the same number (32768) of evenly distributed point-like detectors and approximately equal detection areas (plane: 150  mm×150  mm; cylinder: 70 mm base diameter, 100 mm height; sphere: 84 mm diameter). The reconstructed images of the xz and xy cross sections of the phantom are displayed in Figs. 14(d)14(f) and 14(g)14(i), respectively. The results show that the FBP algorithm achieves stable image reconstruction for all three detection geometries. However, the image reconstructed in the closed spherical detection surface has much fewer artifacts than those reconstructed in the planar and cylindrical detection surfaces. This is because the spherical detection surface is closer to an ideal detection geometry than the finite planar and finite cylindrical surfaces.

    Image reconstruction by FBP in three common detection geometries. (a)–(c) Schematic diagrams showing a multi-sphere phantom and planar, cylindrical, and spherical detection surfaces. The three detection surfaces have the same number of point-like detectors (32768) and approximately equal detection areas. Please refer to the text for more simulation settings. (d)–(f) Reconstructed images in the x–z plane. (g)–(i) Reconstructed images in the x–y plane. Adapted from Ref. [137] with permission.

    Figure 14.Image reconstruction by FBP in three common detection geometries. (a)–(c) Schematic diagrams showing a multi-sphere phantom and planar, cylindrical, and spherical detection surfaces. The three detection surfaces have the same number of point-like detectors (32768) and approximately equal detection areas. Please refer to the text for more simulation settings. (d)–(f) Reconstructed images in the xz plane. (g)–(i) Reconstructed images in the xy plane. Adapted from Ref. [137] with permission.

    To extend the application scenarios of FBP, many in-depth studies have also been carried out[138]. For example, in 2007, Kunyansky proposed a set of FBP-type formulas for image reconstruction in spherical detection geometries with arbitrary dimensions (n2)[139]. In the same year, Finch et al. also developed a set of FBP-type inverse formulas for spherical detection geometries with even dimensions[140]. In 2009, Nguyen derived a family of inverse formulas for thermoacoustic tomography, in which many previously known FBP inverse formulas can be obtained as special cases[141]. In addition, the exact FBP algorithms mentioned above are primarily used for planar, cylindrical, and spherical detection geometries with point-like detectors. In 2007, Burgholzer et al. developed a two-step FBP algorithm for integrating line detectors[142]. In 2012, Natterer proposed a novel FBP algorithm for elliptical detection geometries[143], which was further developed by Palamodov in 2012[144], Haltmeier in 2014[145,146], and Salman in 2014[147].

    Moreover, the FBP algorithm is well suited for parallel computing. GPUs have been used to achieve real-time FBP image reconstruction in both 2D and 3D imaging[42,148153]. A field-programmable gate array (FPGA) was also employed to accelerate image reconstruction in low-cost PACT systems[154].

    Table 4 summarizes representative work on FBP-based image reconstruction in PACT.

    • Table 4. Representative Work on FBP-Based Image Reconstruction in PACT

      Table 4. Representative Work on FBP-Based Image Reconstruction in PACT

      AuthorYearMethodDetection GeometryDimensionSource
      Haltmeier2014FBPEllipticalArbitrary[145,146]
      Salman2014FBPElliptical2D and 3D[147]
      Natterer2012FBPElliptical3D[143]
      Palamodov2012FBPEllipticalArbitrary[144]
      Nguyen2009FBPSphericalArbitrary[141]
      Burgholzer &
      Matt2007Approximate FBPArbitrary3D[35]
      Burgholzer et al.2007FBPArbitrary closed detection curve2D or 3D[142]
      Kunyansky2007FBPSphericalArbitrary[139]
      Finch et al.2007FBPSphericalEven[140]
      Xu & Wang2005FBPPlanar, cylindrical, and spherical3D[24]
      Finch et al.2004FBPSphericalOdd[23]
      Xu & Wang2003Approximate FBPPlanar, cylindrical, and spherical3D[136]
      Xu & Wang2002Approximate FBPCircular3D[134]
      Xu & Wang2002Approximate FBPSpherical3D[135]
      Xu et al.2001Approximate FBPCircular3D[155]
      Kruger et al.1995Approximate FBPCircular3D[22]

    3.3 Series Expansion

    The basic principle of series expansion (SE) algorithms is to approximate the image to be reconstructed using mathematical series. Compared with other reconstruction algorithms in PACT, SE algorithms are simple and fast for specific detection geometries such as planar geometries because they can be implemented using FFT.

    3.3.1 Series expansion for planar detection geometries

    Norton et al. proposed exact SE methods for the reconstruction of acoustic reflectivity in circular geometries in 1980[156] and in planar, cylindrical, and spherical geometries in 1981[157]. Several groups have studied similar ideas for image reconstruction in PACT. Köstli et al. in 2001 proposed an exact SE algorithm for image reconstruction in planar detection geometries[31], and Xu et al. in 2002 reported a similar algorithm[158].

    Mathematically, SE-based PACT image reconstruction for planar detection geometries can be formulated as[31,83]P0(kx,ky,ω)=v02kz2ωFx,y,t{p(x,y,t)},P0(kx,ky,ω)ω2=v02(kx2+ky2+kz2)P0(kx,ky,kz),p0(x,y,z)=Fx,y,z1{P0(kx,ky,kz)},where p(x,y,t) is the photoacoustic signal measured by a planar surface at position (x,y) and time t; kx, ky, and kz are the spatial wavenumber components in each Cartesian direction; ω is the temporal frequency; v0 is the SOS; → represents the interpolation operation between the temporal and spatial frequencies ω and kz; p0(x,y,z) and P0(kx,ky,kz) are the initial photoacoustic pressure and its spatial Fourier transform; and F and F1 represent the forward and inverse Fourier transforms, respectively. The reconstruction procedure above involves one interpolation and two Fourier transforms and has a computational complexity of O(n3logn) for a 3D image with a size of n×n×n voxels by use of FFT[159]. The numerical implementation of the SE algorithm is available in the k-Wave toolbox[83].

    Figure 15 is a numerical example showing SE-based PACT image reconstruction using the k-Wave toolbox. The photoacoustic source in the xy plane contains nine spherical absorbers with unit intensity. Among them, eight absorbers with diameters uniformly varying from 1 mm to 2 mm are evenly distributed on a circle with a diameter of 10 mm, while the ninth absorber with a diameter of 1.6 mm is seated at the origin. The signal detection geometry is a planar surface 12 mm below the xy plane. The planar surface has 364×364 point detectors uniformly distributed spanning a physical size of 36  mm×36  mm. Figures 15(b) and 15(c) are the xz and xy cross sections of the photoacoustic source, respectively, and Figs. 15(d) and 15(e) are the corresponding reconstructed images, which show that the SE algorithm can recover major structures of the source. Note that the distortion and blurring in the reconstructed images are due to the finite size of the planar detection surface.

    Example of SE-based image reconstruction in PACT. (a) Schematic diagram of a planar detection geometry and a multi-sphere photoacoustic source. (b), (c) xz and xy cross sections of the source. (d), (e) xz and xy cross sections of the reconstructed source. Please refer to the text for the simulation settings.

    Figure 15.Example of SE-based image reconstruction in PACT. (a) Schematic diagram of a planar detection geometry and a multi-sphere photoacoustic source. (b), (c) xz and xy cross sections of the source. (d), (e) xz and xy cross sections of the reconstructed source. Please refer to the text for the simulation settings.

    3.3.2 Series expansions for circular, cylindrical, and spherical detection geometries

    In 2002, the Wang group reported exact SE algorithms for PACT image reconstruction in cylindrical and spherical geometries[135,160]. However, the image reconstruction procedures in these two cases involve complicated mathematical calculations, preventing their implementations using FFT and are thus time-consuming[135,160]. Based on Norton’s pioneering work on circular geometries[156], Haltmeier and Scherzer in 2007 proposed a 3D reconstruction algorithm for cylindrical detection geometries, where photoacoustic signals are recorded by linear integrating detectors[161]. The proposed algorithm has a computational complexity of O(n4) for a 3D image with a size of n×n×n voxels, which is higher than that of the SE algorithm in planar geometries [O(n3logn)][159] but lower than that of FBP [O(n5)][24].

    In 2012, Kunyansky proposed fast image reconstruction algorithms suitable for circular, cylindrical, and spherical detection geometries[33]. In Kunyansky’s work, image reconstruction is based on the Fourier transforms of the initial photoacoustic pressure p0(rs) and the measured photoacoustic signals p(rd,t), as summarized below. Suppose that the detection geometry is a circle, as shown in Fig. 7(c). By expanding the Fourier transform P(rd,ω) of the measured photoacoustic signal p(rd,t) and the initial photoacoustic pressure p0(rs) in the Fourier series in φ and θ, we have[33]P(rd,ω)=k=Pk(ω)eikφ,rd=(rdcosφ,rdsinφ),p0(rs)=k=p0k(rs)eikθ,rs=(rscosθ,rssinθ),where i is the imaginary unit, ω is the angular frequency, rd=|rd| is the radius of the detection circle, rs=|rs| is the distance from the reconstruction point to the center of the circular detection geometry, and Pk(ω) and p0k(rs) are expansion coefficients given by[33]Pk(ω)=12π02πP(rd,ω)eikφdφ,p0k(rs)=2π0Pk(ω)H|k|(1)(ωR)J|k|(ωrs)dω,where H|k|(1) is the Hankel function of the first kind of order k and J|k| is the Bessel function of the first kind of order |k|[162]. Equation (55) is somewhat similar to an expression in Norton’s work[156], where the Bessel function J|k| rather than the Hankel function H|k|(1) is in the denominator and a term corresponding to the real part of Pk(ω) in the numerator.

    By discretizing Eq. (55), we can obtain p0k(rs) and then the initial photoacoustic pressure p0(rs) [Eq. (53)]. However, direct discretization of Eq. (55) results in a computational complexity of O(n2) for each p0k(rs) and O(n3) for the whole reconstruction. To achieve fast reconstruction, substituting Eq. (55) into Eq. (53) yields[33]p0(rs)=12πR2P0(Λ)eirs·ΛdΛ,Λ=(ωcosφ,ωsinφ),where P0(Λ) is given by P0(Λ)={2πk=(i)kp0k(ω)ωH|k|(1)(ωR)eikφ,Λ0,02Pk=0(ω)πωH0(1)(ωR)RJ1(ωR)dω,Λ=0,where R is the radius of the circular detection geometry. Equation (56) indicates that the initial photoacoustic pressure p0(rs) can be obtained by calculating the 2D inverse Fourier transform of P0(Λ), which has a computational complexity of O(n2logn), lower than that of the direct-discretization-based reconstruction [O(n3)].

    For cylindrical detection geometries with linear integrating detectors, image reconstruction can be readily realized by combining the fast image reconstruction procedure for 2D circular geometries and the 3D inverse Fourier transform[33]. This procedure can yield fast reconstruction with a computational complexity of O(n3logn). For spherical detection geometries, the derivation of the image reconstruction procedure is similar to those for circular geometries [Eqs. (52)–(57)]. A prominent difference is that p0(rs) and P(rd,ω) are expanded into spherical harmonics. In addition, the fast spherical harmonics (FSH) transform is adopted to achieve fast reconstruction[163], which results in a reconstruction complexity of O(n3log2n)[33].

    Wang and Anastasio in 2012 demonstrated that a mapping relationship exists between the spatial frequency components of initial photoacoustic pressure p0(rs) and the temporal frequency components of measured photoacoustic signals p(rd,t)[164]. They thus proposed a Fourier-transform-based image reconstruction algorithm whose computational complexity is O(n2logn) for 2D image reconstruction in circular geometries and O(n3logn) for 3D image reconstruction in spherical geometries[165]. The reconstruction formula does not involve series expansion of special functions or multi-dimensional interpolation operations in Fourier space, which are commonly used in previous work.

    3.3.3 Series expansion for general detection geometries

    Kunyansky in 2007 presented a generalized SE method for image reconstruction in arbitrarily closed detection geometry provided that the eigenfunctions of the Dirichlet Laplacian are explicitly known[32].

    Assuming that λk2 and uk(r) are the eigenvalues and normalized eigenfunctions of the Dirichlet Laplacian in the interior Ω of a closed detection surface S, we have[32]2uk(r)+λk2uk(r)=0,rΩ,ΩRnuk(r)=0,rS,uk22Ω|uk(r)|2dr=1,where n denotes the spatial dimension. Since the eigenfunctions uk(r) are orthogonal, the initial photoacoustic pressure p0(r) can be formulated in series form as p0(r)=k=0αkuk(r),where the coefficients αk can be obtained from the measured projection data p(rd,t), and uk(r) is the known eigenfunctions of the Dirichlet Laplacian determined by detection geometries. uk(r) is the solution of the Dirichlet problem for the Helmholtz equation with zero boundary conditions and has the following Helmholtz representation[32]: uk(r)=SΦλk(|rrd|)nduk(rd)dS,rΩ,where Φλk is a free-space rotationally invariant Green’s function[166], nd is the unit normal vector of the detector surface pointing to the ROI, and rd is the position of the detector. The coefficients αk can be calculated by[32]αk(r)=Ωuk(r)p0(r)dr=SI(rd,λk)nduk(rd)dS,where I(rd,λk) is given by I(rd,λk)=0diam(Ω)/v0g(rd,t)Φλk(t)v0dt.

    Here diam(Ω) denotes the diameter of Ω, and g(rd,t) is the integral over a spherical shell [Eq. (27)] centered at the detector position rd, which can be calculated from the measured projection data p(rd,t). With the calculated coefficients αk and eigenfunctions uk(r), the initial photoacoustic pressure p0(r) can be reconstructed [Eq. (59)].

    If the eigenfunctions of the Dirichlet Laplacian of a detection geometry such as a sphere, a half-sphere, a finite cylinder, or a cube are explicitly known, the eigenfunction-based SE method can provide exact reconstruction for any closed detection geometry. In particular, when the detection geometry is a cube, the reconstruction procedure can be implemented using 3D FFT, resulting in efficient computation with a complexity of O(n3logn)[32]. Furthermore, unlike the FBP algorithm, this technique can provide theoretically exact reconstruction within the region enclosed by the detection geometry even if part of a photoacoustic source is outside the region.

    The image reconstruction procedures discussed for planar, circular, cylindrical, spherical, and arbitrarily closed geometries are based on the assumption that the acoustic medium is homogeneous[32]. For the image reconstruction problem in heterogeneous media, Agranovsky and Kuchment proposed an analytic reconstruction formula that works for any closed detection geometry and variable SOS satisfying a non-trapping condition[167]. This algorithm can be regarded as a generalization of the eigenfunction-based SE method for arbitrary closed detection geometries [Eqs. (58)–(62)][167].

    Table 5 lists representative work on SE-based fast image reconstruction algorithms and Table 6 summarizes representative work on SE-based image reconstruction in PACT.

    • Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT

      Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT

      AuthorYearDetection GeometryDimensionComplexityaSource
      Kunyansky2012Circular2DO(n2logn)[33]
      Spherical3DO(n3log2n)
      Cylindrical3DO(n3logn)
      Wang et al.2012Circular2DO(n2logn)[165]
      Spherical3DO(n3logn)
      Kunyansky2007Cubic3DO(n3logn)[32]
      Xu et al.2002Planar3DO(n3logn)[160]
      Köstli et al.2001Planar3DO(n3logn)[31]
    • Table 6. Representative Work on SE-Based Image Reconstruction in PACT

      Table 6. Representative Work on SE-Based Image Reconstruction in PACT

      AuthorYearDetection GeometryDetectorSOSSource
      Kunyansky2012Circular, spherical, and cylindricalPoint-like detectors for circles and spheres; linear integrating detectors for cylinderConstant[33]
      Wang et al.2012Circular and sphericalPoint-like detectorsConstant[165]
      Zangerl et al.2009CylindricalCircular integrating detectorsConstant[168,169]
      Haltmeier et al.2007CylindricalLinear integrating detectorsConstant[161]
      Kunyansky2007Arbitrary closed geometryPoint-like detectorsConstant[32]
      Agranovsky & Kuchment2007Arbitrary closed geometryPoint-like detectorsConstant or variable[167]
      Xu & Wang2002SphericalPoint-like detectorsConstant[135]
      Xu et al.2002PlanarPoint-like detectorsConstant[158]
      Xu et al.2002CylindricalPoint-like detectorsConstant[160]
      Köstli et al.2001PlanarPoint-like detectorsConstant[31]

    3.4 Time Reversal

    Time reversal (TR) is a type of algorithm that involves recovering initial photoacoustic pressure via numerically running a forward acoustic propagation model backward and re-transmitting the photoacoustic signals measured by each detector into the image domain in a temporally reversed order. TR algorithms can couple the acoustic properties of media such as SOS, density, dispersion, and absorption into the acoustic propagation model and can be used for image reconstruction in arbitrary closed detection geometry[170]. They are thus regarded as the ‘least restrictive’ image reconstruction algorithms in PACT[170,171].

    3.4.1 Standard time reversal

    In 2004, Xu and Wang proposed a TR algorithm in PACT and derived an analytical expression for image reconstruction based on the Green’s function with a homogeneous Dirichlet boundary condition[34]. Under the condition of the far-field approximation [see Sec. 3.2.1], the derived analytical expression in[34] is reduced to the reconstruction formula in FBP [Eq. (47)]. Subsequent research on TR-based image reconstruction primarily focused on coupling the acoustic properties of heterogeneous media into the TR model to improve reconstruction performance[35,170172]. For example, Hristova et al. studied TR-based image reconstruction in heterogeneous media, in even dimensions, and with partially closed detection surfaces[171]. Treeby et al. considered the effect of acoustic absorption of media and proposed a TR algorithm for absorbing acoustic media[170].

    The TR algorithms are based on the Huygens principle[171]. When the SOS of a medium is constant and the spatial dimension of an acoustic field is odd, the Huygens principle indicates that the acoustic wave field vanishes after a moment T[171] and TR algorithms enable exact image reconstruction for a sufficiently large T. When the SOS is variable or the spatial dimension is even, the Huygens principle no longer holds, and TR algorithms can only provide approximate reconstructions[171].

    In a lossy medium, the photoacoustic wave equation can be formulated as an initial value problem by three coupled acoustic equations and corresponding initial conditions[170]. The coupled acoustic equations, including the linearized equation of motion, the linearized equation of continuity, and the adiabatic equation of state, are given as[170]tu(r,t)=1ρ0(r)p(r,t),tρ(r,t)=ρ0(r)·u(r,t),p(r,t)=v0(r)2{1+τ(r)t(2)y/21+η(r)(2)(y+1)/21}ρ(r,t),which are subject to the following initial conditions: p(r,t)|t=0=p0(r),u(r,t)|t=0=0.Here p(r,t) is the acoustic pressure at time t and position r inside the imaging region, u(r,t) is the acoustic particle velocity, ρ(r,t) is the acoustic density, ρ0(r) is the ambient density, and τ(r) and η(r) are the absorption and dispersion proportionality coefficients given by τ(r)=2α0v0(r)y1,η(r)=2α0v0(r)ytan(πy/2),where α0 is the absorption coefficient (dBMHzycm1) and y is a real constant representing the power law exponent.

    TR-based image reconstruction is achieved by running the same photoacoustic forward model in temporally reversed order and thus can be formulated by the same set of equations [Eqs. (63)–(67)] except that the initial conditions in Eq. (66) are replaced with p(r,t)|t=0=0,u(r,t)|t=0=0,p(rd,t)=p(rd,Tt).

    Figure 16 illustrates the basic principle of TR-based image reconstruction.

    Illustration of TR-based image reconstruction. (a) Cross section of a spherical absorber and a spherical detector array. (b) Photoacoustic signal measured by a detector. (c) Temporal reversion of the measured signal in (b). (d)–(i) Acoustic wave fields in the detection region at different moments during backward propagation of the time-reversed signal.

    Figure 16.Illustration of TR-based image reconstruction. (a) Cross section of a spherical absorber and a spherical detector array. (b) Photoacoustic signal measured by a detector. (c) Temporal reversion of the measured signal in (b). (d)–(i) Acoustic wave fields in the detection region at different moments during backward propagation of the time-reversed signal.

    Generally, there is no explicit analytical solution to the partial differential equations in Eqs. (63)–(65). Numerical methods are required to solve these equations. Finite element and finite difference are two commonly used methods for computing numerical solutions to partial differential equations due to their flexibility, accuracy, and rigorous mathematical foundations. However, these approaches require many mesh points per wavelength and small time steps to achieve accurate calculations, which is computationally expensive, especially for high-frequency and large-scale imaging models.

    To solve this problem, Cox et al. developed a k-space pseudospectral approach that can achieve accurate computations using fewer mesh points and more time steps[173]. The k-space-pseudospectral-method-based TR algorithm has been implemented in the k-Wave toolbox[83] and is based on the discrete form of the coupled acoustic equations [Eqs. (63)–(65)][170], which can be written as ξp(r,t)=F1{ikξκkF{p(r,t)}},uξ(r,t+Δt)=uξ(r,t)Δtρ0(r)ξp(r,t),ξuξ(r,t+Δt)=F1{ikξκkF{uξ(r,t+Δt)}},ρξ(r,t+Δt)=ρξ(r,t)Δtρ0(r)ξuξ(r,t+Δt),p(r,t+Δt)=v0(r)2[ξρξ(r,t+Δt)+abs+disp],where i is the imaginary unit, ξ denotes the Cartesian coordinates [ξ=x represents 1D space, ξ=(x,y) represents 2D space, ξ=(x,y,z) represents 3D space], kξ is the spatial wavenumber component, Δt is the time step, κk=sinc[v0(r)kΔt/2] is a k-space adjustment to the spatial derivative[170], and F and F1 represent the forward and inverse Fourier transforms, respectively. Equations (69) and (71) are equations for gradient calculation and Eqs. (70) and (72) are update equations. Equation (73) is an equation of state, including the absorption and dispersion of a medium, which is given by abs=τ(r)F1{(kξκk)y2F{ρ0(r)ξξuξ(r,t+Δt)}},disp=η(r)F1{(kξκk)y1F{ξρξ(r,t+Δt)}}.

    The above equations are calculated iteratively and the entire acoustic field is updated for each time step.

    One of the most prominent features of the TR algorithm is that it is well-suited for image reconstruction in acoustically heterogeneous media. To demonstrate this, a simulation is presented. Figure 17(a) shows an acoustically heterogeneous phantom consisting of a uniform background, several blood vessels, and a bone mimicking the cross section of a human finger. The background and blood vessels have a mass density of 1000  kg/m3 and an SOS of 1500 m/s. In contrast, the acoustically heterogeneous region (i.e., the bone) has a density of 1850  kg/m3 and an SOS of 3800 m/s. A 512-element full-ring detector array with a diameter of 50 mm enclosing the phantom is used for imaging. Figure 17(b) shows the image reconstructed by the TR algorithm under the assumption that the media are homogeneous. A typical SOS of 1505 m/s and a density of 1050  kg/m3 are used. In this case, extensive image artifacts are present in the reconstructed image due to the inaccurate setting of the SOS and density for the TR model. As a comparison, Fig. 17(c) presents the image reconstructed by the TR algorithm coupling the true SOS and density of the heterogeneous media. The results show that by incorporating the acoustic properties of media, the TR algorithms can achieve improved image reconstruction in PACT.

    TR-based image reconstruction in acoustically heterogeneous media. (a) A numerical phantom consisting of multiple blood vessels and a bone mimicking the cross section of a human finger. A 512-element full-ring detector array (dashed circle) with a diameter of 50 mm enclosing the phantom is used for imaging. (b) Image reconstructed by TR using a constant SOS (1505 m/s) and a constant density (1050 kg/m3). (c) Image reconstructed by TR coupling the true SOS and density of the media. Please refer to the text for the simulation settings.

    Figure 17.TR-based image reconstruction in acoustically heterogeneous media. (a) A numerical phantom consisting of multiple blood vessels and a bone mimicking the cross section of a human finger. A 512-element full-ring detector array (dashed circle) with a diameter of 50 mm enclosing the phantom is used for imaging. (b) Image reconstructed by TR using a constant SOS (1505 m/s) and a constant density (1050  kg/m3). (c) Image reconstructed by TR coupling the true SOS and density of the media. Please refer to the text for the simulation settings.

    Another important feature of TR is that it can address the image reconstruction problem in arbitrary closed detection geometry. Figures 18(a)18(c) show an imaging example, where a 2D multi-disk phantom is enclosed by three different closed detection geometries, including a square, an octagon, and a circle. The octagon and the circle in Figs. 18(b) and 18(c) can be circumscribed by the square in Fig. 18(a). Figures 18(d)18(f) show the corresponding images reconstructed by the TR algorithm, which shows that TR can produce good results for all three detection geometries.

    TR-based image reconstruction under different detection geometries. (a)–(c) Schematic diagrams showing a phantom and three different detection geometries. The square detection geometry has a side length of 50 mm, the octagonal geometry has a side length of 20.7 mm, and the circular geometry has a diameter of 50 mm. (d)–(f) Corresponding images reconstructed by TR.

    Figure 18.TR-based image reconstruction under different detection geometries. (a)–(c) Schematic diagrams showing a phantom and three different detection geometries. The square detection geometry has a side length of 50 mm, the octagonal geometry has a side length of 20.7 mm, and the circular geometry has a diameter of 50 mm. (d)–(f) Corresponding images reconstructed by TR.

    3.4.2 Modified time reversal based on the Neumann series

    The standard TR algorithms discussed above are unable to find analytical solutions for the wave equations when the SOS is variable or the spatial dimension of an acoustic field is even. To address this problem, a Neumann-series-based TR algorithm was proposed by Stefanov and Uhlmann[174] and further developed by Qian et al.[175]. The basic idea of the Neumann series-based TR algorithm is to first modify the measurement data at t=T using a harmonic extension operator, perform TR reconstruction, and then repeat the TR process[175]. The Neumann series-based TR algorithm can yield exact reconstruction if the detected pressure p(rd,t) is known on the whole boundary Ω0 and the measurement time T is greater than a stability threshold[175].

    In the Neumann-series-based TR algorithm, the initial conditions [Eq. (68)] for the coupled wave equations are modified to[63,175]p(r,t)|t=0=PΩp(r,T),u(r,t)|t=0=0,p(rd,t)=p(rd,Tt),where PΩ denotes the Poisson operator of the harmonic extension defined as PΩh(r,T)=ϕ(r), and ϕ(r) is the solution of the elliptic boundary value problem 2ϕ(r)=0,ϕ(r)|rΩ0=h(r,T)|rΩ0,where Ω is the domain defined by a detection surface and Ω0 is the boundary of the domain.

    By introducing a forward acoustic propagation operator A and a modified TR reconstruction operator AmodifyTR, the photoacoustic forward problem and the modified TR reconstruction can be expressed as p(rd,t)=Ap0(r),p0(r)=AmodifyTRp(rd,t).

    The Neumann-series-based TR reconstruction can be expressed as[175]p0(r)=j=0(IAmodifyTRA)jAmodifyTRp(rd,t),where I is an identity operator. Theoretically, Eq. (80) can provide an exact reconstruction if the variable j varies from zero to +. However, in practice, j is always finite, and exact reconstruction is impossible. If j is cut off at a particular value m, the reconstruction formula in Eq. (80) can be written as p0m(r)=j=0m(IAmodifyTRA)jAmodifyTRp(rd,t).

    Equation (81) can be further formulated in the form of iterative TR reconstruction[176], i.e., p0m+1(r)=p0m(r)+AmodifyTR[p(rd,t)Ap0m(r)].

    The estimated initial photoacoustic pressure p0m(r) usually contains pixels with negative intensity due to non-ideal imaging conditions, such as finite detector bandwidth and limited view angle. To improve reconstruction results, non-negative constraints can be imposed on the solutions during iteration. Moreover, the Poisson operator of harmonic extension can be neglected in some cases, e.g., when the spatial dimension of an acoustic field is even and the medium is homogeneous or when the time T is large enough to guarantee that acoustic waves inside a detection region decay sufficiently. Figure 19 shows an example illustrating Neumann-series-based TR reconstruction in limited-view imaging.

    Iterative TR-based image reconstruction for limited-view imaging. (a) Schematic diagram of a phantom and a limited-view detector array. The detector array has 455 detectors uniformly distributed over a 50-mm-diameter partial circle with a view angle of 3/4π. (b)–(d) Images reconstructed by the Neumann-series-based TR algorithm after 3, 5, and 10 iterations.

    Figure 19.Iterative TR-based image reconstruction for limited-view imaging. (a) Schematic diagram of a phantom and a limited-view detector array. The detector array has 455 detectors uniformly distributed over a 50-mm-diameter partial circle with a view angle of 3/4π. (b)–(d) Images reconstructed by the Neumann-series-based TR algorithm after 3, 5, and 10 iterations.

    Table 7 summarizes representative work on TR-based image reconstruction in PACT.

    • Table 7. Representative Work on TR-Based Image Reconstruction in PACT

      Table 7. Representative Work on TR-Based Image Reconstruction in PACT

      AuthorYearSolutionMediaSource
      Qian et al.2011NumericalHeterogeneous (exact solution)[175]
      Treeby et al.2010NumericalHeterogeneous, absorptive, and dispersive[170]
      Stefanov & Uhlmann2009NumericalHeterogeneous (exact solution)[174]
      Hristova et al.2008NumericalHeterogeneous[171]
      Burgholzer et al.2007NumericalHeterogeneous[35]
      Xu & Wang2004AnalyticalHeterogeneous[34]

    3.5 Iterative Reconstruction

    The last class of conventional image reconstruction algorithms in PACT is iterative reconstruction (IR), which achieves image reconstruction through iteration. IR algorithms construct a set of linear equations based on a discrete photoacoustic imaging model and iteratively seek numerical solutions by minimizing the error between measured projection data and their estimates (see Fig. 20). Compared with the aforementioned DAS, FBP, SE, and TR algorithms, IR algorithms can yield improved photoacoustic image quality under non-ideal imaging scenarios, such as spare-view and limited-view imaging. They can also incorporate the physical models of an imaging system, such as transducer responses and acoustic heterogeneity, to achieve improved imaging. The downside of IR algorithms is that they are slow and computationally expensive due to the numerical optimization involved.

    Principle of the IR-based image reconstruction model.

    Figure 20.Principle of the IR-based image reconstruction model.

    3.5.1 Discrete forward imaging model

    The basic idea of IR algorithms is to establish a mapping relationship between the photoacoustic image to be reconstructed and projection data using a set of linear equations and then solve them iteratively. To determine the relationship, a discrete photoacoustic imaging model was established and is illustrated in Fig. 21. In the discrete model, the 2D photoacoustic image is represented by n×n evenly distributed grid points, and the value of the jth grid point is xj. The photoacoustic signal p(rd,t) measured by a detector is discretely sampled, and the sampling length is K. The kth sampling point of the photoacoustic signal, denoted as pk, relates to the spherical shell integral of the photoacoustic image over the kth detection shell, whose radius equals the TOF of the photoacoustic signal (kΔt, where Δt is the temporal sampling interval) multiplied by the SOS v0. The goal of IR-based image reconstruction is to solve for xj from the projection data pk.

    Discrete photoacoustic imaging model in IR. The photoacoustic image is discretely represented by n×n evenly distributed grid points. The projection data pk measured by a detector correspond to the spherical shell integral of the photoacoustic image over the kth detection shell.

    Figure 21.Discrete photoacoustic imaging model in IR. The photoacoustic image is discretely represented by n×n evenly distributed grid points. The projection data pk measured by a detector correspond to the spherical shell integral of the photoacoustic image over the kth detection shell.

    Based on the discrete imaging model, the relationship between the unknowns xj and the projection data pk can be described by the following set of linear equations: {a1,1x1+a1,2x2++a1,NxN=p1,a2,1x1+a2,2x2++a2,NxN=p2,aK,1x1+aK,2x2++aK,NxN=pK,which can be written as j=1Nakjxj=pk,k=1,2,,K,where akj is a weighting factor representing the contribution of the jth grid point to the kth detection shell, N=n×n is the total number of grid points, and K is the total number of sampling points of a detector. If we consider the projection data measured by all M detectors and denote A=[a1,1a2,1a1,2a2,2a1,Na2,NaK,1aK+1,1aK,2aK+1,2aK,NaK+1,NaMK,1aMK,2aMK,N],x=[x1x2xjxN],p=[p1p2pKpK+1pMK],Eq. (83) can be written in matrix form as Ax=p,where A is the system matrix that transforms the initial photoacoustic pressure x to the measured projection data p. To solve Eq. (86), the system matrix A should be constructed first.

    3.5.2 System matrix construction

    A. System matrix construction for acoustically homogeneous media

    If the imaging medium is acoustically homogeneous, the photoacoustic wave equations have an explicit analytical solution, as presented in Eq. (18). The solution describes how the measured photoacoustic signal p(rd,t) relates to a photoacoustic source p0(rs). Discretizing the solution [Eq. (18)], the system matrix A can be constructed as [p](m1)K+k=[14πv02tp0(rs)|rsrdm|δ(t|rsrdm|v0)drs]t=kΔt,where rdm denotes the position of the mth detector, Δt is the temporal sampling interval, and [p](m1)K+k is the photoacoustic signal measured by the mth detector at time kΔt. To calculate Eq. (87), p0(rs) can be expanded using a set of basis functions as[38]p0(rs)j=1Nx(rsj)ψ(rsj),where rsj represents the position of the jth grid point in the discrete image x, and ψ(rsj) is the basis function at the jth grid point. Substituting Eq. (88) into Eq. (87), a discrete imaging model can be obtained as [p](m1)K+k={j=1N[14πv02tψ(rsj)|rsrdm|δ(t|rsrdm|v0)drs]x(rsj)}t=kΔt.

    Comparing Eqs. (84) and (89), the elements of the system matrix A can be obtained as [A](m1)K+k,j=[14πv02tψ(rsj)|rsrdm|δ(t|rsrdm|v0)drs]t=kΔt,where [A](m1)K+k,j denotes the element in the [(m-1)K+k]th row and jth column of the system matrix A.

    To construct a system matrix with sufficient accuracy, it is necessary to choose a suitable expansion function ψ. Several expansion functions are available for this purpose. Among them, a spherical-voxel function[36,177179], Kaiser–Bessel functions[38], and linear interpolation functions[37,40,42,178,180] are the most commonly used. Figure 22 shows a schematic of the three expansion-function-based discrete grid models.

    Illustration of the discrete grid models based on different expansion functions. (a) Discrete grid model based on a spherical voxel. (b) Discrete grid model based on the Kaiser–Bessel function. (c) Discrete grid model based on the bilinear interpolation. The red dot in (c) represents the point to be interpolated.

    Figure 22.Illustration of the discrete grid models based on different expansion functions. (a) Discrete grid model based on a spherical voxel. (b) Discrete grid model based on the Kaiser–Bessel function. (c) Discrete grid model based on the bilinear interpolation. The red dot in (c) represents the point to be interpolated.

    The expansion function ψ can help interpret the meaning of the system matrix A. Suppose that a photoacoustic image includes only a source defined by an expansion function ψ at grid point j. The element [A](m1)K+k,j of the system matrix A is the response of the mth detector at time kΔt, as illustrated in Fig. 23. This implies that to construct the system matrix A, we can simply compute the photoacoustic signals measured by each detector and arrange the signals in a way according to Fig. 23. Since the system matrix A is only associated with the discrete photoacoustic imaging model (Fig. 21) and the expansion function ψ [Eq. (90)], it can be used for different experiments once constructed for a particular imaging system.

    Schematic diagram illustrating the meaning of the elements in a system matrix. (a) Spherical-voxel-function-based discrete imaging model. The photoacoustic signals generated from the spherical voxel at the jth grid point are recorded by detectors. (b) Elements of the system matrix. The jth column of the system matrix corresponds to the photoacoustic signal of the jth spherical voxel in (a).

    Figure 23.Schematic diagram illustrating the meaning of the elements in a system matrix. (a) Spherical-voxel-function-based discrete imaging model. The photoacoustic signals generated from the spherical voxel at the jth grid point are recorded by detectors. (b) Elements of the system matrix. The jth column of the system matrix corresponds to the photoacoustic signal of the jth spherical voxel in (a).

    A1. Spherical-voxel-based system matrix

    A spherical voxel function is defined as[181]ψ(rj)={1,for  |rrj|ε/2,0,for  |rrj|>ε/2,where ε is the spacing between two grid points, and rj=(xj,yj,zj)T denotes the coordinate of the jth grid point. In a spherical-voxel-based discrete imaging model, each grid point in the image to be reconstructed is regarded as a uniform spherical source[36,177179]. It is possible to directly construct the system matrix A based on the definition of the spherical voxel function. However, there may be problems. This is because a spherical voxel is typically small in radius. The photoacoustic signals are short in time and require a high sampling rate, which leads to an increase in computational complexity or aliasing artifacts[181]. To address this problem, a spherical-voxel-based system matrix can be constructed in the frequency domain[181].

    Define P and A˜ as the discrete Fourier transforms of the measured projection data p and the system matrix A. The matrix form of the imaging model in Eq. (86) can be rewritten as[38,42]A˜x=P,and the elements of the system matrix are given by[38,42][A˜](m1)K+k,j=P0(f)h˜SIRm(rsj,f)|f=kΔf,where f denotes the frequency, Δf is the frequency sampling interval, h˜SIRm(rsj,f) is the Fourier transform of the SIR of the mth detector and can be written as[38,42]h˜SIRm(rsj,f)=12π|rsjrdm|exp(i2πf|rsjrdm|v0),and P0(f) is the Fourier transform of the signal generated from a photoacoustic source defined by a spherical voxel and is given as[38,42]P0(f)=iv0f[ε2v0cos(πfεv0)12πfsin(πfεv0)].

    Combining Eqs. (93)–(95), the spherical-voxel-based system matrix A can be constructed in the frequency domain. This frequency domain approach can solve the sampling problem of high-frequency photoacoustic signals and make it easier to incorporate detector responses (see Sec. 3.5.3).

    A2. Kaiser–Bessel-function-based system matrix

    The Kaiser–Bessel (KB) functions are radially symmetric functions defined as[182,183]b(r)={(1r2/a2)nIn(γ1r2/a2)In(γ),if  0ra,0,otherwise,where In represents the modified Bessel function of the first kind of order n[184], a is the radius of support, and γ is a shape parameter governing the width of the radial profile. When n=0, a=ε/2, and γ=0, the KB function reduces to the spherical voxel function [Eq. (91)]. The KB-function-based expansion function can be expressed as ψ(rj)=b(|rrj|),where rj=(xj,yj,zj)T denotes the coordinate of the jth grid point.

    The elements of the KB-function-based system matrix have an analytical solution and can be calculated in the frequency domain[38]. The derivation is similar to that of the spherical-voxel-based system matrix described in Eqs. (92)–(94) except that Eq. (95) should be replaced with P0(f)=i4π2fa3v02In(γ)jn+1(4π2a2f2/v02γ2)(4π2a2f2/v02γ2)(n+1)/2.Here f denotes the frequency, P0(f) is the Fourier transform of the signal generated from the photoacoustic source defined by a KB function, and jn+1 is the spherical Bessel function of the first kind of order n+1[184].

    In the spherical-voxel-based imaging model, the expansion function ψ is a sphere with uniform intensity, which is not differentiable at the boundary. Therefore, generated photoacoustic signals have an infinite bandwidth and may induce numerical instabilities when calculating the system matrix using Eq. (90). In the KB-function-based imaging model, the expansion function ψ is also a sphere but with a smoothly varying intensity. It is differentiable at any position and thus can avoid numerical instabilities encountered by the spherical-voxel-based imaging model.

    The shape of the KB function can be adjusted as needed. The parameter n affects the differentiability of the KB function and is typically set to a value greater than 2 (n2) to guarantee the continuity of derivatives at the boundary. The parameter a determines the effective size of a voxel and can be selected according to the desired spatial resolution. The parameter γ describes the smoothness of the KB function and impacts the width of the radial profile. It typically has a value of 10.4 for image reconstruction in PACT but may vary depending on application scenarios. Figure 24 illustrates the profiles of a family of KB functions for a given set of parameters. More details about the KB functions can be found in[185].

    KB functions with different parameters. (a) Profiles of the KB functions with four groups of parameters. (b) 3D visualization of the KB function with a=1, n=2, and γ=10.4.

    Figure 24.KB functions with different parameters. (a) Profiles of the KB functions with four groups of parameters. (b) 3D visualization of the KB function with a=1, n=2, and γ=10.4.

    A3. Interpolation-function-based system matrix

    In addition to the spherical-voxel- and KB-function-based approaches, the system matrix can also be constructed using linear interpolation functions. Several different interpolation functions can be used as the expansion function to describe the mapping relation between a continuous image and its discrete counterpart. Generally, a high-order interpolation function helps produce high-accuracy results, which, however, involves more calculations. Considering the trade-off between accuracy and efficiency, bilinear and trilinear interpolation functions are commonly used for 2D[43] and 3D[38,40,42] IR models, respectively. Specifically, when the trilinear interpolation method is employed in 3D space, the expansion function can be expressed as[42]ψ(rj)={(1|xxj|ε)(1|yyj|ε)(1|zzj|ε),for  |xxj|,|yyj|,|zzj|ε,0,otherwise,where rj=(xj,yj,zj)T denotes the coordinate of the jth grid point, and ε is the spacing between two grid points. The equation implies that the non-zero values of the expansion function ψ exist only in the ε-neighborhood of position rj. The image value at any position can be interpolated from its four neighbors in 2D space or eight neighbors in 3D space.

    The interpolation-function-based system matrix A can be constructed in two steps as[42]p=AxDGx,where the matrix G represents the spherical Radon transform in 3D [Eq. (27)], and the matrix D represents the differential operation in Eq. (18). The matrix G can be obtained by accumulating all image grid points intersecting with an integrating spherical shell and satisfies the following relationship[42]: [Gx](m1)K+k=ε2j=1N[x]jp=1Npq=1Nqψ(rk,p,qj)[g](m1)K+k,where [g](m1)K+kg(rdm,t=kΔt) is the approximation of the spherical Radon transform in Eq. (27), and Np and Nq are the numbers of angular divisions over the polar and azimuth directions, respectively, on a local spherical coordinate system centered at rdm[42]. The differential matrix D can be obtained using a difference operation and satisfies the following relationship[42]: [Dg](m1)K+k=18πv02Δt2[[g](m1)K+k+1k+1[g](m1)K+k1k1][p](m1)K+k.

    The preceding procedure implies that the system matrix A can be constructed via the two matrices G and D, while the matrix elements in A do not need to be explicitly calculated.

    In addition to the implicit method discussed above, the interpolation-function-based system matrix can also be constructed by explicitly calculating its elements via analytical methods[37,40,42,43]. The analytical methods typically have better computational stability due to more accurate calculations of derivatives.

    B. System matrix construction for acoustically heterogeneous media

    The system matrix discussed above is primarily constructed based on the analytical solution of the photoacoustic wave equations [Eq. (18)], where the acoustic medium is assumed to be homogeneous. When the acoustic media are heterogeneous, the photoacoustic discrete imaging model can still be described by the coupled photoacoustic wave equations [Eqs. (63)–(67)], which, however, have no analytical solutions. Based on the k-space pseudospectral method (see Sec. 3.4.1), Huang et al. proposed a method for constructing the system matrix A in acoustically heterogeneous media[41], which is briefly reviewed below.

    Define a vector containing all photoacoustic field variables at the time step kΔt as[41]vk=(uk1,uk2,uk3,ρk1,ρk2,ρk3,pk)T,where vk is a 7N×1 vector, N is the total number of grid points in an image, uki and ρki (i=1, 2, 3) denote the particle velocity and acoustic density in the ith direction on a 3D Cartesian grid at the kth time step, pk is the acoustic pressure at the grid points, and the superscript T denotes the matrix transpose. The update of the field variables from t=0 to t=(K1)Δt is described by [v0,v1,,vK1]T=TK1TK2T1[v0,07N×1,,07N×1]T,where T is a 7NK×7NK matrix, and K is the total number of time steps. The vector [v0,07N×1,,07N×1] represents the values of the field variables at t=0 and can be calculated from the initial photoacoustic pressure p0 as [v0,07N×1,,07N×1]T=T0p0,where T0 is a 7NK×N matrix that maps the initial pressure p0 to the initial value of the field variables at time t=0[41]. Since detectors generally do not coincide with image grid points, the measured photoacoustic projection data p can be related to the field quantities via interpolation as[41]p=S[v0,v1,,vK1]T,where S denotes the interpolation operation, such as the Delaunay triangulation-based interpolation. Combining Eqs. (104)–(106), we have p=STK1T1T0p0.

    By comparing Eq. (107) with Eq. (86), the system matrix can be obtained as A=STK1T1T0.

    As mentioned above, the jth column of system matrix A corresponds to the response of a detector to the source defined by an expansion function ψ at grid point j. Therefore, the system matrix A in heterogeneous media can be constructed by computing the response of a detector when the photoacoustic source defined by an expansion function ψ traverses all image grid points, which can be calculated using the k-Wave toolbox[186].

    In a homogeneous medium, the construction of the system matrix A can be simpler. The response of a detector to an expansion-function-defined source at the first (reference) image grid point is computed and serves as the first column of A. The remaining columns of A can be obtained by time-shifting the elements in the first column while taking into account the signal attenuation according to the relative position between the current and the reference grid points. This can greatly improve the construction speed of the system matrix A.

    3.5.3 Transducer response modeling

    The IR model discussed above is established based on the assumption that an ultrasound detector is point-like in shape and has an infinite bandwidth. However, a real ultrasound detector always has a finite aperture size and a limited bandwidth, which makes the IR model less accurate. To address this problem, it is necessary to incorporate the SIR and EIR of an ultrasound detector (see Sec. 2.3) to make the system matrix A as realistic as possible.

    A. EIR modeling

    When the system matrix is constructed in the time domain, the discrete photoacoustic imaging model [Eq. (86)] incorporating the EIR of a detector can be written as[37]F1E˜FAx=p,where the matrix E˜ denotes the Fourier transform of the EIR of a detector, and F and F1 represent the forward and inverse Fourier transforms, respectively. In contrast, when the system matrix is constructed in the frequency domain using the spherical voxel or the KB function, the EIR of a detector can be incorporated into a system matrix via[39][A˜](m1)K+k=P0(f)h˜SIRm(rsj,f)h˜EIR(f)|f=kΔf.Here, P0 is the Fourier transform of the acoustic pressure generated from the photoacoustic source defined by the spherical voxel or the KB function, f is the frequency, Δf is the frequency sampling interval, k is the index of sampling points, and h˜SIRm and h˜EIR are the Fourier transforms of the SIR and EIR of a detector, which can be modeled or measured in practice.

    B. SIR modeling

    B1. Far-field SIR

    Several SIR models have been proposed and used for image reconstruction in PACT[187,188]. When a detector has a flat and rectangular surface with an area of a×b [see Fig. 25(a)] and the condition of far-field approximation[189] is met, the temporal Fourier transform of the SIR of a detector can be expressed as[187]h˜SIRm(rsj,f)=abexp(i2πf|rsjrdm|v0)2π|rsjrdm|sinc(πfaxdetjmv0|rsjrdm|)sinc(πfbydetjmv0|rsjrdm|),where xdetjm and ydetjm represent the coordinates of the jth grid in a local coordinate system at the mth detector. Other variables have the same meanings as those defined in Eq. (90). With Eq. (111), the system matrix A can be constructed according to Eq. (93).

    Detector models for building SIR. (a) Detector model under the condition of far-field approximation. (b) Detector model based on patches (n=2). (c) Detector model based on discrete detection elements.

    Figure 25.Detector models for building SIR. (a) Detector model under the condition of far-field approximation. (b) Detector model based on patches (n=2). (c) Detector model based on discrete detection elements.

    B2. Patch-based SIR

    When the far-field approximation condition does not hold, the far-field SIR model may lead to inaccurate reconstruction results. To address this problem, the surface of a detector can be divided into small patches [Fig. 25(b)]. For each small patch, the aforementioned far-field assumption is approximately met. The SIR of each small patch can be characterized by Eq. (111), and the SIR of the whole detector is obtained by summing the SIRs of all small patches. Suppose that a detector has an area of a×b and is divided into n×n patches. Denoting a and b as the length and width of each small patch (i.e., a=a/n, b=b/n), we have h˜SIRm(rsj,f)1n2l=1n2h˜SIRm,l(rsj,f).

    Here h˜SIRm,l(rsj,f) is the SIR of the lth patch, which can be calculated using Eq. (111).

    B3. Discrete-detection-element-based SIR

    The far-field and patch-based SIR models are easy to incorporate into the system matrix based on the spherical voxel function or the KB function but are difficult to incorporate into the system matrix based on linear interpolation or the k-space pseudospectral method. Huang et al.[41] and Ding et al.[40] solved this problem by dividing the entire detector surface into a set of detection elements with equal areas, as shown in Fig. 25(c). In this case, the acoustic pressure p(rd,t) recorded by a detector can be approximately written as[40]p(rd,t)l=1Lp(rdl,t)σ,where L is the total number of detection elements, σ is the area of a detection element (dimensionless), and rdl is the position of the lth detection element. p(rdl,t) is the acoustic pressure recorded at position rdl and time t and can be written in matrix form as [Eq. (86)] Alx=pl,where pl is the acoustic pressure recorded by the lth detection element, and Al is the system matrix for the lth detection element. Substituting Eq. (114) into Eq. (113), we have l=1LAlxσp.

    The system matrix A in the discrete imaging model in Eq. (86) is replaced with a weighted summation of L system matrices Al in the detection-element-based SIR model. To achieve accurate representation, the size of the discrete detection element should be small, preferably much less than the acoustic wavelength[41]. Therefore, the discrete-detection-element-based SIR model involves more computations and is generally more time-consuming than the far-field- and patch-based SIR models.

    3.5.4 Iterative reconstruction

    Once the system matrix A of a PACT imaging system is constructed, the initial photoacoustic pressure x can be recovered from the measured projection data p by solving Eq. (86). In principle, the linear photoacoustic imaging model in Eq. (86) can be solved by the pseudo-inverse matrix method, i.e., x=Ap,where A=(AHA)1AH is the Moore–Penrose pseudo-inverse matrix of A, and the superscript H denotes the conjugate transpose. However, the pseudo-inverse method is not commonly used in practice because the system matrix A is generally too large to be inverted[61]. Alternatively, the imaging model can be solved iteratively. In this way, the image reconstruction problem is equivalent to solving the least-square problem x^=argminx012pAx22,where ·2 represents the L2-norm, and x^ is the approximate solution of the least-square problem. Since the initial acoustic pressure is non-negative, a non-negativity constraint is imposed on Eq. (117). Generally, the optimization problem in Eq. (117) is ill-posed under non-ideal imaging conditions. To address this problem, Eq. (117) is typically regularized by a penalty function and can be rewritten as x^=argminx012pAx22+λR(x),where pAx22 is the data fidelity term, R(x) is a regularization term, and λ is a regularization coefficient controlling the weight of regularization, which can be automatically optimized by methods such as the generalized cross-validation (GCV) method[190] and the L-curve method[191,192].

    One popular regularization technique used in Eq. (118) is Tikhonov regularization, which can be expressed as[63]R(x)=Γx22,where Γ is the Tikhonov matrix. In many cases, the Tikhonov matrix is chosen as the identity matrix (Γ=I), giving preference to solutions with smaller norms. When the regularization term R(x) is convex and differentiable, such as in Tikhonov regularization, the optimization problem in Eq. (118) can be solved by optimization methods such as stochastic gradient descent (SGD)[193,194], least-square QR (LSQR)[195], and conjugate gradient (CG)[196]. When a gradient descent method[197] is used, the optimization problem in Eq. (118) can be solved by xk+1=xkα[A*(Axkp)+λR(xk)xk],where k is the iteration number, α is the step size controlling the update rate, A*(Axkp) is the gradient of the fidelity term Axp22, and A* is the adjoint of the matrix A.

    Tikhonov regularization emphasizes the smoothness of an image and tends to produce images with blurred edges. An alternative to Tikhonov regularization is sparse regularization, which is non-smooth and aims to find a solution with the minimum number of non-zero elements in certain transform domains[198200]. L1-norm-based regularization is one of the most commonly used sparse regularization methods and has the following form[63,201]: R(x)=Φx1,where Φ is a transformation matrix that can be constructed using sparse basis functions, such as the wavelet function and finite difference operators. When a finite difference operator is employed, the L1-norm regularization becomes total variation (TV) regularization[181], which can be written as R(x)=xTV=j=1N([x]j[x]jx1)2+([x]j[x]jy1)2+([x]j[x]jz1)2,where jx − 1, jy − 1, and jz − 1 are the neighboring grids of the jth grid along the x-axis, y-axis, and z-axis, respectively, and N is the total number of image grid points. Sparse regularization involves non-smooth cost functions and can be solved using proximal point algorithms, such as the iterative shrinkage thresholding algorithm (ISTA) and its accelerated version fast ISTA (FISTA)[202], the iteratively reweighted least squares (IRLS)[203], and the alternating direction method of multipliers (ADMM)[204]. Specifically, the optimization problem in Eq. (118) can be solved by a proximal-gradient-descent scheme[45]: xk+1=proxR,λ[xkαA*(Axkp)],where proxR,λ is a proximal operator related to the regularization term R(x) and the parameter λ. From Eqs. (120) and (123), we know that the update process of an IR algorithm is related to the current reconstructed image xk, the regularization term R(x), and the gradient term A*(Axkp), as shown in Fig. 20.

    Combining different regularization strategies may help achieve improved image quality compared with using only a single type of regularization[205]. When dual regularization is imposed, the regularization term can be written as R(x)=λ1R1(x)+λ2R2(x),where R1(x) and R2(x) represent two types of regularization, and λ1 and λ2 are the corresponding coefficients. Based on this idea, Li et al. proposed non-local means and sparse-wavelet-based dual regularization and achieved image reconstruction with enhanced SNR and contrast[205].

    Figure 26 shows an example of IR-based image reconstruction in PACT. Figure 26(a) shows a numerical blood vessel phantom. The photoacoustic signals generated from the phantom are received by a 50-mm-diameter full-ring detector array with 64 elements enclosing the phantom. Figure 26(b) is the image reconstructed by FBP [Eq. (47)]. Severe streak artifacts are present in the reconstructed image due to the limited number of ultrasound detectors. Figure 26(c) is the image reconstructed by TV-based IR. The results show that the IR algorithm can produce images with fewer artifacts, demonstrating its superior performance under non-ideal imaging scenarios. The intensity profiles of the reconstructed images indicated by the white arrow are shown in Fig. 26(d) for comparison of the results of IR and FBP.

    Image reconstruction in sparse-view imaging by IR. (a) Numerical blood vessel phantom. The photoacoustic signals generated from the phantom are received by a full-ring detector array with 64 elements enclosing the phantom. (b) Image reconstructed by FBP [Eq. (47)]. (c) Image reconstructed by TV-based IR. (d) Intensity profiles of the reconstructed images indicated by the arrow in (a). GT: ground truth.

    Figure 26.Image reconstruction in sparse-view imaging by IR. (a) Numerical blood vessel phantom. The photoacoustic signals generated from the phantom are received by a full-ring detector array with 64 elements enclosing the phantom. (b) Image reconstructed by FBP [Eq. (47)]. (c) Image reconstructed by TV-based IR. (d) Intensity profiles of the reconstructed images indicated by the arrow in (a). GT: ground truth.

    Table 8 summarizes representative work on IR-based image reconstruction in PACT.

    • Table 8. Representative Work on IR-Based Image Reconstruction in PACT

      Table 8. Representative Work on IR-Based Image Reconstruction in PACT

      AuthorYearSystem Matrix ModelMediaDetector ResponseRegularizationOptimizationDimSource
      Yalavarthy et al.2021k-space PSTDHeterogEIRNon-local means + TVIRLS3D[206]
      Biton et al.2019InterpolationHomogAdaptive anisotropic TVChambolle-Pock2D[201]
      Li et al.2019InterpolationHomogNon-local means + L1 - normGD, FISTA2D[205]
      Prakash et al.2019InterpolationHomogSecond-order TVCG2D[207]
      Ding et al.2017InterpolationHomogSIRLSQR3D[40]
      Ding et al.2016InterpolationHomogLSQR2D[43]
      Liu et al.2016Curve-drivenHomogLSQR2D[208]
      Javaherian& Holman2016k-space PSTDHomogTVMulti-grid ISTA, FISTA2D[209]
      Wang et al.2014KB functionHomogEIR, SIRQuadratic smoothness penalty/TikhonovLinear CG3D[38]
      Mitsuhashi et al.2014Spherical voxelHomogEIR, SIRQuadratic smoothness penaltyFletcher-Reeves CG3D[189]
      Huang et al.2013k-space PSTDHeterogEIR, SIRTVFISTA2D[41]
      Wang et al.2012Spherical voxelHomogEIR, SIRQuadratic smoothness Laplacian/TVFletcher-Reeves CG, FISTA3D[181]
      Deán-Ben et al.2012InterpolationHeterog (Weak)Second-order TikhonovLSQR2D[210]
      Deán-Ben et al.2012InterpolationTikhonovLSQR3D[178]
      Rosenthal et al.2011InterpolationHomogSIRPseudo-inverse, LSQR2D[211]
      Wang et al.2011Spherical voxelHomogEIR, SIRQuadratic smoothness penaltyFletcher-Reeves CG3D[39]
      Rosenthal et al.2010InterpolationHomogEIRPseudo-inverse, LSQR2D[37]
      Provost & Lesage2009HomogEIRL1-normLASSO2D[198]
      Paltauf et al.2002Spherical voxelHomogAlgebraic iteration3D[36]

    4 Deep Learning Approaches

    The aforementioned conventional approaches can achieve high-quality image reconstruction under ideal imaging conditions. However, they may face challenges under non-ideal imaging conditions, such as limited detector bandwidth, finite detector aperture, limited view angle, sparse detector arrangement, and inhomogeneous acoustic media. Inspired by the successful applications of DL across industries such as computer vision[212], natural language processing[213], and biomedical engineering[214], DL-based image reconstruction in tomography has drawn considerable attention in recent years[215217]. DL has been successfully used for image reconstruction in CT, magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasound, and other imaging modalities[215,217,218]. It can achieve tomographic image reconstruction with improved image quality and computational efficiency. In PACT, DL has been used to address a range of image reconstruction problems[48,50], such as detector bandwidth expansion[51,52], resolution enhancement[53,54], low-power/cost imaging[219,220], artifact removal[55], reconstruction acceleration[57], and reconstruction enhancement in sparse-view and limited-view imaging[44,46,47,58]. Specifically, the applications of DL in PACT image reconstruction are mainly reflected in five aspects, including signal preprocessing in the data domain, image postprocessing in the image domain, hybrid-domain processing in both the data and the image domains, learned iterative reconstruction, and direct reconstruction.

    4.1 Preprocessing in the Data Domain

    Under non-ideal imaging conditions, the raw photoacoustic projection data measured by a detector array may be incomplete and contain noise. Data-domain signal preprocessing attempts to transform incomplete photoacoustic projection data to nearly complete projection data using neural networks before image reconstruction, as shown in Fig. 27.

    DL-based projection data preprocessing in the data domain.

    Figure 27.DL-based projection data preprocessing in the data domain.

    Many related studies have emerged based on this idea[221]. Gutta et al. proposed a simple five-layer fully connected network to expand the bandwidth of measured photoacoustic projection data and achieved image reconstruction with improved contrast and quality[51]. Awasthi et al. developed a UNet-based model for super-resolution, denoising, and bandwidth enhancement of photoacoustic projection data in sparse-view imaging and achieved improved image reconstruction, as shown in Fig. 28[52]. Here, UNet is a specially designed U-shaped convolutional neural network, which has been widely used in various image processing tasks such as image denoising, image deconvolution, and image reconstruction. Zhang et al. designed a network based on a channel-wise attention mechanism to extract feature relations between sparse data channels and achieved full-channel projection data recovery from sparse input[222]. Zhang et al. proposed a UNet-based network to correct photoacoustic projection data with skull-induced aberration in brain imaging and demonstrated that the aberration could be effectively removed after preprocessing[223]. These studies show that preprocessing projection data in the data domain can help enhance image reconstruction quality.

    DL-based preprocessing helps correct photoacoustic projection data and enhances image reconstruction quality. The photoacoustic projection data in this case were recorded by sparsely distributed detectors with finite bandwidths. (a) Architecture of the UNet used for signal preprocessing. (b) Reconstructed image of a living rat brain using the raw bandwidth-limited projection data of 100 detectors. (c) Reconstructed image using the interpolated projection data of 200 detectors. (d) Reconstructed image using the interpolated projection data denoised by automated wavelet denoising (AWD). (e) Reconstructed image using the interpolated projection data (c) processed by the super-resolution CNN (SRCNN). (f) Reconstructed image using the interpolated projection data (c) processed by the UNet in (a). Adapted from Ref. [52] with permission.

    Figure 28.DL-based preprocessing helps correct photoacoustic projection data and enhances image reconstruction quality. The photoacoustic projection data in this case were recorded by sparsely distributed detectors with finite bandwidths. (a) Architecture of the UNet used for signal preprocessing. (b) Reconstructed image of a living rat brain using the raw bandwidth-limited projection data of 100 detectors. (c) Reconstructed image using the interpolated projection data of 200 detectors. (d) Reconstructed image using the interpolated projection data denoised by automated wavelet denoising (AWD). (e) Reconstructed image using the interpolated projection data (c) processed by the super-resolution CNN (SRCNN). (f) Reconstructed image using the interpolated projection data (c) processed by the UNet in (a). Adapted from Ref. [52] with permission.

    Table 9 lists representative work on DL-based signal preprocessing in PACT.

    • Table 9. Representative Work on DL-Based Signal Preprocessing in PACT

      Table 9. Representative Work on DL-Based Signal Preprocessing in PACT

      AuthorYearProblemNetworkDatasetSource
      Zhang et al.2023Skull-induced aberration correctionUNetSimulation[223]
      Zhang et al.2022Sparse-view imagingTransformerSimulation[222]
      Awasthi et al.2020Bandwidth expansion and sparse-view imagingUNetSimulation, phantom (test), in vivo (test)[52]
      Gutta et al.2017Detector bandwidth expansionFive fully connected layersSimulation, phantom (test)[51]

    4.2 Postprocessing in the Image Domain

    In addition to signal preprocessing in the data domain, another application of DL in PACT is image postprocessing in the image domain, which indicates that deep neural networks can be used as postprocessing tools for image enhancement (see Fig. 29). This approach is especially useful for image artifacts removal in non-ideal imaging scenarios.

    DL-based image postprocessing in the image domain.

    Figure 29.DL-based image postprocessing in the image domain.

    To mitigate the image artifacts that often appear in sparse-view or/and limited-view PACT images, a variety of DL-based postprocessing methods have been proposed. For example, Antholzer et al. proposed a UNet-based architecture to process images reconstructed by FBP in sparse-view imaging and demonstrated the effectiveness of the model in artifact removal[44]. Farnia et al. developed a UNet-based network to process images reconstructed by TR and achieved artifact suppression with limited projection data[224]. Davoudi et al. proposed a UNet-based framework for image quality recovery from sparse and limited projection data and demonstrated its performance with whole-body mouse imaging in vivo, as shown in Fig. 30[46].

    DL-based postprocessing for image quality improvement in spare-view PACT. (a) Modified UNet for image postprocessing. (b) Reference images reconstructed using 512-channel projection data and their close-ups. (c) Images reconstructed with 32-channel projection data and their postprocessed versions by the modified UNet. (d) Images reconstructed with 128-channel projection data and their postprocessed versions by the modified UNet. Adapted from Ref. [46] with permission.

    Figure 30.DL-based postprocessing for image quality improvement in spare-view PACT. (a) Modified UNet for image postprocessing. (b) Reference images reconstructed using 512-channel projection data and their close-ups. (c) Images reconstructed with 32-channel projection data and their postprocessed versions by the modified UNet. (d) Images reconstructed with 128-channel projection data and their postprocessed versions by the modified UNet. Adapted from Ref. [46] with permission.

    In addition to the classic UNet, its variants are also commonly used as postprocessing tools in PACT. Guan et al. developed a fully dense UNet (FD-UNet) for image artifact removal in sparse-view PACT imaging and demonstrated its superiority over the conventional UNet[225]. Zhang et al. proposed a dual-domain UNet (DuDoUNet) with a specially designed information-sharing block, which takes both time-domain DAS images and k-space images as inputs, and verified its superiority with a public clinical database[226]. Guan et al. proposed a 3D modified UNet called dense dilation UNet (DD-UNet) and demonstrated its effectiveness in improving the imaging quality in 3D sparse-view and limited-view imaging[227]. Choi et al. reported a 3D progressive UNet for multi-parameter dynamic volume imaging and demonstrated that it could improve imaging speed and reduce image artifacts in limited-view PACT imaging, as shown in Fig. 31[47].

    DL-based postprocessing for imaging quality enhancement in limited-view 3D PACT. (a) Architecture of the 3D progressive UNet. (b) Reconstructed images in full-view imaging, cluster (limited) view imaging, and DL-enhanced cluster imaging. Adapted from Ref. [47] with permission.

    Figure 31.DL-based postprocessing for imaging quality enhancement in limited-view 3D PACT. (a) Architecture of the 3D progressive UNet. (b) Reconstructed images in full-view imaging, cluster (limited) view imaging, and DL-enhanced cluster imaging. Adapted from Ref. [47] with permission.

    In addition to UNet and its variants, generative adversarial networks (GANs) are another deep neural network commonly used for image postprocessing in PACT[228]. GAN shows excellent image processing capabilities and has been successfully used in many applications, such as image style conversion, image enhancement, and image restoration. A GAN network consists of two sub-networks, i.e., a generator and a discriminator, which are adversarially trained until the discriminator cannot distinguish the result produced by the generator and the ground truth. Vu et al. explored the application of a Wasserstein GAN with gradient penalty (WGAN–GP) in limited-view and limited-bandwidth PACT imaging and achieved image artifact removal[229]. Lu et al. proposed a GAN-based DL approach (LV-GAN) to recover high-quality images in limited-view imaging and achieved artifact removal and high recovery accuracy under a view angle of less than 60°[58]. Shahid et al. developed a residual-network-based GAN (ResGAN) to improve image quality by denoising and removing image artifacts in sparse-view imaging[230]. The deep neural networks in the above studies generally require supervision and need a large number of paired images for training, which are often difficult to obtain, especially in experiments. To address this problem, Lu et al. proposed an unsupervised cyclic GAN network (CycleGAN) that only needs unpaired training and successfully achieved artifact removal in sparse-view and limited-view PACT images[231].

    In addition to solving the image enhancement problem in sparse-view and limited-view imaging, DL-based postprocessing has also been used to address a variety of other imaging problems, such as reflection artifact removal and aberration mitigation. For example, Shan et al. incorporated a deep neural network into conventional iterative algorithms and successfully corrected reflection artifacts caused by planar echogenic structures outside imaging tissues[232]. Jeon et al. developed a hybrid deep neural network based on SegNet and UNet and achieved SOS aberration mitigation, streak artifact removal, and temporal resolution improvement in sparse-view imaging[233]. Gao et al. proposed a modified encoder-decoder UNet to learn the mapping relationship between speckle patterns and target images in thick porous media and solved the scattering problem in transcranial photoacoustic brain imaging[234]. Shijo et al. proposed a shifted-window transformer to restore artifact-free images from artifact-heavy images reconstructed using a TR algorithm[235].

    Deep neural networks have also been used to improve the spatial resolution of PACT images. For example, Rajendran and Pramanik employed a fully dense U-Net termed TARES to improve the spatially variant tangential resolution in circular scanning PACT imaging systems[53]. Zhang et al. exploited a fully dense UNet named Deep-E to improve the elevational resolution of linear-detector-array-based PACT imaging[54]. Based on this work, the same research group proposed a new Deep-E combining a 2D Deep-E and a 3D Deep-E and demonstrated its performance in elevational resolution improvement and deep vessel recovery in linear-detector-array-based PACT imaging[236].

    In addition, DL can also be used to reduce noise in low-fluence light emitting diode (LED)-based PACT imaging. For example, Anas et al. designed a deep neural network consisting of a CNN and a recurrent neural network (RNN) to improve the SNR of noise-corrupted images in LED-based PACT imaging[219]. Hariri et al. proposed a multi-level wavelet CNN (MWCNN) model to enhance image contrast in low-fluence PACT imaging[220]. The above work demonstrates that DL-based postprocessing can enhance photoacoustic images in different aspects, including artifact removal, contrast boost, resolution improvement, and aberration mitigation.

    In addition to the above problems, DL-based postprocessing methods have also been used to address other problems in PACT, such as quantitative imaging[237242], image fusion[243], image classification and segmentation[244246], and band-frequency separation[247]. Table 10 summarizes representative work on DL-based image postprocessing in PACT.

    • Table 10. Representative Work on DL-Based Image Postprocessing in PACT

      Table 10. Representative Work on DL-Based Image Postprocessing in PACT

      AuthorYearProblemNetworkDatasetSource
      Image enhancement from incomplete projection data
      Choi et al.2022Limited-view imaging3D progressive UNetIn vivo[47]
      Shahid et al.2022Sparse-view imagingResGANIn vivo (public dataset)[230]
      Shahid et al.2021Sparse-view imaging3-layer CNN, UNet, ResUNetIn vivo[248]
      Zhang et al.2021Sparse-view imagingDuDoUNetSimulation[226]
      Guan et al.2021Sparse-view and limited-view imagingDD-UNetSimulation[227]
      Godefroy et al.2021Limited-view and finite-bandwidth imagingUNetSimulation, phantom[249]
      Lu et al.2021Limited-view imagingLV-GANSimulation, phantom, in vivo (test)[58]
      Lu et al.2021Sparse-view imagingCycleGANSimulation, phantom, in vivo[231]
      Guan et al.2020Sparse-view imagingFD-UNetSimulation[225]
      Farnia et al.2020Sparse-view imagingUNetSimulation, in vivo (test)[224]
      Vu et al.2020Limited-view and finite-bandwidth imagingWGAN-GPSimulation, phantom (test), in vivo (test)[229]
      Zhang et al.2020Sparse-view and limited-view imagingRADL-Net(10-layer CNN)Simulation, phantom (test), in vivo (test)[250]
      Antholzer et al.2018Sparse-view imagingUNetSimulation[44]
      Davoudi et al.2019Sparse-view and limited-view imagingUNetSimulation, phantom, in vivo[46]
      Inhomogeneous acoustic media
      Gao et al.2022Thick porous mediaUNetSimulation, phantom, ex vivo[234]
      Jeon et al.2020SOS aberrationSegUNetSimulation, phantom, in vivo (test)[233]
      Shan et al.2019Reflection artifact correctionUNetSimulation[232]
      Resolution enhancement
      Zheng et al.2022Elevational resolution enhancementDeep-E (2D and 3D FD-UNet)Simulation, phantom (test), in vivo (test)[236]
      Zhang et al.2021Elevational resolution enhancementDeep-E (2D FD-UNet)Simulation, phantom (test), in vivo (test)[54]
      Rajendran & Pramanik2020Tangential resolution enhancementTARES (FD-UNet)Simulation, phantom (test), in vivo (test)[53]
      Low-energy imaging
      Hariri et al.2020Low-fluence imagingMulti-level wavelet UNetPhantom, in vivo (test)[220]
      Anas et al.2018LED imagingRNNPhantom, in vivo (test)[219]
      Image classification and segmentation
      Lafci et al.2021SegmentationUNetIn vivo[246]
      Chlis et al.2020SegmentationSparse UNetIn vivo[245]
      Zhang et al.2019Classification and segmentationAlexNet and GoogLeNetSimulation[244]
      Others
      González et al.2023Band-frequency separationFD-UNetSimulation[247]
      Rajendran & Pramanik2022Imaging speed improvementUNetSimulation, phantom (test), in vivo (test)[251]
      Rajendran & Pramanik2021Radius calibrationRACOR-PAT(FD-UNet)Simulation, phantom, in vivo[252]
      Awasthi et al.2019Image fusionPA-Fuse (DeepFuse)Simulation, phantom (test), in vivo (test)[243]

    4.3 Hybrid-Domain Processing

    Data-domain preprocessing and image-domain postprocessing can only extract feature information from one domain. To achieve improved imaging results, hybrid-domain processing attempting to extract feature information from both the data domain and the image domain (Fig. 32) is also commonly used.

    DL-based hybrid-domain processing. The neural networks 1 and 2 are used to extract feature information from the signal domain and the image domain, while the neural network 3 is used to fuse the outputs of the preceding two networks to generate the final images.

    Figure 32.DL-based hybrid-domain processing. The neural networks 1 and 2 are used to extract feature information from the signal domain and the image domain, while the neural network 3 is used to fuse the outputs of the preceding two networks to generate the final images.

    Based on this idea, Lan et al. proposed a hybrid deep neural network termed knowledge-infused GAN (Ki-GAN) for image enhancement in sparse-view PACT imaging[253]. The Ki-GAN employs raw photoacoustic signals and DAS-reconstructed photoacoustic images as input and can produce images with better quality than conventional DAS-type algorithms and a UNet-based postprocessing method for both fully and sparsely sampled data. Based on this work, the authors further developed a new hybrid-domain DL model named Y-Net for image enhancement in limited-view PACT imaging[254]. The Y-Net consists of two encoders and one decoder. The two encoders are used to separately process raw photoacoustic signals and DAS-reconstructed photoacoustic images, while the decoder is employed to fuse the outputs of the two encoders to generate final images. Davoudi et al. developed a hybrid-domain CNN model using both time-resolved photoacoustic signals and reconstructed images as input to enhance the image quality in limited-view PACT imaging[255] and achieved excellent results as shown in Fig. 33. Moreover, Guo et al. proposed a hybrid-domain attention steered network (AS-Net) for image enhancement in sparse-view PACT imaging[256]. The AS-Net also takes raw photoacoustic signals and DAS-reconstructed images as inputs, but the photoacoustic signals are first transformed into a 3D square matrix before being fed into the network to ensure compatibility with the input of the AS-Net. The AS-Net adopts a self-attention mechanism for semantic feature extraction and fusion and is very efficient.

    DL-based hybrid-domain processing for enhancing the imaging quality in limited-view PACT imaging. (a) Architecture of the dual-domain CNN. (b) Reconstructed results of an in vivo human finger. Adapted from Ref. [255] with permission.

    Figure 33.DL-based hybrid-domain processing for enhancing the imaging quality in limited-view PACT imaging. (a) Architecture of the dual-domain CNN. (b) Reconstructed results of an in vivo human finger. Adapted from Ref. [255] with permission.

    In addition to exploiting raw projection data directly, preprocessed projection data can also be used as the input of a hybrid-domain network. For example, Li et al. proposed a hybrid-domain CNN model called PRNet to improve the quality of sparse-view photoacoustic images[257]. The PRNet uses not only raw photoacoustic projection data but also their derivatives as input. Lan et al. proposed a joint feature fusion framework (JEFF-Net) to enhance photoacoustic images from limited-view projection data[258]. The JEFF-Net uses the sub-DAS images generated from each-channel raw projection data and the whole DAS images generated from all projection data as input, as shown in Fig. 34.

    JEFF-Net for image enhancement in limited-view PACT imaging. RGC-Net: residual global context subnetwork; SCTM: space-based calibration and transition module; GT: ground truth. Reprinted from Ref. [258] with permission.

    Figure 34.JEFF-Net for image enhancement in limited-view PACT imaging. RGC-Net: residual global context subnetwork; SCTM: space-based calibration and transition module; GT: ground truth. Reprinted from Ref. [258] with permission.

    Table 11 lists representative work on DL-based hybrid-domain processing in PACT.

    • Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT

      Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT

      AuthorYearProblemNetworkDatasetSource
      Inputs: raw data + reconstructed image
      Guo et al.2022Sparse-view imagingAS-NetSimulation, in vivo[256]
      Davoudi et al.2021Limited-view imagingCNNIn vivo[255]
      Lan et al.2020Limited-view imagingY-NetSimulation, ex vivo (test), in vivo (test)[254]
      Lan et al.2019Sparse-view imagingKi-GANSimulation[253]
      Inputs: preprocessed raw data + reconstructed image
      Lan et al.2023Limited-view imagingJEFF-NetSimulation, in vivo[258]
      Li et al.2021Sparse-view imagingPR-NetSimulation, phantom[257]

    4.4 Learned Iterative Reconstruction

    Conventional DL approaches generally employ deep neural networks as independent modules for signal preprocessing, image postprocessing, or hybrid processing. In contrast, learned iterative reconstruction (IR) algorithms attempt to fuse deep neural networks into an IR model to improve the quality and efficiency of image reconstruction. Specifically, learned IR algorithms employ deep neural networks to learn the regularization term or regularizer R(x)[259,260] or learn an entire iteration process[45,261], as shown in Fig. 35.

    Learned IR method in PACT.

    Figure 35.Learned IR method in PACT.

    A regularizer is important for IR-based image reconstruction, especially for limited projection data. Conventional regularizers such as Tikhonov and TV may not necessarily be optimal. Data-driven DL-based regularizers were thus proposed to replace conventional regularizers. Li et al.[259] and Antholzer et al.[260] used a neural network to learn the Tikhonov regularizer to improve the image quality in sparse-view PACT imaging. However, the Tikhonov regularizer is trained before the IR loop and is not updated during iteration. Therefore, the learned Tikhonov regularizer may not be optimal for the image reconstruction problem in PACT. Wang et al. considered the entire IR process and proposed a learned IR model based on Eq. (120)[197], as shown in Fig. 36. The learned IR model takes the current reconstructed image xk and the gradient term AT(Axkp) as inputs and updates the regularizer and the hypermeters α and λ in each iteration. The experimental results show that the learned IR model is effective at improving the quality of sparse-view PACT images and has a faster convergence speed than conventional IR algorithms. Moreover, Lan et al. proposed a novel untrained CNN-based compressed sensing regularizer for sparse-view PACT imaging and achieved improved image quality compared with conventional IR algorithms[262].

    Learning a regularizer. (a) Dual-path network for regularizer learning. (b) Cross-sectional images of a mouse reconstructed by IR with Tikhonov, TV, and learned regularizers. The learned regularizer has the highest image quality in terms of detail preservation and artifact suppression. Adapted from Ref. [197] with permission.

    Figure 36.Learning a regularizer. (a) Dual-path network for regularizer learning. (b) Cross-sectional images of a mouse reconstructed by IR with Tikhonov, TV, and learned regularizers. The learned regularizer has the highest image quality in terms of detail preservation and artifact suppression. Adapted from Ref. [197] with permission.

    In addition to learning a regularizer, it is also possible to learn an entire iteration process [Eqs. (120) and (123)] using deep neural networks. For example, Hauptmann et al. proposed a deep gradient descent (DGD) algorithm to reconstruct high-resolution 3D images from restricted photoacoustic projection data in limited-view PACT imaging[45]. In the DGD algorithm, a CNN model was designed to learn the proximal operator in Eq. (123), which yields high-quality images at a fast convergence speed, as shown in Fig. 37. Similarly, Yang et al. proposed using recurrent inference machines (RIMs) to learn an iteration process in IR and achieved accelerated reconstruction[261]. Lan et al. developed a CNN network combining a variational model to learn an iteration process and achieved robust and accelerated reconstruction in limited-view PACT imaging[263]. Shan et al. proposed a simultaneous reconstruction network (SR-Net) to update the initial pressure x and the sound speed v0 at each iteration of IR reconstruction for PACT imaging in acoustically heterogeneous media[264]. Moreover, Boink et al. proposed a learned primal-dual (L-PD) framework to learn an iteration process in IR and achieved simultaneous high-quality image reconstruction and segmentation in limited-view and sparse-view PACT imaging[265]. Recently, diffusion models in DL have attracted great attention in the field of biomedical image processing because of their powerful generative capabilities[266,267]. In PACT, some studies have incorporated diffusion models into iterative optimization frameworks, which show superior performance compared with conventional IR algorithms[268271].

    Learning 3D IR based on DGD. (a) Structure of the CNN representing one iteration of DGD. (b) From left to right: initialization of the network, DGD result with five iterations, TV result with 50 iterations, and reference image. The proposed learned IR has a faster convergence speed than the conventional TV-based IR algorithm. Adapted from Ref. [45] with permission.

    Figure 37.Learning 3D IR based on DGD. (a) Structure of the CNN representing one iteration of DGD. (b) From left to right: initialization of the network, DGD result with five iterations, TV result with 50 iterations, and reference image. The proposed learned IR has a faster convergence speed than the conventional TV-based IR algorithm. Adapted from Ref. [45] with permission.

    Generally, compared with conventional IR algorithms, learned IR methods can improve reconstruction efficiency by reducing the number of iterations. However, they are still time-consuming because the reconstruction model [Eqs. (120) and (123)] needs to be solved repeatedly. To accelerate the convergence of learned IR algorithms, Hsu et al. proposed a fast iterative reconstruction (FIRe) algorithm that simultaneously learns the discrete forward photoacoustic imaging model in Eq. (86) and an entire iteration process to reduce the reconstruction time[57]. Results showed that the FIRe algorithm can produce images with a quality comparable to that of learned IR and conventional IR algorithms but with reconstruction time reduced by nine and 620 times, respectively.

    Table 12 lists representative work on DL-based IR in PACT.

    • Table 12. Representative Work on DL-Based IR in PACT

      Table 12. Representative Work on DL-Based IR in PACT

      AuthorYearProblemNetworkDatasetSource
      Regularizer learning
      Wang et al.2022Sparse-view imagingCNNSimulation, in vivo[197]
      Lan et al.2021Sparse-view imagingUntrained CNN (deep image prior)Simulation (test), phantom (test), in vivo (test)[262]
      Antholzer et al.2019Sparse-view imagingResUNetSimulation, phantom (test)[260]
      Li et al.2018Sparse-view imagingEncoder-decoderSimulation[259]
      Entire IR learning
      Hsu et al.2023Reconstruction accelerationFIReSimulation[57]
      Lan et al.2022Limited-view imagingCNNSimulation, in vivo (public dataset)[263]
      Boink et al.2020Image reconstruction and segmentationCNNSimulation, phantom[265]
      Yang et al.2019Reconstruction accelerationRIMSimulation[261]
      Shan et al.2019Joint reconstructionSR-NetSimulation[264]
      Hauptmann et al.2018Limited view and accelerationCNNSimulation, phantom, in vivo[45]
      Diffusion model
      Guo et al.2024Limited-view imagingUNetSimulation, in vivo[268]
      Dey et al.2024Limited-view and sparse-view imagingCNNSimulation, in vivo[270]

    4.5 Direct Reconstruction

    In the aforementioned DL-based approaches, deep neural networks generally perform only certain functions, such as preprocessing, postprocessing, and regularizer learning, and cannot independently reconstruct images from raw projection data without the use of conventional algorithms. Nevertheless, deep neural networks can perform direct image reconstruction alone by learning the mapping relationship between raw photoacoustic projection data and reconstructed images (see Fig. 38).

    DL-based direct image reconstruction in PACT.

    Figure 38.DL-based direct image reconstruction in PACT.

    In 2018, Waibel et al. used a UNet-like model to reconstruct images from limited-view photoacoustic projection data and demonstrated the feasibility of neural-network-based direct image reconstruction through numerical simulation[272]. Lan et al. proposed a UNet-based deep neural network (DUNet) to reconstruct PACT images from multi-frequency photoacoustic projection data and demonstrated its superiority over conventional image reconstruction algorithms[273]. Feng et al. developed a residual-block-based end-to-end UNet (Res-UNet) to reconstruct PACT images from raw photoacoustic projection data and achieved high-quality image reconstruction with sharp edges and suppressed artifacts[274]. Lan et al. designed an encoder-decoder network to transform superimposed four-channel photoacoustic projection data into reconstructed images and achieved real-time PACT imaging with a single-channel data acquisition (DAQ) system[275]. Recently, Shen et al. developed a physics-driven DL-based filtered back projection (dFBP) framework for direct image reconstruction from raw projection data[276]. The dFBP network is constructed based on the physical model of the analytical FBP algorithm [Eq. (47)] and consists of a filtering module, a back-projection module, and a fusion module, as shown in Fig. 39. The proposed dFBP is robust and flexible, can achieve direct signal-to-image transformation with enhanced accuracy, and reconstruct high-quality, artifact-suppressed images from sparse-view, limited-view, and acoustic heterogeneity-contaminated projection data. Moreover, Lan et al. designed a self-supervised deep-learning framework for image reconstruction from extremely sparse projection data. The success of the network can be mainly attributed to the adoption of a transformer-based self-attention mechanism[277].

    Direct image reconstruction using dFBP. (a) Physical model of the analytical FBP [Eq. (47)]. (b) Architecture of the dFBP, which consists of a filtering module, a back-projection module, and a fusion module. (c), (d) Images separately reconstructed by FBP and dFBP using 128-channel photoacoustic projection data. (e), (f) Images separately reconstructed by FBP and dFBP from limited-view photoacoustic projection data (view angle: 3π/4). Reprinted from Ref. [276] with permission.

    Figure 39.Direct image reconstruction using dFBP. (a) Physical model of the analytical FBP [Eq. (47)]. (b) Architecture of the dFBP, which consists of a filtering module, a back-projection module, and a fusion module. (c), (d) Images separately reconstructed by FBP and dFBP using 128-channel photoacoustic projection data. (e), (f) Images separately reconstructed by FBP and dFBP from limited-view photoacoustic projection data (view angle: 3π/4). Reprinted from Ref. [276] with permission.

    To reduce the complexity of network training, raw photoacoustic projection data can be preprocessed to produce high-level features before being fed into an image reconstruction network. Guan et al. proposed UNet-based pixel-wise deep learning (Pixel-DL) that uses pixel-interpolated data as input and realized direct image reconstruction in limited-view and sparse-view PACT imaging[278]. Similarly, Kim et al. developed a deep convolutional network (upgUNet), which takes multi-channel features extracted from raw photoacoustic projection data as input, and achieved direct image reconstruction in limited-view PACT imaging[279], as shown in Fig. 40(a). Inspired by the principle of FBP, Tong et al. designed a feature projection network (FPNet) that takes raw photoacoustic projection data and their first-order derivative as input [Fig. 40(b)] and achieved high reconstruction quality from limited-view data with sparse measurements[280]. Recently, Dehner et al. proposed a deep neural network named DeepMB for real-time IR image reconstruction with adjustable SOS[281,282]. DeepMB learns conventional IR algorithms using domain-transformed projection data as input and can reconstruct high-quality images 3000 times (less than 10 ms per image) faster than conventional IR algorithms.

    DL-based direct image reconstruction using high-level features extracted from raw photoacoustic projection data. (a) Architecture of the upgUNet proposed by Kim and coworkers. Reprinted from Ref. [279] with permission. (b) Architecture of the FPNet + UNet proposed by Tong and coworkers (top) and representative reconstruction results (bottom). Adapted from Ref. [280] with permission.

    Figure 40.DL-based direct image reconstruction using high-level features extracted from raw photoacoustic projection data. (a) Architecture of the upgUNet proposed by Kim and coworkers. Reprinted from Ref. [279] with permission. (b) Architecture of the FPNet + UNet proposed by Tong and coworkers (top) and representative reconstruction results (bottom). Adapted from Ref. [280] with permission.

    Table 13 lists representative work on DL-based direct image reconstruction in PACT.

    • Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT

      Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT

      AuthorYearProblemNetworkDatasetSource
      Input: raw data
      Shen et al.2024Sparse-view/limited-view imagingLearned FBPSimulation, in vivo[276]
      Lan et al.2023Channel data decouplingEncoder-decoderSimulation, in vivo[275]
      Feng et al.2020Direct image reconstructionRes-UNetSimulation, phantom (test)[274]
      Lan et al.2019Multi-frequency image reconstructionDUNetSimulation[273]
      Allman et al.2018Image reconstruction with source detection and reflection artifacts removalDeep fully convolutional network + Fast R-CNNSimulation, phantom[283]
      Waibel et al.2018Limited-view imagingUNetSimulation[272]
      Input: high-level features extracted from raw data
      Dehner et al.2022IR algorithm accelerationDeepMBSimulation, in vivo (test)[281]
      Guan et al.2020Sparse-view and limited-view imagingFD-UNetSimulation[278]
      Kim et al.2020Limited-view imagingupgUNetSimulation, phantom (test), in vivo (test)[279]
      Tong et al.2020Sparse-view & limited-view imagingFPNet +UNetSimulation, phantom, in vivo[280]

    5 Performance Comparisons

    Thus far, we have reviewed the principles of the six classes of image reconstruction algorithms in PACT, namely, DAS, FBP, SE, TR, IR, and DL. To choose the most appropriate algorithm in practice, it is necessary to understand the characteristics and performance of each method under different imaging scenarios. In this section, we discuss and compare the six classes of algorithms, focusing on image reconstruction quality and image reconstruction speed. Table 14 first summarizes the overall characteristics and performance of each algorithm when evaluated under different circumstances, which will be discussed in detail below.

    • Table 14. Comparison of Different Image Reconstruction Algorithms in PACTa

      Table 14. Comparison of Different Image Reconstruction Algorithms in PACTa

      CircumstanceDASFBPSETRIRDL
      Detector EIR modelingbDifficultDifficultDifficultDifficultOKOK
      Detector SIR modelingcDifficultDifficultDifficultDifficultOKOK
      Performance in sparse-view imagingPoorPoorPoorPoorGoodExcellent
      Performance in limited-view samplingPoorPoorPoorPoorGoodExcellent
      Media property couplingDifficultDifficultDifficultOKOKOK
      SpeedFastFastVery fastSlowVery slowFast
      Memory footprintVery lowVery lowLowHighVery highHigh or very high

    5.1 Image Reconstruction Quality under Various Imaging Scenarios

    5.1.1 Ideal imaging conditions

    To achieve perfect image reconstruction or acoustic inversion in PACT, photoacoustic signal detection should be performed under ideal conditions: (1) the ultrasound detectors used for signal detection should have an infinite bandwidth; (2) the ultrasound detectors should have a point-like aperture; (3) the ultrasound detectors should be densely arranged in space; (4) the detector array formed by individual detectors should have a full view angle with respect to a sample; and (5) the acoustic media should be homogeneous. When these conditions are satisfied, all six classes of image reconstruction algorithms are expected to produce high-quality images.

    However, practical photoacoustic signal detection is unlikely to be ideal. In the following section, we discuss and compare the performances of the image reconstruction algorithms under non-ideal imaging scenarios.

    5.1.2 Limited detector bandwidth

    The ultrasound detectors used for photoacoustic signal detection should ideally have an infinite bandwidth so that they can respond to all frequency contents of a signal[4]. However, a practical detector always has a limited bandwidth and a non-ideal EIR, which distorts measured photoacoustic signals and blurs reconstructed images. To deal with this problem, a reconstruction algorithm should consider the non-ideal EIR. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. It is generally difficult for DAS, FBP, SE, and TR algorithms to incorporate the non-ideal EIR of a detector into their reconstruction models but easy for IR- and DL-based algorithms, as summarized in Table 14. Specifically, IR algorithms can easily solve the non-ideal EIR problem by incorporating it into the system matrix A [Eq. (110)], and DL algorithms can solve the problem by training a reconstruction network with EIR-corrected signal-image datasets.

    To understand how different image reconstruction algorithms behave in the case of non-ideal EIR, an example is given in Fig. 41. In this example, the photoacoustic source is a numerical blood vessel phantom distributed in the xy plane [Fig. 41(a)]. The detector array used for imaging is a full ring with a diameter of 50 mm and has 512 evenly distributed elements, each of which has a point-like shape. To simulate the limited bandwidth case, the detector array is assigned a center frequency of 2 MHz and is assumed to have a Gaussian-like EIR with a fractional bandwidth of 80%. Figures 41(b)41(d) show the images reconstructed by the FBP-, TR-, and EIR-corrected IR algorithms, respectively. Figures 41(e)41(g) are the images of Figs. 41(b)41(d) with negative values removed (negative values in a photoacoustic image have no physical meaning). The results show that the images reconstructed by FBP and TR have apparent artifacts, while the images reconstructed by IR have much fewer artifacts due to the correction of the detector EIR. Note that since DAS and SE typically have similar performance to FBP and are commonly adopted in PACT systems with linear and planar detector arrays, they are not compared in this example.

    Performance comparison of FBP, TR, and IR when the detectors used for signal detection have a limited bandwidth. (a) A full-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b)–(d) Images reconstructed by FBP, TR, and EIR-corrected IR algorithms, respectively. (e)–(g) Images in (b)–(d) with negative values removed and close-up views of the images in the red dashed box. In this example, the detector array is assumed to have a Gaussian-like EIR with a bandwidth of 80%. Other simulation settings can be found in the text.

    Figure 41.Performance comparison of FBP, TR, and IR when the detectors used for signal detection have a limited bandwidth. (a) A full-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b)–(d) Images reconstructed by FBP, TR, and EIR-corrected IR algorithms, respectively. (e)–(g) Images in (b)–(d) with negative values removed and close-up views of the images in the red dashed box. In this example, the detector array is assumed to have a Gaussian-like EIR with a bandwidth of 80%. Other simulation settings can be found in the text.

    Although DAS, FBP, SE, and TR algorithms are generally not ready to incorporate the EIR of a practical detector, their input signals can be preprocessed by deconvolution or DL to correct the non-ideal EIR. In deconvolution, the EIR of a detector is first experimentally measured using a point object and then incorporated into the signal detection model [Eq. (23)] for correction. In DL-based preprocessing, the EIR of a detector can be compensated for by training a neural network using EIR-corrected photoacoustic signals. Figure 42 shows an example demonstrating that signal preprocessing helps correct the non-ideal EIR of a detector and improves image reconstruction results[51]. In this example, the photoacoustic source is a numerical blood vessel, as shown in Fig. 42(a). Photoacoustic signals were measured by a full-ring detector array consisting of 100 elements with a radius of 37.02 mm, a center frequency of 2.25 MHz, and a fractional bandwidth of 70%. Figures 42(b) and 42(c) are images reconstructed using full and limited-bandwidth photoacoustic signals, respectively. Figures 42(d) and 42(e) are images reconstructed using signals separately preprocessed by deconvolution and a deep neural network, which were both designed to compensate for the EIR effect of the detector array. Figures 42(f)42(j) show the corresponding results for a Derenzo phantom. The simulation results show that deconvolution and DL-based signal preprocessing can effectively correct non-ideal EIRs and improve image reconstruction quality.

    Signal preprocessing helps correct the non-ideal EIR of a detector and improves image reconstruction quality. First row: (a) numerical blood vessel used for the test. (b), (c) Images reconstructed using full and limited-bandwidth photoacoustic signals, respectively. (d), (e) Images reconstructed using signals preprocessed by deconvolution and a deep neural network, respectively. Second row: (f)–(j) a Derenzo phantom and corresponding results. Detailed simulation settings can be found in the text. Adapted from Ref. [51] with permission.

    Figure 42.Signal preprocessing helps correct the non-ideal EIR of a detector and improves image reconstruction quality. First row: (a) numerical blood vessel used for the test. (b), (c) Images reconstructed using full and limited-bandwidth photoacoustic signals, respectively. (d), (e) Images reconstructed using signals preprocessed by deconvolution and a deep neural network, respectively. Second row: (f)–(j) a Derenzo phantom and corresponding results. Detailed simulation settings can be found in the text. Adapted from Ref. [51] with permission.

    5.1.3 Finite detector aperture size

    In addition to having an infinite bandwidth, an ideal ultrasound detector used for photoacoustic signal detection should also have a point-like shape so that it can respond to photoacoustic signals from all directions[4]. However, a practical detector always has a finite aperture size and a non-ideal SIR, which distorts measured photoacoustic signals and blurs reconstructed images. To address this problem, a reconstruction algorithm should consider the non-ideal SIR. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. It is generally difficult for DAS, FBP, SE, and TR algorithms to incorporate the non-ideal SIR of a detector into their reconstruction models but easy for IR- and DL-based algorithms, as summarized in Table 14. Specifically, IR algorithms can easily solve the non-ideal SIR problem by incorporating it into the system matrix A [Eq. (93)], and DL algorithms can solve this problem by training a reconstruction neural network with SIR-corrected signal-image datasets.

    To understand how different image reconstruction algorithms behave in the case of non-ideal SIRs, a simulation from[286] is adapted and shown in Fig. 43. In this example, the photoacoustic source has six-point absorbers located at different distances from the origin. Photoacoustic signals were recorded by a detector with a 6-mm-diameter aperture size and a center frequency of 5 MHz. The detector rotated around the photoacoustic source to record signals at different positions. Figures 43(a) and 43(b) show the images reconstructed by a model-based algorithm and DAS, respectively. The model-based algorithm uses a similar discrete imaging model [Eq. (86)] as IR and couples the SIR of the detector in its system matrix. The results show that the image reconstructed by DAS has significant distortions while the image recovered by the model-based algorithm has much fewer distortions due to the correction of the detector SIR.

    Performance comparison of a model-based algorithm and DAS when the detector used for signal detection has a finite aperture size. (a), (b) Images reconstructed by the model-based algorithm and DAS, respectively. The model-based algorithm uses a similar discrete imaging model [Eq. (86)] as IR but couples the detector SIR in its system matrix. Reprinted from Ref. [286] with permission.

    Figure 43.Performance comparison of a model-based algorithm and DAS when the detector used for signal detection has a finite aperture size. (a), (b) Images reconstructed by the model-based algorithm and DAS, respectively. The model-based algorithm uses a similar discrete imaging model [Eq. (86)] as IR but couples the detector SIR in its system matrix. Reprinted from Ref. [286] with permission.

    Although DAS, FBP, SE, and TR algorithms are generally not ready to incorporate the SIR of a practical detector, their output images can be postprocessed by deconvolution or DL to correct the non-ideal SIR. In deconvolution, the SIR of a detector is first modeled using linear acoustics[39,287,288] and then incorporated into a non-blind image deconvolution model for SIR correction. Alternatively, the SIR can be unknown and a blind deconvolution algorithm is employed to correct the non-ideal SIR. In DL-based postprocessing, the SIR of a detector can be compensated for by training a neural network using SIR-corrected photoacoustic images. Figure 44 shows an example demonstrating that DL-based image postprocessing helps correct the non-ideal SIR of a detector and improves image quality[53]. In this example, the photoacoustic source contains four-point targets. The detector used for signal detection has a finite aperture size and a center frequency of 2.25 MHz. The detector rotates around the photoacoustic source to measure signals at different positions. Figures 44(a) and 44(b) are images reconstructed using DAS without and with DL-based image postprocessing, respectively. The results show that the DAS-reconstructed image without postprocessing has significant distortions, while the image with postprocessing has much fewer distortions. This indicates that DL-based image postprocessing can effectively correct non-ideal SIRs and improve image reconstruction quality.

    Image postprocessing helps correct the non-ideal SIR of a detector and improves image quality. In this example, a detector with a finite aperture size rotates around the four-point sources for signal detection. (a), (b) Images reconstructed by DAS without and with DL-based postprocessing, respectively. Adapted from Ref. [53] with permission.

    Figure 44.Image postprocessing helps correct the non-ideal SIR of a detector and improves image quality. In this example, a detector with a finite aperture size rotates around the four-point sources for signal detection. (a), (b) Images reconstructed by DAS without and with DL-based postprocessing, respectively. Adapted from Ref. [53] with permission.

    Although here we independently discuss the EIR and SIR of a detector, they are never separable and together constitute the total impulse response (TIR) of a detector. To address the TIR of a detector, methods similar to those used for EIR and SIR correction can be used. In other words, the TIR of a detector can be experimentally measured and incorporated into IR or DL models for improved image reconstruction[289]. Alternatively, non-blind deconvolution, blind deconvolution, and DL-based image postprocessing techniques can also be adopted to correct the non-ideal TIR of a detector[290292].

    5.1.4 Sparse-view imaging

    Ideally, the ultrasound detectors used for photoacoustic signal detection should be densely arranged in space to satisfy the spatial Nyquist sampling theorem[4]. However, the detectors in a practical detector array are often sparse due to high fabrication costs, leading to the ill-posed problem of image reconstruction from sparse projection data. The six classes of algorithms reviewed in this paper have different performances when dealing with this problem. Generally, analytical algorithms such as DAS, FBP, SE, and TR have similar performance and require more projection data to reconstruct an image with reasonable image quality than do IR algorithms, which solve the image reconstruction problem in the sense of least squares; IR algorithms require more projection data than DL-based algorithms, which are data-driven and can realize complicated signal-to-image mapping using sparse projection data. In other words, DL-based algorithms have better performance than IR algorithms in sparse-view PACT imaging, and IR algorithms have better performance than DAS, FBP, SE, and TR algorithms, as summarized in Table 14.

    To evaluate the performance of different image reconstruction algorithms in sparse-view imaging, an example is presented in Fig. 45. The simulation settings in this example are similar to those in Fig. 41 except that the circular detector array has a varying number of elements from 32 to 512 and all detector elements have infinite bandwidth. Figure 45 shows the images reconstructed by the FBP-, TR-, and TV-regularized IR algorithms using 32-, 128-, and 512-channel projection data. As expected, the image reconstruction quality of each algorithm improves with the increase of detector number. However, FBP and TR suffer from streak artifacts more significantly than TV-regularized IR in sparse-view cases (e.g., 32 and 128 views), although the three algorithms yield similar image quality when the detection views are dense (e.g., 512). Since DAS and SE typically have similar performance to FBP and are commonly adopted in PACT systems with linear and planar detector arrays, they are not compared in this example.

    Performance comparison of FBP, TR, and IR in sparse-view PACT imaging. First–third columns: images reconstructed by FBP, TR, and TV-regularized IR, respectively. First–third row: images reconstructed using (a)–(c) 32-, (d)–(f) 128-, and (g)–(i) 512-channel projection data, respectively. The simulation settings can be found in the text.

    Figure 45.Performance comparison of FBP, TR, and IR in sparse-view PACT imaging. First–third columns: images reconstructed by FBP, TR, and TV-regularized IR, respectively. First–third row: images reconstructed using (a)–(c) 32-, (d)–(f) 128-, and (g)–(i) 512-channel projection data, respectively. The simulation settings can be found in the text.

    Compared with IR algorithms, DL-based algorithms may have better image reconstruction performance in sparse-view imaging[276,293]. To illustrate this, Fig. 46 shows an example comparing the performance of conventional algorithms such as TR, back projection, and TV-based IR, and two DL-based image reconstruction algorithms, namely, Pixel-DL[278] and model-based learning (MBLr)[45], for image reconstruction using sparsely sampled projection data[293]. In this example, the imaging target is mouse cerebral vasculature [Fig. 46(a)], and photoacoustic signals were received by an 8-mm-diameter full-ring array with 32 evenly distributed detectors. Figures 46(b)46(f) are the images reconstructed by TR, back projection, TV-regularized IR, Pixel-DL, and MBLr, respectively. The results show that the TV-regularized IR algorithm has higher image reconstruction quality than the TR and back-projection algorithms, which suffer from severe artifacts due to the sparseness of the projection data, while the two DL-based reconstruction methods, especially MBLr, yield improved quality compared with all conventional algorithms in this case.

    Performance comparison of DL and conventional image reconstruction algorithms in sparse-view PACT imaging. (a) Mouse cerebral vasculature and local close-up image. (b)–(f) Images reconstructed by TR, back projection, TV-regularized IR, Pixel-DL, and MBLr, respectively. DL-based algorithms (MBLr and Pixel-DL) are superior to conventional algorithms in this case. The simulation settings can be found in the text. Adapted from Ref. [293] with permission.

    Figure 46.Performance comparison of DL and conventional image reconstruction algorithms in sparse-view PACT imaging. (a) Mouse cerebral vasculature and local close-up image. (b)–(f) Images reconstructed by TR, back projection, TV-regularized IR, Pixel-DL, and MBLr, respectively. DL-based algorithms (MBLr and Pixel-DL) are superior to conventional algorithms in this case. The simulation settings can be found in the text. Adapted from Ref. [293] with permission.

    Although conventional image reconstruction algorithms suffer from artifacts in sparse-view imaging, they can be improved[221,294296]. For example, Sandbichler et al. achieved enhanced image reconstruction using conventional algorithms by transforming sparse projection data into dense data using compressed sensing in sparse-view imaging[297]. Meng et al. developed a principal component analysis (PCA)-based method and achieved high-quality 3D image reconstruction with sparsely sampled data without involving an iterative process[298]. Hu et al. analyzed the aliasing problem caused by spatial undersampling and proposed a temporal low-pass-filtering and spatial interpolation method for aliasing mitigation and artifact suppression[299,300]. Cai et al. analyzed the relationship between streak artifacts and sparse projection data and developed an adaptive FBP algorithm named contamination-tracing back-projection (CTBP) for image artifact suppression in sparse-view imaging[301]. Hakakzadeh et al. proposed a spatial-domain factor for sparse sampling circular-view PACT and achieved artifact suppression and resolution improvement compared with conventional algorithms[302]. Moreover, Wang et al. proposed an iterative scheme combining virtually parallel projecting and spatially adaptive filtering and achieved enhanced image reconstruction in sparse-view PACT imaging[303].

    5.1.5 Limited-view imaging

    Ideally, the detector array used for PACT imaging should have a full view angle (4π steradians in 3D space) with respect to the sample being imaged[4]. However, a practical detector array always has a limited view angle, leading to the ill-posed problem of image reconstruction from limited-view projection data. The six classes of algorithms reviewed in this paper have different performances when dealing with this problem. Generally, analytical algorithms such as DAS, FBP, SE, and TR have similar performance and require a larger view angle to reconstruct an image with reasonable image quality than do IR algorithms, which solve the image reconstruction problem in the sense of least squares; IR algorithms require a larger view angle than DL-based algorithms, which are data-driven and can realize complicated signal-to-image mapping using limited-view projection data. In other words, DL-based algorithms have better performance than IR algorithms in limited-view PACT imaging and IR algorithms have better performance than DAS, FBP, SE, and TR algorithms, as summarized in Table 14.

    To evaluate the performance of different image reconstruction algorithms in limited-view imaging, an example is presented in Fig. 47. The simulation settings in this example are similar to those used in Fig. 41 except that the circular detector arrays used for imaging have limited view angles ranging from π/2 (quarter circle) to 2π (full circle), and all detector elements have an infinite bandwidth. Figure 47 shows the images reconstructed by the FBP, TR, and TV-regularized IR algorithms when the view angles of the detector arrays are π/2, π, and 2π. As expected, the image reconstruction quality of each algorithm improves with the increase of the view angles of the detector arrays. However, FBP and TR suffer from artifacts more significantly than does TV-regularized IR in limited-view cases (e.g., π/2 and π), although the three algorithms yield similar image quality when the view angle is 2π.

    Performance comparison of FBP, TR, and TV-regularized IR in limited-view PACT imaging. (a) A partial-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b) Images reconstructed by FBP, TR, and TV-regularized IR under different imaging angles. The simulation settings can be found in the text.

    Figure 47.Performance comparison of FBP, TR, and TV-regularized IR in limited-view PACT imaging. (a) A partial-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b) Images reconstructed by FBP, TR, and TV-regularized IR under different imaging angles. The simulation settings can be found in the text.

    Compared with conventional algorithms, DL-based methods may have better image reconstruction performance in limited-view imaging[276]. To illustrate this, Fig. 48 shows a simulation comparing the performance of FBP and a DL-based FBP algorithm (dFBP) for image reconstruction using limited-view projection data[276]. In this example, the imaging target is a numerical zebrafish, and photoacoustic signals are collected by a circular detector array, which has a diameter of 80 mm and 512 evenly distributed detectors, as shown in Fig. 48(a). To simulate limited-view imaging scenarios, the shape of the detector array is reduced from a full circle to partial circles with central angles of π/4, π/2, 3π/4, and π. Figures 48(b) and 48(c) show the corresponding images reconstructed by FBP and dFBP under different view angles. The dFBP method can achieve better image reconstruction quality than FBP, which suffers from severe artifacts due to the incompleteness of the projection data.

    Performance comparison of DL and conventional image reconstruction algorithms in limited-view PACT imaging. (a) Imaging configuration. (b), (c) Images reconstructed by FBP and dFBP under different view angles. Adapted from Ref. [276] with permission.

    Figure 48.Performance comparison of DL and conventional image reconstruction algorithms in limited-view PACT imaging. (a) Imaging configuration. (b), (c) Images reconstructed by FBP and dFBP under different view angles. Adapted from Ref. [276] with permission.

    Although image reconstruction in limited-view imaging is challenging, the reconstruction procedure can be improved. First, limited-view raw projection data can potentially be augmented to achieve improved reconstruction. In 2004, Patch derived the consistency conditions for projection and estimated missing data from measured data[304]. Gamelin et al. employed a single-stage Wiener optimal filter to augment measured projection data by interpolation between measurement locations[305]. Second, conventional algorithms can be modified to adapt to the image reconstruction problem. For example, Paltauf et al. proposed a modified FBP algorithm that uses a weight factor to improve the image reconstruction quality in limited-view imaging[306]. Third, DL-based methods can be used as postprocessing tools for image enhancement. For example, Lu et al. proposed a GAN-based image postprocessing method and achieved high-quality image recovery from limited-view photoacoustic images[58].

    5.1.6 Heterogeneous media

    Many image reconstruction algorithms in PACT, such as DAS and FBP, are derived based on the assumption that the acoustic media are homogeneous, lossless, and non-dispersive. However, this is not true for biological tissues, in which strong acoustic heterogeneities, such as bones and air cavities, may be present[296,307]. To address this problem, an image reconstruction algorithm should consider the properties of an acoustic medium. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. DAS and FBP algorithms can employ dual SOS[308] or jointly reconstruct initial photoacoustic pressure and SOS[309] to reduce image artifacts caused by acoustic heterogeneity. Modified SE algorithms can incorporate variable SOS into reconstruction models to account for acoustic heterogeneity[167]. In comparison, TR-, IR-, and DL-based algorithms can more easily incorporate acoustic properties of a medium into their reconstruction models, as summarized in Table 14. Specifically, TR algorithms can solve the acoustic heterogeneity problem by incorporating the properties of acoustic media such as SOS, density, dispersion, and absorption into their acoustic propagation model (see Sec. 3.4.1). IR algorithms can solve this problem by incorporating the properties of acoustic media into the system matrix A by solving coupled photoacoustic wave equations (see Sec. 3.5.2). DL algorithms can solve this problem by training a reconstruction network with heterogeneity-corrected signal-image datasets. Certainly, incorporating the properties of acoustic media into a reconstruction model requires prior knowledge about the media.

    To understand how different image reconstruction algorithms behave in the case of acoustic heterogeneity, an example is presented in Fig. 49. The simulation settings in this example are similar to those used in Fig. 41 except that the sound speeds in the background and the ROI [the white dashed box in Fig. 49(a)] are assumed to be 1500 and 1520 m/s, respectively, and all detector elements have an infinite bandwidth. Figure 49(b) shows the image reconstructed by FBP with a constant SOS of 1505 m/s, which is the optimal SOS for reconstruction. Figures 49(c) and 49(d) show the images reconstructed by TR- and TV-regularized IR by coupling the actual SOS distribution into their reconstruction models. The images reconstructed by TR and IR with a correct SOS distribution have better image quality than that of FBP, which contains splitting artifacts at the end of the vessel.

    Performance comparison of FBP, TR, and IR in acoustically heterogeneous media. (a) A numerical blood vessel phantom with a nonuniform SOS distribution. The SOSs in the background and the white dashed box are 1500 and 1520 m/s, respectively. (b) Image reconstructed by FBP with a constant SOS of 1505 m/s. (c), (d) Images reconstructed by TR and TV-regularized IR by coupling the actual SOS distribution into their reconstruction models. Detailed simulation settings can be found in the text.

    Figure 49.Performance comparison of FBP, TR, and IR in acoustically heterogeneous media. (a) A numerical blood vessel phantom with a nonuniform SOS distribution. The SOSs in the background and the white dashed box are 1500 and 1520 m/s, respectively. (b) Image reconstructed by FBP with a constant SOS of 1505 m/s. (c), (d) Images reconstructed by TR and TV-regularized IR by coupling the actual SOS distribution into their reconstruction models. Detailed simulation settings can be found in the text.

    Although algorithms such as DAS, FBP, and SE are generally not ready to incorporate the properties of acoustic media into their reconstruction models, their output images can be postprocessed for image enhancement. Figure 50 shows a simulation demonstrating that image postprocessing helps correct acoustic heterogeneity and improves image reconstruction results[233]. The photoacoustic source in this example contains multiple elliptical and line absorbers and has three SOS regions with values of 1480, 1450, and 1575 m/s at different depths. The detector array used for imaging is a linear array located at the top of the image. Figure 50(a) shows the image reconstructed by a multi-stencil fast marching (MSFM) approach, which is regarded as the ideal result. The MSFM approach estimates the acoustic TOF based on the eikonal equation and the known SOS distribution and can achieve high-quality beamforming. Figure 50(b) shows the images reconstructed by a conventional Fourier beamformer using a constant SOS of 1540 m/s. Figure 50(c) shows the image in Fig. 50(b) processed by an automatic SOS selection method, which attempts to maximize the sharpness of the photoacoustic image with an optimal SOS (1480 m/s in this case). Figure 50(d) shows the image of Fig. 50(b) processed by a deep neural network called SegUNet. The results show that the image produced by a conventional Fourier beamformer has significant distortions, which can be partially corrected by the automatic SOS selection method and fully corrected by the DL-based postprocessing method.

    DL-based postprocessing enhances image quality in acoustically heterogeneous media. In this example, the imaging media consist of three layers in the depth direction (z direction), and their SOSs are 1480, 1450, and 1575 m/s. (a) Image reconstructed using the MSFM method, which is regarded as the ideal result. (b) Image reconstructed by a conventional Fourier beamformer with a constant SOS of 1540 m/s. (c) Postprocessed image of (b) using an autofocus approach. (d) Postprocessed image of (b) using a DL-based method (SegUNet). Adapted from Ref. [233] with permission.

    Figure 50.DL-based postprocessing enhances image quality in acoustically heterogeneous media. In this example, the imaging media consist of three layers in the depth direction (z direction), and their SOSs are 1480, 1450, and 1575 m/s. (a) Image reconstructed using the MSFM method, which is regarded as the ideal result. (b) Image reconstructed by a conventional Fourier beamformer with a constant SOS of 1540 m/s. (c) Postprocessed image of (b) using an autofocus approach. (d) Postprocessed image of (b) using a DL-based method (SegUNet). Adapted from Ref. [233] with permission.

    5.1.7 Other aspects

    In addition to the qualitative reconstruction discussed above, quantitative image reconstruction is another important aspect to consider in certain imaging scenarios, such as quantitative photoacoustic imaging. Under ideal conditions, most algorithms can achieve accurate amplitude reconstruction of initial photoacoustic pressure. However, real imaging scenarios are never ideal, which makes quantitative image reconstruction very challenging. Generally, IR[41,64] and DL[293] algorithms can achieve more accurate image reconstruction compared to other methods and are more suitable for quantitative photoacoustic imaging. Additionally, negative artifacts that have no physical meanings often occur in reconstructed photoacoustic images under non-ideal conditions[137]. In this case, IR algorithms with non-negative constraints can be used to reconstruct photoacoustic images free from negative components[207].

    5.2 Image Reconstruction Speed and Memory Footprint

    5.2.1 Image reconstruction speed

    In addition to image quality, image reconstruction speed and computational complexity are other critical indicators for measuring the performance of an algorithm. An ideal image reconstruction algorithm should be fast enough to allow a PACT imaging system to capture transient biodynamic processes such as heartbeat and blood flow in a living subject.

    The computational complexity of image reconstruction algorithms in PACT varies from algorithm to algorithm. For non-iterative methods, such as DAS, FBP, SE, and TR, SE is the most computationally efficient due to the application of FFT. Assuming that a 3D image to be reconstructed has a size of Nx×Ny×Nz (Nx=Ny=Nz=n) and the detector number M=n×n[32], SE algorithms have a computational complexity of O[n3log(n)] for 3D image reconstruction and O[n2log(n)] for 2D image reconstruction for the specific detection geometries listed in Table 5. To perform the same reconstruction tasks, TR algorithms have computational complexities of O[n4log(n)] for 3D and O[n3log(n)] for 2D[310], while back-projection-type algorithms, such as DAS and FBP, have computational complexities of O(n5) for 3D and O(n3) for 2D[62,310]. One may notice that TR algorithms have lower complexity than back-projection-type algorithms for 3D image reconstruction but seem to be slower in practice. This is because TR algorithms need to compute an entire acoustic field [Eqs. (69)–(73)] step by step, which is time- and memory-consuming, especially for large-scale 3D imaging models. In contrast, back-projection-type algorithms can directly reconstruct the ROI instead of a whole imaging region and can be implemented with parallel computing. Compared with non-iterative methods, IR algorithms usually have a much greater computational complexity due to the repeated calculation of the image reconstruction model in Fig. 20.

    The computational complexity of DL-based algorithms depends on the specific method. Generally, non-iterative DL algorithms, such as data-, image-, and hybrid-domain processing algorithms and direct reconstruction algorithms, can process an image in roughly the same amount of time as conventional non-iterative algorithms, such as FBP and TR. Iterative DL algorithms, such as learned IR, are generally faster than conventional IR algorithms because they normally require fewer iterations and because the discrete forward imaging model [Eq. (86)] can be learned by neural networks. Figure 51(a) shows a qualitative comparison of the speed of different image reconstruction algorithms.

    Qualitative comparison of different image reconstruction algorithms in terms of (a) image reconstruction speed and (b) memory footprint.

    Figure 51.Qualitative comparison of different image reconstruction algorithms in terms of (a) image reconstruction speed and (b) memory footprint.

    The speed of an image reconstruction algorithm in PACT can be accelerated by GPUs[42,152,311]. For example, Wang et al. developed a comprehensive GPU-based framework for PACT image reconstruction that achieved an 871-fold increase in speed when reconstructing extremely large-scale 3D images compared to a central processing unit (CPU)-based implementation[152]. Liu et al. reported an ultrafast GPU-based FBP implementation for PACT image reconstruction[311]. The implementation requires only 0.38 ms to reconstruct a 2D image with a size of 512 pixel × 512 pixel and 6.15 ms to reconstruct a 3D image with a size of 160 voxel × 160 voxel × 160 voxel. Wang et al. proposed GPU-based parallelization strategies to accelerate the FBP algorithm and a penalized least-square and interpolation-based IR (PLS-Int-IR) algorithm[42] and improved the image reconstruction speeds of FBP and PLS-Int-IR by factors of 1000 and 125, respectively. Although most image reconstruction algorithms can, in principle, be accelerated by GPUs, the speedup ratios may vary. Back-projection-type algorithms, such as DAS and FBP, inherently support parallel computing and thus can achieve a high speedup ratio. In contrast, TR and IR algorithms either need to compute an entire photoacoustic field [Eqs. (69)–(73)] or need to update the image reconstruction model (Fig. 20) step by step, thus offering limited acceleration performance.

    5.2.2 Memory footprint

    Memory footprint is also a critical indicator for measuring the performance of an algorithm. Generally, the memory footprint of an image reconstruction algorithm in PACT depends on the size of the image to be reconstructed Nx×Ny×Nz and the size of the measured photoacoustic signals M×K (M: detector number, K: signal sampling length). For DAS, FBP, SE, and TR algorithms, the memory footprint can be approximately estimated as Memory1NxNyNz+MK.

    Similarly, the memory footprint of IR algorithms can be approximately estimated to be Memory2NxNyNzMK+MK,where NxNyNzMK is the size of the system matrix A in Eq. (86).

    According to Eqs. (125)–(126), back-projection-type algorithms, such as DAS and FBP, require the least amount of memory, while IR algorithms require the most memory. Moreover, back-projection-type algorithms can reconstruct only a portion of an imaging region (i.e., ROI) rather than the whole region, which can further reduce the amount of memory needed. IR algorithms can also achieve direct image reconstruction of ROIs but the memory footprint is still high due to the extremely large size of the system matrix A, which is sparse and can be compactly represented to reduce the memory footprint[36]. In contrast to that of conventional image reconstruction algorithms, the memory footprint of a DL algorithm depends on its network structure. In general, the memory footprint of non-iterative DL algorithms is greater than that of conventional non-iterative algorithms, but less than that of IR algorithms. The memory footprint of learned IR algorithms is comparable to that of conventional IR algorithms and is also very high. Figure 51(b) shows a qualitative comparison of the memory footprints of different image reconstruction algorithms.

    6 Challenges and Discussion

    The six classes of algorithms reviewed in this paper, namely, DAS, FBP, SE, TR, IR, and DL, can achieve high-quality image reconstruction under ideal imaging conditions. However, they may face challenges under non-ideal imaging conditions, especially in the following scenarios: (1) the bandwidths and aperture sizes of the detectors used for imaging are limited and finite; (2) the elements and view angle of the detector array used for imaging are sparse and limited; and (3) acoustic media are strongly heterogeneous.

    The bandwidth and aperture size of an individual ultrasound detector in a detector array are important for high-quality photoacoustic image reconstruction. An ideal detector should have infinite bandwidth and a point-like aperture to respond to all frequency contents of a photoacoustic signal from all directions. However, practical ultrasound detectors always have limited bandwidths and finite aperture sizes, which distort measured photoacoustic signals and blur reconstructed images. Researchers typically first measure the impulse response of a detector and then couple it to IR algorithms[37,39] or use deconvolution approaches[291,292,312] to remove the impacts of detector bandwidth and aperture size. However, these methods lead to solving optimization problems, which are typically slow, ill-posed, and sensitive to noise. How to build more efficient models to compensate for the impact of detector bandwidth and aperture size is a subject worth studying.

    The element number and view angle of a detector array are important for high-quality photoacoustic image reconstruction. Ideally, the element number of a detector array should be large enough to satisfy the spatial Nyquist sampling criteria for perfect sampling, and the view angle should be 4π steradian (full 3D view) to record complete photoacoustic signals in 3D space. However, practical detector arrays such as linear and planar arrays typically have a limited number of elements and a limited view angle, which eventually results in the challenging problem of image reconstruction from sparse-view and limited-view projections. IR and DL are two classes of algorithms that are commonly used to address image reconstruction problems in these scenarios[211,276,278,313]. However, these methods typically involve intensive calculations and/or require huge amounts of training data that are difficult to obtain in practice. How to develop more accurate and efficient algorithms to better address the image reconstruction problem in sparse-view and limited-view imaging is also worth studying.

    The acoustic homogeneity of a medium is also important for high-quality photoacoustic image reconstruction. Most current image reconstruction algorithms used in PACT assume that the acoustic media are homogeneous. However, this does not hold in biological tissues, especially when strong heterogeneities, such as bones and air cavities, are present[307]. Current methods for image reconstruction in heterogeneous tissues include half-time[314] or partial-time[315] reconstruction, autofocus reconstruction by optimizing the SOS[316], joint reconstruction of optical absorption and SOS[317], and ultrasound-guided adaptive reconstruction[296]. However, these methods can only mitigate the acoustic heterogeneity problem to a certain extent and have limited improvements in reconstruction accuracy, especially in complex imaging scenarios such as transcranial imaging. An alternative method is full-waveform-based algorithms, such as TR[170] and IR[41], which are based on solutions to exact photoacoustic wave equations. These algorithms can consider the acoustic properties of the media such as SOS, absorption, dispersion, and density, and achieve accurate image reconstruction in acoustically heterogeneous media. However, these algorithms are not widely used thus far, especially in practical applications. The main reason is that they require prior information on media such as SOS maps, which can be measured using other imaging techniques such as ultrasound CT[318320] but at the expense of introducing additional imaging facilities and computing resources. Therefore, how to better tackle the acoustic heterogeneity problem and achieve high-quality image reconstruction deserves further investigation.

    In addition to image reconstruction quality, image reconstruction speed is also important for an algorithm. Currently, many algorithms such as DAS, FBP, SE, and non-iterative DL algorithms can achieve real-time reconstruction when accelerated with GPUs[33,104,152,279,311], which, however, is a great challenge for IR-based algorithms. The IR algorithms in PACT need to compute a large-scale system matrix A and perform image reconstruction iteratively. Therefore, they require huge amounts of computational resources, which makes real-time reconstruction difficult. This is especially true for 3D IR reconstruction, in which the computational time is typically on the order of hours[42]. DL-based IR reconstruction has been reported to help reduce the number of iterations[45] and improve computational efficiency[57]. Nevertheless, developing new strategies to improve the computational speed of IR algorithms and reduce the memory footprint still has important practical significance.

    DL-based image reconstruction has superior performance over conventional image reconstruction algorithms and is an emerging technique in PACT. It has been used to solve a variety of non-ideal image reconstruction problems in PACT, such as detector bandwidth expansion and IR acceleration. Nonetheless, DL-based image reconstruction faces multiple challenges. First, the performance of DL methods heavily relies on the quality and quantity of training data. The training data are usually required to be paired, and the acquisition of large amounts of paired data is difficult in practice. Training neural networks using simulated datasets and/or small-scale experimental datasets may limit the performance of the networks in terms of reconstruction accuracy, robustness, and generalizability in practical applications. Second, reference images such as full-bandwidth photoacoustic images for network training in DL methods are often difficult to obtain. In these situations, networks can only be trained using simulated datasets. Since experimental conditions are typically much more complex than simulations, training a network using only simulated datasets leads to degraded reconstruction performance and poor robustness. Third, there are currently few publicly available datasets for the comparison of the performances of different DL methods. Using private datasets to compare different DL methods may lead to biased conclusions. Finally, DL-based image reconstruction methods in PACT lack interpretability, which is also a long-standing problem in the field of DL. In response to the above issues, it is necessary to research the following aspects: (1) building large-scale publicly available datasets for the construction of networks with good generalization properties and performance comparisons of different networks; (2) developing unsupervised DL-based reconstruction methods that do not need paired data for training; (3) developing more powerful simulation platforms to generate reference data closely resembling real experimental data[321]; and (4) developing physics-informed networks to improve the interpretability of DL methods.

    One important goal of image reconstruction from photoacoustic projections is disease diagnosis. The current diagnostic route from photoacoustic signals to photoacoustic images to knowledge is straightforward but can potentially lead to information loss and distortion during image reconstruction, which can decrease the accuracy of disease diagnosis. Artificial intelligence (AI) can mine knowledge from vast amounts of data and offer opportunities for disease diagnosis directly from raw data[322]. The use of raw-data-based diagnostic technology instead of image-based diagnostic technology may facilitate the development of fully automated scanning and diagnostic procedures and become an important direction in the future.

    This review mainly discusses advanced image reconstruction algorithms in PAT for high-performance imaging. In addition to advanced algorithms, emerging technologies such as metamaterials and nanomaterials can be leveraged to enhance the imaging performance of PAT. For example, metamaterials can be appropriately designed to influence the propagation of electromagnetic waves or acoustic waves in a manner not observed in bulk materials. By tailoring the amplitude, phase, or polarization of light, optical metamaterials can emulate conventional optical lenses, waveplates, or holograms to achieve multidimensional light-field modulation in PAT[323329]. Metamaterials can also be designed for photoacoustic signal sensing to address the sensitivity and bandwidth problem of conventional piezoelectric sensors[330333] and provide a sensitive method for ultrasound detection in PAT.

    Nanomaterials that often exhibit unique optical, electronic, or physicochemical properties are another emerging technology that can be used to enhance the imaging performance of PAT. Nanoparticles are a common type of nanomaterial used as molecular contrast agents in PAT[334] and can be generally divided into two categories: inorganic nanoparticles and organic nanoparticles. Inorganic nanoparticles, such as noble metal nanoparticles, iron oxide nanoparticles, semiconductor nanoparticles, magnetic nanoparticles, and carbon nanoparticles, possess versatile properties and have been investigated for various applications in biomedical imaging and therapeutics[335,336]. Among them, noble metal nanoparticles are the most commonly used contrast agents in PAT due to their high optical absorption cross section and biocompatibility[337]. Gold nanoparticles, such as gold nanospheres, nanorods, nanoshells, and nanostars, are particularly popular because of their high extinction coefficient in the near-infrared range and their high photoacoustic conversion efficiency[334,338,339]. In addition to inorganic nanoparticles, organic nanoparticles such as polymer nanoparticles and encapsulations are also increasingly being used in PAT[334,340,341]. Polymer nanoparticles usually have a molar extinction coefficient lower than that of gold nanoparticles but higher than that of small-molecule dyes and have an absorption peak in the near-infrared range. They typically possess high structural and functional flexibility, high biodegradability and biocompatibility, and have a high translational potential. Furthermore, polymer nanoparticles usually have stable surfaces that can be modified with specific targeting and therapeutic moieties for molecular imaging and targeted therapy, which enables the imaging of biological events and functionalities at multiple scales.

    7 Conclusions

    In this work, we systemically review the image formation problem in PACT over the past three decades, including the forward problem, conventional reconstruction methods, DL-based reconstruction methods, and the comparisons of different reconstruction methods.

    The photoacoustic forward problem involves multiple physical processes, including photoacoustic signal generation, signal propagation, and signal detection. The generation of photoacoustic signals is based on the thermoelastic effect and should satisfy the conditions of thermal confinement and stress confinement. The propagation of photoacoustic signals is governed by the photoacoustic wave equation, which can be analytically solved by the method of Green’s functions and numerically visualized by the k-Wave toolbox. The detection of photoacoustic signals typically involves the use of ultrasound detector arrays, whose properties such as detector bandwidth, aperture size, element number, and view angle impact detected photoacoustic signals and final images. The photoacoustic forward problem can be described by the spherical Radon transform. The task of image reconstruction in PACT can be regarded as finding the inverse of the spherical Radon transform.

    Conventional reconstruction methods in PACT typically achieve image formation via analytical derivations or numerical computations and mainly include five classes of algorithms, i.e., DAS, FBP, SE, TR, and IR. The DAS-type algorithms reconstruct an image by summing the delayed photoacoustic signals of each detector and have many variants, such as native DAS, DMAS, SLSC, and MV. The DAS-type algorithms are simple in principle and fast in speed but lack mathematical rigor. FBP-type algorithms reconstruct an image by back-projecting the filtered photoacoustic signals of each detector to the image domain and can achieve accurate image reconstruction under ideal conditions. They are rigorously deduced from the photoacoustic wave equation and can be regarded as advanced versions of DAS. The SE-type algorithms reconstruct an image by representing it using a mathematical series. They are computationally efficient and super-fast for special detection geometries such as the planar geometry. The TR-type algorithms reconstruct an image by running a forward numerical acoustic propagation model backward and can be used for any closed detection geometry. They are well suited for image reconstruction in heterogeneous media since the acoustic propagation model can couple acoustic properties of media, such as SOS, density, dispersion, and absorption of a medium. The IR-type algorithms reconstruct an image by minimizing the energy error between measured projection data and estimated projection data. They can incorporate various physical factors into reconstruction models, suppress image artifacts caused by incomplete projections, and are well-suited for non-ideal imaging scenarios.

    DL achieves tomographic image reconstruction in PACT by learning the photoacoustic imaging model from big data using designed networks and thus is network-based, data-driven, and learning-oriented. DL techniques can be applied to photoacoustic image reconstruction from multiple aspects, including signal preprocessing in the data domain, image postprocessing in the image domain, hybrid-domain processing, direct signal-to-image reconstruction, and IR reconstruction learning. DL-based signal preprocessing employs a network to enhance raw photoacoustic projection data in the data domain and delivers the enhanced projection data to conventional reconstruction algorithms as input. DL-based image postprocessing employs a network to enhance photoacoustic images output from conventional reconstruction algorithms for artifacts and noise suppression. DL-based hybrid-domain processing employs a network to make full use of the information of raw projection data and reconstructed images and can yield images with enhanced quality. In contrast, DL-based direct reconstruction employs a network to form photoacoustic images directly from raw projection data and does not involve any conventional image reconstruction algorithms. Training such a direct transformation network, however, requires large-scale datasets. DL-based IR reconstruction employs a network to learn the reconstruction process of conventional IR algorithms and can also achieve direct signal-to-image reconstruction. This method typically has better robustness and generalizability but consumes more time and memory than other DL methods.

    The image reconstruction algorithms discussed in this review have distinct characteristics. In terms of image reconstruction quality, most image reconstruction algorithms work well under ideal imaging conditions and can yield high-quality images. However, they behave differently under non-ideal imaging scenarios, such as sparse-view imaging, limited-view imaging, imaging with bandwidth-limited and/or finite-aperture detectors, and imaging in the presence of acoustic heterogeneity. Generally, the DAS, FBP, and SE algorithms have relatively poor performance in these scenarios. The TR and IR algorithms can yield improved results because they can incorporate the physical models of an imaging system, such as the SOS of media, the EIR, and the SIR of a detector. The DL algorithms can produce surprisingly good results that are unattainable for conventional algorithms due to their network-based data-driven nature. In terms of image reconstruction speed and memory footprint, DAS, FBP, SE, TR, and non-iterative DL algorithms are generally faster than IR and iterative DL methods. TR, IR, and iterative DL algorithms generally require more memory than DAS, FBP, and SE methods.

    This review is expected to help general readers better understand the image formation problem in PACT, provide a self-contained reference guide for beginners as well as specialists, and facilitate further developments and applications of novel image reconstruction algorithms.

    [14] T. Bowen. Radiation-induced thermoacoustic soft tissue imaging, 817(1981).

    [15] T. Bowen et al. Some experimental results on the thermoacoustic imaging of tissue equivalent phantom materials, 823(1981).

    [16] T. Bowen. Radiation-induced thermoacoustic imaging. U.S. patent(1981).

    [17] A. A. Oraevsky, S. L. Jacques, F. K. Tittel. Determination of tissue optical properties by piezoelectric detection of laser-induced stress waves, 86(1993).

    [18] A. A. Oraevsky et al. Laser-based optoacoustic imaging in biological tissues, 122(1994).

    [19] A. A. Oraevsky et al. Lateral and z-axial resolution in laser optoacoustic imaging with ultrasonic transducers, 198(1995).

    [26] C. G. A. Hoelen et al. Photoacoustic blood cell detection and imaging of blood vessels in phantom tissue, 142(1998).

    [74] L. V. Wang, H. Wu. Biomedical Optics: Principles and Imaging(2007).

    [76] American National. American National Standard for Safe Use of Lasers ANSI Z136.1-2007(2007).

    [78] B. Cox, P. C. Beard. Modeling photoacoustic propagation in tissue using k-space techniques. Photoacoustic Imaging and Spectroscopy, 25(2017).

    [79] H. Jiang. Fundamentals of photoacoustic tomography. Photoacoustic Tomography, 1(2015).

    [81] C. Guo, C. S. Singh. Handbook of Laser Technology and Applications: Laser Applications: Medical, Metrology and Communication (volume four)(2021).

    [89] S. R. Deans. The Radon Transform and Some of its Applications(2007).

    [90] A. C. Kak, M. Slaney. Algorithms for reconstruction with nondiffracting sources. Principles of Computerized Tomographic Imaging, 49(2001).

    [91] N. J. Redding, G. N. Newsam. Radon transform and its inverse. Inverting the Circular Radon Transform, 2(2002).

    [92] N. J. Redding, T. M. Payne. Inverting the spherical Radon transform for 3D SAR image formation, 466(2003).

    [93] K. E. Thomenius. Evolution of ultrasound beamformers, 1615(1996).

    [95] R. E. McKeighen, M. P. Buchin. New techniques for dynamically variable electronic delays for real time ultrasonic imaging, 250(1977).

    [101] A. Alshaya et al. Spatial resolution and contrast enhancement in photoacoustic imaging with filter delay multiply and sum beamforming technique, 1(2016).

    [108] M. T. Graham, M. A. L. Bell. Theoretical application of short-lag spatial coherence to photoacoustic imaging, 1(2017).

    [112] J. Mann, W. Walker. A constrained adaptive beamformer for medical ultrasound: Initial results, 1807(2002).

    [113] M. Sasso, C. Cohen-Bacrie. Medical ultrasound imaging using the fully adaptive beamformer, ii/489(2005).

    [120] R. Al Mukaddim, R. Ahmed, T. Varghese. Improving minimum variance beamforming with sub-aperture processing for photoacoustic imaging, 2879(2021).

    [123] K. W. Hollman, K. Rigby, M. O’donnell. Coherence factor of speckle from a multi-row probe, 1257(1999).

    [162] D. L. Colton, R. Kress, R. Kress. Inverse Acoustic and Electromagnetic Scattering Theory, 93(1998).

    [169] G. Zangerl, O. Scherzer, M. Haltmeier. Exact series reconstruction in photoacoustic tomography with circular integrating detectors. Commun. Math. Sci., 7, 665(2008).

    [179] K. Wang et al. Investigation of iterative image reconstruction in optoacoustic tomography, 379(2012).

    [184] G. B. Arfken, H. J. Weber, F. E. Harris. Mathematical Methods for Physicists: a Comprehensive Guide(2011).

    [191] P. C. Hansen. The L-curve and its use in the numerical treatment of inverse problems. Computational Inverse Problems in Electrocardiology, 119(2001).

    [193] L. Bottou. Large-scale machine learning with stochastic gradient descent, 177(2010).

    [194] L. Bottou. Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade, 421(2012).

    [214] S. S. Nisha, N. Meeral. Applications of deep learning in biomedical engineering. Handbook of Deep Learning in Biomedical Engineering Techniques and Applications, 245(2020).

    [222] J. Zhang et al. PAFormer: photoacoustic reconstruction via transformer with mask mechanism (IUS), 1(2022).

    [226] J. Zhang et al. Limited-view photoacoustic imaging reconstruction with dual domain inputs under mutual information constraint(2020).

    [227] S. Guan et al. Dense dilated UNet: deep learning for 3D photoacoustic tomography image reconstruction(2021).

    [235] V. Shijo et al. SwinIR for photoacoustic computed tomography artifact reduction, 1(2023).

    [240] C. Yang, F. Gao. EDA-Net: dense aggregation of deep and shallow information achieves quantitative photoacoustic blood oxygenation imaging deep in human breast, 246(2019).

    [241] C. Yang et al. Quantitative photoacoustic blood oxygenation imaging using deep residual and recurrent neural network, 741(2019).

    [253] H. Lan et al. Ki-GAN: knowledge infusion generative adversarial network for photoacoustic image reconstruction in vivo, 273(2019).

    [257] W. Li et al. Deep learning reconstruction algorithm based on sparse photoacoustic tomography system, 1(2021).

    [260] S. Antholzer et al. NETT regularization for compressed sensing photoacoustic tomography, 272(2019).

    [261] C. Yang, H. Lan, F. Gao. Accelerated photoacoustic tomography reconstruction via recurrent inference machines, 6371(2019).

    [266] J. Ho, A. Jain, P. Abbeel. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst., 33, 6840(2020).

    [267] Z. Luo et al. Image restoration with mean-reverting stochastic differential equations(2023).

    [269] S. Tong et al. Score-based generative models for photoacoustic image reconstruction with rotation consistency constraints(2023).

    [270] S. Dey et al. Score-based diffusion models for photoacoustic tomography image reconstruction, 2470(2024).

    [272] D. Waibel et al. Reconstruction of initial pressure from limited view photoacoustic images using deep learning, 196(2018).

    [273] H. Lan et al. Reconstruct the photoacoustic image based on deep learning with multi-frequency ring-shape transducer array, 7115(2019).

    [276] K. Shen et al. Physics-driven deep learning photoacoustic tomography. Fundam. Res.(2024).

    [281] C. Dehner et al. DeepMB: deep neural network for real-time model-based optoacoustic image reconstruction with adjustable speed of sound(2022).

    [282] C. Dehner et al. Deep model-based optoacoustic image reconstruction (DeepMB), 66(2024).

    [287] V. G. Andreev et al. Detection of optoacoustic transients with a rectangular transducer of finite dimensions, 153(2002).

    [288] S. A. Ermilov et al. Development of laser optoacoustic and ultrasonic imaging system for breast cancer utilizing handheld array probes, 28(2009).

    [310] J. Wei, K. Shen, C. Tian. Comparisons of filtered back-projection and time reversal algorithms in photoacoustic tomography, 68(2023).

    [311] S. Liu et al. Ultrafast Filtered Back-Projection for Photoacoustic Computed Tomography(2024).

    Tools

    Get Citation

    Copy Citation Text

    Chao Tian, Kang Shen, Wende Dong, Fei Gao, Kun Wang, Jiao Li, Songde Liu, Ting Feng, Chengbo Liu, Changhui Li, Meng Yang, Sheng Wang, Jie Tian, "Image reconstruction from photoacoustic projections," Photon. Insights 3, R06 (2024)

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Category: Review Article

    Received: Jul. 8, 2024

    Accepted: Aug. 28, 2024

    Published Online: Sep. 29, 2024

    The Author Email: Yang Meng (yangmeng_pumch@126.com), Wang Sheng (iamsheng2020@ustc.edu.cn), Tian Jie (tian@ieee.org)

    DOI:10.3788/PI.2024.R06

    Topics