Image reconstruction from photoacoustic projections

Chao Tian; Kang Shen; Wende Dong; Fei Gao; Kun Wang; Jiao Li; Songde Liu; Ting Feng; Chengbo Liu; Changhui Li; Meng Yang; Sheng Wang; Jie Tian

doi:10.3788/PI.2024.R06

1 Introduction

Photoacoustic tomography (PAT), also referred to as opto-acoustic tomography, is a cross-sectional or three-dimensional (3D) biomedical imaging technique. The physical foundation for PAT is the photoacoustic effect discovered in 1880 by Alexander Graham Bell^[1], who found that modulated light can excite sound waves in materials. Modern PAT typically uses a short-pulsed laser to illuminate biological tissues. The laser energy is absorbed by tissue chromophores and converted into heat. Under the conditions of stress confinement and thermal confinement, acoustic signals are generated due to the thermoelastic effect and are recorded by ultrasound detectors to recover the optical absorption property of tissues^[2–5]. By combining optical excitation and acoustic detection, PAT has the advantages of rich optical contrast and high ultrasonic resolution in deep tissues and has found unique applications in a range of biomedical fields^[6–9].

PAT has three major embodiments: photoacoustic computed tomography (PACT), photoacoustic microscopy (PAM), and photoacoustic endoscopy (PAE). Among them, PACT uses diffused light to illuminate biological tissues and can achieve hundred-micron-level imaging resolution at centimeter-level tissue depth^[10–13]. Analogous to X-ray computed tomography, high-quality image formation in PACT relies on sophisticated image reconstruction procedures, without which the details in an image may not be able to be properly resolved. In 1981, Bowen studied thermoacoustic imaging of soft tissues using ionizing radiation (high-energy electrons, X-ray photons, neutrons, and other charged particles) and non-ionizing radiation (radio waves, microwaves, and ultrasonic waves)^[14–16] and presented one-dimensional (1D) depth-resolved thermoacoustic signals from a human upper arm using radio-wave excitation. However, the results are only 1D depth-resolved and have no lateral resolution. In the mid-1990s, Oraevsky^[17–19] and Kruger^[20,21] independently studied laser-induced photoacoustic imaging of biological tissues and presented experimental 1D signals and two-dimensional (2D) scan images (see Fig. 1). The results are depth-resolved but poorly laterally resolved due to the lack of image reconstruction procedures. To obtain cross-sectional or 3D images resolved both axially and laterally, Kruger et al. in 1995 developed an approximate filtered back projection (FBP) image reconstruction algorithm for laser-induced PACT analogous to the FBP algorithm used in X-ray computed tomography^[22]. The results suggest that the approximate FBP algorithm helps resolve imaging targets both in the axial and lateral directions (see Fig. 1).

Figure 1.Key events in the development of PACT image reconstruction algorithms.

Download full size

View all figures

High-performance image reconstruction is critical to image formation in modern PACT imaging systems. Ever since the publication of the approximate FBP algorithm by Kruger and colleagues, several novel image reconstruction approaches have been proposed to reconstruct high-quality PACT images. According to whether the emerging deep learning technique is involved, image reconstruction approaches in PACT can be divided into two main categories: conventional image reconstruction methods and deep learning (DL)-based image reconstruction methods. Conventional image reconstruction methods do not involve deep learning and mainly include five classes of algorithms: FBP, delay and sum (DAS), series expansion (SE), time reversal (TR), and iterative reconstruction (IR).

FBP is a class of analytical image reconstruction algorithms originating from X-ray computed tomography that project filtered photoacoustic signals back to target regions according to the time of flight (TOF) of photoacoustic signals. Kruger adopted this idea and proposed an approximate version of FBP for image reconstruction in PACT in 1995, as mentioned previously^[22]. However, this algorithm can only achieve approximate image reconstruction. To reduce reconstruction error, Finch et al. developed an exact FBP algorithm for image reconstruction in a spherical detection geometry in 2004^[23], and Xu and Wang developed a universal FBP algorithm for exact image reconstruction in infinite planar, cylindrical, and spherical detection geometries in 2005^[24]. FBP algorithms are simple yet efficient and have been widely used for image reconstruction in PACT.

DAS-based reconstruction is essentially a simplified version of the FBP algorithm, which directly projects measured photoacoustic signals back to target regions without filtering. It is one of the most widely used beamforming methods in ultrasound imaging and can potentially be adapted for image reconstruction in PACT. In 1998, Hoelen et al. first used a simple weighted DAS algorithm for image reconstruction in PACT and achieved good reconstruction results^[25,26]. Several modified algorithms were subsequently developed to improve the reconstruction performance of DAS^[27–30]. DAS-based reconstruction algorithms are typically fast and easy to implement. However, they are intuitive and empirical image reconstruction techniques and can only achieve approximate image reconstruction.

SE is a class of image reconstruction algorithms that use mathematical series to represent the image to be reconstructed. In 2001, Köstli et al. derived an exact SE-based inverse formula in the Fourier domain for image reconstruction in a planar detection geometry^[31]. Kunyansky took a step further and derived SE reconstruction formulas for image reconstruction in circular, spherical, cylindrical, and cubic geometries^[32,33]. Compared with other image reconstruction algorithms in PACT, SE algorithms may be computationally efficient for certain detection geometries (e.g., planar geometry) due to the application of the fast Fourier transform (FFT) algorithm during computation^[31].

TR algorithms recover photoacoustic images by running a forward numerical acoustic propagation model backward and re-transmitting the photoacoustic signals measured by each detector in a temporally reversed order. In 2004, Xu and Wang presented an analytical TR model for image reconstruction in arbitrary closed detection geometries^[34]. Burgholzer et al. developed a numerical TR algorithm for image reconstruction in arbitrary detection geometries based on the finite difference method^[35]. To improve computational efficiency and image quality, Treeby et al. further developed a numerical k-space pseudospectral TR algorithm for image reconstruction in heterogeneous media^[34]. The TR algorithms can couple the acoustic properties of media (e.g., SOS, density, dispersion, and absorption) and can be used for image reconstruction in arbitrary closed detection geometries. Therefore, they are regarded as the “least restrictive” image reconstruction algorithms in PACT.

The last class of conventional image reconstruction methods is IR. It solves the image reconstruction problem by iteratively seeking the optimal solution that minimizes the error between measured projection data and the estimate from constructed mathematical models. In 2002, Paltauf et al. proposed the first IR method to improve image reconstruction quality under non-ideal imaging conditions and achieved 3D image reconstruction with reduced artifacts^[36]. After that, much research was conducted by different groups to improve the performance of IR, such as improving the computational accuracy of the system matrix in the model^[37,38], compensating for the response of detectors^[39,40], coupling the acoustic property of the media^[41], and accelerating the reconstruction process^[42,43]. Compared with other algorithms, IR algorithms are typically slow but can yield high-quality imaging results when only limited projection data are available. Therefore, they are more suitable for non-ideal imaging scenarios, such as limited-view imaging, sparse-view imaging, and imaging in acoustically heterogeneous media.

The second major category of image reconstruction approaches in PACT is fast-developing DL-based methods, which are inspired by the successful use of DL in a range of fields. DL-based methods typically train neural networks and use them to automatically transform input data into output photoacoustic images. Compared with conventional image reconstruction approaches, DL methods are more efficient and can handle more complicated scenarios. In 2018, Antholzer et al. first proposed a deep convolutional neural network (CNN)-based method for PACT image reconstruction under sparse-view sampling and opened new opportunities for intelligent image reconstruction in PACT^[44]. After that, a series of DL-based methods was developed for image reconstruction using both simulated and experimental datasets^[45–47]. State-of-the-art DL methods are powerful enough to achieve preprocessing in the data domain, postprocessing in the image domain, hybrid processing in both the data and image domains, learned IR, and direct image reconstruction from the data domain to the image domain^[48–50]. They have been used to address a range of image reconstruction problems in PACT, such as detector bandwidth expansion^[51,52], resolution enhancement^[53,54], artifact removal^[55], ultralow-laser-energy imaging^[56], reconstruction acceleration^[57], and reconstruction enhancement in sparse-view and limited-view imaging^{[44,46,47,58,59]}.

Over the past three decades, great achievements have been made in PACT image reconstruction. To date, there have been a few excellent reviews devoted to summarizing the research progress in this field^[60–63]. However, most review papers in the literature are from more than 10 years ago and cannot reflect the latest research advances^[60–62]. Moreover, some review papers cover only particular subjects of the field, such as IR reconstruction^[63,64] or DL-based reconstruction^{[48–50,65–68]}. There is a pressing need to prepare new work to systematically review the recent achievements in this field. For the above reasons, we prepared this review, which differs from other works in the following aspects. First, the review was prepared from a historical perspective. We surveyed the development of each type of reconstruction algorithm and put them into the context of the entire history of PACT. Second, the review is comprehensive. It not only contains the five classes of conventional reconstruction algorithms (FBP, DAS, SE, TR, and IR) but also covers the state-of-the-art DL-based reconstruction algorithms. Third, the review contains comparative studies. A dedicated part (Sec. 5) was prepared to compare the performance of each type of image reconstruction algorithm in terms of image reconstruction quality, reconstruction speed, and memory footprint. Finally, the review is expected to be beginner-friendly. The entire review contains 51 figures, 14 tables, 126 mathematical equations, and comparative studies of each algorithm and is easy for novices to understand.

The remainder of this review is organized as follows. Section 2 describes the forward problem of PACT, including photoacoustic signal generation, propagation, detection, and the mathematical foundation (i.e., the Radon transform) for image reconstruction. Section 3 reviews the basic principles of the five classes of conventional image reconstruction algorithms, namely, FBP, DAS, SE, TR, and IR. Section 4 surveys the DL-based image reconstruction methods from the perspective of preprocessing, postprocessing, hybrid processing, learned IR, and direct image reconstruction. Section 5 provides comparative studies of major image reconstruction algorithms in PACT. Section 6 highlights the major challenges and future directions for PACT image reconstruction. Finally, Sec. 7 offers concluding remarks. Figure 2, Table 1, and Table 2 list major topics, abbreviations, and symbols used in this review, respectively.

Figure 2.Major topics discussed in this review.

Download full size

View all figures

Table 1. Abbreviations Used in this Review

View table

View all Tables

Table 1. Abbreviations Used in this Review


Abbreviation	Meaning	Abbreviation	Meaning
1D	One-dimensional	2D	Two-dimensional
3D	Three-dimensional	CF	Coherence factor
CNN	Convolutional neural network	CNR	Contrast-to-noise ratio
CT	Computed tomography	DAS	Delay and sum
DMAS	Delay multiply and sum	DL	Deep learning
EIR	Electrical impulse response	FBP	Filtered back projection
FFT	Fast Fourier transform	GPU	Graphics processing unit
IR	Iterative reconstruction	MV	Minimum variance
PACT	Photoacoustic computed tomography	PAT	Photoacoustic tomography
ROI	Region of interest	SE	Series expansion
SIR	Spatial impulse response	SLSC	Short-lag spatial coherence
SNR	Signal-to-noise ratio	SOS	Speed of sound
TOF	Time of flight	TR	Time reversal

Table 2. Symbols Used in this Review

View table

View all Tables

Table 2. Symbols Used in this Review


Symbol	Meaning	Symbol	Meaning
Lowercase English letters
$b (r_{d}, t)$	Back-projection term	$b (r)$	The Kaiser–Bessel function
$d_{c}$	Characteristic size of the heated region	$d σ$	Element of a detection surface $S$
$d Ω$	Solid angle subtended by $d σ$	$f$	Frequency
$g (s, θ)$	Radon transform	$h_{EIR}$	EIR of a detector
${\tilde{h}}_{EIR}$	Fourier transform of the EIR of a detector	$h_{SIR}$	SIR of a detector
${\tilde{h}}_{SIR}$	Fourier transform of the SIR of a detector	$i$	Imaginary unit
$j_{n}$	The spherical Bessel function of the first kind of order $n$	$k_{x}$ , $k_{y}$ , $k_{z}$	Spatial wavenumbers in the $x$ , $y$ , and $z$ directions
$n$	A general variable	$p_{0} (x, y, z)$	Initial photoacoustic pressure (image to be reconstructed)
$p (r, t)$	Photoacoustic signal at position $r$ and time $t$	$p (r_{d}, t)$	Real photoacoustic signal measured by a detector
$p_{ideal} (r_{d}, t)$	Ideal photoacoustic signal measured by a detector	$s_{i} (t)$	Photoacoustic signal measured by the $i$ th detector at time $t$
$t$	Time	$u_{k} (r)$	Normalized eigenfunctions of the Dirichlet Laplacian
$v_{0}$	Sound speed
Uppercase English letters
$C_{p}$	Specific heat capacity at constant pressure	$C_{v}$	Specific heat capacity at constant volume
$F$	Optical fluence	$G (r_{d}, t; r_{s}, t_{s})$	The Green’s function
$H$	Heat deposited per unit volume	$H_{\| k \|}^{(1)}$	The Hankel function of the first kind of order $k$
$I_{n}$	The modified Bessel function of the first kind of order $n$	$J_{\| k \|}$	The Bessel function of the first kind of order $\| k \|$
$K$	Sampling length	$M$	Total number of detectors
$N$	Total number of image grid points	$N_{x}$ , $N_{y}$ , $N_{z}$	Numbers of image grids along the $x$ , $y$ , and $z$ axes
$P_{0} (k_{x}, k_{y}, k_{z})$	Spatial Fourier transform of $p_{0} (x, y, z)$	$P (r_{d}, ω)$	Temporal Fourier transform of $p (r_{d}, t)$
$P_{Ω}$	Poisson operator of harmonic extension	$R (x)$	Regularization
$S$	A detection surface	$S_{DAS}$	Image reconstructed by DAS
$T$	Temperature	$V$	Volume
$W (ω)$	Window function in the frequency domain
Lowercase Greek letters
$α_{0}$	Acoustic absorption coefficient	$α_{th}$	Thermal diffusivity
$β$	Thermal coefficient of volume expansion	$δ$	Dirac delta function
$ϕ (r_{d}, t)$	Velocity potential	$η_{th}$	Photothermal conversion efficiency
$η (r)$	Dispersion proportionality coefficient	$κ$	Isothermal compressibility
$λ_{m}^{2}$	Eigenvalues of the Dirichlet Laplacian	$μ_{a}$	Optical absorption coefficient
$ρ (r)$	Distribution of mass density	$ρ (r, t)$	Acoustic density
$ρ_{0} (r)$	Ambient density	$τ$	Laser pulse duration
$τ_{th}$	Thermal relaxation time	$τ_{s}$	Stress relaxation time
$τ (r)$	Absorption proportionality coefficient	$ω$	Angular frequency
$ψ (r)$	Expansion function
Uppercase Greek letters
$Φ_{λ_{k}}$	Free-space rotationally invariant Green’s function	$Γ$	Grüneisen parameter
$Ω$	Solid angle of a detection surface or domain defined by a detection surface	$Δ f$	Frequency sampling interval
$Δ p$	Change in pressure	$Δ t$	Temporal sampling interval
$Δ T$	Change in temperature	$Δ V$	Change in volume
Vectors or matrices
$n_{d}$	Unit normal vector of a detector surface pointing to a photoacoustic source	$p$	Photoacoustic signal in matrix form
$r = (x, y, z)$	Rectangular coordinates in space	$r = (R, φ, θ$ )	Spherical coordinates in space
$r_{d}$	Detector position	$r_{s}$	Photoacoustic source position
$u (r, t)$	Particle velocity	$x$	Photoacoustic image in matrix form
$A$	System matrix	$A^{†}$	Pseudo-inverse matrix of $A$
$A^{*}$	Adjoint matrix of $A$	$\tilde{A}$	The Fourier transform of $A$
$A^{H}$	Conjugate transpose of $A$	$A^{T}$	Transpose of $A$
$D$	Differential matrix	$\tilde{E}$	Fourier transform of the EIR of a detector
$G$	Spherical Radon transformation matrix	$I$	Identity matrix
$P$	The Fourier transform of $p$	$R$	Covariance matrix
Others symbols
$A$	Forward acoustic propagation operator	$A_{modify}^{TR}$	Modified TR operator
$F$	The Fourier transform	$F^{- 1}$	The inverse Fourier transform
$H (r, t)$	Heating function	$\nabla$	Nabla operator

It is worth pointing out that image reconstruction in PACT involves two aspects: acoustic inversion for determining the distribution of initial acoustic pressure and optical inversion for determining the map of optical absorption^[61,62,69]. This review focuses on the acoustic inversion problem, which is important for image formation in PACT. Moreover, the mathematical foundation for image reconstruction in PACT is equivalent to that in thermoacoustic tomography^[70–73]. Therefore, the image reconstruction algorithms discussed in this review can in principle be used for thermoacoustic tomography.

2 The Forward Problem

2.1 Photoacoustic Signal Generation

To achieve efficient photoacoustic signal excitation and generation, two conditions, i.e., thermal confinement and stress confinement, should be satisfied. The two conditions are related to two timescales, i.e., the thermal relaxation time $τ_{th}$ and the stress relaxation time $τ_{s}$ . The thermal relaxation time $τ_{th}$ characterizes the diffusion of heat from the region heated by laser pulses, while the stress relaxation time $τ_{s}$ describes the propagation of acoustic waves from the heated region. $τ_{th}$ is given by^[74] $τ_{th} = \frac{d_{c}^{2}}{α_{th}},$ (1)where $d_{c}$ is the characteristic dimension of the heated region and $α_{th}$ is the thermal diffusivity ( $m^{2} / s$ ). $τ_{s}$ is given by^[74] $τ_{s} = \frac{d_{c}}{v_{0}},$ (2)where $v_{0}$ is the speed of sound (SOS). If the laser pulse duration is less than the thermal relaxation time, i.e., $τ < τ_{th}$ , the thermal diffusion is negligible during laser heating, in which case the thermal confinement condition can be considered to be satisfied. Similarly, if the laser pulse duration is less than the stress relaxation time, i.e., $τ < τ_{s}$ , the pressure propagation can be ignored during pulse heating, in which case the stress confinement condition can be considered to be satisfied. In PACT, the thermal and stress confinement conditions should be satisfied simultaneously. For instance, assuming that in soft tissues, the SOS is 1500 m/s, the thermal diffusivity is $0.14 {mm}^{2} / s$ , and the dimension of the heated region is 50 µm, the thermal relaxation time $τ_{th}$ is calculated to be 18 ms, and the stress relaxation time $τ_{s}$ is 33 ns. This indicates that the laser used for imaging should have a pulse duration of less than 33 ns to achieve efficient photoacoustic signal excitation and generation.

In PACT, the local fractional volume expansion $Δ V / V$ induced by the photoacoustic effect can be described as^[74] $\frac{Δ V}{V} = β Δ T - κ Δ p,$ (3)where $Δ T$ and $Δ p$ represent the changes in temperature and pressure, respectively, $β$ is the thermal coefficient of volume expansion, and $κ$ is the isothermal compressibility, which can be written as $κ = \frac{C_{p}}{ρ v_{0}^{2} C_{v}},$ (4)where $ρ$ is the mass density and $C_{p}$ and $C_{v}$ are the specific heat capacities at constant pressure and volume, respectively.

Under the conditions of thermal and stress confinement, the fractional volume expansion in Eq. (3) is negligible (i.e., $Δ V / V = 0$ ). The initial photoacoustic pressure $p_{0} = Δ p$ and can be written as $p_{0} = \frac{β Δ T}{κ},$ (5)where the temperature rise can be calculated by^[75] $Δ T = \frac{η_{th} H}{ρ C_{v}} .$ (6)

Here $η_{th}$ is the percentage of the laser energy converted into heat, and $H$ is the heat energy deposited per unit volume, which is defined as the product of the absorption coefficient $μ_{a}$ and the optical fluence $F$ (i.e., $H = μ_{a} F$ ). The initial acoustic pressure $p_{0}$ can thus be rewritten as $p_{0} = (\frac{β v_{0}^{2}}{C_{p}}) η_{th} H = Γ η_{th} H,$ (7)where $Γ$ is the Grüneisen parameter, a dimensionless constant representing the efficiency of the conversion of heat to pressure. For water at body temperature (37°C), $Γ \approx 0.2$ , which indicates that 20% of the thermal energy deposited by a laser in water couples into acoustic energy.

The generation of photoacoustic signals can be illustrated using a numerical example. Assume that the laser employed for imaging has a fluence of $10 mJ / {cm}^{2}$ ( $F = 10 mJ / {cm}^{2}$ ), which is within the American National Standards Institute (ANSI) safety limit^[76], and the biological tissue being imaged has the following physical parameters: $Γ = 0.2$ , $μ_{a} = 0.1 {cm}^{- 1}$ , $ρ = 1.0 g / {cm}^{3}$ , and $C_{v} = 4.0 {Jg}^{- 1} K^{- 1}$ . The factor $η_{th}$ can be set to 1 because the fluorescence and nonthermal absorption of biological tissues are typically weak. In this way, the temperature rise $Δ T$ is calculated to be 0.25 mK and the initial acoustic pressure $p_{0}$ is 200 Pa, which indicates that a 1 mK temperature rise can produce an acoustic pressure rise of 800 Pa.

2.2 Photoacoustic Signal Propagation

2.2.1 The photoacoustic wave equations

The generation and propagation of photoacoustic waves can be mathematically modeled by the photoacoustic wave equation^[77]. Generally, three first-order equations, including the linearized equation of motion, the linearized equation of continuity, and the thermal elastic equation, can be used to model the properties of acoustic wave generation and propagation^[78]. If the medium is lossless and the thermal conductivity can be ignored, the three equations can be written as^[79] $\frac{\partial}{\partial t} u (r, t) = - \frac{1}{ρ (r)} \nabla p (r, t),$ (8) $\nabla \cdot u (r, t) = - \frac{1}{ρ (r) v_{0}^{2} (r)} \frac{\partial}{\partial t} p (r, t) + β \frac{\partial}{\partial t} Δ T (r, t),$ (9) $ρ (r) C_{p} \frac{\partial}{\partial t} T (r, t) = H (r, t),$ (10)where $u (r, t)$ is the particle velocity, $v_{0} (r)$ is the SOS, $ρ (r)$ is the mass density, $p (r, t)$ is the acoustic pressure at position $r$ and time $t$ , $T (r, t)$ and $Δ T (r, t)$ are the temperature and temperature rise, respectively, and $H (r, t)$ is the heating function that represents the thermal energy deposited per unit volume and per unit time. The second-order photoacoustic wave equation can be obtained by eliminating the variable $u (r, t)$ from the three equations as^[78,79] ${ρ (r) \nabla \cdot [\frac{1}{ρ (r)} \nabla] - \frac{1}{v_{0}^{2} (r)} \frac{\partial^{2}}{\partial t^{2}}} p (r, t) = - \frac{β}{C_{p}} \frac{\partial H (r, t)}{\partial t} .$ (11)

Equation (11) describes the relationship between the acoustic pressure $p (r, t)$ and the photoacoustic source associated with the heating function $H (r, t)$ . The source term [right side of Eq. (11)] is proportional to the first derivative of the heating function $H (r, t)$ , which indicates that the heating function $H (r, t)$ should be time-varying to achieve efficient photoacoustic signal generation.

When the medium is acoustically homogeneous, the photoacoustic wave equation in Eq. (11) reduces to^[78,79] $(\nabla^{2} - \frac{1}{v_{0}^{2}} \frac{\partial^{2}}{\partial t^{2}}) p (r, t) = - \frac{β}{C_{p}} \frac{\partial H (r, t)}{\partial t} .$ (12)

2.2.2 Forward Green’s function solution

The forward solution for the wave equation in a homogeneous medium in Eq. (12) can be obtained using Green’s function^[75,80] as $p (r_{d}, t) = \frac{β}{C_{p}} \int_{- \infty}^{+ \infty} \int_{V} G (r_{d}, t; r_{s}, t_{s}) \frac{\partial H (r_{s}, t_{s})}{\partial t_{s}} d r_{s} d t_{s},$ (13)where $p (r_{d}, t)$ is the acoustic pressure detected at position $r_{d}$ and time $t$ and $r_{s}$ and $t_{s}$ represent the location and time of the photoacoustic source, respectively. The Green’s function $G (r_{d}, t; r_{s}, t_{s})$ in an infinite 3D space has the following form: $G (r_{d}, t; r_{s}, t_{s}) = \frac{δ (t - t_{s} - | r_{d} - r_{s} | / v_{0})}{4 π | r_{d} - r_{s} |},$ (14)where $δ$ is the Dirac delta function.

Under the condition of stress confinement, the heating function can be decomposed as $H (r_{s}, t_{s}) = H (r_{s}) δ (t_{s})$ , where $H (r_{s})$ is the heat deposited per unit volume. In this way, Eq. (13) becomes $p (r_{d}, t) = \frac{β}{C_{p}} \int_{- \infty}^{+ \infty} \int_{V} G (r_{d}, t; r_{s}, t_{s}) H (r_{s}) δ^{'} (t_{s}) d r_{s} d t_{s},$ (15)where $δ^{'}$ is the derivative of the Dirac delta function. Using the property $\int δ^{'} (t - t_{0}) f (t) d t = - f^{'} (t_{0})$ , Eq. (15) becomes $p (r_{d}, t) = \frac{β}{C_{p}} {\int_{V} H (r_{s}) \frac{\partial G (r_{d}, t; r_{s}, t_{s})}{\partial t} d r_{s} |}_{t_{s} = 0} .$ (16)

Applying the relation in Eq. (7) ( $η_{th}$ is set to 1), $p_{0} (r_{s}) = Γ H (r_{s}) = (\frac{β v_{0}^{2}}{C_{p}}) H (r_{s}),$ (17)and substituting the Green’s function [Eq. (14)] into Eq. (16), we obtain the general forward solution of the photoacoustic wave equation, i.e., $p (r_{d}, t) = \frac{1}{4 π v_{0}^{2}} \frac{\partial}{\partial t} \int_{V} \frac{p_{0} (r_{s})}{| r_{d} - r_{s} |} δ (t - \frac{| r_{d} - r_{s} |}{v_{0}}) d r_{s} .$ (18)

The measured pressure $p (r_{d}, t)$ is associated with the velocity potential $ϕ (r_{d}, t)$ via^[81] $p (r_{d}, t) = - ρ \frac{\partial ϕ (r_{d}, t)}{\partial t} .$ (19)

The velocity potential can be written as $ϕ (r_{d}, t) = - \frac{1}{4 π v_{0}^{2} ρ} \int_{V} \frac{p_{0} (r_{s})}{| r_{d} - r_{s} |} δ (t - \frac{| r_{d} - r_{s} |}{v_{0}}) d r_{s} .$ (20)

Equations (18) and (20) indicate that the measured pressure $p (r_{d}, t)$ and velocity potential $ϕ (r_{d}, t)$ are linearly related to the initial acoustic pressure $p_{0} (r_{s})$ and are inversely proportional to the distance between the detector and the source, i.e., $| r_{d} r_{s} |$ .

2.2.3 Photoacoustic signal of a spherical absorber

The concept of acoustic pressure and velocity potential can be illustrated through a simple example^[81]. Suppose that the photoacoustic source is a uniform sphere and that the detector is a point detector. We examine the velocity potential and acoustic pressure received by the point detector in two situations. In the first situation, the point detector is located at the center of the photoacoustic source, as shown in Fig. 3(a). The velocity potential can be obtained as [Eq. (20)] $ϕ (r_{d}, t) = {\begin{matrix} - \frac{p_{0} (r_{s})}{v_{0} ρ} t, & if v_{0} t \leq r_{a}, \\ 0, & if v_{0} t > r_{a}, \end{matrix}$ (21)where $r_{a}$ is the radius of the spherical source. In the second situation, the point detector is located outside the photoacoustic source, as shown in Fig. 3(d). The velocity potential can be calculated as $ϕ (r_{d}, t) = {\begin{matrix} - \frac{p_{0} (r_{s})}{4 v_{0} ρ d} [{(d - v_{0} t)}^{2} - r_{a}^{2}], & if d - r_{a} \leq v_{0} t \leq d + r_{a}, \\ 0, & otherwise, \end{matrix}$ (22)where $d$ is the distance between the point detector and the center of the photoacoustic source. The acoustic pressure $p (r_{d}, t)$ in these two cases can be accordingly calculated using the relationship in Eq. (19).

Figure 3.Velocity potential and acoustic pressure generated from a 4-mm-diameter spherical source. The first and second rows show the results when the detector is located at the center of the source and is 10 mm away from the center of the source, respectively. (a), (d) Schematic diagrams showing the point detector and the spherical source. (b), (e) Negative velocity potentials at the point detector. (c), (f) Corresponding acoustic pressures.

Download full size

View all figures

To quantify the velocity potential and acoustic pressure measured by the point detector, we make the following assumptions: the radius of the spherical source is 4 mm ( $r_{a} = 4 mm$ ), the mass density of the medium is $1000 kg / m^{3}$ ( $ρ = 1000 kg / m^{3}$ ), the SOS is 1480 m/s ( $v_{0} = 1480 m/s$ ), the Grüneisen parameter at room temperature is 0.12 ( $Γ = 0.12$ ), the optical absorption coefficient $μ_{a}$ is $100 {cm}^{- 1}$ ( $μ_{a} = 100 {cm}^{- 1}$ ), and the optical fluence is $10 mJ / {cm}^{2}$ ( $F = 10 mJ / {cm}^{2}$ ). Therefore, the heat energy $H$ deposited at time zero is $10^{6} J / m^{3}$ , and the initial acoustic pressure $p_{0} (r_{s})$ [Eq. (7)] is $1.2 \times 10^{5} Pa$ .

Figures 3(b) and 3(c) show the velocity potential and acoustic pressure measured by the point detector in the first case (the point detector is located at the center of the source). The results show that the negative velocity potential $- ϕ (r_{d}, t)$ measured by the point detector first linearly increases as the shell of the integration [Eq. (20)] increases with time and then suddenly drops to zero due to no energy deposition outside the source. Consequently, the acoustic pressure $p (r_{d}, t)$ (i.e., the first derivative of the velocity potential) is a non-zero constant at the beginning and becomes zero with a negative impulse caused by the sudden drop of velocity potential to zero. Figures 3(e) and 3(f) show the velocity potential and acoustic pressure measured by the point detector in the second case (the point detector is located 10 mm away from the center of the source). The results show that the negative velocity potential $- ϕ (r_{d}, t)$ measured by the point detector first increases as the incremental hemispherical shell of the integration [Eq. (20)] advances to the center of the source, and then decreases with time as the shell advances to the rear of the source. Consequently, the acoustic pressure $p (r_{d}, t)$ is initially positive, passes through zero, then becomes negative, and has an N-shaped waveform.

Based on the example in Fig. 3, we can also deduce that the size of a photoacoustic source impacts the characteristics of the photoacoustic signals received by a detector in the time domain. To illustrate this, Fig. 4(a) shows three time-domain photoacoustic signals measured by a point detector 3 mm away from the centers of three uniform spherical sources with diameters of 1 mm, 200 µm, and 50 µm. The time-domain photoacoustic signals all have an N-shaped waveform as expected. However, the amplitude and duration of the photoacoustic signals are distinct. The photoacoustic signals produced by sources with a smaller size have smaller amplitudes and shorter durations, which correspond to higher frequencies and broader bandwidths in the frequency domain, as shown in Fig. 4(b). The center frequency of the photoacoustic signals $f_{c}$ generated by a spherical source can be estimated by $f_{c} = v_{0} / 3 r_{a}$ , where $r_{a}$ is the radius of the spherical source^[82]. This indicates that detectors with a high center frequency should be employed for high-frequency imaging.

Figure 4.Signals of spherical photoacoustic sources with different sizes and their Fourier spectra. (a) Time-domain N-shaped photoacoustic signals generated from three spherical sources with diameters of 1 mm, 200 µm, and 50 µm. (b) Normalized Fourier spectra of the corresponding photoacoustic signals. Reprinted from Ref. [82] with permission.

Download full size

View all figures

2.2.4 Photoacoustic field visualization: the k-Wave toolbox

Numerical simulation of the forward propagation of photoacoustic fields helps visualize the sound fields in complex media and solve the acoustic inversion problem in PACT. Numerical simulation can be implemented using the powerful open-source k-Wave toolbox^[83], which was developed by the Photoacoustic Imaging Group at University College London (UCL). In the k-Wave toolbox, the propagation of photoacoustic fields is modeled by three coupled first-order partial differential equations^[84], which are fundamentally the same as the three equations in Eqs. (8 )–(10). The propagation model considers the acoustic properties of media, such as acoustic speed, dispersion, and absorption, and can characterize the acoustic propagation problem in heterogeneous media. The k-Wave toolbox solves the propagation model via a k-space pseudospectral method^[83,85,86], which can perform fast and accurate computations with reduced memory. Figure 5 is an example showing the propagation of the photoacoustic fields of a 2D disk and a 3D sphere using the k-Wave toolbox. The results visualize the instant characteristics of the photoacoustic fields during propagation.

Figure 5.Photoacoustic field visualization using the k-Wave toolbox. (a) Propagation of photoacoustic fields generated from a 2D disk (diameter: 2 mm). Red: positive pressure; blue: negative pressure. (b) Propagation of photoacoustic fields generated from a 3D sphere (diameter: 2 mm). For observation, negative pressure is not displayed in this case.

Download full size

View all figures

2.3 Photoacoustic Signal Detection

The photoacoustic signals propagating outward from a photoacoustic source need to be captured by ultrasound detectors for image formation. Ideally, for photoacoustic signal detection, detector arrays that can perfectly record original signals in time and space should be used. However, perfect detector arrays never exist in reality. The characteristics of a practical detector array, such as detector aperture, detector bandwidth, detector number, and view angle, impact detected photoacoustic signals and final images.

Aperture and bandwidth are two fundamental characteristics of an ultrasound detector and have important impacts on measured photoacoustic signals. An ideal ultrasound detector should have a point-like aperture, in which way it has an omnidirectional response. Moreover, an ideal ultrasound detector should also have an infinite bandwidth so that it can respond to all frequency contents of a signal. Neither of the two conditions, however, is attainable in practice. A practical ultrasound detector always has a finite aperture size and a finite bandwidth. The finite aperture averages photoacoustic signals in space, resulting in a smoothed spatial impulse response (SIR), while the finite bandwidth affects the conversion of photoacoustic signals to electrical signals, leading to a degraded electrical impulse response (EIR). Taking the SIR and EIR of an ultrasound detector into account, the photoacoustic signal measured by a practical ultrasound detector can be mathematically formulated as^[4] $p (r_{d}, t) = p_{ideal} (r_{d}, t) * h_{SIR} (r_{d}, r_{s}, t) * h_{EIR} (t),$ (23)where $p_{ideal} (r_{d}, t)$ is the ideal photoacoustic signal, $*$ represents temporal convolution, and $h_{SIR} (r_{d}, r_{s}, t)$ and $h_{EIR} (t)$ represent the SIR and EIR of a detector, respectively. The non-ideal SIR and EIR of an ultrasound detector distort photoacoustic signals measured in the time domain, as illustrated in Fig. 6, which eventually degrades the image quality of photoacoustic images.

$Effects of the SIR and EIR of a detector on photoacoustic signals. (a) Schematic diagram showing a photoacoustic source and a circular detector array. The photoacoustic source is a 1-mm-diameter sphere and is 10 mm away from the center of the detector array, which has a radius of 25 mm. (b) Effect of the finite aperture size (SIR) on the photoacoustic signals. In this case, the height and width of each detector are set to 10 and 5 mm, respectively. (c) Effect of finite bandwidth on the photoacoustic signals. In this case, the center frequency and fractional bandwidth of the detector are set to 1 MHz and 100%, respectively.$

Figure 6.Effects of the SIR and EIR of a detector on photoacoustic signals. (a) Schematic diagram showing a photoacoustic source and a circular detector array. The photoacoustic source is a 1-mm-diameter sphere and is 10 mm away from the center of the detector array, which has a radius of 25 mm. (b) Effect of the finite aperture size (SIR) on the photoacoustic signals. In this case, the height and width of each detector are set to 10 and 5 mm, respectively. (c) Effect of finite bandwidth on the photoacoustic signals. In this case, the center frequency and fractional bandwidth of the detector are set to 1 MHz and 100%, respectively.

Download full size

View all figures

In addition to the aperture and bandwidth of a detector, the detector number and view angle of a detector array also play important roles in photoacoustic signal detection. Ideally, the number of detectors in a detector array used for photoacoustic signal measurement should meet the spatial Nyquist sampling theorem for complete signal acquisition^[4]. The view angle of a detector array should be $4 π$ steradian (full 3D view) to record complete photoacoustic signals in 3D space. However, these two conditions are actually unattainable in practice due to the high fabrication cost of a large number of detectors and the configuration constraints of an imaging system (e.g., a separate space is required for laser illumination). Violating these conditions leads to the problems of image reconstruction from sparse-view and limited-view projections, which are mathematically challenging and will be discussed later in this review. Figure 7 showcases commonly used 2D and 3D detection geometries in PACT^[87,88], which have limited view angles except for the closed spherical array.

Figure 7.Commonly used detection geometries in PACT. (a) Linear array. (b) Curved array. (c) Circular array. (d) Planar array. (e) Cylindrical array. (f) Hemispherical array. (g) Closed spherical array. Reprinted from Refs. [87,88] with permission.

Download full size

View all figures

2.4 Radon Transform

The forward signal propagation and detection processes in PACT can be described by the well-known Radon transform^[89], which is the mathematical foundation of computed tomography (CT).

Before discussing the Radon transform in PACT, we first introduce the linear Radon transform in X-ray CT, which is defined as the integral of a function along a straight line. Specifically, the linear Radon transform in X-ray CT is written as^[90] $g (s, θ) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x, y) δ (x \cos θ + y \sin θ - s) d x d y,$ (24)where $f (x, y)$ is the original function, $g (s, θ)$ is the sinogram or projection data of the function $f (x, y)$ along the straight line defined in the Dirac delta function $δ$ , and $(s, θ)$ are two parameters of the normal equation of the straight line in the delta function. The inverse Radon transform reverses the forward process and recovers the original function $f (x, y)$ from measured sinograms $g (s, θ)$ . A schematic representation of the linear Radon transform is shown in Fig. 8(a).

Figure 8.Forward and inverse Radon transforms in X-ray CT and PACT. (a) Linear Radon transform and its inverse in X-ray CT. (b), (c) Circular and spherical Radon transforms and their inverses in PACT.

Download full size

View all figures

In PACT and thermoacoustic tomography, the Radon transform integrates a function along a circle (2D) or a sphere (3D) instead of a straight line. The 2D circular Radon transform can be written as^[91] $g (s, θ) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x, y) δ [s - \sqrt{{(x - r \cos θ)}^{2} + {(y - r \sin θ)}^{2}}] d x d y,$ (25)where $(r, θ)$ represents the position of the detector in a polar coordinate system and $s$ is the radius of the integral circle. Similarly, the 3D spherical Radon transform $g (s, θ)$ can be mathematically written as^[92] $g (s, θ) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x, y, z) δ [s - \sqrt{{(x - r \sin φ \cos θ)}^{2} + {(y - r \sin φ \sin θ)}^{2} + {(z - r \cos φ)}^{2}}] d x d y d z,$ (26)where ( $r$ , $φ$ , $θ$ ) represents the position of the detector in a spherical coordinate system and $s$ is the radius of the spherical shell to be integrated. Schematic representations of the circular and spherical Radon transforms are shown in Figs. 8(b) and 8(c).

Using the definition in Eq. (26), the spherical Radon transform in PACT can be obtained from Eq. (18) and has the following form: $g (r_{d}, t) = \int p_{0} (r_{s}) δ (t - \frac{| r_{d} - r_{s} |}{v_{0}}) d r_{s} .$ (27)

Equation (27) is in the form of the spherical Radon transform [Eq. (26)], representing the integral over a spherical shell with a radius of $v_{0} t$ centered at the detector position $r_{d}$ . The projection data $g (r_{d}, t)$ are related to the measured photoacoustic signal $p (r_{d}, t)$ via $g (r_{d}, t) = 4 π v_{0}^{2} | r_{d} - r_{s} | \int_{0}^{t} p (r_{d}, τ) d τ .$ (28)

The projection data $g (r_{d}, t)$ have a similar definition as the velocity potential in Eq. (19) and can be calculated from the measured photoacoustic signal $p (r_{d}, t)$ . The task of image reconstruction in PACT is to find the inverse spherical Radon transform and recover the original function $f (x, y)$ or $p_{0} (r_{s})$ from the measured photoacoustic signal $p (r_{d}, t)$ , which is the focus of the following sections.

3 Conventional Approaches

3.1 DAS-Type Algorithms

Delay and sum (DAS)-based beamforming is a commonly used image reconstruction technique in ultrasound imaging^[93]. Due to similar image formation processes, DAS-based beamforming is also widely used in PACT, where it reconstructs an image by summing the delayed raw photoacoustic signals of each detector. The time delay between each channel is calculated according to the acoustic TOF of the photoacoustic signals from the point of interest to each ultrasound detector. To yield high-quality images, preprocessing of raw photoacoustic signals and/or postprocessing of reconstructed images are usually needed. Here we review the five most commonly used DAS-type image reconstruction algorithms: DAS, delay multiply and sum (DMAS), short-lag spatial coherence (SLSC), minimum variance (MV), and coherence factor (CF), which adopt different preprocessing and/or postprocessing strategies. The basic workflows of these methods are first illustrated in Fig. 9, and detailed descriptions are presented below.

Figure 9.Workflows of different DAS-based image reconstruction algorithms in PACT. (a) DAS. (b) DMAS. (c) SLSC ( $M = 3$ , $L = 1$ , $l = 1$ , $t_{1} = t_{2} = t$ ). (d) MV. (e) CF-DAS; sign denotes the signum function and sqrt denotes the square root.

Download full size

View all figures

3.1.1 Delay and sum

DAS is the most basic beamforming algorithm in ultrasound imaging due to its simplicity, speed, and robustness^[94–96]. Hoelen et al. introduced a DAS method for 3D PACT imaging of blood vessels based on a planar detection geometry in 1998^[26,97]. Feng et al. applied a DAS method in linear-scanning thermoacoustic tomography in 2001^[98]. The DAS method can be mathematically formulated as $S_{DAS} (x, z) = \sum_{i = 1}^{M} s_{i} (t),$ (29)where $S_{DAS}$ is the reconstructed image, $s_{i} (t)$ is the photoacoustic signal measured by the $i$ th detector at time $t$ , $M$ is the total number of detectors, and $(x, z)$ represents the position in a coordinate system. The variable $t$ in $s_{i} (t)$ denotes the TOF of the photoacoustic signals from position $(x, z)$ to the $i$ th detector. The workflow and principle of the DAS algorithm are illustrated in Figs. 9(a) and 10, respectively.

Figure 10.Principle of DAS-based image reconstruction.

Download full size

View all figures

The performance of the DAS algorithm was evaluated using a designed numerical phantom with multiple point sources equally distributed along the longitudinal centerline [Fig. 11(a)]. In the evaluation, a linear detector array with a width of 200 mm and 128 elements was placed at the top of the phantom to receive the photoacoustic signals generated from the sources. Figure 11(b) shows the image reconstructed by the DAS method. The results show that DAS can successfully reconstruct the structural information of photoacoustic sources but fails to reproduce the correct amplitude. The reconstructed image has both positive and negative components and is bipolar in magnitude. A unipolar image can be obtained by finding the envelope of the bipolar image, as shown in Fig. 11(c). Log transformation can be further used to improve the contrast of the reconstructed image, as shown in Fig. 11(d). Since DAS treats the delayed photoacoustic signals $s_{i} (t)$ from different detectors equally and is non-adaptive, significant side lobes are present in the reconstructed image in Fig. 11(d), which degrades the spatial resolution of the image.

Figure 11.An example showing DAS-based image reconstruction in PACT. (a) Ground truth. (b) Image reconstructed by DAS. (c) Envelope of (b). (d) Log transform of (c). The detector array is at the top of the image.

Download full size

View all figures

3.1.2 Delay multiply and sum

Delay multiply and sum (DMAS) is a variant of the DAS algorithm that can provide improved contrast, signal-to-noise ratio (SNR), and resolution. Similar to DAS, DMAS also sums photoacoustic signals measured by different detectors according to calculated time delays. However, the measured photoacoustic signals in DMAS need to be combinatorically coupled and multiplied before summation. Therefore, DMAS is essentially a nonlinear algorithm.

The DMAS algorithm was first proposed by Lim et al. for confocal microwave detection of breast cancer in 2008 and showed improved identification of embedded malignant tumors in a variety of numerical breast phantoms compared with DAS^[99]. In 2015, Matrone et al. modified this method for B-mode ultrasound imaging and demonstrated that DMAS can provide higher contrast resolution than DAS^[100]. Inspired by the success of DMAS in confocal microwave imaging and ultrasound imaging, several research groups worldwide have conducted in-depth studies of DMAS-based photoacoustic image reconstruction. For example, Alshaya et al. introduced DMAS to the field of PACT in 2016 and proposed a filter DMAS to improve the SNR of reconstructed images^[101]. To further improve the performance of DMAS, in 2018, Mozaffarzadeh et al. developed a double-stage DMAS algorithm that can produce images with improved quality compared with DAS and DMAS but at the expense of greater computational cost^[27]. In the same year, Kirchner et al. proposed a signed DMAS algorithm that can provide linear image reconstruction with increased image quality^[102]. In 2022, Mulani et al. presented a high-order DMAS method, in which multi-term (e.g., three, four, or five terms) multiplication is used to replace two-term multiplication in the original DMAS algorithm^[103].

Generally, the original DMAS algorithm can be mathematically written as^[100] $S_{DMAS} (x, z) = \sum_{i = 1}^{M - 1} \sum_{j = i + 1}^{M} x_{i j} (t),$ (30)where $S_{DMAS}$ is the reconstructed image, $M$ is the total number of detectors, and $x_{i j} (t)$ is given by $x_{i j} (t) = sign [s_{i} (t_{i}) s_{j} (t_{j})] \sqrt{| s_{i} (t_{i}) s_{j} (t_{j}) |},$ (31)where $sign (\cdot)$ denotes the signum function. The workflow of the DMAS algorithm is illustrated in Fig. 9(b). If the center frequency of the photoacoustic signals $s_{i} (t)$ and $s_{j} (t)$ is $f_{0}$ , the multiplication operation in Eq. (31) shifts the center frequency of the resultant signal to 0 and $2 f_{0}$ . As a result, the output signal $S_{DMAS}$ needs to be filtered by a bandpass filter centered at $2 f_{0}$ to extract the second harmonic component $2 f_{0}$ while removing the direct current (DC) component. Therefore, DMAS is also called filtered-DMAS. Compared with DAS, DMAS uses both the amplitude and spatial correlation of delayed photoacoustic signals from different detectors for image reconstruction. For this reason, it can partially overcome the drawbacks of DAS and reconstruct images with improved spatial resolution and reduced side lobes^[100].

A downside of DMAS is that the combinational multiplication in the algorithm increases the computational complexity. To solve this problem, in 2019, Jeon et al. proposed a DMAS algorithm with a modified CF, which avoids combinatorial multiplication in DMAS and significantly reduces the total number of multiplication operations^[104]. In 2022, Paul et al. proposed a simplified-delay-multiply-and-standard-deviation (SDMASD) method^[105], which is based on the measurement of the standard deviation of delayed and multiplied signals instead of normal delayed signals. Compared with native DAS and DMAS algorithms, the SDMASD algorithm can achieve real-time imaging using graphics processing units (GPUs) and produce improved image quality.

3.1.3 Short-lag spatial coherence

Short-lag spatial coherence (SLSC) is a beamforming technique that was initially developed by Lediju et al. for ultrasound imaging in 2011^[106]. SLSC reconstructs an ultrasound image by calculating the spatial coherence of measured signals, and the reconstructed image is thus independent of the amplitude of the signals. As a result, SLSC can eliminate adverse effects caused by the different strengths of scatterers in ultrasound imaging. It has been demonstrated that compared with the conventional DAS beamforming algorithm, SLSC can achieve image reconstruction with considerable improvements in terms of resolution, contrast-to-noise ratio (CNR), and SNR.

In PACT, Muyinatu Bell et al. applied the SLSC algorithm to achieve imaging of prostate brachytherapy seeds^[28,107] and demonstrated that the SLSC algorithm can enhance photoacoustic image quality compared with DAS, especially when the intensity of laser illumination is insufficient. Graham and Muyinatu Bell later developed a spatial coherence theory based on the van Cittert–Zernike theorem, a classical theorem in statistical optics, to explore the strengths and limitations of the SLSC algorithm^[108,109]. In 2021, Mora et al. combined SLSC with DMAS and proposed a generalized spatial coherence algorithm for PACT, which can preserve relative signal magnitudes and improve the CNR and SNR of a reconstructed image^[110].

Similar to DMAS, the photoacoustic signals in SLSC are also first delayed according to the TOF from the point of interest to each ultrasound detector and then combinatorically coupled and multiplied before summation. The SLSC algorithm can be formulated as^[106] ${\hat{R}}_{p} (l) = \frac{1}{M - l} \sum_{i = 1}^{M - l} \frac{\sum_{t = t_{1}}^{t_{2}} s_{i} (t) s_{i + l} (t)}{\sqrt{\sum_{t = t_{1}}^{t_{2}} s_{i}^{2} (t) \sum_{t = t_{1}}^{t_{2}} s_{i + l}^{2} (t)}},$ (32)where ${\hat{R}}_{p} (l)$ is the normalized spatial coherence of the signals measured by a detector, $M$ is the total number of detectors, and $l$ represents the number of intervals between two detectors used to calculate the spatial coherence and is called the lag. The final SLSC image is obtained by summing all ${\hat{R}}_{p} (l)$ terms, i.e., $S_{SLSC} (x, z) = \sum_{l = 1}^{L} {\hat{R}}_{p} (l),$ (33)where $L$ is the total number of lags. A large $L$ helps improve the lateral resolution but decreases the CNR and SNR. Therefore, the value of $L$ needs to be elaborately selected to achieve the best tradeoff between key image quality metrics, such as lateral spatial resolution, CNR, and SNR. The workflow of the SLSC algorithm is illustrated in Fig. 9(c).

3.1.4 Minimum variance

Minimum variance (MV) is a weighted DAS method that was originally devised by Capon in 1969 for narrowband signal processing applications such as sonar, radar, and wireless communication^[111]. In 2002, Mann and Walker used a constrained adaptive beamformer (the MV method) for medical ultrasound imaging^[112] and demonstrated its effectiveness in spatial resolution and contrast enhancement. Due to the remarkable improvement in spatial resolution, a large number of MV-based methods have been developed for ultrasound imaging^[113–116].

MV-based image reconstruction has also been studied in PACT in recent years^{[30,117–120]}. The MV reconstruction formula can be written as^[114,116] $S_{MV} (x, z) = \sum_{i = 1}^{M} w_{i} (t) s_{i} (t) = w {(t)}^{H} s (t),$ (34)where $M$ is the total number of detectors, $w_{i} (t)$ is the optimal weight for the photoacoustic signal $s_{i} (t)$ measured by the $i$ th detector, $s (t)$ is a vector containing delayed photoacoustic signals from all detectors, $w (t)$ is a vector containing optimal weights for the delayed photoacoustic signals in $s (t)$ , and the superscript $H$ denotes the conjugate transpose. The workflow of the MV algorithm is illustrated in Fig. 9(d). The weight $w (t)$ can be adaptively calculated by minimizing the variance of the output $S_{MV}$ while maintaining the unit signal gain at the focal imaging point. The optimization problem can be mathematically written as^[114] $\min_{w} w^{H} R w, subject to w^{H} a = 1,$ (35)where $R = E [s s^{H}]$ is the spatial covariance matrix of the delayed photoacoustic signals $s (t)$ , $E$ denotes the expectation, and $a$ is the equivalent of the steering vector in narrowband applications^[114,116]. When the photoacoustic signals of each detector are delayed, $a$ is a simple all-one vector. The optimization problem in Eq. (35) can be solved by the method of Lagrange multipliers^[121], which gives $w = \frac{R^{- 1} a}{a^{H} R^{- 1} a} .$ (36)

By substituting Eq. (36) into Eq. (34), MV-based image reconstruction can be achieved.

To improve the robustness of the MV method, the covariance matrix $R$ can be calculated based on a spatial smoothing strategy, where a detector array is divided into a group of overlapping subarrays, and the covariance matrices of all subarrays are calculated and averaged to form the final covariance matrix^[114]. In this way, the covariance matrix $R$ is given as^[114] $R = \frac{1}{N_{d} - L + 1} \sum_{l = 1}^{N_{d} - L + 1} s_{l} (t) s_{l} {(t)}^{H},$ (37)where $L$ is the number of detectors in a subarray and $l$ is the index of the detector in the current subarray. To estimate the covariance matrix more accurately and enhance the contrast of MV, temporal averaging of multiple samples can be used together with spatial smoothing^[116]. Once the covariance matrix $R$ is estimated, the optimal weights $w$ can be obtained by Eq. (36). The final image reconstructed by the MV algorithm can be written as^[114] $S_{MV} (x, z) = \frac{1}{N_{d} - L + 1} \sum_{l = 1}^{N_{d} - L + 1} w {(t)}^{H} s_{l} (t) .$ (38)

Although the MV method can produce narrower main lobes (i.e., the lobe located at the target) and higher spatial resolution than algorithms such as DAS and DMAS, its performance in terms of side lobe (i.e., the lobes adjacent to the main lobe) suppression and image contrast enhancement is limited. Therefore, many studies have been devoted to the development of MV-based hybrid beamforming algorithms. In 2008, Park et al. imposed additional CF weights on MV and achieved enhanced spatial resolution, contrast, and side lobe suppression^[117]. In 2018, Mozaffarzadeh et al. developed an MV-based DMAS method^[118] and an eigenspace-based MV method combined with DMAS for resolution improvement and side lobe reduction^[119]. In 2021, Paul et al. proposed an adaptive-weighting-based MV to address the side lobe issue in MV beamformed images. It was demonstrated that the weighted MV approach can improve SNR while reducing main lobe width and side lobe intensity and has the potential for use in PACT imaging systems with a limited number of ultrasound detectors^[30].

3.1.5 Coherence factor

The coherence factor (CF) is a pixel-level adaptive weighting factor that can improve the performance of DAS-based beamforming methods in side lobe suppression and spatial resolution improvement. CF was originally proposed by Mallart and Fink in 1994 for the evaluation of phase aberration correction techniques in scattering media^[122] and was later used as an image quality metric for ultrasound imaging by Hollman et al.^[123]. In 2003, Li et al. presented a generalized CF for ultrasound beamforming in heterogeneous media and showed that the combination of generalized CF and DAS-based beamformers could yield improved image quality^[124].

In PACT, Liao et al. incorporated CF into DAS in 2004 and demonstrated the superiority of the CF-weighted DAS method in improving spatial resolution and SNR compared with DAS^[29]. Wang and Li considered the local SNR in the formulation of CF and developed an SNR-dependent CF for ultrasound and photoacoustic imaging in 2014^[125]. Wang et al. integrated CF with a focal-line-based image formation method to improve the contrast and elevational resolution of 3D photoacoustic imaging in 2016^[126]. Mozaffarzadeh et al. proposed a high-resolution CF weighting technique and achieved improved resolution, SNR, and side lobe suppression in 2019^[127]. Paul et al. considered the noise level variations of raw ultrasound data in the formulation of CF and achieved improvements in image resolution, side lobe reduction, SNR, and contrast in 2021^[128]. In the same year, Mukaddim and Varghese extended CF from the spatial domain to the spatiotemporal domain^[129]. This extension helps cancel out signals with low spatial and temporal coherence and results in higher background noise cancellation while preserving the main features of interest in reconstructed images.

Mathematically, the CF is defined as the ratio of the coherent sum of photoacoustic signals across detectors to the incoherent sum and can be formulated as^[123] $CF (x, z) = \frac{{| \sum_{i = 1}^{M} s_{i} (t) |}^{2}}{M \sum_{i = 1}^{M} {| s_{i} (t) |}^{2}},$ (39)where $s_{i} (t)$ is the photoacoustic signal measured by the $i$ th detector at time $t$ and $M$ is the total number of detectors. The value of the CF ranges from zero to one. A large value means that the signals at that point are highly focused and that the point can be reconstructed with high quality. In contrast, a small CF value indicates that the signals are weakly focused and will result in lower image quality. The CF-weighted DAS method can be expressed as $S_{CF-DAS} (x, z) = CF (x, z) S_{DAS} (x, z),$ (40)where $S_{CF - DAS} (x, z)$ is the reconstructed image. The term $S_{DAS} (x, z)$ can be replaced by the outputs of other beamforming methods, such as DMAS, SLSC, and MV. The workflow of the CF algorithm is illustrated in Fig. 9(e).

Each beamforming method mentioned above has advantages and disadvantages. To improve the performance of image reconstruction in PACT, they can be combined to yield hybrid beamforming methods such as DMASD plus DAS/DMAS^[105], SLSC plus filter DMAS^[110], MV plus DMAS^[118,119], CF plus DMAS^[104,130], and CF plus MV^{[117,127,131]}.

Representative work on DAS-based image reconstruction in PACT is summarized in Table 3. For completeness, Table 3 also includes relevant work in ultrasound or microwave imaging.

Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT

View table

View all Tables

Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT


Method	Author	Year	Variant	Source
DAS	Ma et al.	2020	Multiple DAS with enveloping	[132]
Hoelen et al.	2000	Modified DAS	[97]
Hoelen et al.	1998	Modified DAS	[25,26]
DMAS	Mulani et al.	2022	High-order DMAS	[103]
Jeon et al.	2019	CF-weighted DMAS	[104]
Mozaffarzadeh et al.	2018	Double-stage DMAS	[27]
Kirchner et al.	2018	Signed DMAS	[102]
Alshaya et al.	2016	Filter DMAS	[101]
Lim et al.	2008	DMAS (microwave imaging)	[99]
SLSC	Graham et al.	2020	Photoacoustic spatial coherence theory for SLSC	[109]
Bell et al.	2013	SLSC (for PACT)	[28]
Lediju et al.	2011	SLSC (for ultrasound)	[106]
MV	Asl & Mahloojifar	2009	Modified MV (for ultrasound)	[116]
Synnevag et al.	2007	MV (for ultrasound)	[114]
Mann & Walker	2002	Constrained adaptive beamformer	[112]
CF	Mao et al.	2022	Spatial coherence + polarity coherence	[133]
Mukaddim et al.	2021	Spatiotemporal CF	[129]
Paul et al.	2021	Variational CF	[128]
Wang et al.	2014	SNR-dependent CF	[125]
Liao et al.	2004	CF-weighted DAS	[29]
Li et al.	2003	Generalized CF	[124]
Mallart & Fink	1994	CF	[122]
Hybrid	Paul et al.	2022	SDMASD + DAS/DMAS	[105]
Mora et al.	2021	SLSC + Filter DMAS	[110]
Mozaffarzadeh et al.	2019	CF + MV	[127,131]
Mozaffarzadeh et al.	2018	CF + DMAS	[104,130]
Mozaffarzadeh et al.	2018	MV + DMAS	[118,119]

3.2 Filtered Back Projection

DAS-type algorithms can achieve approximate photoacoustic image reconstruction and are inexact reconstruction techniques. To achieve exact image reconstruction, advanced algorithms are needed. Filtered back projection (FBP) is a class of algorithms that are based on the Radon transform (see Sec. 2.4). It achieves image reconstruction by first filtering measured photoacoustic signals and then back-projecting the filtered signals into the image domain. The back-projection operation in FBP is similar to the reconstruction procedure in DAS-type algorithms. Therefore, the native DAS algorithm can be regarded as a simplified version of FBP, which achieves image reconstruction by directly back-projecting original photoacoustic signals into the image domain.

3.2.1 Approximate filtered back projection

Early FBP algorithms were developed based on the condition of far-field approximation^[35], which states that if the distance between a detector and a photoacoustic source is much greater than the size of the photoacoustic source itself (i.e., $| r_{s} - r_{d} | ≫ d$ , $d$ : source size), the approximation $| r_{d} - r_{s} | \approx | r_{d} - r_{s} \cdot (r_{d} / | r_{d} |) |$ holds (see Fig. 12). Under the condition of far-field approximation, the photoacoustic integral over a spherical shell can be approximated by the integral over its tangential plane. Consequently, image reconstruction in PACT can be achieved by inverting the linear Radon transform. Kruger et al. proposed an approximate FBP algorithm in 1995^[22], which is the first formal image reconstruction method in PACT. Xu and Wang developed approximate FBP algorithms from a more rigorous perspective in 2002^[134,135] and 2003^[136]. They deduced exact solutions to the image reconstruction problem and proposed approximate time-domain FBP algorithms for circular^[134], spherical^[135], planar, and cylindrical detection geometries^[136].

Figure 12.Schematic diagram showing the signal detection and image reconstruction geometry in FBP. The forward problem and the image reconstruction problem in PACT correspond to the spherical Radon transform and its inverse, respectively. Under the condition of the far-field approximation, the integral over a spherical shell can be approximated by the integral over its tangential plane.

Download full size

View all figures

The preceding approximate FBP algorithms were generally developed for specific detection geometries. In 2007, Burgholzer and Matt extended the approximate FBP algorithms for image reconstruction in arbitrarily closed detection geometries under the assumption of the far-field approximation. The extended FBP reconstruction formula can be written as^[35] $p_{0} (r_{s}) ≅ \frac{1}{4 π} \int_{S} b (r_{d}, t) δ (t - \frac{| r_{s} - r_{d} |}{v_{0}}) d Ω,$ (41)where $p_{0} (r_{s})$ is the reconstructed photoacoustic image, $δ$ is the Dirac delta function, $b (r_{d}, t)$ is the back-projection term given by $b (r_{d}, t) = - 2 t \frac{\partial p (r_{d}, t)}{\partial t},$ (42)and $d Ω$ is the solid angle subtended by the element $d σ$ of a detection surface and can be calculated by $d Ω = \frac{d σ}{{| r_{s} - r_{d} |}^{2}} (n_{d} \cdot \frac{r_{s} - r_{d}}{| r_{s} - r_{d} |}) .$ (43)Here $n_{d}$ is the unit normal vector of the detector surface pointing to the region of interest (ROI).

The time-domain first derivative in Eq. (42) can be interpreted in the frequency domain as $\frac{\partial p (r_{d}, t)}{\partial t} = F^{- 1} {i ω F {p (r_{d}, t)}},$ (44)where $F$ and $F^{- 1}$ denote the forward and inverse Fourier transforms, respectively; $ω$ is a ramp filter, which suppresses the low-frequency contents of the measured photoacoustic signal $p (r_{d}, t)$ and amplifies high-frequency information. Since the value of $ω$ extends from $- \infty$ to $\infty$ , the ramp filter is not integrable, and the inverse Fourier transform in Eq. (44) is undefined. To solve this problem, the ramp filter can be band-limited by a window function and Eq. (44) becomes $\frac{\partial p_{0} (r_{d}, t)}{\partial t} = F^{- 1} {i ω W (ω) F {p_{0} (r_{d}, t)}},$ (45)where $W (ω)$ represents the window function. In practice, a smooth window such as the Hanning function is preferred over a box function because the latter may introduce undesirable ringing artifacts in the image domain.

One benefit of the far-field approximation in FBP is that it allows for simplified calculation of the solid angle $d Ω$ . In other words, under the far-field approximation, $d Ω$ in Eq. (43) reduces to $d Ω ≅ - \frac{d σ}{{| r_{d} |}^{2}} (n_{d} \cdot \frac{r_{d}}{| r_{d} |}) .$ (46)

Compared with Eq. (43), Eq. (46) involves only the detector position $r_{d}$ when calculating $d Ω$ and is independent of the source position $r_{s}$ , thereby reducing the computational complexity. For example, assuming that the size of a 3D image to be reconstructed is $N_{x} \times N_{y} \times N_{z}$ voxels ( $N_{x} = N_{y} = N_{z} = n$ ) and the number of detectors used for imaging is $M$ ( $M = n \times n$ ), the computational complexity is $O (n^{5})$ for Eq. (43) and only $O (n^{2})$ for Eq. (46). Furthermore, if the detection geometry is spherical, $d Ω$ in Eq. (46) becomes a constant ( $d Ω = d σ / | r_{d} |^{2}$ ), indicating that the computational complexity is $O (1)$ .

3.2.2 Exact filtered back projection

The approximate FBP algorithms are based on the condition of far-field approximation. This condition, however, may not be fully met in practice considering that photoacoustic signals may attenuate with propagation distance and that signal detection in space may be restricted. To solve this problem, in 2004, Finch et al. derived an exact FBP formula for image reconstruction in spherical detection geometries with odd dimensions ( $n \geq 3$ )^[23]. In 2005, Xu and Wang presented a universal FBP formula for image reconstruction in infinite planar, infinite cylindrical, and closed spherical detection geometries^[24]. The reconstruction formula is given as $p_{0} (r_{s}) = \int_{Ω} b (r_{d}, t) δ (t - \frac{| r_{s} - r_{d} |}{v_{0}}) \frac{d Ω}{Ω},$ (47)where the back-projection term $b (r_{d}, t) = 2 [p (r_{d}, t) - t \frac{\partial p (r_{d}, t)}{\partial t}] .$ (48) $Ω$ is the solid angle of the detection surface with respect to the reconstruction point and equals $2 π$ for the infinite planar geometry and $4 π$ for the spherical and cylindrical geometries. FBP is one of the most commonly used algorithms in PACT.

By comparing Eqs. (47) and (41), we find that the exact FBP algorithm is very close to the approximate FBP algorithm in the formulas. They both consist of three reconstruction steps: filtering, back projection (the $δ$ function), and summation. Their major difference is in the back-projection terms. The back-projection term $b (r_{d}, t)$ in the exact FBP algorithm has an extra term $p (r_{d}, t)$ , which is not present in the approximate FBP algorithm. The reasons are as follows. According to the forward solution of the photoacoustic wave equation [Eq. (18)], the amplitude of the photoacoustic signals received by a detector is proportional to the size of the source and inversely proportional to the detection distance, i.e., $p (r_{d}, t) \propto d / | r_{d} - r_{s} |$ . Under the condition of the far-field approximation, i.e., $| r_{d} - r_{s} | ≫ d$ , the amplitude of $p (r_{d}, t)$ is small enough compared with the derivative term $- t \partial p (r_{d}, t) / \partial t$ in Eq. (48) to be ignored without significant loss of reconstruction accuracy. Therefore, the back-projection term $b (r_{d}, t)$ in the approximate FBP algorithm does not involve the term $p (r_{d}, t)$ .

Figure 13 presents an example demonstrating the principle of FBP-based image reconstruction. In this example, the photoacoustic source is a 5-mm-diameter uniform spherical absorber, and the photoacoustic signals generated from the source are recorded by a 40-mm-diameter circular detector array [Fig. 13(a)]. Figures 13(b) and 13(c) show the representative photoacoustic signal $p (r_{d}, t)$ measured by a detector and the corresponding back-projection signal $b (r_{d}, t)$ , respectively. Figure 13(d) shows the projection images of detectors at different positions. Figures 13(e)–13(g) are the reconstruction results obtained by summing the projection images of four, 16, and 256 detectors, respectively. The results demonstrate that the FBP algorithm can effectively reconstruct the structure and amplitude information of the photoacoustic source. Note that the artifacts in the reconstructed images are not caused by the FBP algorithm but by the limited view angle of the circular detector array^[137].

Figure 13.Illustration of the principle of the FBP algorithm. (a) Schematic diagram showing a spherical photoacoustic source (diameter: 5 mm) and an array of point-like detectors uniformly distributed over a circle (diameter: 40 mm). (b) N-shaped photoacoustic signal recorded by a detector on the detection circle. (c) Back-projection signal [Eq. (48)]. (d) Projection images produced by the detectors at different positions. (e)–(g) Images reconstructed using 4, 16, and 256 detectors, respectively. Adapted from Ref. [137] with permission.

Download full size

View all figures

To evaluate the performance of the FBP algorithm for different detection geometries, a group of image reconstruction simulations was conducted. In the simulations, a multi-sphere phantom is used as the photoacoustic source. The phantom contains nine spherical absorbers with unit intensity, among which eight with diameters uniformly varying from 1 mm to 2 mm are evenly distributed on a circle with a diameter of 10 mm, while the ninth one, with a diameter of 1.6 mm, is seated at the origin. Figures 14(a)–14(c) show the relative positions of the multi-sphere phantom and a finite planar, finite cylindrical, and closed spherical detection surface. The planar detection surface is located 12 mm below the $x - y$ plane, and the cylindrical and spherical detection surfaces are centered at the origin. The three detection surfaces have the same number (32768) of evenly distributed point-like detectors and approximately equal detection areas (plane: $150 mm \times 150 mm$ ; cylinder: 70 mm base diameter, 100 mm height; sphere: 84 mm diameter). The reconstructed images of the $x - z$ and $x - y$ cross sections of the phantom are displayed in Figs. 14(d)–14(f) and 14(g)–14(i), respectively. The results show that the FBP algorithm achieves stable image reconstruction for all three detection geometries. However, the image reconstructed in the closed spherical detection surface has much fewer artifacts than those reconstructed in the planar and cylindrical detection surfaces. This is because the spherical detection surface is closer to an ideal detection geometry than the finite planar and finite cylindrical surfaces.

Figure 14.Image reconstruction by FBP in three common detection geometries. (a)–(c) Schematic diagrams showing a multi-sphere phantom and planar, cylindrical, and spherical detection surfaces. The three detection surfaces have the same number of point-like detectors (32768) and approximately equal detection areas. Please refer to the text for more simulation settings. (d)–(f) Reconstructed images in the $x - z$ plane. (g)–(i) Reconstructed images in the $x - y$ plane. Adapted from Ref. [137] with permission.

Download full size

View all figures

To extend the application scenarios of FBP, many in-depth studies have also been carried out^[138]. For example, in 2007, Kunyansky proposed a set of FBP-type formulas for image reconstruction in spherical detection geometries with arbitrary dimensions ( $n \geq 2$ )^[139]. In the same year, Finch et al. also developed a set of FBP-type inverse formulas for spherical detection geometries with even dimensions^[140]. In 2009, Nguyen derived a family of inverse formulas for thermoacoustic tomography, in which many previously known FBP inverse formulas can be obtained as special cases^[141]. In addition, the exact FBP algorithms mentioned above are primarily used for planar, cylindrical, and spherical detection geometries with point-like detectors. In 2007, Burgholzer et al. developed a two-step FBP algorithm for integrating line detectors^[142]. In 2012, Natterer proposed a novel FBP algorithm for elliptical detection geometries^[143], which was further developed by Palamodov in 2012^[144], Haltmeier in 2014^[145,146], and Salman in 2014^[147].

Moreover, the FBP algorithm is well suited for parallel computing. GPUs have been used to achieve real-time FBP image reconstruction in both 2D and 3D imaging^{[42,148–153]}. A field-programmable gate array (FPGA) was also employed to accelerate image reconstruction in low-cost PACT systems^[154].

Table 4 summarizes representative work on FBP-based image reconstruction in PACT.

Table 4. Representative Work on FBP-Based Image Reconstruction in PACT

View table

View all Tables

Table 4. Representative Work on FBP-Based Image Reconstruction in PACT


Author	Year	Method	Detection Geometry	Dimension	Source
Haltmeier	2014	FBP	Elliptical	Arbitrary	[145,146]
Salman	2014	FBP	Elliptical	2D and 3D	[147]
Natterer	2012	FBP	Elliptical	3D	[143]
Palamodov	2012	FBP	Elliptical	Arbitrary	[144]
Nguyen	2009	FBP	Spherical	Arbitrary	[141]
Burgholzer &
Matt	2007	Approximate FBP	Arbitrary	3D	[35]
Burgholzer et al.	2007	FBP	Arbitrary closed detection curve	2D or 3D	[142]
Kunyansky	2007	FBP	Spherical	Arbitrary	[139]
Finch et al.	2007	FBP	Spherical	Even	[140]
Xu & Wang	2005	FBP	Planar, cylindrical, and spherical	3D	[24]
Finch et al.	2004	FBP	Spherical	Odd	[23]
Xu & Wang	2003	Approximate FBP	Planar, cylindrical, and spherical	3D	[136]
Xu & Wang	2002	Approximate FBP	Circular	3D	[134]
Xu & Wang	2002	Approximate FBP	Spherical	3D	[135]
Xu et al.	2001	Approximate FBP	Circular	3D	[155]
Kruger et al.	1995	Approximate FBP	Circular	3D	[22]

3.3 Series Expansion

The basic principle of series expansion (SE) algorithms is to approximate the image to be reconstructed using mathematical series. Compared with other reconstruction algorithms in PACT, SE algorithms are simple and fast for specific detection geometries such as planar geometries because they can be implemented using FFT.

3.3.1 Series expansion for planar detection geometries

Norton et al. proposed exact SE methods for the reconstruction of acoustic reflectivity in circular geometries in 1980^[156] and in planar, cylindrical, and spherical geometries in 1981^[157]. Several groups have studied similar ideas for image reconstruction in PACT. Köstli et al. in 2001 proposed an exact SE algorithm for image reconstruction in planar detection geometries^[31], and Xu et al. in 2002 reported a similar algorithm^[158].

Mathematically, SE-based PACT image reconstruction for planar detection geometries can be formulated as^[31,83] $P_{0} (k_{x}, k_{y}, ω) = \frac{v_{0}^{2} k_{z}}{2 ω} F_{x, y, t} {p (x, y, t)},$ (49) $P_{0} (k_{x}, k_{y}, ω) \overset{ω^{2} = v_{0}^{2} (k_{x}^{2} + k_{y}^{2} + k_{z}^{2})}{\to} P_{0} (k_{x}, k_{y}, k_{z}),$ (50) $p_{0} (x, y, z) = F_{x, y, z}^{- 1} {P_{0} (k_{x}, k_{y}, k_{z})},$ (51)where $p (x, y, t)$ is the photoacoustic signal measured by a planar surface at position $(x, y)$ and time $t$ ; $k_{x}$ , $k_{y}$ , and $k_{z}$ are the spatial wavenumber components in each Cartesian direction; $ω$ is the temporal frequency; $v_{0}$ is the SOS; → represents the interpolation operation between the temporal and spatial frequencies $ω$ and $k_{z}$ ; $p_{0} (x, y, z)$ and $P_{0} (k_{x}, k_{y}, k_{z})$ are the initial photoacoustic pressure and its spatial Fourier transform; and $F$ and $F^{- 1}$ represent the forward and inverse Fourier transforms, respectively. The reconstruction procedure above involves one interpolation and two Fourier transforms and has a computational complexity of $O (n^{3} \log n)$ for a 3D image with a size of $n \times n \times n$ voxels by use of FFT^[159]. The numerical implementation of the SE algorithm is available in the k-Wave toolbox^[83].

Figure 15 is a numerical example showing SE-based PACT image reconstruction using the k-Wave toolbox. The photoacoustic source in the $x y$ plane contains nine spherical absorbers with unit intensity. Among them, eight absorbers with diameters uniformly varying from 1 mm to 2 mm are evenly distributed on a circle with a diameter of 10 mm, while the ninth absorber with a diameter of 1.6 mm is seated at the origin. The signal detection geometry is a planar surface 12 mm below the $x y$ plane. The planar surface has $364 \times 364$ point detectors uniformly distributed spanning a physical size of $36 mm \times 36 mm$ . Figures 15(b) and 15(c) are the $x z$ and $x y$ cross sections of the photoacoustic source, respectively, and Figs. 15(d) and 15(e) are the corresponding reconstructed images, which show that the SE algorithm can recover major structures of the source. Note that the distortion and blurring in the reconstructed images are due to the finite size of the planar detection surface.

Figure 15.Example of SE-based image reconstruction in PACT. (a) Schematic diagram of a planar detection geometry and a multi-sphere photoacoustic source. (b), (c) $x z$ and $x y$ cross sections of the source. (d), (e) $x z$ and $x y$ cross sections of the reconstructed source. Please refer to the text for the simulation settings.

Download full size

View all figures

3.3.2 Series expansions for circular, cylindrical, and spherical detection geometries

In 2002, the Wang group reported exact SE algorithms for PACT image reconstruction in cylindrical and spherical geometries^[135,160]. However, the image reconstruction procedures in these two cases involve complicated mathematical calculations, preventing their implementations using FFT and are thus time-consuming^[135,160]. Based on Norton’s pioneering work on circular geometries^[156], Haltmeier and Scherzer in 2007 proposed a 3D reconstruction algorithm for cylindrical detection geometries, where photoacoustic signals are recorded by linear integrating detectors^[161]. The proposed algorithm has a computational complexity of $O (n^{4})$ for a 3D image with a size of $n \times n \times n$ voxels, which is higher than that of the SE algorithm in planar geometries [ $O (n^{3} \log n)$ ]^[159] but lower than that of FBP [ $O (n^{5})$ ]^[24].

In 2012, Kunyansky proposed fast image reconstruction algorithms suitable for circular, cylindrical, and spherical detection geometries^[33]. In Kunyansky’s work, image reconstruction is based on the Fourier transforms of the initial photoacoustic pressure $p_{0} (r_{s})$ and the measured photoacoustic signals $p (r_{d}, t)$ , as summarized below. Suppose that the detection geometry is a circle, as shown in Fig. 7(c). By expanding the Fourier transform $P (r_{d}, ω)$ of the measured photoacoustic signal $p (r_{d}, t)$ and the initial photoacoustic pressure $p_{0} (r_{s})$ in the Fourier series in $φ$ and $θ$ , we have^[33] $P (r_{d}, ω) = \sum_{k = - \infty}^{\infty} P^{k} (ω) e^{i k φ}, r_{d} = (r_{d} \cos φ, r_{d} \sin φ),$ (52) $p_{0} (r_{s}) = \sum_{k = - \infty}^{\infty} p_{0}^{k} (r_{s}) e^{i k θ}, r_{s} = (r_{s} \cos θ, r_{s} \sin θ),$ (53)where $i$ is the imaginary unit, $ω$ is the angular frequency, $r_{d} = | r_{d} |$ is the radius of the detection circle, $r_{s} = | r_{s} |$ is the distance from the reconstruction point to the center of the circular detection geometry, and $P^{k} (ω)$ and $p_{0}^{k} (r_{s})$ are expansion coefficients given by^[33] $P^{k} (ω) = \frac{1}{2 π} \int_{0}^{2 π} P (r_{d}, ω) e^{- i k φ} d φ,$ (54) $p_{0}^{k} (r_{s}) = \frac{2}{π} \int_{0}^{\infty} \frac{P^{k} (ω)}{H_{| k |}^{(1)} (ω R)} J_{| k |} (ω r_{s}) d ω,$ (55)where $H_{| k |}^{(1)}$ is the Hankel function of the first kind of order $k$ and $J_{| k |}$ is the Bessel function of the first kind of order $| k |$ ^[162]. Equation (55) is somewhat similar to an expression in Norton’s work^[156], where the Bessel function $J_{| k |}$ rather than the Hankel function $H_{| k |}^{(1)}$ is in the denominator and a term corresponding to the real part of $P^{k} (ω)$ in the numerator.

By discretizing Eq. (55), we can obtain $p_{0}^{k} (r_{s})$ and then the initial photoacoustic pressure $p_{0} (r_{s})$ [Eq. (53)]. However, direct discretization of Eq. (55) results in a computational complexity of $O (n^{2})$ for each $p_{0}^{k} (r_{s})$ and $O (n^{3})$ for the whole reconstruction. To achieve fast reconstruction, substituting Eq. (55) into Eq. (53) yields^[33] $p_{0} (r_{s}) = \frac{1}{2 π} \int_{R^{2}} P_{0} (Λ) e^{i r_{s} \cdot Λ} d Λ, Λ = (ω \cos φ, ω \sin φ),$ (56)where $P_{0} (Λ)$ is given by $P_{0} (Λ) = {\begin{matrix} \frac{2}{π} \sum_{k = - \infty}^{\infty} \frac{{(- i)}^{k} p_{0}^{k} (ω)}{ω H_{| k |}^{(1)} (ω R)} e^{i k φ}, & Λ \neq 0, \\ \int_{0}^{\infty} \frac{2 P^{k = 0} (ω)}{π ω H_{0}^{(1)} (ω R)} R J_{1} (ω R) d ω, & Λ = 0, \end{matrix}$ (57)where $R$ is the radius of the circular detection geometry. Equation (56) indicates that the initial photoacoustic pressure $p_{0} (r_{s})$ can be obtained by calculating the 2D inverse Fourier transform of $P_{0} (Λ)$ , which has a computational complexity of $O (n^{2} \log n)$ , lower than that of the direct-discretization-based reconstruction $[O (n^{3})]$ .

For cylindrical detection geometries with linear integrating detectors, image reconstruction can be readily realized by combining the fast image reconstruction procedure for 2D circular geometries and the 3D inverse Fourier transform^[33]. This procedure can yield fast reconstruction with a computational complexity of $O (n^{3} \log n)$ . For spherical detection geometries, the derivation of the image reconstruction procedure is similar to those for circular geometries [Eqs. (52 )–(57)]. A prominent difference is that $p_{0} (r_{s})$ and $P (r_{d}, ω)$ are expanded into spherical harmonics. In addition, the fast spherical harmonics (FSH) transform is adopted to achieve fast reconstruction^[163], which results in a reconstruction complexity of $O (n^{3} \log^{2} n)$ ^[33].

Wang and Anastasio in 2012 demonstrated that a mapping relationship exists between the spatial frequency components of initial photoacoustic pressure $p_{0} (r_{s})$ and the temporal frequency components of measured photoacoustic signals $p (r_{d}, t)$ ^[164]. They thus proposed a Fourier-transform-based image reconstruction algorithm whose computational complexity is $O (n^{2} \log n)$ for 2D image reconstruction in circular geometries and $O (n^{3} \log n)$ for 3D image reconstruction in spherical geometries^[165]. The reconstruction formula does not involve series expansion of special functions or multi-dimensional interpolation operations in Fourier space, which are commonly used in previous work.

3.3.3 Series expansion for general detection geometries

Kunyansky in 2007 presented a generalized SE method for image reconstruction in arbitrarily closed detection geometry provided that the eigenfunctions of the Dirichlet Laplacian are explicitly known^[32].

Assuming that $λ_{k}^{2}$ and $u_{k} (r)$ are the eigenvalues and normalized eigenfunctions of the Dirichlet Laplacian in the interior $Ω$ of a closed detection surface $S$ , we have^[32] $\nabla^{2} u_{k} (r) + λ_{k}^{2} u_{k} (r) = 0, r \in Ω, Ω \subseteq R^{n} u_{k} (r) = 0, r \in S, {‖ u_{k} ‖}_{2}^{2} \equiv \int_{Ω} {| u_{k} (r) |}^{2} d r = 1,$ (58)where $n$ denotes the spatial dimension. Since the eigenfunctions $u_{k} (r)$ are orthogonal, the initial photoacoustic pressure $p_{0} (r)$ can be formulated in series form as $p_{0} (r) = \sum_{k = 0}^{\infty} α_{k} u_{k} (r),$ (59)where the coefficients $α_{k}$ can be obtained from the measured projection data $p (r_{d}, t)$ , and $u_{k} (r)$ is the known eigenfunctions of the Dirichlet Laplacian determined by detection geometries. $u_{k} (r)$ is the solution of the Dirichlet problem for the Helmholtz equation with zero boundary conditions and has the following Helmholtz representation^[32]: $u_{k} (r) = \int_{S} Φ_{λ_{k}} (| r - r_{d} |) \frac{\partial}{\partial n_{d}} u_{k} (r_{d}) d S, r \in Ω,$ (60)where $Φ_{λ_{k}}$ is a free-space rotationally invariant Green’s function^[166], $n_{d}$ is the unit normal vector of the detector surface pointing to the ROI, and $r_{d}$ is the position of the detector. The coefficients $α_{k}$ can be calculated by^[32] $α_{k} (r) = \int_{Ω} u_{k} (r) p_{0} (r) d r = \int_{S} I (r_{d}, λ_{k}) \frac{\partial}{\partial n_{d}} u_{k} (r_{d}) d S,$ (61)where $I (r_{d}, λ_{k})$ is given by $I (r_{d}, λ_{k}) = \int_{0}^{diam (Ω) / v_{0}} g (r_{d}, t) Φ_{λ_{k}} (t) v_{0} d t .$ (62)

Here $diam (Ω)$ denotes the diameter of $Ω$ , and $g (r_{d}, t)$ is the integral over a spherical shell [Eq. (27)] centered at the detector position $r_{d}$ , which can be calculated from the measured projection data $p (r_{d}, t)$ . With the calculated coefficients $α_{k}$ and eigenfunctions $u_{k} (r)$ , the initial photoacoustic pressure $p_{0} (r)$ can be reconstructed [Eq. (59)].

If the eigenfunctions of the Dirichlet Laplacian of a detection geometry such as a sphere, a half-sphere, a finite cylinder, or a cube are explicitly known, the eigenfunction-based SE method can provide exact reconstruction for any closed detection geometry. In particular, when the detection geometry is a cube, the reconstruction procedure can be implemented using 3D FFT, resulting in efficient computation with a complexity of $O (n^{3} \log n)$ ^[32]. Furthermore, unlike the FBP algorithm, this technique can provide theoretically exact reconstruction within the region enclosed by the detection geometry even if part of a photoacoustic source is outside the region.

The image reconstruction procedures discussed for planar, circular, cylindrical, spherical, and arbitrarily closed geometries are based on the assumption that the acoustic medium is homogeneous^[32]. For the image reconstruction problem in heterogeneous media, Agranovsky and Kuchment proposed an analytic reconstruction formula that works for any closed detection geometry and variable SOS satisfying a non-trapping condition^[167]. This algorithm can be regarded as a generalization of the eigenfunction-based SE method for arbitrary closed detection geometries [Eqs. (58 )–(62)]^[167].

Table 5 lists representative work on SE-based fast image reconstruction algorithms and Table 6 summarizes representative work on SE-based image reconstruction in PACT.

Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT

View table

View all Tables

Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT


Author	Year	Detection Geometry	Dimension	Complexity^a	Source
Kunyansky	2012	Circular	2D	$O (n^{2} \log n)$	[33]
Spherical	3D	$O (n^{3} \log^{2} n)$
Cylindrical	3D	$O (n^{3} \log n)$
Wang et al.	2012	Circular	2D	$O (n^{2} \log n)$	[165]
Spherical	3D	$O (n^{3} \log n)$
Kunyansky	2007	Cubic	3D	$O (n^{3} \log n)$	[32]
Xu et al.	2002	Planar	3D	$O (n^{3} \log n)$	[160]
Köstli et al.	2001	Planar	3D	$O (n^{3} \log n)$	[31]

Table 6. Representative Work on SE-Based Image Reconstruction in PACT

View table

View all Tables

Table 6. Representative Work on SE-Based Image Reconstruction in PACT


Author	Year	Detection Geometry	Detector	SOS	Source
Kunyansky	2012	Circular, spherical, and cylindrical	Point-like detectors for circles and spheres; linear integrating detectors for cylinder	Constant	[33]
Wang et al.	2012	Circular and spherical	Point-like detectors	Constant	[165]
Zangerl et al.	2009	Cylindrical	Circular integrating detectors	Constant	[168,169]
Haltmeier et al.	2007	Cylindrical	Linear integrating detectors	Constant	[161]
Kunyansky	2007	Arbitrary closed geometry	Point-like detectors	Constant	[32]
Agranovsky & Kuchment	2007	Arbitrary closed geometry	Point-like detectors	Constant or variable	[167]
Xu & Wang	2002	Spherical	Point-like detectors	Constant	[135]
Xu et al.	2002	Planar	Point-like detectors	Constant	[158]
Xu et al.	2002	Cylindrical	Point-like detectors	Constant	[160]
Köstli et al.	2001	Planar	Point-like detectors	Constant	[31]

3.4 Time Reversal

Time reversal (TR) is a type of algorithm that involves recovering initial photoacoustic pressure via numerically running a forward acoustic propagation model backward and re-transmitting the photoacoustic signals measured by each detector into the image domain in a temporally reversed order. TR algorithms can couple the acoustic properties of media such as SOS, density, dispersion, and absorption into the acoustic propagation model and can be used for image reconstruction in arbitrary closed detection geometry^[170]. They are thus regarded as the ‘least restrictive’ image reconstruction algorithms in PACT^[170,171].

3.4.1 Standard time reversal

In 2004, Xu and Wang proposed a TR algorithm in PACT and derived an analytical expression for image reconstruction based on the Green’s function with a homogeneous Dirichlet boundary condition^[34]. Under the condition of the far-field approximation [see Sec. 3.2.1], the derived analytical expression in^[34] is reduced to the reconstruction formula in FBP [Eq. (47)]. Subsequent research on TR-based image reconstruction primarily focused on coupling the acoustic properties of heterogeneous media into the TR model to improve reconstruction performance^{[35,170–172]}. For example, Hristova et al. studied TR-based image reconstruction in heterogeneous media, in even dimensions, and with partially closed detection surfaces^[171]. Treeby et al. considered the effect of acoustic absorption of media and proposed a TR algorithm for absorbing acoustic media^[170].

The TR algorithms are based on the Huygens principle^[171]. When the SOS of a medium is constant and the spatial dimension of an acoustic field is odd, the Huygens principle indicates that the acoustic wave field vanishes after a moment $T$ ^[171] and TR algorithms enable exact image reconstruction for a sufficiently large $T$ . When the SOS is variable or the spatial dimension is even, the Huygens principle no longer holds, and TR algorithms can only provide approximate reconstructions^[171].

In a lossy medium, the photoacoustic wave equation can be formulated as an initial value problem by three coupled acoustic equations and corresponding initial conditions^[170]. The coupled acoustic equations, including the linearized equation of motion, the linearized equation of continuity, and the adiabatic equation of state, are given as^[170] $\frac{\partial}{\partial t} u (r, t) = - \frac{1}{ρ_{0} (r)} \nabla p (r, t),$ (63) $\frac{\partial}{\partial t} ρ (r, t) = - ρ_{0} (r) \nabla \cdot u (r, t),$ (64) $p (r, t) = v_{0} {(r)}^{2} {1 + τ (r) \frac{\partial}{\partial t} {(- \nabla^{2})}^{y / 2 - 1} + η (r) {(- \nabla^{2})}^{(y + 1) / 2 - 1}} ρ (r, t),$ (65)which are subject to the following initial conditions: ${p (r, t) |}_{t = 0} = p_{0} (r), {u (r, t) |}_{t = 0} = 0.$ (66)Here $p (r, t)$ is the acoustic pressure at time $t$ and position $r$ inside the imaging region, $u (r, t)$ is the acoustic particle velocity, $ρ (r, t)$ is the acoustic density, $ρ_{0} (r)$ is the ambient density, and $τ (r)$ and $η (r)$ are the absorption and dispersion proportionality coefficients given by $τ (r) = - 2 α_{0} v_{0} {(r)}^{y - 1}, η (r) = 2 α_{0} v_{0} {(r)}^{y} \tan (π y / 2),$ (67)where $α_{0}$ is the absorption coefficient ( $dB {MHz}^{- y} {cm}^{- 1}$ ) and $y$ is a real constant representing the power law exponent.

TR-based image reconstruction is achieved by running the same photoacoustic forward model in temporally reversed order and thus can be formulated by the same set of equations [Eqs. (63 )–(67)] except that the initial conditions in Eq. (66) are replaced with ${p (r, t) |}_{t = 0} = 0, {u (r, t) |}_{t = 0} = 0, p (r_{d}, t) = p (r_{d}, T - t) .$ (68)

Figure 16 illustrates the basic principle of TR-based image reconstruction.

Figure 16.Illustration of TR-based image reconstruction. (a) Cross section of a spherical absorber and a spherical detector array. (b) Photoacoustic signal measured by a detector. (c) Temporal reversion of the measured signal in (b). (d)–(i) Acoustic wave fields in the detection region at different moments during backward propagation of the time-reversed signal.

Download full size

View all figures

Generally, there is no explicit analytical solution to the partial differential equations in Eqs. (63 )–(65). Numerical methods are required to solve these equations. Finite element and finite difference are two commonly used methods for computing numerical solutions to partial differential equations due to their flexibility, accuracy, and rigorous mathematical foundations. However, these approaches require many mesh points per wavelength and small time steps to achieve accurate calculations, which is computationally expensive, especially for high-frequency and large-scale imaging models.

To solve this problem, Cox et al. developed a k-space pseudospectral approach that can achieve accurate computations using fewer mesh points and more time steps^[173]. The k-space-pseudospectral-method-based TR algorithm has been implemented in the k-Wave toolbox^[83] and is based on the discrete form of the coupled acoustic equations [Eqs. (63 )–(65)]^[170], which can be written as $\frac{\partial}{\partial ξ} p (r, t) = F^{- 1} {i k_{ξ} κ_{k} F {p (r, t)}},$ (69) $u_{ξ} (r, t + Δ t) = u_{ξ} (r, t) - \frac{Δ t}{ρ_{0} (r)} \frac{\partial}{\partial ξ} p (r, t),$ (70) $\frac{\partial}{\partial ξ} u_{ξ} (r, t + Δ t) = F^{- 1} {i k_{ξ} κ_{k} F {u_{ξ} (r, t + Δ t)}},$ (71) $ρ_{ξ} (r, t + Δ t) = ρ_{ξ} (r, t) - Δ t ρ_{0} (r) \frac{\partial}{\partial ξ} u_{ξ} (r, t + Δ t),$ (72) $p (r, t + Δ t) = v_{0} {(r)}^{2} [\sum_{ξ} ρ_{ξ} (r, t + Δ t) + abs + disp],$ (73)where $i$ is the imaginary unit, $ξ$ denotes the Cartesian coordinates [ $ξ = x$ represents 1D space, $ξ = (x, y)$ represents 2D space, $ξ = (x, y, z)$ represents 3D space], $k_{ξ}$ is the spatial wavenumber component, $Δ t$ is the time step, $κ_{k} = sinc [v_{0} (r) k Δ t / 2]$ is a k-space adjustment to the spatial derivative^[170], and $F$ and $F^{- 1}$ represent the forward and inverse Fourier transforms, respectively. Equations (69) and (71) are equations for gradient calculation and Eqs. (70) and (72) are update equations. Equation (73) is an equation of state, including the absorption and dispersion of a medium, which is given by $abs = τ (r) F^{- 1} {{(k_{ξ} κ_{k})}^{y - 2} F {ρ_{0} (r) \sum_{ξ} \frac{\partial}{\partial ξ} u_{ξ} (r, t + Δ t)}},$ (74) $disp = η (r) F^{- 1} {{(k_{ξ} κ_{k})}^{y - 1} F {\sum_{ξ} ρ_{ξ} (r, t + Δ t)}} .$ (75)

The above equations are calculated iteratively and the entire acoustic field is updated for each time step.

One of the most prominent features of the TR algorithm is that it is well-suited for image reconstruction in acoustically heterogeneous media. To demonstrate this, a simulation is presented. Figure 17(a) shows an acoustically heterogeneous phantom consisting of a uniform background, several blood vessels, and a bone mimicking the cross section of a human finger. The background and blood vessels have a mass density of $1000 kg / m^{3}$ and an SOS of 1500 m/s. In contrast, the acoustically heterogeneous region (i.e., the bone) has a density of $1850 kg / m^{3}$ and an SOS of 3800 m/s. A 512-element full-ring detector array with a diameter of 50 mm enclosing the phantom is used for imaging. Figure 17(b) shows the image reconstructed by the TR algorithm under the assumption that the media are homogeneous. A typical SOS of 1505 m/s and a density of $1050 kg / m^{3}$ are used. In this case, extensive image artifacts are present in the reconstructed image due to the inaccurate setting of the SOS and density for the TR model. As a comparison, Fig. 17(c) presents the image reconstructed by the TR algorithm coupling the true SOS and density of the heterogeneous media. The results show that by incorporating the acoustic properties of media, the TR algorithms can achieve improved image reconstruction in PACT.

Figure 17.TR-based image reconstruction in acoustically heterogeneous media. (a) A numerical phantom consisting of multiple blood vessels and a bone mimicking the cross section of a human finger. A 512-element full-ring detector array (dashed circle) with a diameter of 50 mm enclosing the phantom is used for imaging. (b) Image reconstructed by TR using a constant SOS (1505 m/s) and a constant density ( $1050 kg / m^{3}$ ). (c) Image reconstructed by TR coupling the true SOS and density of the media. Please refer to the text for the simulation settings.

Download full size

View all figures

Another important feature of TR is that it can address the image reconstruction problem in arbitrary closed detection geometry. Figures 18(a)–18(c) show an imaging example, where a 2D multi-disk phantom is enclosed by three different closed detection geometries, including a square, an octagon, and a circle. The octagon and the circle in Figs. 18(b) and 18(c) can be circumscribed by the square in Fig. 18(a). Figures 18(d)–18(f) show the corresponding images reconstructed by the TR algorithm, which shows that TR can produce good results for all three detection geometries.

Figure 18.TR-based image reconstruction under different detection geometries. (a)–(c) Schematic diagrams showing a phantom and three different detection geometries. The square detection geometry has a side length of 50 mm, the octagonal geometry has a side length of 20.7 mm, and the circular geometry has a diameter of 50 mm. (d)–(f) Corresponding images reconstructed by TR.

Download full size

View all figures

3.4.2 Modified time reversal based on the Neumann series

The standard TR algorithms discussed above are unable to find analytical solutions for the wave equations when the SOS is variable or the spatial dimension of an acoustic field is even. To address this problem, a Neumann-series-based TR algorithm was proposed by Stefanov and Uhlmann^[174] and further developed by Qian et al.^[175]. The basic idea of the Neumann series-based TR algorithm is to first modify the measurement data at $t = T$ using a harmonic extension operator, perform TR reconstruction, and then repeat the TR process^[175]. The Neumann series-based TR algorithm can yield exact reconstruction if the detected pressure $p (r_{d}, t)$ is known on the whole boundary $Ω_{0}$ and the measurement time $T$ is greater than a stability threshold^[175].

In the Neumann-series-based TR algorithm, the initial conditions [Eq. (68)] for the coupled wave equations are modified to^[63,175] ${p (r, t) |}_{t = 0} = P_{Ω} p (r, T), {u (r, t) |}_{t = 0} = 0, p (r_{d}, t) = p (r_{d}, T - t),$ (76)where $P_{Ω}$ denotes the Poisson operator of the harmonic extension defined as $P_{Ω} h (r, T) = ϕ (r)$ , and $ϕ (r)$ is the solution of the elliptic boundary value problem $\nabla^{2} ϕ (r) = 0, {ϕ (r) |}_{r \in Ω_{0}} = {h (r, T) |}_{r \in Ω_{0}},$ (77)where $Ω$ is the domain defined by a detection surface and $Ω_{0}$ is the boundary of the domain.

By introducing a forward acoustic propagation operator $A$ and a modified TR reconstruction operator $A_{modify}^{TR}$ , the photoacoustic forward problem and the modified TR reconstruction can be expressed as $p (r_{d}, t) = A p_{0} (r),$ (78) $p_{0} (r) = A_{modify}^{TR} p (r_{d}, t) .$ (79)

The Neumann-series-based TR reconstruction can be expressed as^[175] $p_{0} (r) = \sum_{j = 0}^{\infty} {(I - A_{modify}^{TR} A)}^{j} A_{modify}^{TR} p (r_{d}, t),$ (80)where $I$ is an identity operator. Theoretically, Eq. (80) can provide an exact reconstruction if the variable $j$ varies from zero to $+ \infty$ . However, in practice, $j$ is always finite, and exact reconstruction is impossible. If $j$ is cut off at a particular value $m$ , the reconstruction formula in Eq. (80) can be written as $p_{0}^{m} (r) = \sum_{j = 0}^{m} {(I - A_{modify}^{TR} A)}^{j} A_{modify}^{TR} p (r_{d}, t) .$ (81)

Equation (81) can be further formulated in the form of iterative TR reconstruction^[176], i.e., $p_{0}^{m + 1} (r) = p_{0}^{m} (r) + A_{modify}^{TR} [p (r_{d}, t) - A p_{0}^{m} (r)] .$ (82)

The estimated initial photoacoustic pressure $p_{0}^{m} (r)$ usually contains pixels with negative intensity due to non-ideal imaging conditions, such as finite detector bandwidth and limited view angle. To improve reconstruction results, non-negative constraints can be imposed on the solutions during iteration. Moreover, the Poisson operator of harmonic extension can be neglected in some cases, e.g., when the spatial dimension of an acoustic field is even and the medium is homogeneous or when the time $T$ is large enough to guarantee that acoustic waves inside a detection region decay sufficiently. Figure 19 shows an example illustrating Neumann-series-based TR reconstruction in limited-view imaging.

Figure 19.Iterative TR-based image reconstruction for limited-view imaging. (a) Schematic diagram of a phantom and a limited-view detector array. The detector array has 455 detectors uniformly distributed over a 50-mm-diameter partial circle with a view angle of $3 / 4 π$ . (b)–(d) Images reconstructed by the Neumann-series-based TR algorithm after 3, 5, and 10 iterations.

Download full size

View all figures

Table 7 summarizes representative work on TR-based image reconstruction in PACT.

Table 7. Representative Work on TR-Based Image Reconstruction in PACT

View table

View all Tables

Table 7. Representative Work on TR-Based Image Reconstruction in PACT


Author	Year	Solution	Media	Source
Qian et al.	2011	Numerical	Heterogeneous (exact solution)	[175]
Treeby et al.	2010	Numerical	Heterogeneous, absorptive, and dispersive	[170]
Stefanov & Uhlmann	2009	Numerical	Heterogeneous (exact solution)	[174]
Hristova et al.	2008	Numerical	Heterogeneous	[171]
Burgholzer et al.	2007	Numerical	Heterogeneous	[35]
Xu & Wang	2004	Analytical	Heterogeneous	[34]

3.5 Iterative Reconstruction

The last class of conventional image reconstruction algorithms in PACT is iterative reconstruction (IR), which achieves image reconstruction through iteration. IR algorithms construct a set of linear equations based on a discrete photoacoustic imaging model and iteratively seek numerical solutions by minimizing the error between measured projection data and their estimates (see Fig. 20). Compared with the aforementioned DAS, FBP, SE, and TR algorithms, IR algorithms can yield improved photoacoustic image quality under non-ideal imaging scenarios, such as spare-view and limited-view imaging. They can also incorporate the physical models of an imaging system, such as transducer responses and acoustic heterogeneity, to achieve improved imaging. The downside of IR algorithms is that they are slow and computationally expensive due to the numerical optimization involved.

Figure 20.Principle of the IR-based image reconstruction model.

Download full size

View all figures

3.5.1 Discrete forward imaging model

The basic idea of IR algorithms is to establish a mapping relationship between the photoacoustic image to be reconstructed and projection data using a set of linear equations and then solve them iteratively. To determine the relationship, a discrete photoacoustic imaging model was established and is illustrated in Fig. 21. In the discrete model, the 2D photoacoustic image is represented by $n \times n$ evenly distributed grid points, and the value of the $j$ th grid point is $x_{j}$ . The photoacoustic signal $p (r_{d}, t)$ measured by a detector is discretely sampled, and the sampling length is $K$ . The $k$ th sampling point of the photoacoustic signal, denoted as $p_{k}$ , relates to the spherical shell integral of the photoacoustic image over the $k$ th detection shell, whose radius equals the TOF of the photoacoustic signal ( $k Δ t$ , where $Δ t$ is the temporal sampling interval) multiplied by the SOS $v_{0}$ . The goal of IR-based image reconstruction is to solve for $x_{j}$ from the projection data $p_{k}$ .

Figure 21.Discrete photoacoustic imaging model in IR. The photoacoustic image is discretely represented by $n \times n$ evenly distributed grid points. The projection data $p_{k}$ measured by a detector correspond to the spherical shell integral of the photoacoustic image over the $k$ th detection shell.

Download full size

View all figures

Based on the discrete imaging model, the relationship between the unknowns $x_{j}$ and the projection data $p_{k}$ can be described by the following set of linear equations: ${\begin{matrix} a_{1, 1} x_{1} + a_{1, 2} x_{2} + \dots + a_{1, N} x_{N} = p_{1}, \\ a_{2, 1} x_{1} + a_{2, 2} x_{2} + \dots + a_{2, N} x_{N} = p_{2}, \\ ⋮ \\ a_{K, 1} x_{1} + a_{K, 2} x_{2} + \dots + a_{K, N} x_{N} = p_{K}, \end{matrix}$ (83)which can be written as $\sum_{j = 1}^{N} a_{k j} x_{j} = p_{k}, k = 1, 2, \dots, K,$ (84)where $a_{k j}$ is a weighting factor representing the contribution of the $j$ th grid point to the $k$ th detection shell, $N = n \times n$ is the total number of grid points, and $K$ is the total number of sampling points of a detector. If we consider the projection data measured by all $M$ detectors and denote $A = [\begin{matrix} \begin{matrix} a_{1, 1} \\ a_{2, 1} \end{matrix} & \begin{matrix} a_{1, 2} \\ a_{2, 2} \end{matrix} & \dots & \begin{matrix} a_{1, N} \\ a_{2, N} \end{matrix} \\ \begin{matrix} ⋮ \\ a_{K, 1} \\ a_{K + 1, 1} \end{matrix} & \begin{matrix} ⋮ \\ a_{K, 2} \\ a_{K + 1, 2} \end{matrix} & \begin{matrix} ⋮ \\ \dots \\ \dots \end{matrix} & \begin{matrix} ⋮ \\ a_{K, N} \\ a_{K + 1, N} \end{matrix} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{M K, 1} & a_{M K, 2} & \dots & a_{M K, N} \end{matrix}], x = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{j} \\ ⋮ \\ x_{N} \end{matrix}], p = [\begin{matrix} p_{1} \\ p_{2} \\ ⋮ \\ p_{K} \\ p_{K + 1} \\ ⋮ \\ p_{M K} \end{matrix}],$ (85)Eq. (83) can be written in matrix form as $A x = p,$ (86)where $A$ is the system matrix that transforms the initial photoacoustic pressure $x$ to the measured projection data $p$ . To solve Eq. (86), the system matrix $A$ should be constructed first.

3.5.2 System matrix construction

A. System matrix construction for acoustically homogeneous media

If the imaging medium is acoustically homogeneous, the photoacoustic wave equations have an explicit analytical solution, as presented in Eq. (18). The solution describes how the measured photoacoustic signal $p (r_{d}, t)$ relates to a photoacoustic source $p_{0} (r_{s})$ . Discretizing the solution [Eq. (18)], the system matrix $A$ can be constructed as ${[p]}_{(m - 1) K + k} = {[\frac{1}{4 π v_{0}^{2}} \frac{\partial}{\partial t} \int \frac{p_{0} (r_{s})}{| r_{s} - r_{d}^{m} |} δ (t - \frac{| r_{s} - r_{d}^{m} |}{v_{0}}) d r_{s}]}_{t = k Δ t},$ (87)where $r_{d}^{m}$ denotes the position of the $m$ th detector, $Δ t$ is the temporal sampling interval, and ${[p]}_{(m - 1) K + k}$ is the photoacoustic signal measured by the $m$ th detector at time $k Δ t$ . To calculate Eq. (87), $p_{0} (r_{s})$ can be expanded using a set of basis functions as^[38] $p_{0} (r_{s}) \approx \sum_{j = 1}^{N} x (r_{s}^{j}) ψ (r_{s}^{j}),$ (88)where $r_{s}^{j}$ represents the position of the $j$ th grid point in the discrete image $x$ , and $ψ (r_{s}^{j})$ is the basis function at the $j$ th grid point. Substituting Eq. (88) into Eq. (87), a discrete imaging model can be obtained as ${[p]}_{(m - 1) K + k} = {\sum_{j = 1}^{N} [\frac{1}{4 π v_{0}^{2}} \frac{\partial}{\partial t} \int \frac{ψ (r_{s}^{j})}{| r_{s} - r_{d}^{m} |} δ (t - \frac{| r_{s} - r_{d}^{m} |}{v_{0}}) d r_{s}] x (r_{s}^{j})}_{t = k Δ t} .$ (89)

Comparing Eqs. (84) and (89), the elements of the system matrix $A$ can be obtained as ${[A]}_{(m - 1) K + k, j} = {[\frac{1}{4 π v_{0}^{2}} \frac{\partial}{\partial t} \int \frac{ψ (r_{s}^{j})}{| r_{s} - r_{d}^{m} |} δ (t - \frac{| r_{s} - r_{d}^{m} |}{v_{0}}) d r_{s}]}_{t = k Δ t},$ (90)where ${[A]}_{(m - 1) K + k, j}$ denotes the element in the $[(m - 1) K + k]$ th row and $j$ th column of the system matrix $A$ .

To construct a system matrix with sufficient accuracy, it is necessary to choose a suitable expansion function $ψ$ . Several expansion functions are available for this purpose. Among them, a spherical-voxel function^{[36,177–179]}, Kaiser–Bessel functions^[38], and linear interpolation functions^{[37,40,42,178,180]} are the most commonly used. Figure 22 shows a schematic of the three expansion-function-based discrete grid models.

Figure 22.Illustration of the discrete grid models based on different expansion functions. (a) Discrete grid model based on a spherical voxel. (b) Discrete grid model based on the Kaiser–Bessel function. (c) Discrete grid model based on the bilinear interpolation. The red dot in (c) represents the point to be interpolated.

Download full size

View all figures

The expansion function $ψ$ can help interpret the meaning of the system matrix $A$ . Suppose that a photoacoustic image includes only a source defined by an expansion function $ψ$ at grid point $j$ . The element ${[A]}_{(m - 1) K + k, j}$ of the system matrix $A$ is the response of the $m$ th detector at time $k Δ t$ , as illustrated in Fig. 23. This implies that to construct the system matrix $A$ , we can simply compute the photoacoustic signals measured by each detector and arrange the signals in a way according to Fig. 23. Since the system matrix $A$ is only associated with the discrete photoacoustic imaging model (Fig. 21) and the expansion function $ψ$ [Eq. (90)], it can be used for different experiments once constructed for a particular imaging system.

Figure 23.Schematic diagram illustrating the meaning of the elements in a system matrix. (a) Spherical-voxel-function-based discrete imaging model. The photoacoustic signals generated from the spherical voxel at the $j$ th grid point are recorded by detectors. (b) Elements of the system matrix. The $j$ th column of the system matrix corresponds to the photoacoustic signal of the $j$ th spherical voxel in (a).

Download full size

View all figures

A1. Spherical-voxel-based system matrix

A spherical voxel function is defined as^[181] $ψ (r^{j}) = {\begin{matrix} 1, & for | r - r^{j} | \leq ε / 2, \\ 0, & for | r - r^{j} | > ε / 2, \end{matrix}$ (91)where $ε$ is the spacing between two grid points, and $r^{j} = {(x^{j}, y^{j}, z^{j})}^{T}$ denotes the coordinate of the $j$ th grid point. In a spherical-voxel-based discrete imaging model, each grid point in the image to be reconstructed is regarded as a uniform spherical source^{[36,177–179]}. It is possible to directly construct the system matrix $A$ based on the definition of the spherical voxel function. However, there may be problems. This is because a spherical voxel is typically small in radius. The photoacoustic signals are short in time and require a high sampling rate, which leads to an increase in computational complexity or aliasing artifacts^[181]. To address this problem, a spherical-voxel-based system matrix can be constructed in the frequency domain^[181].

Define $P$ and $\tilde{A}$ as the discrete Fourier transforms of the measured projection data $p$ and the system matrix $A$ . The matrix form of the imaging model in Eq. (86) can be rewritten as^[38,42] $\tilde{A} x = P,$ (92)and the elements of the system matrix are given by^[38,42] ${[\tilde{A}]}_{(m - 1) K + k, j} = {P_{0} (f) {\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f) |}_{f = k Δ f},$ (93)where $f$ denotes the frequency, $Δ f$ is the frequency sampling interval, ${\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f)$ is the Fourier transform of the SIR of the $m$ th detector and can be written as^[38,42] ${\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f) = \frac{1}{2 π | r_{s}^{j} - r_{d}^{m} |} \exp (- i 2 π f \frac{| r_{s}^{j} - r_{d}^{m} |}{v_{0}}),$ (94)and $P_{0} (f)$ is the Fourier transform of the signal generated from a photoacoustic source defined by a spherical voxel and is given as^[38,42] $P_{0} (f) = - i \frac{v_{0}}{f} [\frac{ε}{2 v_{0}} \cos (\frac{π f ε}{v_{0}}) - \frac{1}{2 π f} \sin (\frac{π f ε}{v_{0}})] .$ (95)

Combining Eqs. (93 )–(95), the spherical-voxel-based system matrix $A$ can be constructed in the frequency domain. This frequency domain approach can solve the sampling problem of high-frequency photoacoustic signals and make it easier to incorporate detector responses (see Sec. 3.5.3).

A2. Kaiser–Bessel-function-based system matrix

The Kaiser–Bessel (KB) functions are radially symmetric functions defined as^[182,183] $b (r) = {\begin{cases} {(\sqrt{1 - r^{2} / a^{2}})}^{n} \frac{I_{n} (γ \sqrt{1 - r^{2} / a^{2}})}{I_{n} (γ)}, & if 0 \leq r \leq a, \\ 0, & otherwise, \end{cases}$ (96)where $I_{n}$ represents the modified Bessel function of the first kind of order $n$ ^[184], $a$ is the radius of support, and $γ$ is a shape parameter governing the width of the radial profile. When $n = 0$ , $a = ε / 2$ , and $γ = 0$ , the KB function reduces to the spherical voxel function [Eq. (91)]. The KB-function-based expansion function can be expressed as $ψ (r^{j}) = b (| r - r^{j} |),$ (97)where $r^{j} = {(x^{j}, y^{j}, z^{j})}^{T}$ denotes the coordinate of the $j$ th grid point.

The elements of the KB-function-based system matrix have an analytical solution and can be calculated in the frequency domain^[38]. The derivation is similar to that of the spherical-voxel-based system matrix described in Eqs. (92 )–(94) except that Eq. (95) should be replaced with $P_{0} (f) = - \frac{i 4 π^{2} f a^{3}}{v_{0}^{2} I_{n} (γ)} \frac{j_{n + 1} (\sqrt{4 π^{2} a^{2} f^{2} / v_{0}^{2} - γ^{2}})}{{(4 π^{2} a^{2} f^{2} / v_{0}^{2} - γ^{2})}^{(n + 1) / 2}} .$ (98)Here $f$ denotes the frequency, $P_{0} (f)$ is the Fourier transform of the signal generated from the photoacoustic source defined by a KB function, and $j_{n + 1}$ is the spherical Bessel function of the first kind of order $n + 1$ ^[184].

In the spherical-voxel-based imaging model, the expansion function $ψ$ is a sphere with uniform intensity, which is not differentiable at the boundary. Therefore, generated photoacoustic signals have an infinite bandwidth and may induce numerical instabilities when calculating the system matrix using Eq. (90). In the KB-function-based imaging model, the expansion function $ψ$ is also a sphere but with a smoothly varying intensity. It is differentiable at any position and thus can avoid numerical instabilities encountered by the spherical-voxel-based imaging model.

The shape of the KB function can be adjusted as needed. The parameter $n$ affects the differentiability of the KB function and is typically set to a value greater than 2 ( $n \geq 2$ ) to guarantee the continuity of derivatives at the boundary. The parameter $a$ determines the effective size of a voxel and can be selected according to the desired spatial resolution. The parameter $γ$ describes the smoothness of the KB function and impacts the width of the radial profile. It typically has a value of 10.4 for image reconstruction in PACT but may vary depending on application scenarios. Figure 24 illustrates the profiles of a family of KB functions for a given set of parameters. More details about the KB functions can be found in^[185].

Figure 24.KB functions with different parameters. (a) Profiles of the KB functions with four groups of parameters. (b) 3D visualization of the KB function with $a = 1$ , $n = 2$ , and $γ = 10.4$ .

Download full size

View all figures

A3. Interpolation-function-based system matrix

In addition to the spherical-voxel- and KB-function-based approaches, the system matrix can also be constructed using linear interpolation functions. Several different interpolation functions can be used as the expansion function to describe the mapping relation between a continuous image and its discrete counterpart. Generally, a high-order interpolation function helps produce high-accuracy results, which, however, involves more calculations. Considering the trade-off between accuracy and efficiency, bilinear and trilinear interpolation functions are commonly used for 2D^[43] and 3D^[38,40,42] IR models, respectively. Specifically, when the trilinear interpolation method is employed in 3D space, the expansion function can be expressed as^[42] $ψ (r^{j}) = {\begin{cases} (1 - \frac{| x - x^{j} |}{ε}) (1 - \frac{| y - y^{j} |}{ε}) (1 - \frac{| z - z^{j} |}{ε}), & for | x - x^{j} |, | y - y^{j} |, | z - z^{j} | \leq ε, \\ 0, & otherwise, \end{cases}$ (99)where $r^{j} = {(x^{j}, y^{j}, z^{j})}^{T}$ denotes the coordinate of the $j$ th grid point, and $ε$ is the spacing between two grid points. The equation implies that the non-zero values of the expansion function $ψ$ exist only in the $ε$ -neighborhood of position $r^{j}$ . The image value at any position can be interpolated from its four neighbors in 2D space or eight neighbors in 3D space.

The interpolation-function-based system matrix $A$ can be constructed in two steps as^[42] $p = A x \equiv D G x,$ (100)where the matrix $G$ represents the spherical Radon transform in 3D [Eq. (27)], and the matrix $D$ represents the differential operation in Eq. (18). The matrix $G$ can be obtained by accumulating all image grid points intersecting with an integrating spherical shell and satisfies the following relationship^[42]: ${[G x]}_{(m - 1) K + k} = ε^{2} \sum_{j = 1}^{N} {[x]}_{j} \sum_{p = 1}^{N_{p}} \sum_{q = 1}^{N_{q}} ψ (r_{k, p, q}^{j}) \equiv {[g]}_{(m - 1) K + k},$ (101)where ${[g]}_{(m - 1) K + k} \approx g (r_{d}^{m}, t = k Δ t)$ is the approximation of the spherical Radon transform in Eq. (27), and $N_{p}$ and $N_{q}$ are the numbers of angular divisions over the polar and azimuth directions, respectively, on a local spherical coordinate system centered at $r_{d}^{m}$ ^[42]. The differential matrix $D$ can be obtained using a difference operation and satisfies the following relationship^[42]: ${[D g]}_{(m - 1) K + k} = \frac{1}{8 π v_{0}^{2} Δ t^{2}} [\frac{{[g]}_{(m - 1) K + k + 1}}{k + 1} - \frac{{[g]}_{(m - 1) K + k - 1}}{k - 1}] \equiv {[p]}_{(m - 1) K + k} .$ (102)

The preceding procedure implies that the system matrix $A$ can be constructed via the two matrices $G$ and $D$ , while the matrix elements in $A$ do not need to be explicitly calculated.

In addition to the implicit method discussed above, the interpolation-function-based system matrix can also be constructed by explicitly calculating its elements via analytical methods^{[37,40,42,43]}. The analytical methods typically have better computational stability due to more accurate calculations of derivatives.

B. System matrix construction for acoustically heterogeneous media

The system matrix discussed above is primarily constructed based on the analytical solution of the photoacoustic wave equations [Eq. (18)], where the acoustic medium is assumed to be homogeneous. When the acoustic media are heterogeneous, the photoacoustic discrete imaging model can still be described by the coupled photoacoustic wave equations [Eqs. (63 )–(67)], which, however, have no analytical solutions. Based on the k-space pseudospectral method (see Sec. 3.4.1), Huang et al. proposed a method for constructing the system matrix $A$ in acoustically heterogeneous media^[41], which is briefly reviewed below.

Define a vector containing all photoacoustic field variables at the time step $k Δ t$ as^[41] $v_{k} = {(u_{k}^{1}, u_{k}^{2}, u_{k}^{3}, ρ_{k}^{1}, ρ_{k}^{2}, ρ_{k}^{3}, p_{k})}^{T},$ (103)where $v_{k}$ is a $7 N \times 1$ vector, $N$ is the total number of grid points in an image, $u_{k}^{i}$ and $ρ_{k}^{i}$ ( $i = 1$ , 2, 3) denote the particle velocity and acoustic density in the $i$ th direction on a 3D Cartesian grid at the $k$ th time step, $p_{k}$ is the acoustic pressure at the grid points, and the superscript $T$ denotes the matrix transpose. The update of the field variables from $t = 0$ to $t = (K - 1) Δ t$ is described by ${[v_{0}, v_{1}, \dots, v_{K - 1}]}^{T} = T_{K - 1} T_{K - 2} \dots T_{1} {[v_{0}, 0_{7 N \times 1}, \dots, 0_{7 N \times 1}]}^{T},$ (104)where $T$ is a $7 N K \times 7 N K$ matrix, and $K$ is the total number of time steps. The vector $[v_{0}, 0_{7 N \times 1}, \dots, 0_{7 N \times 1}]$ represents the values of the field variables at $t = 0$ and can be calculated from the initial photoacoustic pressure $p_{0}$ as ${[v_{0}, 0_{7 N \times 1}, \dots, 0_{7 N \times 1}]}^{T} = T_{0} p_{0},$ (105)where $T_{0}$ is a $7 N K \times N$ matrix that maps the initial pressure $p_{0}$ to the initial value of the field variables at time $t = 0$ ^[41]. Since detectors generally do not coincide with image grid points, the measured photoacoustic projection data $p$ can be related to the field quantities via interpolation as^[41] $p = S {[v_{0}, v_{1}, \dots, v_{K - 1}]}^{T},$ (106)where $S$ denotes the interpolation operation, such as the Delaunay triangulation-based interpolation. Combining Eqs. (104 )–(106), we have $p = S T_{K - 1} \dots T_{1} T_{0} p_{0} .$ (107)

By comparing Eq. (107) with Eq. (86), the system matrix can be obtained as $A = S T_{K - 1} \dots T_{1} T_{0} .$ (108)

As mentioned above, the $j$ th column of system matrix $A$ corresponds to the response of a detector to the source defined by an expansion function $ψ$ at grid point $j$ . Therefore, the system matrix $A$ in heterogeneous media can be constructed by computing the response of a detector when the photoacoustic source defined by an expansion function $ψ$ traverses all image grid points, which can be calculated using the k-Wave toolbox^[186].

In a homogeneous medium, the construction of the system matrix $A$ can be simpler. The response of a detector to an expansion-function-defined source at the first (reference) image grid point is computed and serves as the first column of $A$ . The remaining columns of $A$ can be obtained by time-shifting the elements in the first column while taking into account the signal attenuation according to the relative position between the current and the reference grid points. This can greatly improve the construction speed of the system matrix $A$ .

3.5.3 Transducer response modeling

The IR model discussed above is established based on the assumption that an ultrasound detector is point-like in shape and has an infinite bandwidth. However, a real ultrasound detector always has a finite aperture size and a limited bandwidth, which makes the IR model less accurate. To address this problem, it is necessary to incorporate the SIR and EIR of an ultrasound detector (see Sec. 2.3) to make the system matrix $A$ as realistic as possible.

A. EIR modeling

When the system matrix is constructed in the time domain, the discrete photoacoustic imaging model [Eq. (86)] incorporating the EIR of a detector can be written as^[37] $F^{- 1} \tilde{E} F A x = p,$ (109)where the matrix $\tilde{E}$ denotes the Fourier transform of the EIR of a detector, and $F$ and $F^{- 1}$ represent the forward and inverse Fourier transforms, respectively. In contrast, when the system matrix is constructed in the frequency domain using the spherical voxel or the KB function, the EIR of a detector can be incorporated into a system matrix via^[39] ${[\tilde{A}]}_{(m - 1) K + k} = {P_{0} (f) {\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f) {\tilde{h}}_{EIR} (f) |}_{f = k Δ f} .$ (110)Here, $P_{0}$ is the Fourier transform of the acoustic pressure generated from the photoacoustic source defined by the spherical voxel or the KB function, $f$ is the frequency, $Δ f$ is the frequency sampling interval, $k$ is the index of sampling points, and ${\tilde{h}}_{SIR}^{m}$ and ${\tilde{h}}_{EIR}$ are the Fourier transforms of the SIR and EIR of a detector, which can be modeled or measured in practice.

B. SIR modeling

B1. Far-field SIR

Several SIR models have been proposed and used for image reconstruction in PACT^[187,188]. When a detector has a flat and rectangular surface with an area of $a \times b$ [see Fig. 25(a)] and the condition of far-field approximation^[189] is met, the temporal Fourier transform of the SIR of a detector can be expressed as^[187] ${\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f) = \frac{a b \exp (- i 2 π f \frac{| r_{s}^{j} - r_{d}^{m} |}{v_{0}})}{2 π | r_{s}^{j} - r_{d}^{m} |} sinc (π f \frac{a x_{\det}^{j m}}{v_{0} | r_{s}^{j} - r_{d}^{m} |}) sinc (π f \frac{b y_{\det}^{j m}}{v_{0} | r_{s}^{j} - r_{d}^{m} |}),$ (111)where $x_{\det}^{j m}$ and $y_{\det}^{j m}$ represent the coordinates of the $j$ th grid in a local coordinate system at the $m$ th detector. Other variables have the same meanings as those defined in Eq. (90). With Eq. (111), the system matrix $A$ can be constructed according to Eq. (93).

Figure 25.Detector models for building SIR. (a) Detector model under the condition of far-field approximation. (b) Detector model based on patches ( $n = 2$ ). (c) Detector model based on discrete detection elements.

Download full size

View all figures

B2. Patch-based SIR

When the far-field approximation condition does not hold, the far-field SIR model may lead to inaccurate reconstruction results. To address this problem, the surface of a detector can be divided into small patches [Fig. 25(b)]. For each small patch, the aforementioned far-field assumption is approximately met. The SIR of each small patch can be characterized by Eq. (111), and the SIR of the whole detector is obtained by summing the SIRs of all small patches. Suppose that a detector has an area of $a \times b$ and is divided into $n \times n$ patches. Denoting $a^{'}$ and $b^{'}$ as the length and width of each small patch (i.e., $a^{'} = a / n$ , $b^{'} = b / n$ ), we have ${\tilde{h}}_{SIR}^{m} (r_{s}^{j}, f) \approx \frac{1}{n^{2}} \sum_{l = 1}^{n^{2}} {\tilde{h}}_{SIR}^{m, l} (r_{s}^{j}, f) .$ (112)

Here ${\tilde{h}}_{SIR}^{m, l} (r_{s}^{j}, f)$ is the SIR of the $l$ th patch, which can be calculated using Eq. (111).

B3. Discrete-detection-element-based SIR

The far-field and patch-based SIR models are easy to incorporate into the system matrix based on the spherical voxel function or the KB function but are difficult to incorporate into the system matrix based on linear interpolation or the k-space pseudospectral method. Huang et al.^[41] and Ding et al.^[40] solved this problem by dividing the entire detector surface into a set of detection elements with equal areas, as shown in Fig. 25(c). In this case, the acoustic pressure $p (r_{d}, t)$ recorded by a detector can be approximately written as^[40] $p (r_{d}, t) \approx \sum_{l = 1}^{L} p (r_{d}^{l}, t) σ,$ (113)where $L$ is the total number of detection elements, $σ$ is the area of a detection element (dimensionless), and $r_{d}^{l}$ is the position of the $l$ th detection element. $p (r_{d}^{l}, t)$ is the acoustic pressure recorded at position $r_{d}^{l}$ and time $t$ and can be written in matrix form as [Eq. (86)] $A_{l} x = p_{l},$ (114)where $p_{l}$ is the acoustic pressure recorded by the $l$ th detection element, and $A_{l}$ is the system matrix for the $l$ th detection element. Substituting Eq. (114) into Eq. (113), we have $\sum_{l = 1}^{L} A_{l} x σ \approx p .$ (115)

The system matrix $A$ in the discrete imaging model in Eq. (86) is replaced with a weighted summation of $L$ system matrices $A_{l}$ in the detection-element-based SIR model. To achieve accurate representation, the size of the discrete detection element should be small, preferably much less than the acoustic wavelength^[41]. Therefore, the discrete-detection-element-based SIR model involves more computations and is generally more time-consuming than the far-field- and patch-based SIR models.

3.5.4 Iterative reconstruction

Once the system matrix $A$ of a PACT imaging system is constructed, the initial photoacoustic pressure $x$ can be recovered from the measured projection data $p$ by solving Eq. (86). In principle, the linear photoacoustic imaging model in Eq. (86) can be solved by the pseudo-inverse matrix method, i.e., $x = A^{†} p,$ (116)where $A = {(A^{H} A)}^{- 1} A^{H}$ is the Moore–Penrose pseudo-inverse matrix of $A$ , and the superscript $H$ denotes the conjugate transpose. However, the pseudo-inverse method is not commonly used in practice because the system matrix $A$ is generally too large to be inverted^[61]. Alternatively, the imaging model can be solved iteratively. In this way, the image reconstruction problem is equivalent to solving the least-square problem $\hat{x} = \arg \min_{x \geq 0} \frac{1}{2} {‖ p - A x ‖}_{2}^{2},$ (117)where ${‖ \cdot ‖}_{2}$ represents the $L_{2}$ -norm, and $\hat{x}$ is the approximate solution of the least-square problem. Since the initial acoustic pressure is non-negative, a non-negativity constraint is imposed on Eq. (117). Generally, the optimization problem in Eq. (117) is ill-posed under non-ideal imaging conditions. To address this problem, Eq. (117) is typically regularized by a penalty function and can be rewritten as $\hat{x} = \arg \min_{x \geq 0} \frac{1}{2} {‖ p - A x ‖}_{2}^{2} + λ R (x),$ (118)where ${‖ p - A x ‖}_{2}^{2}$ is the data fidelity term, $R (x)$ is a regularization term, and $λ$ is a regularization coefficient controlling the weight of regularization, which can be automatically optimized by methods such as the generalized cross-validation (GCV) method^[190] and the L-curve method^[191,192].

One popular regularization technique used in Eq. (118) is Tikhonov regularization, which can be expressed as^[63] $R (x) = {‖ Γ x ‖}_{2}^{2},$ (119)where $Γ$ is the Tikhonov matrix. In many cases, the Tikhonov matrix is chosen as the identity matrix ( $Γ = I$ ), giving preference to solutions with smaller norms. When the regularization term $R (x)$ is convex and differentiable, such as in Tikhonov regularization, the optimization problem in Eq. (118) can be solved by optimization methods such as stochastic gradient descent (SGD)^[193,194], least-square QR (LSQR)^[195], and conjugate gradient (CG)^[196]. When a gradient descent method^[197] is used, the optimization problem in Eq. (118) can be solved by $x^{k + 1} = x^{k} - α [A^{*} (A x^{k} - p) + λ \frac{\partial R (x^{k})}{\partial x^{k}}],$ (120)where $k$ is the iteration number, $α$ is the step size controlling the update rate, $A^{*} (A x^{k} - p)$ is the gradient of the fidelity term ${‖ A x - p ‖}_{2}^{2}$ , and $A^{*}$ is the adjoint of the matrix $A$ .

Tikhonov regularization emphasizes the smoothness of an image and tends to produce images with blurred edges. An alternative to Tikhonov regularization is sparse regularization, which is non-smooth and aims to find a solution with the minimum number of non-zero elements in certain transform domains^[198–200]. $L_{1}$ -norm-based regularization is one of the most commonly used sparse regularization methods and has the following form^[63,201]: $R (x) = {‖ Φ x ‖}_{1},$ (121)where $Φ$ is a transformation matrix that can be constructed using sparse basis functions, such as the wavelet function and finite difference operators. When a finite difference operator is employed, the $L_{1}$ -norm regularization becomes total variation (TV) regularization^[181], which can be written as $R (x) = {‖ x ‖}_{TV} = \sum_{j = 1}^{N} \sqrt{{({[x]}_{j} - {[x]}_{j_{x} - 1})}^{2} + {({[x]}_{j} - {[x]}_{j_{y} - 1})}^{2} + {({[x]}_{j} - {[x]}_{j_{z} - 1})}^{2}},$ (122)where $j_{x}$ − 1, $j_{y}$ − 1, and $j_{z}$ − 1 are the neighboring grids of the $j$ th grid along the $x$ -axis, $y$ -axis, and $z$ -axis, respectively, and $N$ is the total number of image grid points. Sparse regularization involves non-smooth cost functions and can be solved using proximal point algorithms, such as the iterative shrinkage thresholding algorithm (ISTA) and its accelerated version fast ISTA (FISTA)^[202], the iteratively reweighted least squares (IRLS)^[203], and the alternating direction method of multipliers (ADMM)^[204]. Specifically, the optimization problem in Eq. (118) can be solved by a proximal-gradient-descent scheme^[45]: $x^{k + 1} = {prox}_{R, λ} [x^{k} - α A^{*} (A x^{k} - p)],$ (123)where ${prox}_{R, λ}$ is a proximal operator related to the regularization term $R (x)$ and the parameter $λ$ . From Eqs. (120) and (123), we know that the update process of an IR algorithm is related to the current reconstructed image $x^{k}$ , the regularization term $R (x)$ , and the gradient term $A^{*} (A x^{k} - p)$ , as shown in Fig. 20.

Combining different regularization strategies may help achieve improved image quality compared with using only a single type of regularization^[205]. When dual regularization is imposed, the regularization term can be written as $R (x) = λ_{1} R_{1} (x) + λ_{2} R_{2} (x),$ (124)where $R_{1} (x)$ and $R_{2} (x)$ represent two types of regularization, and $λ_{1}$ and $λ_{2}$ are the corresponding coefficients. Based on this idea, Li et al. proposed non-local means and sparse-wavelet-based dual regularization and achieved image reconstruction with enhanced SNR and contrast^[205].

Figure 26 shows an example of IR-based image reconstruction in PACT. Figure 26(a) shows a numerical blood vessel phantom. The photoacoustic signals generated from the phantom are received by a 50-mm-diameter full-ring detector array with 64 elements enclosing the phantom. Figure 26(b) is the image reconstructed by FBP [Eq. (47)]. Severe streak artifacts are present in the reconstructed image due to the limited number of ultrasound detectors. Figure 26(c) is the image reconstructed by TV-based IR. The results show that the IR algorithm can produce images with fewer artifacts, demonstrating its superior performance under non-ideal imaging scenarios. The intensity profiles of the reconstructed images indicated by the white arrow are shown in Fig. 26(d) for comparison of the results of IR and FBP.

Figure 26.Image reconstruction in sparse-view imaging by IR. (a) Numerical blood vessel phantom. The photoacoustic signals generated from the phantom are received by a full-ring detector array with 64 elements enclosing the phantom. (b) Image reconstructed by FBP [Eq. (47)]. (c) Image reconstructed by TV-based IR. (d) Intensity profiles of the reconstructed images indicated by the arrow in (a). GT: ground truth.

Download full size

View all figures

Table 8 summarizes representative work on IR-based image reconstruction in PACT.

Table 8. Representative Work on IR-Based Image Reconstruction in PACT

View table

View all Tables

Table 8. Representative Work on IR-Based Image Reconstruction in PACT


Author	Year	System Matrix Model	Media	Detector Response	Regularization	Optimization	Dim	Source
Yalavarthy et al.	2021	k-space PSTD	Heterog	EIR	Non-local means + TV	IRLS	3D	[206]
Biton et al.	2019	Interpolation	Homog	–	Adaptive anisotropic TV	Chambolle-Pock	2D	[201]
Li et al.	2019	Interpolation	Homog	–	Non-local means + $L_{1}$ - norm	GD, FISTA	2D	[205]
Prakash et al.	2019	Interpolation	Homog	–	Second-order TV	CG	2D	[207]
Ding et al.	2017	Interpolation	Homog	SIR	–	LSQR	3D	[40]
Ding et al.	2016	Interpolation	Homog	–	–	LSQR	2D	[43]
Liu et al.	2016	Curve-driven	Homog	–	–	LSQR	2D	[208]
Javaherian& Holman	2016	k-space PSTD	Homog	–	TV	Multi-grid ISTA, FISTA	2D	[209]
Wang et al.	2014	KB function	Homog	EIR, SIR	Quadratic smoothness penalty/Tikhonov	Linear CG	3D	[38]
Mitsuhashi et al.	2014	Spherical voxel	Homog	EIR, SIR	Quadratic smoothness penalty	Fletcher-Reeves CG	3D	[189]
Huang et al.	2013	k-space PSTD	Heterog	EIR, SIR	TV	FISTA	2D	[41]
Wang et al.	2012	Spherical voxel	Homog	EIR, SIR	Quadratic smoothness Laplacian/TV	Fletcher-Reeves CG, FISTA	3D	[181]
Deán-Ben et al.	2012	Interpolation	Heterog (Weak)	–	Second-order Tikhonov	LSQR	2D	[210]
Deán-Ben et al.	2012	Interpolation	–	–	Tikhonov	LSQR	3D	[178]
Rosenthal et al.	2011	Interpolation	Homog	SIR	–	Pseudo-inverse, LSQR	2D	[211]
Wang et al.	2011	Spherical voxel	Homog	EIR, SIR	Quadratic smoothness penalty	Fletcher-Reeves CG	3D	[39]
Rosenthal et al.	2010	Interpolation	Homog	EIR	–	Pseudo-inverse, LSQR	2D	[37]
Provost & Lesage	2009	–	Homog	EIR	$L_{1}$ -norm	LASSO	2D	[198]
Paltauf et al.	2002	Spherical voxel	Homog	–	–	Algebraic iteration	3D	[36]

4 Deep Learning Approaches

The aforementioned conventional approaches can achieve high-quality image reconstruction under ideal imaging conditions. However, they may face challenges under non-ideal imaging conditions, such as limited detector bandwidth, finite detector aperture, limited view angle, sparse detector arrangement, and inhomogeneous acoustic media. Inspired by the successful applications of DL across industries such as computer vision^[212], natural language processing^[213], and biomedical engineering^[214], DL-based image reconstruction in tomography has drawn considerable attention in recent years^[215–217]. DL has been successfully used for image reconstruction in CT, magnetic resonance imaging (MRI), positron emission tomography (PET), ultrasound, and other imaging modalities^{[215,217,218]}. It can achieve tomographic image reconstruction with improved image quality and computational efficiency. In PACT, DL has been used to address a range of image reconstruction problems^[48,50], such as detector bandwidth expansion^[51,52], resolution enhancement^[53,54], low-power/cost imaging^[219,220], artifact removal^[55], reconstruction acceleration^[57], and reconstruction enhancement in sparse-view and limited-view imaging^{[44,46,47,58]}. Specifically, the applications of DL in PACT image reconstruction are mainly reflected in five aspects, including signal preprocessing in the data domain, image postprocessing in the image domain, hybrid-domain processing in both the data and the image domains, learned iterative reconstruction, and direct reconstruction.

4.1 Preprocessing in the Data Domain

Under non-ideal imaging conditions, the raw photoacoustic projection data measured by a detector array may be incomplete and contain noise. Data-domain signal preprocessing attempts to transform incomplete photoacoustic projection data to nearly complete projection data using neural networks before image reconstruction, as shown in Fig. 27.

Figure 27.DL-based projection data preprocessing in the data domain.

Download full size

View all figures

Many related studies have emerged based on this idea^[221]. Gutta et al. proposed a simple five-layer fully connected network to expand the bandwidth of measured photoacoustic projection data and achieved image reconstruction with improved contrast and quality^[51]. Awasthi et al. developed a UNet-based model for super-resolution, denoising, and bandwidth enhancement of photoacoustic projection data in sparse-view imaging and achieved improved image reconstruction, as shown in Fig. 28^[52]. Here, UNet is a specially designed U-shaped convolutional neural network, which has been widely used in various image processing tasks such as image denoising, image deconvolution, and image reconstruction. Zhang et al. designed a network based on a channel-wise attention mechanism to extract feature relations between sparse data channels and achieved full-channel projection data recovery from sparse input^[222]. Zhang et al. proposed a UNet-based network to correct photoacoustic projection data with skull-induced aberration in brain imaging and demonstrated that the aberration could be effectively removed after preprocessing^[223]. These studies show that preprocessing projection data in the data domain can help enhance image reconstruction quality.

Figure 28.DL-based preprocessing helps correct photoacoustic projection data and enhances image reconstruction quality. The photoacoustic projection data in this case were recorded by sparsely distributed detectors with finite bandwidths. (a) Architecture of the UNet used for signal preprocessing. (b) Reconstructed image of a living rat brain using the raw bandwidth-limited projection data of 100 detectors. (c) Reconstructed image using the interpolated projection data of 200 detectors. (d) Reconstructed image using the interpolated projection data denoised by automated wavelet denoising (AWD). (e) Reconstructed image using the interpolated projection data (c) processed by the super-resolution CNN (SRCNN). (f) Reconstructed image using the interpolated projection data (c) processed by the UNet in (a). Adapted from Ref. [52] with permission.

Download full size

View all figures

Table 9 lists representative work on DL-based signal preprocessing in PACT.

Table 9. Representative Work on DL-Based Signal Preprocessing in PACT

View table

View all Tables

Table 9. Representative Work on DL-Based Signal Preprocessing in PACT


Author	Year	Problem	Network	Dataset	Source
Zhang et al.	2023	Skull-induced aberration correction	UNet	Simulation	[223]
Zhang et al.	2022	Sparse-view imaging	Transformer	Simulation	[222]
Awasthi et al.	2020	Bandwidth expansion and sparse-view imaging	UNet	Simulation, phantom (test), in vivo (test)	[52]
Gutta et al.	2017	Detector bandwidth expansion	Five fully connected layers	Simulation, phantom (test)	[51]

4.2 Postprocessing in the Image Domain

In addition to signal preprocessing in the data domain, another application of DL in PACT is image postprocessing in the image domain, which indicates that deep neural networks can be used as postprocessing tools for image enhancement (see Fig. 29). This approach is especially useful for image artifacts removal in non-ideal imaging scenarios.

Figure 29.DL-based image postprocessing in the image domain.

Download full size

View all figures

To mitigate the image artifacts that often appear in sparse-view or/and limited-view PACT images, a variety of DL-based postprocessing methods have been proposed. For example, Antholzer et al. proposed a UNet-based architecture to process images reconstructed by FBP in sparse-view imaging and demonstrated the effectiveness of the model in artifact removal^[44]. Farnia et al. developed a UNet-based network to process images reconstructed by TR and achieved artifact suppression with limited projection data^[224]. Davoudi et al. proposed a UNet-based framework for image quality recovery from sparse and limited projection data and demonstrated its performance with whole-body mouse imaging in vivo, as shown in Fig. 30^[46].

Figure 30.DL-based postprocessing for image quality improvement in spare-view PACT. (a) Modified UNet for image postprocessing. (b) Reference images reconstructed using 512-channel projection data and their close-ups. (c) Images reconstructed with 32-channel projection data and their postprocessed versions by the modified UNet. (d) Images reconstructed with 128-channel projection data and their postprocessed versions by the modified UNet. Adapted from Ref. [46] with permission.

Download full size

View all figures

In addition to the classic UNet, its variants are also commonly used as postprocessing tools in PACT. Guan et al. developed a fully dense UNet (FD-UNet) for image artifact removal in sparse-view PACT imaging and demonstrated its superiority over the conventional UNet^[225]. Zhang et al. proposed a dual-domain UNet (DuDoUNet) with a specially designed information-sharing block, which takes both time-domain DAS images and k-space images as inputs, and verified its superiority with a public clinical database^[226]. Guan et al. proposed a 3D modified UNet called dense dilation UNet (DD-UNet) and demonstrated its effectiveness in improving the imaging quality in 3D sparse-view and limited-view imaging^[227]. Choi et al. reported a 3D progressive UNet for multi-parameter dynamic volume imaging and demonstrated that it could improve imaging speed and reduce image artifacts in limited-view PACT imaging, as shown in Fig. 31^[47].

Figure 31.DL-based postprocessing for imaging quality enhancement in limited-view 3D PACT. (a) Architecture of the 3D progressive UNet. (b) Reconstructed images in full-view imaging, cluster (limited) view imaging, and DL-enhanced cluster imaging. Adapted from Ref. [47] with permission.

Download full size

View all figures

In addition to UNet and its variants, generative adversarial networks (GANs) are another deep neural network commonly used for image postprocessing in PACT^[228]. GAN shows excellent image processing capabilities and has been successfully used in many applications, such as image style conversion, image enhancement, and image restoration. A GAN network consists of two sub-networks, i.e., a generator and a discriminator, which are adversarially trained until the discriminator cannot distinguish the result produced by the generator and the ground truth. Vu et al. explored the application of a Wasserstein GAN with gradient penalty (WGAN–GP) in limited-view and limited-bandwidth PACT imaging and achieved image artifact removal^[229]. Lu et al. proposed a GAN-based DL approach (LV-GAN) to recover high-quality images in limited-view imaging and achieved artifact removal and high recovery accuracy under a view angle of less than 60°^[58]. Shahid et al. developed a residual-network-based GAN (ResGAN) to improve image quality by denoising and removing image artifacts in sparse-view imaging^[230]. The deep neural networks in the above studies generally require supervision and need a large number of paired images for training, which are often difficult to obtain, especially in experiments. To address this problem, Lu et al. proposed an unsupervised cyclic GAN network (CycleGAN) that only needs unpaired training and successfully achieved artifact removal in sparse-view and limited-view PACT images^[231].

In addition to solving the image enhancement problem in sparse-view and limited-view imaging, DL-based postprocessing has also been used to address a variety of other imaging problems, such as reflection artifact removal and aberration mitigation. For example, Shan et al. incorporated a deep neural network into conventional iterative algorithms and successfully corrected reflection artifacts caused by planar echogenic structures outside imaging tissues^[232]. Jeon et al. developed a hybrid deep neural network based on SegNet and UNet and achieved SOS aberration mitigation, streak artifact removal, and temporal resolution improvement in sparse-view imaging^[233]. Gao et al. proposed a modified encoder-decoder UNet to learn the mapping relationship between speckle patterns and target images in thick porous media and solved the scattering problem in transcranial photoacoustic brain imaging^[234]. Shijo et al. proposed a shifted-window transformer to restore artifact-free images from artifact-heavy images reconstructed using a TR algorithm^[235].

Deep neural networks have also been used to improve the spatial resolution of PACT images. For example, Rajendran and Pramanik employed a fully dense U-Net termed TARES to improve the spatially variant tangential resolution in circular scanning PACT imaging systems^[53]. Zhang et al. exploited a fully dense UNet named Deep-E to improve the elevational resolution of linear-detector-array-based PACT imaging^[54]. Based on this work, the same research group proposed a new Deep-E combining a 2D Deep-E and a 3D Deep-E and demonstrated its performance in elevational resolution improvement and deep vessel recovery in linear-detector-array-based PACT imaging^[236].

In addition, DL can also be used to reduce noise in low-fluence light emitting diode (LED)-based PACT imaging. For example, Anas et al. designed a deep neural network consisting of a CNN and a recurrent neural network (RNN) to improve the SNR of noise-corrupted images in LED-based PACT imaging^[219]. Hariri et al. proposed a multi-level wavelet CNN (MWCNN) model to enhance image contrast in low-fluence PACT imaging^[220]. The above work demonstrates that DL-based postprocessing can enhance photoacoustic images in different aspects, including artifact removal, contrast boost, resolution improvement, and aberration mitigation.

In addition to the above problems, DL-based postprocessing methods have also been used to address other problems in PACT, such as quantitative imaging^[237–242], image fusion^[243], image classification and segmentation^[244–246], and band-frequency separation^[247]. Table 10 summarizes representative work on DL-based image postprocessing in PACT.

Table 10. Representative Work on DL-Based Image Postprocessing in PACT

View table

View all Tables

Table 10. Representative Work on DL-Based Image Postprocessing in PACT


Author	Year	Problem	Network	Dataset	Source
Image enhancement from incomplete projection data
Choi et al.	2022	Limited-view imaging	3D progressive UNet	In vivo	[47]
Shahid et al.	2022	Sparse-view imaging	ResGAN	In vivo (public dataset)	[230]
Shahid et al.	2021	Sparse-view imaging	3-layer CNN, UNet, ResUNet	In vivo	[248]
Zhang et al.	2021	Sparse-view imaging	DuDoUNet	Simulation	[226]
Guan et al.	2021	Sparse-view and limited-view imaging	DD-UNet	Simulation	[227]
Godefroy et al.	2021	Limited-view and finite-bandwidth imaging	UNet	Simulation, phantom	[249]
Lu et al.	2021	Limited-view imaging	LV-GAN	Simulation, phantom, in vivo (test)	[58]
Lu et al.	2021	Sparse-view imaging	CycleGAN	Simulation, phantom, in vivo	[231]
Guan et al.	2020	Sparse-view imaging	FD-UNet	Simulation	[225]
Farnia et al.	2020	Sparse-view imaging	UNet	Simulation, in vivo (test)	[224]
Vu et al.	2020	Limited-view and finite-bandwidth imaging	WGAN-GP	Simulation, phantom (test), in vivo (test)	[229]
Zhang et al.	2020	Sparse-view and limited-view imaging	RADL-Net(10-layer CNN)	Simulation, phantom (test), in vivo (test)	[250]
Antholzer et al.	2018	Sparse-view imaging	UNet	Simulation	[44]
Davoudi et al.	2019	Sparse-view and limited-view imaging	UNet	Simulation, phantom, in vivo	[46]
Inhomogeneous acoustic media
Gao et al.	2022	Thick porous media	UNet	Simulation, phantom, ex vivo	[234]
Jeon et al.	2020	SOS aberration	SegUNet	Simulation, phantom, in vivo (test)	[233]
Shan et al.	2019	Reflection artifact correction	UNet	Simulation	[232]
Resolution enhancement
Zheng et al.	2022	Elevational resolution enhancement	Deep-E (2D and 3D FD-UNet)	Simulation, phantom (test), in vivo (test)	[236]
Zhang et al.	2021	Elevational resolution enhancement	Deep-E (2D FD-UNet)	Simulation, phantom (test), in vivo (test)	[54]
Rajendran & Pramanik	2020	Tangential resolution enhancement	TARES (FD-UNet)	Simulation, phantom (test), in vivo (test)	[53]
Low-energy imaging
Hariri et al.	2020	Low-fluence imaging	Multi-level wavelet UNet	Phantom, in vivo (test)	[220]
Anas et al.	2018	LED imaging	RNN	Phantom, in vivo (test)	[219]
Image classification and segmentation
Lafci et al.	2021	Segmentation	UNet	In vivo	[246]
Chlis et al.	2020	Segmentation	Sparse UNet	In vivo	[245]
Zhang et al.	2019	Classification and segmentation	AlexNet and GoogLeNet	Simulation	[244]
Others
González et al.	2023	Band-frequency separation	FD-UNet	Simulation	[247]
Rajendran & Pramanik	2022	Imaging speed improvement	UNet	Simulation, phantom (test), in vivo (test)	[251]
Rajendran & Pramanik	2021	Radius calibration	RACOR-PAT(FD-UNet)	Simulation, phantom, in vivo	[252]
Awasthi et al.	2019	Image fusion	PA-Fuse (DeepFuse)	Simulation, phantom (test), in vivo (test)	[243]

4.3 Hybrid-Domain Processing

Data-domain preprocessing and image-domain postprocessing can only extract feature information from one domain. To achieve improved imaging results, hybrid-domain processing attempting to extract feature information from both the data domain and the image domain (Fig. 32) is also commonly used.

Figure 32.DL-based hybrid-domain processing. The neural networks 1 and 2 are used to extract feature information from the signal domain and the image domain, while the neural network 3 is used to fuse the outputs of the preceding two networks to generate the final images.

Download full size

View all figures

Based on this idea, Lan et al. proposed a hybrid deep neural network termed knowledge-infused GAN (Ki-GAN) for image enhancement in sparse-view PACT imaging^[253]. The Ki-GAN employs raw photoacoustic signals and DAS-reconstructed photoacoustic images as input and can produce images with better quality than conventional DAS-type algorithms and a UNet-based postprocessing method for both fully and sparsely sampled data. Based on this work, the authors further developed a new hybrid-domain DL model named Y-Net for image enhancement in limited-view PACT imaging^[254]. The Y-Net consists of two encoders and one decoder. The two encoders are used to separately process raw photoacoustic signals and DAS-reconstructed photoacoustic images, while the decoder is employed to fuse the outputs of the two encoders to generate final images. Davoudi et al. developed a hybrid-domain CNN model using both time-resolved photoacoustic signals and reconstructed images as input to enhance the image quality in limited-view PACT imaging^[255] and achieved excellent results as shown in Fig. 33. Moreover, Guo et al. proposed a hybrid-domain attention steered network (AS-Net) for image enhancement in sparse-view PACT imaging^[256]. The AS-Net also takes raw photoacoustic signals and DAS-reconstructed images as inputs, but the photoacoustic signals are first transformed into a 3D square matrix before being fed into the network to ensure compatibility with the input of the AS-Net. The AS-Net adopts a self-attention mechanism for semantic feature extraction and fusion and is very efficient.

Figure 33.DL-based hybrid-domain processing for enhancing the imaging quality in limited-view PACT imaging. (a) Architecture of the dual-domain CNN. (b) Reconstructed results of an in vivo human finger. Adapted from Ref. [255] with permission.

Download full size

View all figures

In addition to exploiting raw projection data directly, preprocessed projection data can also be used as the input of a hybrid-domain network. For example, Li et al. proposed a hybrid-domain CNN model called PRNet to improve the quality of sparse-view photoacoustic images^[257]. The PRNet uses not only raw photoacoustic projection data but also their derivatives as input. Lan et al. proposed a joint feature fusion framework (JEFF-Net) to enhance photoacoustic images from limited-view projection data^[258]. The JEFF-Net uses the sub-DAS images generated from each-channel raw projection data and the whole DAS images generated from all projection data as input, as shown in Fig. 34.

Figure 34.JEFF-Net for image enhancement in limited-view PACT imaging. RGC-Net: residual global context subnetwork; SCTM: space-based calibration and transition module; GT: ground truth. Reprinted from Ref. [258] with permission.

Download full size

View all figures

Table 11 lists representative work on DL-based hybrid-domain processing in PACT.

Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT

View table

View all Tables

Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT


Author	Year	Problem	Network	Dataset	Source
Inputs: raw data + reconstructed image
Guo et al.	2022	Sparse-view imaging	AS-Net	Simulation, in vivo	[256]
Davoudi et al.	2021	Limited-view imaging	CNN	In vivo	[255]
Lan et al.	2020	Limited-view imaging	Y-Net	Simulation, ex vivo (test), in vivo (test)	[254]
Lan et al.	2019	Sparse-view imaging	Ki-GAN	Simulation	[253]
Inputs: preprocessed raw data + reconstructed image
Lan et al.	2023	Limited-view imaging	JEFF-Net	Simulation, in vivo	[258]
Li et al.	2021	Sparse-view imaging	PR-Net	Simulation, phantom	[257]

4.4 Learned Iterative Reconstruction

Conventional DL approaches generally employ deep neural networks as independent modules for signal preprocessing, image postprocessing, or hybrid processing. In contrast, learned iterative reconstruction (IR) algorithms attempt to fuse deep neural networks into an IR model to improve the quality and efficiency of image reconstruction. Specifically, learned IR algorithms employ deep neural networks to learn the regularization term or regularizer $R (x)$ ^[259,260] or learn an entire iteration process^[45,261], as shown in Fig. 35.

Figure 35.Learned IR method in PACT.

Download full size

View all figures

A regularizer is important for IR-based image reconstruction, especially for limited projection data. Conventional regularizers such as Tikhonov and TV may not necessarily be optimal. Data-driven DL-based regularizers were thus proposed to replace conventional regularizers. Li et al.^[259] and Antholzer et al.^[260] used a neural network to learn the Tikhonov regularizer to improve the image quality in sparse-view PACT imaging. However, the Tikhonov regularizer is trained before the IR loop and is not updated during iteration. Therefore, the learned Tikhonov regularizer may not be optimal for the image reconstruction problem in PACT. Wang et al. considered the entire IR process and proposed a learned IR model based on Eq. (120)^[197], as shown in Fig. 36. The learned IR model takes the current reconstructed image $x^{k}$ and the gradient term $A^{T} (A x^{k} p)$ as inputs and updates the regularizer and the hypermeters $α$ and $λ$ in each iteration. The experimental results show that the learned IR model is effective at improving the quality of sparse-view PACT images and has a faster convergence speed than conventional IR algorithms. Moreover, Lan et al. proposed a novel untrained CNN-based compressed sensing regularizer for sparse-view PACT imaging and achieved improved image quality compared with conventional IR algorithms^[262].

Figure 36.Learning a regularizer. (a) Dual-path network for regularizer learning. (b) Cross-sectional images of a mouse reconstructed by IR with Tikhonov, TV, and learned regularizers. The learned regularizer has the highest image quality in terms of detail preservation and artifact suppression. Adapted from Ref. [197] with permission.

Download full size

View all figures

In addition to learning a regularizer, it is also possible to learn an entire iteration process [Eqs. (120) and (123)] using deep neural networks. For example, Hauptmann et al. proposed a deep gradient descent (DGD) algorithm to reconstruct high-resolution 3D images from restricted photoacoustic projection data in limited-view PACT imaging^[45]. In the DGD algorithm, a CNN model was designed to learn the proximal operator in Eq. (123), which yields high-quality images at a fast convergence speed, as shown in Fig. 37. Similarly, Yang et al. proposed using recurrent inference machines (RIMs) to learn an iteration process in IR and achieved accelerated reconstruction^[261]. Lan et al. developed a CNN network combining a variational model to learn an iteration process and achieved robust and accelerated reconstruction in limited-view PACT imaging^[263]. Shan et al. proposed a simultaneous reconstruction network (SR-Net) to update the initial pressure $x$ and the sound speed $v_{0}$ at each iteration of IR reconstruction for PACT imaging in acoustically heterogeneous media^[264]. Moreover, Boink et al. proposed a learned primal-dual (L-PD) framework to learn an iteration process in IR and achieved simultaneous high-quality image reconstruction and segmentation in limited-view and sparse-view PACT imaging^[265]. Recently, diffusion models in DL have attracted great attention in the field of biomedical image processing because of their powerful generative capabilities^[266,267]. In PACT, some studies have incorporated diffusion models into iterative optimization frameworks, which show superior performance compared with conventional IR algorithms^[268–271].

Figure 37.Learning 3D IR based on DGD. (a) Structure of the CNN representing one iteration of DGD. (b) From left to right: initialization of the network, DGD result with five iterations, TV result with 50 iterations, and reference image. The proposed learned IR has a faster convergence speed than the conventional TV-based IR algorithm. Adapted from Ref. [45] with permission.

Download full size

View all figures

Generally, compared with conventional IR algorithms, learned IR methods can improve reconstruction efficiency by reducing the number of iterations. However, they are still time-consuming because the reconstruction model [Eqs. (120) and (123)] needs to be solved repeatedly. To accelerate the convergence of learned IR algorithms, Hsu et al. proposed a fast iterative reconstruction (FIRe) algorithm that simultaneously learns the discrete forward photoacoustic imaging model in Eq. (86) and an entire iteration process to reduce the reconstruction time^[57]. Results showed that the FIRe algorithm can produce images with a quality comparable to that of learned IR and conventional IR algorithms but with reconstruction time reduced by nine and 620 times, respectively.

Table 12 lists representative work on DL-based IR in PACT.

Table 12. Representative Work on DL-Based IR in PACT

View table

View all Tables

Table 12. Representative Work on DL-Based IR in PACT


Author	Year	Problem	Network	Dataset	Source
Regularizer learning
Wang et al.	2022	Sparse-view imaging	CNN	Simulation, in vivo	[197]
Lan et al.	2021	Sparse-view imaging	Untrained CNN (deep image prior)	Simulation (test), phantom (test), in vivo (test)	[262]
Antholzer et al.	2019	Sparse-view imaging	ResUNet	Simulation, phantom (test)	[260]
Li et al.	2018	Sparse-view imaging	Encoder-decoder	Simulation	[259]
Entire IR learning
Hsu et al.	2023	Reconstruction acceleration	FIRe	Simulation	[57]
Lan et al.	2022	Limited-view imaging	CNN	Simulation, in vivo (public dataset)	[263]
Boink et al.	2020	Image reconstruction and segmentation	CNN	Simulation, phantom	[265]
Yang et al.	2019	Reconstruction acceleration	RIM	Simulation	[261]
Shan et al.	2019	Joint reconstruction	SR-Net	Simulation	[264]
Hauptmann et al.	2018	Limited view and acceleration	CNN	Simulation, phantom, in vivo	[45]
Diffusion model
Guo et al.	2024	Limited-view imaging	UNet	Simulation, in vivo	[268]
Dey et al.	2024	Limited-view and sparse-view imaging	CNN	Simulation, in vivo	[270]

4.5 Direct Reconstruction

In the aforementioned DL-based approaches, deep neural networks generally perform only certain functions, such as preprocessing, postprocessing, and regularizer learning, and cannot independently reconstruct images from raw projection data without the use of conventional algorithms. Nevertheless, deep neural networks can perform direct image reconstruction alone by learning the mapping relationship between raw photoacoustic projection data and reconstructed images (see Fig. 38).

Figure 38.DL-based direct image reconstruction in PACT.

Download full size

View all figures

In 2018, Waibel et al. used a UNet-like model to reconstruct images from limited-view photoacoustic projection data and demonstrated the feasibility of neural-network-based direct image reconstruction through numerical simulation^[272]. Lan et al. proposed a UNet-based deep neural network (DUNet) to reconstruct PACT images from multi-frequency photoacoustic projection data and demonstrated its superiority over conventional image reconstruction algorithms^[273]. Feng et al. developed a residual-block-based end-to-end UNet (Res-UNet) to reconstruct PACT images from raw photoacoustic projection data and achieved high-quality image reconstruction with sharp edges and suppressed artifacts^[274]. Lan et al. designed an encoder-decoder network to transform superimposed four-channel photoacoustic projection data into reconstructed images and achieved real-time PACT imaging with a single-channel data acquisition (DAQ) system^[275]. Recently, Shen et al. developed a physics-driven DL-based filtered back projection (dFBP) framework for direct image reconstruction from raw projection data^[276]. The dFBP network is constructed based on the physical model of the analytical FBP algorithm [Eq. (47)] and consists of a filtering module, a back-projection module, and a fusion module, as shown in Fig. 39. The proposed dFBP is robust and flexible, can achieve direct signal-to-image transformation with enhanced accuracy, and reconstruct high-quality, artifact-suppressed images from sparse-view, limited-view, and acoustic heterogeneity-contaminated projection data. Moreover, Lan et al. designed a self-supervised deep-learning framework for image reconstruction from extremely sparse projection data. The success of the network can be mainly attributed to the adoption of a transformer-based self-attention mechanism^[277].

Figure 39.Direct image reconstruction using dFBP. (a) Physical model of the analytical FBP [Eq. (47)]. (b) Architecture of the dFBP, which consists of a filtering module, a back-projection module, and a fusion module. (c), (d) Images separately reconstructed by FBP and dFBP using 128-channel photoacoustic projection data. (e), (f) Images separately reconstructed by FBP and dFBP from limited-view photoacoustic projection data (view angle: $3 π / 4$ ). Reprinted from Ref. [276] with permission.

Download full size

View all figures

To reduce the complexity of network training, raw photoacoustic projection data can be preprocessed to produce high-level features before being fed into an image reconstruction network. Guan et al. proposed UNet-based pixel-wise deep learning (Pixel-DL) that uses pixel-interpolated data as input and realized direct image reconstruction in limited-view and sparse-view PACT imaging^[278]. Similarly, Kim et al. developed a deep convolutional network (upgUNet), which takes multi-channel features extracted from raw photoacoustic projection data as input, and achieved direct image reconstruction in limited-view PACT imaging^[279], as shown in Fig. 40(a). Inspired by the principle of FBP, Tong et al. designed a feature projection network (FPNet) that takes raw photoacoustic projection data and their first-order derivative as input [Fig. 40(b)] and achieved high reconstruction quality from limited-view data with sparse measurements^[280]. Recently, Dehner et al. proposed a deep neural network named DeepMB for real-time IR image reconstruction with adjustable SOS^[281,282]. DeepMB learns conventional IR algorithms using domain-transformed projection data as input and can reconstruct high-quality images 3000 times (less than 10 ms per image) faster than conventional IR algorithms.

Figure 40.DL-based direct image reconstruction using high-level features extracted from raw photoacoustic projection data. (a) Architecture of the upgUNet proposed by Kim and coworkers. Reprinted from Ref. [279] with permission. (b) Architecture of the FPNet + UNet proposed by Tong and coworkers (top) and representative reconstruction results (bottom). Adapted from Ref. [280] with permission.

Download full size

View all figures

Table 13 lists representative work on DL-based direct image reconstruction in PACT.

Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT

View table

View all Tables

Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT


Author	Year	Problem	Network	Dataset	Source
Input: raw data
Shen et al.	2024	Sparse-view/limited-view imaging	Learned FBP	Simulation, in vivo	[276]
Lan et al.	2023	Channel data decoupling	Encoder-decoder	Simulation, in vivo	[275]
Feng et al.	2020	Direct image reconstruction	Res-UNet	Simulation, phantom (test)	[274]
Lan et al.	2019	Multi-frequency image reconstruction	DUNet	Simulation	[273]
Allman et al.	2018	Image reconstruction with source detection and reflection artifacts removal	Deep fully convolutional network + Fast R-CNN	Simulation, phantom	[283]
Waibel et al.	2018	Limited-view imaging	UNet	Simulation	[272]
Input: high-level features extracted from raw data
Dehner et al.	2022	IR algorithm acceleration	DeepMB	Simulation, in vivo (test)	[281]
Guan et al.	2020	Sparse-view and limited-view imaging	FD-UNet	Simulation	[278]
Kim et al.	2020	Limited-view imaging	upgUNet	Simulation, phantom (test), in vivo (test)	[279]
Tong et al.	2020	Sparse-view & limited-view imaging	FPNet +UNet	Simulation, phantom, in vivo	[280]

5 Performance Comparisons

Thus far, we have reviewed the principles of the six classes of image reconstruction algorithms in PACT, namely, DAS, FBP, SE, TR, IR, and DL. To choose the most appropriate algorithm in practice, it is necessary to understand the characteristics and performance of each method under different imaging scenarios. In this section, we discuss and compare the six classes of algorithms, focusing on image reconstruction quality and image reconstruction speed. Table 14 first summarizes the overall characteristics and performance of each algorithm when evaluated under different circumstances, which will be discussed in detail below.

Table 14. Comparison of Different Image Reconstruction Algorithms in PACT^a

View table

View all Tables

Table 14. Comparison of Different Image Reconstruction Algorithms in PACT^a


Circumstance	DAS	FBP	SE	TR	IR	DL
Detector EIR modeling^b	Difficult	Difficult	Difficult	Difficult	OK	OK
Detector SIR modeling^c	Difficult	Difficult	Difficult	Difficult	OK	OK
Performance in sparse-view imaging	Poor	Poor	Poor	Poor	Good	Excellent
Performance in limited-view sampling	Poor	Poor	Poor	Poor	Good	Excellent
Media property coupling	Difficult	Difficult	Difficult	OK	OK	OK
Speed	Fast	Fast	Very fast	Slow	Very slow	Fast
Memory footprint	Very low	Very low	Low	High	Very high	High or very high

5.1 Image Reconstruction Quality under Various Imaging Scenarios

5.1.1 Ideal imaging conditions

To achieve perfect image reconstruction or acoustic inversion in PACT, photoacoustic signal detection should be performed under ideal conditions: (1) the ultrasound detectors used for signal detection should have an infinite bandwidth; (2) the ultrasound detectors should have a point-like aperture; (3) the ultrasound detectors should be densely arranged in space; (4) the detector array formed by individual detectors should have a full view angle with respect to a sample; and (5) the acoustic media should be homogeneous. When these conditions are satisfied, all six classes of image reconstruction algorithms are expected to produce high-quality images.

However, practical photoacoustic signal detection is unlikely to be ideal. In the following section, we discuss and compare the performances of the image reconstruction algorithms under non-ideal imaging scenarios.

5.1.2 Limited detector bandwidth

The ultrasound detectors used for photoacoustic signal detection should ideally have an infinite bandwidth so that they can respond to all frequency contents of a signal^[4]. However, a practical detector always has a limited bandwidth and a non-ideal EIR, which distorts measured photoacoustic signals and blurs reconstructed images. To deal with this problem, a reconstruction algorithm should consider the non-ideal EIR. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. It is generally difficult for DAS, FBP, SE, and TR algorithms to incorporate the non-ideal EIR of a detector into their reconstruction models but easy for IR- and DL-based algorithms, as summarized in Table 14. Specifically, IR algorithms can easily solve the non-ideal EIR problem by incorporating it into the system matrix $A$ [Eq. (110)], and DL algorithms can solve the problem by training a reconstruction network with EIR-corrected signal-image datasets.

To understand how different image reconstruction algorithms behave in the case of non-ideal EIR, an example is given in Fig. 41. In this example, the photoacoustic source is a numerical blood vessel phantom distributed in the $x - y$ plane [Fig. 41(a)]. The detector array used for imaging is a full ring with a diameter of 50 mm and has 512 evenly distributed elements, each of which has a point-like shape. To simulate the limited bandwidth case, the detector array is assigned a center frequency of 2 MHz and is assumed to have a Gaussian-like EIR with a fractional bandwidth of 80%. Figures 41(b)–41(d) show the images reconstructed by the FBP-, TR-, and EIR-corrected IR algorithms, respectively. Figures 41(e)–41(g) are the images of Figs. 41(b)–41(d) with negative values removed (negative values in a photoacoustic image have no physical meaning). The results show that the images reconstructed by FBP and TR have apparent artifacts, while the images reconstructed by IR have much fewer artifacts due to the correction of the detector EIR. Note that since DAS and SE typically have similar performance to FBP and are commonly adopted in PACT systems with linear and planar detector arrays, they are not compared in this example.

Figure 41.Performance comparison of FBP, TR, and IR when the detectors used for signal detection have a limited bandwidth. (a) A full-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b)–(d) Images reconstructed by FBP, TR, and EIR-corrected IR algorithms, respectively. (e)–(g) Images in (b)–(d) with negative values removed and close-up views of the images in the red dashed box. In this example, the detector array is assumed to have a Gaussian-like EIR with a bandwidth of 80%. Other simulation settings can be found in the text.

Download full size

View all figures

Although DAS, FBP, SE, and TR algorithms are generally not ready to incorporate the EIR of a practical detector, their input signals can be preprocessed by deconvolution or DL to correct the non-ideal EIR. In deconvolution, the EIR of a detector is first experimentally measured using a point object and then incorporated into the signal detection model [Eq. (23)] for correction. In DL-based preprocessing, the EIR of a detector can be compensated for by training a neural network using EIR-corrected photoacoustic signals. Figure 42 shows an example demonstrating that signal preprocessing helps correct the non-ideal EIR of a detector and improves image reconstruction results^[51]. In this example, the photoacoustic source is a numerical blood vessel, as shown in Fig. 42(a). Photoacoustic signals were measured by a full-ring detector array consisting of 100 elements with a radius of 37.02 mm, a center frequency of 2.25 MHz, and a fractional bandwidth of 70%. Figures 42(b) and 42(c) are images reconstructed using full and limited-bandwidth photoacoustic signals, respectively. Figures 42(d) and 42(e) are images reconstructed using signals separately preprocessed by deconvolution and a deep neural network, which were both designed to compensate for the EIR effect of the detector array. Figures 42(f)–42(j) show the corresponding results for a Derenzo phantom. The simulation results show that deconvolution and DL-based signal preprocessing can effectively correct non-ideal EIRs and improve image reconstruction quality.

Figure 42.Signal preprocessing helps correct the non-ideal EIR of a detector and improves image reconstruction quality. First row: (a) numerical blood vessel used for the test. (b), (c) Images reconstructed using full and limited-bandwidth photoacoustic signals, respectively. (d), (e) Images reconstructed using signals preprocessed by deconvolution and a deep neural network, respectively. Second row: (f)–(j) a Derenzo phantom and corresponding results. Detailed simulation settings can be found in the text. Adapted from Ref. [51] with permission.

Download full size

View all figures

5.1.3 Finite detector aperture size

In addition to having an infinite bandwidth, an ideal ultrasound detector used for photoacoustic signal detection should also have a point-like shape so that it can respond to photoacoustic signals from all directions^[4]. However, a practical detector always has a finite aperture size and a non-ideal SIR, which distorts measured photoacoustic signals and blurs reconstructed images. To address this problem, a reconstruction algorithm should consider the non-ideal SIR. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. It is generally difficult for DAS, FBP, SE, and TR algorithms to incorporate the non-ideal SIR of a detector into their reconstruction models but easy for IR- and DL-based algorithms, as summarized in Table 14. Specifically, IR algorithms can easily solve the non-ideal SIR problem by incorporating it into the system matrix $A$ [Eq. (93)], and DL algorithms can solve this problem by training a reconstruction neural network with SIR-corrected signal-image datasets.

To understand how different image reconstruction algorithms behave in the case of non-ideal SIRs, a simulation from^[286] is adapted and shown in Fig. 43. In this example, the photoacoustic source has six-point absorbers located at different distances from the origin. Photoacoustic signals were recorded by a detector with a 6-mm-diameter aperture size and a center frequency of 5 MHz. The detector rotated around the photoacoustic source to record signals at different positions. Figures 43(a) and 43(b) show the images reconstructed by a model-based algorithm and DAS, respectively. The model-based algorithm uses a similar discrete imaging model [Eq. (86)] as IR and couples the SIR of the detector in its system matrix. The results show that the image reconstructed by DAS has significant distortions while the image recovered by the model-based algorithm has much fewer distortions due to the correction of the detector SIR.

Figure 43.Performance comparison of a model-based algorithm and DAS when the detector used for signal detection has a finite aperture size. (a), (b) Images reconstructed by the model-based algorithm and DAS, respectively. The model-based algorithm uses a similar discrete imaging model [Eq. (86)] as IR but couples the detector SIR in its system matrix. Reprinted from Ref. [286] with permission.

Download full size

View all figures

Although DAS, FBP, SE, and TR algorithms are generally not ready to incorporate the SIR of a practical detector, their output images can be postprocessed by deconvolution or DL to correct the non-ideal SIR. In deconvolution, the SIR of a detector is first modeled using linear acoustics^[39,287,288] and then incorporated into a non-blind image deconvolution model for SIR correction. Alternatively, the SIR can be unknown and a blind deconvolution algorithm is employed to correct the non-ideal SIR. In DL-based postprocessing, the SIR of a detector can be compensated for by training a neural network using SIR-corrected photoacoustic images. Figure 44 shows an example demonstrating that DL-based image postprocessing helps correct the non-ideal SIR of a detector and improves image quality^[53]. In this example, the photoacoustic source contains four-point targets. The detector used for signal detection has a finite aperture size and a center frequency of 2.25 MHz. The detector rotates around the photoacoustic source to measure signals at different positions. Figures 44(a) and 44(b) are images reconstructed using DAS without and with DL-based image postprocessing, respectively. The results show that the DAS-reconstructed image without postprocessing has significant distortions, while the image with postprocessing has much fewer distortions. This indicates that DL-based image postprocessing can effectively correct non-ideal SIRs and improve image reconstruction quality.

Figure 44.Image postprocessing helps correct the non-ideal SIR of a detector and improves image quality. In this example, a detector with a finite aperture size rotates around the four-point sources for signal detection. (a), (b) Images reconstructed by DAS without and with DL-based postprocessing, respectively. Adapted from Ref. [53] with permission.

Download full size

View all figures

Although here we independently discuss the EIR and SIR of a detector, they are never separable and together constitute the total impulse response (TIR) of a detector. To address the TIR of a detector, methods similar to those used for EIR and SIR correction can be used. In other words, the TIR of a detector can be experimentally measured and incorporated into IR or DL models for improved image reconstruction^[289]. Alternatively, non-blind deconvolution, blind deconvolution, and DL-based image postprocessing techniques can also be adopted to correct the non-ideal TIR of a detector^[290–292].

5.1.4 Sparse-view imaging

Ideally, the ultrasound detectors used for photoacoustic signal detection should be densely arranged in space to satisfy the spatial Nyquist sampling theorem^[4]. However, the detectors in a practical detector array are often sparse due to high fabrication costs, leading to the ill-posed problem of image reconstruction from sparse projection data. The six classes of algorithms reviewed in this paper have different performances when dealing with this problem. Generally, analytical algorithms such as DAS, FBP, SE, and TR have similar performance and require more projection data to reconstruct an image with reasonable image quality than do IR algorithms, which solve the image reconstruction problem in the sense of least squares; IR algorithms require more projection data than DL-based algorithms, which are data-driven and can realize complicated signal-to-image mapping using sparse projection data. In other words, DL-based algorithms have better performance than IR algorithms in sparse-view PACT imaging, and IR algorithms have better performance than DAS, FBP, SE, and TR algorithms, as summarized in Table 14.

To evaluate the performance of different image reconstruction algorithms in sparse-view imaging, an example is presented in Fig. 45. The simulation settings in this example are similar to those in Fig. 41 except that the circular detector array has a varying number of elements from 32 to 512 and all detector elements have infinite bandwidth. Figure 45 shows the images reconstructed by the FBP-, TR-, and TV-regularized IR algorithms using 32-, 128-, and 512-channel projection data. As expected, the image reconstruction quality of each algorithm improves with the increase of detector number. However, FBP and TR suffer from streak artifacts more significantly than TV-regularized IR in sparse-view cases (e.g., 32 and 128 views), although the three algorithms yield similar image quality when the detection views are dense (e.g., 512). Since DAS and SE typically have similar performance to FBP and are commonly adopted in PACT systems with linear and planar detector arrays, they are not compared in this example.

Figure 45.Performance comparison of FBP, TR, and IR in sparse-view PACT imaging. First–third columns: images reconstructed by FBP, TR, and TV-regularized IR, respectively. First–third row: images reconstructed using (a)–(c) 32-, (d)–(f) 128-, and (g)–(i) 512-channel projection data, respectively. The simulation settings can be found in the text.

Download full size

View all figures

Compared with IR algorithms, DL-based algorithms may have better image reconstruction performance in sparse-view imaging^[276,293]. To illustrate this, Fig. 46 shows an example comparing the performance of conventional algorithms such as TR, back projection, and TV-based IR, and two DL-based image reconstruction algorithms, namely, Pixel-DL^[278] and model-based learning (MBLr)^[45], for image reconstruction using sparsely sampled projection data^[293]. In this example, the imaging target is mouse cerebral vasculature [Fig. 46(a)], and photoacoustic signals were received by an 8-mm-diameter full-ring array with 32 evenly distributed detectors. Figures 46(b)–46(f) are the images reconstructed by TR, back projection, TV-regularized IR, Pixel-DL, and MBLr, respectively. The results show that the TV-regularized IR algorithm has higher image reconstruction quality than the TR and back-projection algorithms, which suffer from severe artifacts due to the sparseness of the projection data, while the two DL-based reconstruction methods, especially MBLr, yield improved quality compared with all conventional algorithms in this case.

Figure 46.Performance comparison of DL and conventional image reconstruction algorithms in sparse-view PACT imaging. (a) Mouse cerebral vasculature and local close-up image. (b)–(f) Images reconstructed by TR, back projection, TV-regularized IR, Pixel-DL, and MBLr, respectively. DL-based algorithms (MBLr and Pixel-DL) are superior to conventional algorithms in this case. The simulation settings can be found in the text. Adapted from Ref. [293] with permission.

Download full size

View all figures

Although conventional image reconstruction algorithms suffer from artifacts in sparse-view imaging, they can be improved^{[221,294–296]}. For example, Sandbichler et al. achieved enhanced image reconstruction using conventional algorithms by transforming sparse projection data into dense data using compressed sensing in sparse-view imaging^[297]. Meng et al. developed a principal component analysis (PCA)-based method and achieved high-quality 3D image reconstruction with sparsely sampled data without involving an iterative process^[298]. Hu et al. analyzed the aliasing problem caused by spatial undersampling and proposed a temporal low-pass-filtering and spatial interpolation method for aliasing mitigation and artifact suppression^[299,300]. Cai et al. analyzed the relationship between streak artifacts and sparse projection data and developed an adaptive FBP algorithm named contamination-tracing back-projection (CTBP) for image artifact suppression in sparse-view imaging^[301]. Hakakzadeh et al. proposed a spatial-domain factor for sparse sampling circular-view PACT and achieved artifact suppression and resolution improvement compared with conventional algorithms^[302]. Moreover, Wang et al. proposed an iterative scheme combining virtually parallel projecting and spatially adaptive filtering and achieved enhanced image reconstruction in sparse-view PACT imaging^[303].

5.1.5 Limited-view imaging

Ideally, the detector array used for PACT imaging should have a full view angle (4π steradians in 3D space) with respect to the sample being imaged^[4]. However, a practical detector array always has a limited view angle, leading to the ill-posed problem of image reconstruction from limited-view projection data. The six classes of algorithms reviewed in this paper have different performances when dealing with this problem. Generally, analytical algorithms such as DAS, FBP, SE, and TR have similar performance and require a larger view angle to reconstruct an image with reasonable image quality than do IR algorithms, which solve the image reconstruction problem in the sense of least squares; IR algorithms require a larger view angle than DL-based algorithms, which are data-driven and can realize complicated signal-to-image mapping using limited-view projection data. In other words, DL-based algorithms have better performance than IR algorithms in limited-view PACT imaging and IR algorithms have better performance than DAS, FBP, SE, and TR algorithms, as summarized in Table 14.

To evaluate the performance of different image reconstruction algorithms in limited-view imaging, an example is presented in Fig. 47. The simulation settings in this example are similar to those used in Fig. 41 except that the circular detector arrays used for imaging have limited view angles ranging from $π / 2$ (quarter circle) to $2 π$ (full circle), and all detector elements have an infinite bandwidth. Figure 47 shows the images reconstructed by the FBP, TR, and TV-regularized IR algorithms when the view angles of the detector arrays are $π / 2$ , $π$ , and $2 π$ . As expected, the image reconstruction quality of each algorithm improves with the increase of the view angles of the detector arrays. However, FBP and TR suffer from artifacts more significantly than does TV-regularized IR in limited-view cases (e.g., $π / 2$ and $π$ ), although the three algorithms yield similar image quality when the view angle is $2 π$ .

Figure 47.Performance comparison of FBP, TR, and TV-regularized IR in limited-view PACT imaging. (a) A partial-ring detector array in 3D space and a numerical blood vessel phantom used for simulation. (b) Images reconstructed by FBP, TR, and TV-regularized IR under different imaging angles. The simulation settings can be found in the text.

Download full size

View all figures

Compared with conventional algorithms, DL-based methods may have better image reconstruction performance in limited-view imaging^[276]. To illustrate this, Fig. 48 shows a simulation comparing the performance of FBP and a DL-based FBP algorithm (dFBP) for image reconstruction using limited-view projection data^[276]. In this example, the imaging target is a numerical zebrafish, and photoacoustic signals are collected by a circular detector array, which has a diameter of 80 mm and 512 evenly distributed detectors, as shown in Fig. 48(a). To simulate limited-view imaging scenarios, the shape of the detector array is reduced from a full circle to partial circles with central angles of $π / 4$ , $π / 2$ , $3 π / 4$ , and $π$ . Figures 48(b) and 48(c) show the corresponding images reconstructed by FBP and dFBP under different view angles. The dFBP method can achieve better image reconstruction quality than FBP, which suffers from severe artifacts due to the incompleteness of the projection data.

Figure 48.Performance comparison of DL and conventional image reconstruction algorithms in limited-view PACT imaging. (a) Imaging configuration. (b), (c) Images reconstructed by FBP and dFBP under different view angles. Adapted from Ref. [276] with permission.

Download full size

View all figures

Although image reconstruction in limited-view imaging is challenging, the reconstruction procedure can be improved. First, limited-view raw projection data can potentially be augmented to achieve improved reconstruction. In 2004, Patch derived the consistency conditions for projection and estimated missing data from measured data^[304]. Gamelin et al. employed a single-stage Wiener optimal filter to augment measured projection data by interpolation between measurement locations^[305]. Second, conventional algorithms can be modified to adapt to the image reconstruction problem. For example, Paltauf et al. proposed a modified FBP algorithm that uses a weight factor to improve the image reconstruction quality in limited-view imaging^[306]. Third, DL-based methods can be used as postprocessing tools for image enhancement. For example, Lu et al. proposed a GAN-based image postprocessing method and achieved high-quality image recovery from limited-view photoacoustic images^[58].

5.1.6 Heterogeneous media

Many image reconstruction algorithms in PACT, such as DAS and FBP, are derived based on the assumption that the acoustic media are homogeneous, lossless, and non-dispersive. However, this is not true for biological tissues, in which strong acoustic heterogeneities, such as bones and air cavities, may be present^[296,307]. To address this problem, an image reconstruction algorithm should consider the properties of an acoustic medium. The six classes of algorithms reviewed in this paper have different characteristics in addressing this problem. DAS and FBP algorithms can employ dual SOS^[308] or jointly reconstruct initial photoacoustic pressure and SOS^[309] to reduce image artifacts caused by acoustic heterogeneity. Modified SE algorithms can incorporate variable SOS into reconstruction models to account for acoustic heterogeneity^[167]. In comparison, TR-, IR-, and DL-based algorithms can more easily incorporate acoustic properties of a medium into their reconstruction models, as summarized in Table 14. Specifically, TR algorithms can solve the acoustic heterogeneity problem by incorporating the properties of acoustic media such as SOS, density, dispersion, and absorption into their acoustic propagation model (see Sec. 3.4.1). IR algorithms can solve this problem by incorporating the properties of acoustic media into the system matrix $A$ by solving coupled photoacoustic wave equations (see Sec. 3.5.2). DL algorithms can solve this problem by training a reconstruction network with heterogeneity-corrected signal-image datasets. Certainly, incorporating the properties of acoustic media into a reconstruction model requires prior knowledge about the media.

To understand how different image reconstruction algorithms behave in the case of acoustic heterogeneity, an example is presented in Fig. 49. The simulation settings in this example are similar to those used in Fig. 41 except that the sound speeds in the background and the ROI [the white dashed box in Fig. 49(a)] are assumed to be 1500 and 1520 m/s, respectively, and all detector elements have an infinite bandwidth. Figure 49(b) shows the image reconstructed by FBP with a constant SOS of 1505 m/s, which is the optimal SOS for reconstruction. Figures 49(c) and 49(d) show the images reconstructed by TR- and TV-regularized IR by coupling the actual SOS distribution into their reconstruction models. The images reconstructed by TR and IR with a correct SOS distribution have better image quality than that of FBP, which contains splitting artifacts at the end of the vessel.

Figure 49.Performance comparison of FBP, TR, and IR in acoustically heterogeneous media. (a) A numerical blood vessel phantom with a nonuniform SOS distribution. The SOSs in the background and the white dashed box are 1500 and 1520 m/s, respectively. (b) Image reconstructed by FBP with a constant SOS of 1505 m/s. (c), (d) Images reconstructed by TR and TV-regularized IR by coupling the actual SOS distribution into their reconstruction models. Detailed simulation settings can be found in the text.

Download full size

View all figures

Although algorithms such as DAS, FBP, and SE are generally not ready to incorporate the properties of acoustic media into their reconstruction models, their output images can be postprocessed for image enhancement. Figure 50 shows a simulation demonstrating that image postprocessing helps correct acoustic heterogeneity and improves image reconstruction results^[233]. The photoacoustic source in this example contains multiple elliptical and line absorbers and has three SOS regions with values of 1480, 1450, and 1575 m/s at different depths. The detector array used for imaging is a linear array located at the top of the image. Figure 50(a) shows the image reconstructed by a multi-stencil fast marching (MSFM) approach, which is regarded as the ideal result. The MSFM approach estimates the acoustic TOF based on the eikonal equation and the known SOS distribution and can achieve high-quality beamforming. Figure 50(b) shows the images reconstructed by a conventional Fourier beamformer using a constant SOS of 1540 m/s. Figure 50(c) shows the image in Fig. 50(b) processed by an automatic SOS selection method, which attempts to maximize the sharpness of the photoacoustic image with an optimal SOS (1480 m/s in this case). Figure 50(d) shows the image of Fig. 50(b) processed by a deep neural network called SegUNet. The results show that the image produced by a conventional Fourier beamformer has significant distortions, which can be partially corrected by the automatic SOS selection method and fully corrected by the DL-based postprocessing method.

Figure 50.DL-based postprocessing enhances image quality in acoustically heterogeneous media. In this example, the imaging media consist of three layers in the depth direction ( $z$ direction), and their SOSs are 1480, 1450, and 1575 m/s. (a) Image reconstructed using the MSFM method, which is regarded as the ideal result. (b) Image reconstructed by a conventional Fourier beamformer with a constant SOS of 1540 m/s. (c) Postprocessed image of (b) using an autofocus approach. (d) Postprocessed image of (b) using a DL-based method (SegUNet). Adapted from Ref. [233] with permission.

Download full size

View all figures

5.1.7 Other aspects

In addition to the qualitative reconstruction discussed above, quantitative image reconstruction is another important aspect to consider in certain imaging scenarios, such as quantitative photoacoustic imaging. Under ideal conditions, most algorithms can achieve accurate amplitude reconstruction of initial photoacoustic pressure. However, real imaging scenarios are never ideal, which makes quantitative image reconstruction very challenging. Generally, IR^[41,64] and DL^[293] algorithms can achieve more accurate image reconstruction compared to other methods and are more suitable for quantitative photoacoustic imaging. Additionally, negative artifacts that have no physical meanings often occur in reconstructed photoacoustic images under non-ideal conditions^[137]. In this case, IR algorithms with non-negative constraints can be used to reconstruct photoacoustic images free from negative components^[207].

5.2 Image Reconstruction Speed and Memory Footprint

5.2.1 Image reconstruction speed

In addition to image quality, image reconstruction speed and computational complexity are other critical indicators for measuring the performance of an algorithm. An ideal image reconstruction algorithm should be fast enough to allow a PACT imaging system to capture transient biodynamic processes such as heartbeat and blood flow in a living subject.

The computational complexity of image reconstruction algorithms in PACT varies from algorithm to algorithm. For non-iterative methods, such as DAS, FBP, SE, and TR, SE is the most computationally efficient due to the application of FFT. Assuming that a 3D image to be reconstructed has a size of $N_{x} \times N_{y} \times N_{z}$ ( $N_{x} = N_{y} = N_{z} = n$ ) and the detector number $M = n \times n$ ^[32], SE algorithms have a computational complexity of $O [n^{3} \log (n)]$ for 3D image reconstruction and $O [n^{2} \log (n)]$ for 2D image reconstruction for the specific detection geometries listed in Table 5. To perform the same reconstruction tasks, TR algorithms have computational complexities of $O [n^{4} \log (n)]$ for 3D and $O [n^{3} \log (n)]$ for 2D^[310], while back-projection-type algorithms, such as DAS and FBP, have computational complexities of $O (n^{5})$ for 3D and $O (n^{3})$ for 2D^[62,310]. One may notice that TR algorithms have lower complexity than back-projection-type algorithms for 3D image reconstruction but seem to be slower in practice. This is because TR algorithms need to compute an entire acoustic field [Eqs. (69 )–(73)] step by step, which is time- and memory-consuming, especially for large-scale 3D imaging models. In contrast, back-projection-type algorithms can directly reconstruct the ROI instead of a whole imaging region and can be implemented with parallel computing. Compared with non-iterative methods, IR algorithms usually have a much greater computational complexity due to the repeated calculation of the image reconstruction model in Fig. 20.

The computational complexity of DL-based algorithms depends on the specific method. Generally, non-iterative DL algorithms, such as data-, image-, and hybrid-domain processing algorithms and direct reconstruction algorithms, can process an image in roughly the same amount of time as conventional non-iterative algorithms, such as FBP and TR. Iterative DL algorithms, such as learned IR, are generally faster than conventional IR algorithms because they normally require fewer iterations and because the discrete forward imaging model [Eq. (86)] can be learned by neural networks. Figure 51(a) shows a qualitative comparison of the speed of different image reconstruction algorithms.

Figure 51.Qualitative comparison of different image reconstruction algorithms in terms of (a) image reconstruction speed and (b) memory footprint.

Download full size

View all figures

The speed of an image reconstruction algorithm in PACT can be accelerated by GPUs^[42,152,311]. For example, Wang et al. developed a comprehensive GPU-based framework for PACT image reconstruction that achieved an $\sim 871$ -fold increase in speed when reconstructing extremely large-scale 3D images compared to a central processing unit (CPU)-based implementation^[152]. Liu et al. reported an ultrafast GPU-based FBP implementation for PACT image reconstruction^[311]. The implementation requires only 0.38 ms to reconstruct a 2D image with a size of 512 pixel × 512 pixel and 6.15 ms to reconstruct a 3D image with a size of 160 voxel × 160 voxel × 160 voxel. Wang et al. proposed GPU-based parallelization strategies to accelerate the FBP algorithm and a penalized least-square and interpolation-based IR (PLS-Int-IR) algorithm^[42] and improved the image reconstruction speeds of FBP and PLS-Int-IR by factors of 1000 and 125, respectively. Although most image reconstruction algorithms can, in principle, be accelerated by GPUs, the speedup ratios may vary. Back-projection-type algorithms, such as DAS and FBP, inherently support parallel computing and thus can achieve a high speedup ratio. In contrast, TR and IR algorithms either need to compute an entire photoacoustic field [Eqs. (69 )–(73)] or need to update the image reconstruction model (Fig. 20) step by step, thus offering limited acceleration performance.

5.2.2 Memory footprint

Memory footprint is also a critical indicator for measuring the performance of an algorithm. Generally, the memory footprint of an image reconstruction algorithm in PACT depends on the size of the image to be reconstructed $N_{x} \times N_{y} \times N_{z}$ and the size of the measured photoacoustic signals $M \times K$ ( $M$ : detector number, $K$ : signal sampling length). For DAS, FBP, SE, and TR algorithms, the memory footprint can be approximately estimated as ${Memory}_{1} \propto N_{x} N_{y} N_{z} + M K .$ (125)

Similarly, the memory footprint of IR algorithms can be approximately estimated to be ${Memory}_{2} \propto N_{x} N_{y} N_{z} M K + M K,$ (126)where $N_{x} N_{y} N_{z} M K$ is the size of the system matrix $A$ in Eq. (86).

According to Eqs. (125)–(126), back-projection-type algorithms, such as DAS and FBP, require the least amount of memory, while IR algorithms require the most memory. Moreover, back-projection-type algorithms can reconstruct only a portion of an imaging region (i.e., ROI) rather than the whole region, which can further reduce the amount of memory needed. IR algorithms can also achieve direct image reconstruction of ROIs but the memory footprint is still high due to the extremely large size of the system matrix $A$ , which is sparse and can be compactly represented to reduce the memory footprint^[36]. In contrast to that of conventional image reconstruction algorithms, the memory footprint of a DL algorithm depends on its network structure. In general, the memory footprint of non-iterative DL algorithms is greater than that of conventional non-iterative algorithms, but less than that of IR algorithms. The memory footprint of learned IR algorithms is comparable to that of conventional IR algorithms and is also very high. Figure 51(b) shows a qualitative comparison of the memory footprints of different image reconstruction algorithms.

6 Challenges and Discussion

The six classes of algorithms reviewed in this paper, namely, DAS, FBP, SE, TR, IR, and DL, can achieve high-quality image reconstruction under ideal imaging conditions. However, they may face challenges under non-ideal imaging conditions, especially in the following scenarios: (1) the bandwidths and aperture sizes of the detectors used for imaging are limited and finite; (2) the elements and view angle of the detector array used for imaging are sparse and limited; and (3) acoustic media are strongly heterogeneous.

The bandwidth and aperture size of an individual ultrasound detector in a detector array are important for high-quality photoacoustic image reconstruction. An ideal detector should have infinite bandwidth and a point-like aperture to respond to all frequency contents of a photoacoustic signal from all directions. However, practical ultrasound detectors always have limited bandwidths and finite aperture sizes, which distort measured photoacoustic signals and blur reconstructed images. Researchers typically first measure the impulse response of a detector and then couple it to IR algorithms^[37,39] or use deconvolution approaches^{[291,292,312]} to remove the impacts of detector bandwidth and aperture size. However, these methods lead to solving optimization problems, which are typically slow, ill-posed, and sensitive to noise. How to build more efficient models to compensate for the impact of detector bandwidth and aperture size is a subject worth studying.

The element number and view angle of a detector array are important for high-quality photoacoustic image reconstruction. Ideally, the element number of a detector array should be large enough to satisfy the spatial Nyquist sampling criteria for perfect sampling, and the view angle should be 4π steradian (full 3D view) to record complete photoacoustic signals in 3D space. However, practical detector arrays such as linear and planar arrays typically have a limited number of elements and a limited view angle, which eventually results in the challenging problem of image reconstruction from sparse-view and limited-view projections. IR and DL are two classes of algorithms that are commonly used to address image reconstruction problems in these scenarios^{[211,276,278,313]}. However, these methods typically involve intensive calculations and/or require huge amounts of training data that are difficult to obtain in practice. How to develop more accurate and efficient algorithms to better address the image reconstruction problem in sparse-view and limited-view imaging is also worth studying.

The acoustic homogeneity of a medium is also important for high-quality photoacoustic image reconstruction. Most current image reconstruction algorithms used in PACT assume that the acoustic media are homogeneous. However, this does not hold in biological tissues, especially when strong heterogeneities, such as bones and air cavities, are present^[307]. Current methods for image reconstruction in heterogeneous tissues include half-time^[314] or partial-time^[315] reconstruction, autofocus reconstruction by optimizing the SOS^[316], joint reconstruction of optical absorption and SOS^[317], and ultrasound-guided adaptive reconstruction^[296]. However, these methods can only mitigate the acoustic heterogeneity problem to a certain extent and have limited improvements in reconstruction accuracy, especially in complex imaging scenarios such as transcranial imaging. An alternative method is full-waveform-based algorithms, such as TR^[170] and IR^[41], which are based on solutions to exact photoacoustic wave equations. These algorithms can consider the acoustic properties of the media such as SOS, absorption, dispersion, and density, and achieve accurate image reconstruction in acoustically heterogeneous media. However, these algorithms are not widely used thus far, especially in practical applications. The main reason is that they require prior information on media such as SOS maps, which can be measured using other imaging techniques such as ultrasound CT^[318–320] but at the expense of introducing additional imaging facilities and computing resources. Therefore, how to better tackle the acoustic heterogeneity problem and achieve high-quality image reconstruction deserves further investigation.

In addition to image reconstruction quality, image reconstruction speed is also important for an algorithm. Currently, many algorithms such as DAS, FBP, SE, and non-iterative DL algorithms can achieve real-time reconstruction when accelerated with GPUs^{[33,104,152,279,311]}, which, however, is a great challenge for IR-based algorithms. The IR algorithms in PACT need to compute a large-scale system matrix $A$ and perform image reconstruction iteratively. Therefore, they require huge amounts of computational resources, which makes real-time reconstruction difficult. This is especially true for 3D IR reconstruction, in which the computational time is typically on the order of hours^[42]. DL-based IR reconstruction has been reported to help reduce the number of iterations^[45] and improve computational efficiency^[57]. Nevertheless, developing new strategies to improve the computational speed of IR algorithms and reduce the memory footprint still has important practical significance.

DL-based image reconstruction has superior performance over conventional image reconstruction algorithms and is an emerging technique in PACT. It has been used to solve a variety of non-ideal image reconstruction problems in PACT, such as detector bandwidth expansion and IR acceleration. Nonetheless, DL-based image reconstruction faces multiple challenges. First, the performance of DL methods heavily relies on the quality and quantity of training data. The training data are usually required to be paired, and the acquisition of large amounts of paired data is difficult in practice. Training neural networks using simulated datasets and/or small-scale experimental datasets may limit the performance of the networks in terms of reconstruction accuracy, robustness, and generalizability in practical applications. Second, reference images such as full-bandwidth photoacoustic images for network training in DL methods are often difficult to obtain. In these situations, networks can only be trained using simulated datasets. Since experimental conditions are typically much more complex than simulations, training a network using only simulated datasets leads to degraded reconstruction performance and poor robustness. Third, there are currently few publicly available datasets for the comparison of the performances of different DL methods. Using private datasets to compare different DL methods may lead to biased conclusions. Finally, DL-based image reconstruction methods in PACT lack interpretability, which is also a long-standing problem in the field of DL. In response to the above issues, it is necessary to research the following aspects: (1) building large-scale publicly available datasets for the construction of networks with good generalization properties and performance comparisons of different networks; (2) developing unsupervised DL-based reconstruction methods that do not need paired data for training; (3) developing more powerful simulation platforms to generate reference data closely resembling real experimental data^[321]; and (4) developing physics-informed networks to improve the interpretability of DL methods.

One important goal of image reconstruction from photoacoustic projections is disease diagnosis. The current diagnostic route from photoacoustic signals to photoacoustic images to knowledge is straightforward but can potentially lead to information loss and distortion during image reconstruction, which can decrease the accuracy of disease diagnosis. Artificial intelligence (AI) can mine knowledge from vast amounts of data and offer opportunities for disease diagnosis directly from raw data^[322]. The use of raw-data-based diagnostic technology instead of image-based diagnostic technology may facilitate the development of fully automated scanning and diagnostic procedures and become an important direction in the future.

This review mainly discusses advanced image reconstruction algorithms in PAT for high-performance imaging. In addition to advanced algorithms, emerging technologies such as metamaterials and nanomaterials can be leveraged to enhance the imaging performance of PAT. For example, metamaterials can be appropriately designed to influence the propagation of electromagnetic waves or acoustic waves in a manner not observed in bulk materials. By tailoring the amplitude, phase, or polarization of light, optical metamaterials can emulate conventional optical lenses, waveplates, or holograms to achieve multidimensional light-field modulation in PAT^[323–329]. Metamaterials can also be designed for photoacoustic signal sensing to address the sensitivity and bandwidth problem of conventional piezoelectric sensors^[330–333] and provide a sensitive method for ultrasound detection in PAT.

Nanomaterials that often exhibit unique optical, electronic, or physicochemical properties are another emerging technology that can be used to enhance the imaging performance of PAT. Nanoparticles are a common type of nanomaterial used as molecular contrast agents in PAT^[334] and can be generally divided into two categories: inorganic nanoparticles and organic nanoparticles. Inorganic nanoparticles, such as noble metal nanoparticles, iron oxide nanoparticles, semiconductor nanoparticles, magnetic nanoparticles, and carbon nanoparticles, possess versatile properties and have been investigated for various applications in biomedical imaging and therapeutics^[335,336]. Among them, noble metal nanoparticles are the most commonly used contrast agents in PAT due to their high optical absorption cross section and biocompatibility^[337]. Gold nanoparticles, such as gold nanospheres, nanorods, nanoshells, and nanostars, are particularly popular because of their high extinction coefficient in the near-infrared range and their high photoacoustic conversion efficiency^{[334,338,339]}. In addition to inorganic nanoparticles, organic nanoparticles such as polymer nanoparticles and encapsulations are also increasingly being used in PAT^{[334,340,341]}. Polymer nanoparticles usually have a molar extinction coefficient lower than that of gold nanoparticles but higher than that of small-molecule dyes and have an absorption peak in the near-infrared range. They typically possess high structural and functional flexibility, high biodegradability and biocompatibility, and have a high translational potential. Furthermore, polymer nanoparticles usually have stable surfaces that can be modified with specific targeting and therapeutic moieties for molecular imaging and targeted therapy, which enables the imaging of biological events and functionalities at multiple scales.

7 Conclusions

In this work, we systemically review the image formation problem in PACT over the past three decades, including the forward problem, conventional reconstruction methods, DL-based reconstruction methods, and the comparisons of different reconstruction methods.

The photoacoustic forward problem involves multiple physical processes, including photoacoustic signal generation, signal propagation, and signal detection. The generation of photoacoustic signals is based on the thermoelastic effect and should satisfy the conditions of thermal confinement and stress confinement. The propagation of photoacoustic signals is governed by the photoacoustic wave equation, which can be analytically solved by the method of Green’s functions and numerically visualized by the k-Wave toolbox. The detection of photoacoustic signals typically involves the use of ultrasound detector arrays, whose properties such as detector bandwidth, aperture size, element number, and view angle impact detected photoacoustic signals and final images. The photoacoustic forward problem can be described by the spherical Radon transform. The task of image reconstruction in PACT can be regarded as finding the inverse of the spherical Radon transform.

Conventional reconstruction methods in PACT typically achieve image formation via analytical derivations or numerical computations and mainly include five classes of algorithms, i.e., DAS, FBP, SE, TR, and IR. The DAS-type algorithms reconstruct an image by summing the delayed photoacoustic signals of each detector and have many variants, such as native DAS, DMAS, SLSC, and MV. The DAS-type algorithms are simple in principle and fast in speed but lack mathematical rigor. FBP-type algorithms reconstruct an image by back-projecting the filtered photoacoustic signals of each detector to the image domain and can achieve accurate image reconstruction under ideal conditions. They are rigorously deduced from the photoacoustic wave equation and can be regarded as advanced versions of DAS. The SE-type algorithms reconstruct an image by representing it using a mathematical series. They are computationally efficient and super-fast for special detection geometries such as the planar geometry. The TR-type algorithms reconstruct an image by running a forward numerical acoustic propagation model backward and can be used for any closed detection geometry. They are well suited for image reconstruction in heterogeneous media since the acoustic propagation model can couple acoustic properties of media, such as SOS, density, dispersion, and absorption of a medium. The IR-type algorithms reconstruct an image by minimizing the energy error between measured projection data and estimated projection data. They can incorporate various physical factors into reconstruction models, suppress image artifacts caused by incomplete projections, and are well-suited for non-ideal imaging scenarios.

DL achieves tomographic image reconstruction in PACT by learning the photoacoustic imaging model from big data using designed networks and thus is network-based, data-driven, and learning-oriented. DL techniques can be applied to photoacoustic image reconstruction from multiple aspects, including signal preprocessing in the data domain, image postprocessing in the image domain, hybrid-domain processing, direct signal-to-image reconstruction, and IR reconstruction learning. DL-based signal preprocessing employs a network to enhance raw photoacoustic projection data in the data domain and delivers the enhanced projection data to conventional reconstruction algorithms as input. DL-based image postprocessing employs a network to enhance photoacoustic images output from conventional reconstruction algorithms for artifacts and noise suppression. DL-based hybrid-domain processing employs a network to make full use of the information of raw projection data and reconstructed images and can yield images with enhanced quality. In contrast, DL-based direct reconstruction employs a network to form photoacoustic images directly from raw projection data and does not involve any conventional image reconstruction algorithms. Training such a direct transformation network, however, requires large-scale datasets. DL-based IR reconstruction employs a network to learn the reconstruction process of conventional IR algorithms and can also achieve direct signal-to-image reconstruction. This method typically has better robustness and generalizability but consumes more time and memory than other DL methods.

The image reconstruction algorithms discussed in this review have distinct characteristics. In terms of image reconstruction quality, most image reconstruction algorithms work well under ideal imaging conditions and can yield high-quality images. However, they behave differently under non-ideal imaging scenarios, such as sparse-view imaging, limited-view imaging, imaging with bandwidth-limited and/or finite-aperture detectors, and imaging in the presence of acoustic heterogeneity. Generally, the DAS, FBP, and SE algorithms have relatively poor performance in these scenarios. The TR and IR algorithms can yield improved results because they can incorporate the physical models of an imaging system, such as the SOS of media, the EIR, and the SIR of a detector. The DL algorithms can produce surprisingly good results that are unattainable for conventional algorithms due to their network-based data-driven nature. In terms of image reconstruction speed and memory footprint, DAS, FBP, SE, TR, and non-iterative DL algorithms are generally faster than IR and iterative DL methods. TR, IR, and iterative DL algorithms generally require more memory than DAS, FBP, and SE methods.

This review is expected to help general readers better understand the image formation problem in PACT, provide a self-contained reference guide for beginners as well as specialists, and facilitate further developments and applications of novel image reconstruction algorithms.

Category: Review Articles

Received: Jul. 8, 2024

Accepted: Aug. 28, 2024

Published Online: Sep. 29, 2024

The Author Email: Yang Meng (yangmeng_pumch@126.com), Wang Sheng (iamsheng2020@ustc.edu.cn), Tian Jie (tian@ieee.org)

DOI:10.3788/PI.2024.R06

CSTR:32396.14.PI.2024.R06

Table 1. Abbreviations Used in this Review

Table 1. Abbreviations Used in this Review

Table 2. Symbols Used in this Review

Table 2. Symbols Used in this Review

Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT

Table 3. Representative DAS-Type Algorithms Used for Image Reconstruction in PACT

Table 4. Representative Work on FBP-Based Image Reconstruction in PACT

Table 4. Representative Work on FBP-Based Image Reconstruction in PACT

Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT

Table 5. Representative Work on SE-Based Fast Image Reconstruction in PACT

Table 6. Representative Work on SE-Based Image Reconstruction in PACT

Table 6. Representative Work on SE-Based Image Reconstruction in PACT

Table 7. Representative Work on TR-Based Image Reconstruction in PACT

Table 7. Representative Work on TR-Based Image Reconstruction in PACT

Table 8. Representative Work on IR-Based Image Reconstruction in PACT

Table 8. Representative Work on IR-Based Image Reconstruction in PACT

Table 9. Representative Work on DL-Based Signal Preprocessing in PACT

Table 9. Representative Work on DL-Based Signal Preprocessing in PACT

Table 10. Representative Work on DL-Based Image Postprocessing in PACT

Table 10. Representative Work on DL-Based Image Postprocessing in PACT

Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT

Table 11. Representative Work on DL-Based Hybrid-Domain Processing in PACT

Table 12. Representative Work on DL-Based IR in PACT

Table 12. Representative Work on DL-Based IR in PACT

Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT

Table 13. Representative Work on DL-Based Direct Signal-to-Image Reconstruction in PACT

Table 14. Comparison of Different Image Reconstruction Algorithms in PACTa

Table 14. Comparison of Different Image Reconstruction Algorithms in PACTa

微信扫一扫：分享

Table 14. Comparison of Different Image Reconstruction Algorithms in PACT^a

Table 14. Comparison of Different Image Reconstruction Algorithms in PACT^a