Underwater optical imaging and sensing in turbidity using three-dimensional integral imaging: a review

Alex Maric; Gokul Krishnan; Rakesh Joshi; Yinuo Huang; Kashif Usmani; Bahram Javidi

doi:10.3788/AI.2025.20002

1. Introduction

Underwater imaging is challenging as visibility is reduced due to the scattering and absorption of light by suspending particles in water. As light propagates in water, it scatters in different directions, which results in the degradation of the captured images. Light propagation in water is wavelength-dependent, as different wavelengths experience varying absorption levels, thus limiting the distance of propagation and depth at which we can visualize the objects. To relate the effects of attenuation with distance, Beer’s law may be used to measure the attenuation of light in a scattering medium, that is, $I = I_{o} e^{- α d}$ ^[1], where $I_{o}$ is the intensity of light in free space, $I$ is the intensity of light in a scattering medium propagated at a distance $d$ , and $α$ is Beer’s coefficient. A mathematical model that may compensate for the scattering and low visibility in underwater images is the Jaffe model^[2]. This model summarizes the components of scattering that contribute to image degradation in a scattering medium, which are the forward-scattered component, direct component, and backscattered component. Forward scattering occurs when the light reflected from the object is scattered at a small angle due to particles, whereas the backscattered component arises when the light that reaches the camera sensor is not reflected from the object. The direct component occurs when the light is reflected from the object and reaches the camera sensor without any scattering.

One popular underwater sensing technology is light detection and ranging (LiDAR), an active device that uses laser pulses to measure the distance or the range of the object^[3]. In underwater applications, bathymetric LiDAR is the most common LiDAR sensing technology^[3–7]. Bathymetric LiDAR is usually mounted on an aircraft or a boat and uses a green laser to penetrate deep in water. However, as the laser propagates deeper underwater, it loses energy due to scattering and absorption. Thus, active sensors can benefit from powerful light sources for longer range operations. However, active sensors such as LiDAR may become more complex and expensive as they require lasers and time-gated detectors.

One promising technique using multi-view imaging is three-dimensional (3D) integral imaging (InIm)^[8], which captures images from different perspectives of the scene to reconstruct the scene at a particular depth plane^[9]. The technique is passive and is ideal for imaging under degradations. 3D InIm has been used in underwater imaging applications for optical signal detection^[10–16], object visualization^[17–21], object classification^[22], and object detection^[16]. These works have implemented 3D InIm systems in various modalities such as with and without lenses, with diffusers, polarimetric systems, etc. In addition, computational algorithms such as the convolutional neural network (CNN)^[23], generative adversarial network (GAN)^[24], recurrent neural network (RNN)^[25], statistical image processing^[26], statistical optics^[27], active polarization descattering^[28], and correlation-based filters^[29–31] were used to overcome the effects of turbidity and/or partial occlusion. Additionally, they have potential in many underwater applications including automated detection, classification of objects in turbidity, and marine research.

This paper presents an overview of recent advances in underwater optical imaging and sensing systems using 3D InIm in turbid conditions. The paper is organized into five different sections. Section 1 describes underwater imaging and 3D InIm in underwater imaging applications, and briefly discusses LiDAR. Section 2 describes the basics of 3D InIm and reconstruction algorithms, lensless imaging, and polarimetric InIm. Section 3 summarizes various applications of 3D InIm such as underwater object visualization^[18,19,21], underwater signal detection^[12–16], underwater object classification^[22], and underwater object detection^[16]. Section 4 summarizes the comparisons of the published works discussed in Sec. 3. Section 5 concludes the paper.

2. Methodology

2.1. Three-Dimensional Integral Imaging

3D InIm is a multidimensional imaging technique that can reconstruct a 3D scene at a user-defined depth using the angular and intensity information of the captured perspective images, called two-dimensional (2D) elemental images^{[8–13,16–22]}. Conventional 2D imaging, which only collects the intensity of a scene, superimposes the background and the object. However, in 3D InIm, the angular information captured using the perspective images allows the 3D reconstruction at different depths in the scene, which gives the observer a 3D view of the scene. By reconstructing the object at the depth of interest, 3D InIm can segment the object out of the background. This technique is optimal in the maximum likelihood sense in terms of Gaussian noise for read-noise dominant images^[32,33] to provide a higher signal-to-noise ratio (SNR) and has depth-sectioning capabilities to segment in-focus planes from out-of-focus planes^[9]. Additionally, 3D InIm can visualize through partial obscurations by using parallax to capture non-occluded perspective images and may remedy moderate scattering conditions^{[13,16,20–22]}. During the image acquisition stage [see Fig. 1(a)], the optical rays are projected onto 2D elemental images and recorded using a camera array or a single camera on a moving platform. InIm systems can be implemented using a one-dimensional (1D) array of cameras^[12,14,15] or a 2D array of cameras^{[10,11,13,16–22]}. With a 1D array, we only have parallax in one direction (i.e., $x$ -axis or $y$ -axis), whereas, with a 2D array, we have parallax in two directions (i.e., $x$ -axis and $y$ -axis). When using a single camera sensor or a camera array, there are tradeoffs to consider, and scenarios for which these approaches are appropriate. A camera array-based image acquisition is useful in both dynamic (e.g., turbulence, fast-moving objects, and signal detection) and static environments (that is, stationary objects). All cameras in the array are synchronized to record all perspective images or videos simultaneously, resulting in faster data acquisition compared to a single camera on a moving platform. However, camera calibration^[34] is required as a preprocessing step for 3D InIm reconstruction to correct any possible misalignment. Another drawback is that the physical size of the cameras limits the minimum possible camera pitch value, that is, how closely the cameras can be placed next to each other. A single camera sensor is beneficial for user flexibility in choosing the camera pitches and the number of perspective images. However, this approach may be only suitable in static environments.

Figure 1.3D integral imaging. (a) Pickup stage^[12]. (b) Reconstruction stage^[12]. (c) 3D underwater computational reconstruction^[18]. Reprinted with permissions from Refs. [12] and [18].

Download full size

View all figures

To digitally reconstruct at a given range (depth) $z$ [see Fig. 1(b)], we backpropagate the optical rays through a virtual pinhole array. The other depth planes are out of focus when the scene is reconstructed at a particular depth. Mathematically, 3D InIm reconstruction of a scene^[21] [see Eq. (1)] or video sequences converted to image frames^[13] at time frame $t$ [see Eq. (2)] is a superposition of transversally shifted elemental images depth sliced at a user-defined plane $z_{r}$ : $I (x, y, z_{r}) = \frac{1}{O (x, y)} \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} E I_{m n} (x - m \frac{N_{x} p_{x} f}{c_{x} z_{r}}, y - n \frac{N_{y} p_{y} f}{c_{y} z_{r}}),$ (1) $I (x, y, z_{r}; t) = \frac{1}{O (x, y; t)} \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} E I_{m n} (x - m \frac{N_{x} p_{x} f}{c_{x} z_{r}}, y - n \frac{N_{y} p_{y} f}{c_{y} z_{r}}; t),$ (2)where $I$ is the intensity of the reconstructed 3D image at the pixel index $(x, y)$ or the reconstructed 3D image at pixel index $(x, y, t)$ at time frame $t$ , $E I$ is the 2D elemental image, $O (x, y)$ and $O (x, y; t)$ are the overlapping number matrices, and $M$ and $N$ are the numbers of elemental images acquired along the $x$ -axis and $y$ -axis, respectively. $N_{x}$ and $N_{y}$ are the total numbers of pixels of each 2D elemental image, $p_{x}$ and $p_{y}$ are the camera pitches, and $c_{x}$ and $c_{y}$ are the horizontal and vertical image sensor sizes, respectively. $f$ is the lens focal length. In free space imaging, $z_{r}$ is the distance to the actual object plane. According to ray optics in Fig. 1(c), when light rays reach the air–water interface, the rays will refract in water, so the true distance will be shorter than the reported distance. As a post-processing step, we modify $z_{r}$ in Eqs. (1) and (2) to account for the refractive index of water: $z_{r} = z_{air} + \frac{z_{water}}{n_{water}}$ ^[17]. $z_{air}$ , a constant value, is the distance from the camera lens to the aquarium tank glass, $z_{water}$ is the object distance in water, and $n_{water}$ is the refractive index of water ( $\approx 1.33$ ).

2.2. Lensless Imaging

Advances in optical instruments have provided users with access to better camera sensors to capture high-quality images. However, despite their improvements, camera sensors have a limited field of view due to their sensor size and focal length. To capture more photons, one would need a bigger, bulkier lens, which is expensive. An alternative solution is to use a lensless-based imaging system, which is inexpensive, less bulky, has a larger field of view, and is more compact than their lens-based counterparts^[35]. In Ref. [15], a 1D lensless camera array underwater sensing system using diffusers was proposed to classify pseudorandom patterns in turbidity and partial occlusion. It was shown that this approach works well compared with its lens-based counterpart for underwater optical signal detection^[14].

2.3. Underwater Polarimetric Integral Imaging

Polarimetric imaging is an imaging technique that captures the polarimetric information of a scene. Polarimetric imaging introduces an additional degree of information, enhancing the data for analysis and enabling a more comprehensive understanding of the observed scene or object. The polarimetric information of a scene contains the surface features of an object as well as the scattered light to enhance the contrast of an object scene in scattering media^[36]. In polarimetric imaging, the polarimetric information of the scene can be captured by using either a polarization camera or by placing a polarization filter in front of a camera sensor. Polarimetric imaging can also be performed either using active (i.e., light source) or passive illumination (i.e., natural light) depending on the environment. In Ref. [12], an active polarimetric signal detection system using polarization-difference imaging^[37,38] was proposed to transmit a polarized optical signal in turbid water to reduce the effects of turbidity. Polarization-difference imaging was applied to the captured videos and then processed using 3D InIm reconstruction [see Eq. (2)]. In Ref. [20], polarization-based image recovery using active polarization descattering and 3D InIm reconstruction [see Eq. (1)] was proposed to reduce the effects of turbidity for underwater scenes with partial occlusion.

3. Applications

In Sec. 3.1, we briefly review object visualization using statistical image processing (Sec. 3.1.1), peplography (Sec. 3.1.2), and deep learning using GAN (Sec. 3.1.3). In Sec. 3.2, we briefly review signal detection using correlation-based filtering (Sec. 3.2.1) and deep learning using (1) CNN combined with RNN (Sec. 3.2.2) and (2) CNN (Secs. 3.2.3 and 3.2.4). In Sec. 3.3, we briefly review object classification using a neural network (Sec. 3.3.1) and a dual-purpose system using deep learning (Sec. 3.3.2) with (1) CNN and (2) CNN combined with RNN.

3.1. Underwater Object Visualization

Object visualization under degradation was one of the first underwater applications reported using 3D InIm^[17]. Due to difficulty visualizing the objects under high turbidity, image restoration techniques were also used along with 3D InIm to mitigate the effects of turbidity. We briefly review InIm underwater object visualization using statistical image processing^[18], peplography^[19], and physics-based deep learning^[21]. For visualization tasks, we record images and perform 3D InIm reconstruction using Eq. (1).

3.1.1. Statistical image processing

The first object visualization report on 3D InIm reconstruction of underwater objects in turbidity was presented in Ref. [18]. A multi-class underwater turbid scene in Fig. 2(a) was degraded by turbidity. As a result, the objects could not be visualized in turbidity; thus, the authors used statistical image processing algorithms^[26] along with 3D InIm reconstruction for improved visualization. The authors assumed that turbidity was caused by light scattering, which they statistically modeled as a Gaussian distribution. While this statistical model of turbidity may not be very precise, it allows for the application of statistical approaches to reduce the effects of turbidity. The authors used maximum likelihood estimation^[39], gamma correction, histogram matching, and histogram equalization to enhance the visibility of the objects. Finally, the grayscale perspective images were preprocessed using 3D InIm reconstruction. Figure 2 shows an example of this approach where one object is reconstructed in focus at a particular depth plane and the other planes are blurred out.

Figure 2.(a) Clear water 3D scene used in the experiments. (b) The scene in turbid water. (c) Diagram of the 3D underwater imaging system in turbid water. (d)–(g) 3D integral imaging reconstruction. (d) 3D reconstructed bug focused at $z_{r} = 490 mm$ , (e) 3D reconstructed treasure focused at $z_{r} = 640 mm$ , (f) 3D reconstructed small fish focused at $z_{r} = 730 mm$ , and (g) 3D reconstructed large fish focused at $z_{r} = 770 mm$ . Reprinted with permission from Ref. [18].

Download full size

View all figures

3.1.2. Peplography

As described in Sec. 3.1.1, in Ref. [26], the authors used statistical image processing algorithms, including gamma correction, histogram equalization, and histogram stretching, that are subject to estimation errors. Therefore, it results in artificial color pixel intensities. To alleviate this issue, the authors in Ref. [19] proposed peplography (see Fig. 3), a passive technique that relies on detecting ballistic photons—photons that travel through scattering media without significant scattering—and filtering out scattered photons using statistical algorithms^[27]. This technique combines photon counting with statistical algorithms to estimate the turbid medium and extracts the ballistic photons for image reconstruction. This helps to distinguish ballistic photons from scattered photons, allowing for a less noisy visualization of objects. The detected photons are then used to reconstruct a 3D image of the scene using InIm, which involves capturing multiple perspectives of the scene and processing them to create a 3D representation of the object. However, the expected number of photons is arbitrarily chosen for ballistic photon counting due to fluctuations in their intensity. If the expected value is too low, the ballistic photons may not be captured to recover the 3D object information and to restore the scene in scattering media. 3D InIm reconstruction also uses averaging, as shown in Eq. (1), to reduce noise.

Figure 3.Flow chart of peplography for imaging in scattering media. Reprinted with permission from Ref. [19].

Download full size

View all figures

3.1.3. Physics-based deep learning

Deep learning is a computational technique that uses multilayer neural networks to learn representations of features directly from data^[40]. Conventional deep learning-based approaches are purely data-driven. However, for improving the performance and interpretability/reliability of deep learning models, incorporating the physical laws in the deep learning models would be beneficial as was proposed in Ref. [21]. In this work, 3D InIm with the physics-informed cycle generative adversarial network (CycleGAN)^[41,42] for image recovery was proposed. The network was trained on unpaired underwater datasets in turbid water with external lighting and tested in different turbidities with external lighting with and without partial occlusion. The turbidity was varied from $α = 0.0025 {mm}^{- 1}$ to $α = 0.040 {mm}^{- 1}$ , where $α$ is Beer’s coefficient. The degradation parameters were estimated using the dark channel prior^[43] and Bayesian optimization^[44] to estimate the ground truth values, the atmospheric light, and the depth. The backbone of the approach is the encoder–decoder model, where the encoder takes the clean image as input, along with the estimated ground truth parameters, and outputs a degraded image. A physical model is used to synthetically create degraded images using the same set of input parameters to the encoder, and the loss is computed between the output of the encoder and the synthetically generated images. The output of the encoder is then fed into a decoder to recover the clean images. Examples of synthetically generated degraded images of a light source and recovered clean images are shown in turbid water without occlusion and with occlusion in Figs. 4 and 5, respectively. By using a physical model, the deep learning algorithm is limited to generating outputs that are physically consistent. The authors also used an unpaired training set, eliminating the need to collect large, paired datasets. Although physics-based deep learning was applied in turbidity, the method can be extended to other degraded environments (e.g., fog and smoke) using a suitable physical degradation model.

Figure 4.(a) Sample clean image of a light source ( $α = 0.0025 {mm}^{- 1}$ ). (b)–(c) Degraded images in high turbid water ( $α = 0.040 {mm}^{- 1}$ ). (d) Recovered clean image from (b). (e) Recovered clean image from (c). $α$ is Beer’s coefficient. Reprinted with permission from Ref. [21].

Download full size

View all figures

Figure 5.(a) Sample clean image of a light source ( $α = 0.0025 {mm}^{- 1}$ ). (b) 2D degraded central perspective image in turbidity, ambient light, and partial occlusion ( $α = 0.008 {mm}^{- 1}$ ). (c) Recovered clean 2D image from (b). (d) Recovered clean 3D reconstructed image from (b). $α$ is Beer’s coefficient. Reprinted with permission from Ref. [21].

Download full size

View all figures

3.2. Underwater Optical Signal Detection

Another application of 3D InIm is optical underwater signal detection, which focuses on transmitting, receiving, and detecting bits in the presence of turbidity and ambient light conditions. Thanks to improvements in computational algorithms, we can enhance the deteriorated signal to improve the detection performance. In this section, we review signal detection papers covering correlation-based filters^[12] and deep learning-based signal detection methods^[13–15]. The optical signals are temporally encoded, so multi-perspective videos of the transmitted signal are recorded using a camera array. Then, the video frames are converted to image frames for 3D InIm reconstruction [see Eq. (2)]^[12,13,16] or fed directly into a deep learning model^[14,15]. On the transmitter side, a light source such as an LED-generated binary optical signal is coded with a gold sequence^[45]. The gold sequence, expressed as “1” (i.e., LED on) and “0” (i.e., LED off), denoting the gold code and the flipped gold code, respectively, is transmitted underwater. At the receiver end, the transmitted signal is decoded, outputting the class conditional probabilities comparing the coded sequence to the transmitted video sequence: “1,” “0,” and “idle” (i.e., neither “1” nor “0”). The “idle” class is included in Refs. [13–16] to better discriminate between “1” and “0.”

3.2.1. Active polarimetric InIm using correlation-based filtering

In Sec. 2.3, we briefly introduced underwater polarimetric InIm. An example was shown in Ref. [12], where the authors proposed an underwater single-shot polarization-difference InIm system for signal detection using four-dimensional (4D) nonlinear correlation^[29,30]. The underwater scene was illuminated by an active polarized light source, and a beam splitter divided the transmitted signal into two beams. These beams were then captured by two orthogonally polarized $1 \times 3$ camera arrays. The signals were processed using polarization-difference imaging and subsequently reconstructed using 3D InIm to create 4D reconstruction data ( $x$ , $y$ , $z$ , $t$ ) over time. Lastly, a 4D nonlinear correlation filter was applied for signal detection as depicted in Fig. 6. The system’s field of view is limited by the camera array pitch, the size of the beam splitter, and the distance between the beam splitter and the camera arrays. An alternative setup would be to replace the beam splitter and the polarized camera arrays with a polarization camera sensor array equipped with polarization filters. Also, correlation-based filters are effective with smaller datasets, but they may not generalize as well as deep learning techniques.

Figure 6.Flow chart of the underwater optical signal detection pipeline using 4D nonlinear correlation. Reprinted with permission from Ref. [12].

Download full size

View all figures

3.2.2. InIm with deep learning for underwater applications

In this section, we consider a deep learning-based approach for optical signal detection. Partial occlusion was not considered in Ref. [12] but was considered in Ref. [13] in addition to turbidity. In Ref. [13], the authors proposed a signal detection-based deep learning system using a convolutional neural network with bidirectional long short-term memory (CNN-BiLSTM)^[46–48]. The classifier learns the spatial information of the data using a pretrained CNN, GoogleNet^[49], and learns the temporal information using the BiLSTM. The training data was recorded in clear water without occlusion and evaluated on testing data in clear and turbid water with occlusion from Beer’s coefficient of $α = 0.0047 {mm}^{- 1}$ to $α = 0.0318 {mm}^{- 1}$ . After the InIm reconstruction step, the authors performed sliding-window-based temporal segmentation to segment each possible video sequence from the transmitted data sequence. After segmentation, the sequences were input to the CNN-BiLSTM model, which outputs the classification score. Last, a threshold was set to “0” and applied to the classification score to classify the transmitted video sequences (see Fig. 7). The authors assumed the depth information (light source range) of the object was known a priori for computational efficiency. However, the unknown source range can be accommodated with additional computation. The proposed system, which involves camera array calibration, 3D InIm reconstruction, depth estimation, and classification in different stages, is time-consuming and may be prone to reconstruction errors, which can influence the detection results. If we devise an end-to-end deep learning architecture, neither of the intermediate steps would be required. Thus, the likelihood of errors would be minimized, and the computational time would be improved.

Figure 7.Flow chart of CNN-BiLSTM-based signal detection using sliding window-based classification. Reprinted with permission from Ref. [13].

Download full size

View all figures

3.2.3. End-to-end integrated 1D InIm convolutional neural network for underwater applications

In contrast to Ref. [13], the approach in Ref. [14] is an end-to-end integrated architecture that could train on the spatial and temporal features of different camera perspectives without the intermediate steps such as camera calibration, depth estimation, and 3D InIm reconstruction of the light source. Thus, several improvements were made at the receiver end to increase the speed and to minimize hardware complexity without comprising signal detection performance (see Fig. 8). The authors used a $1 \times 9$ camera array to capture multi-perspective videos of the temporally encoded optical signal for better longitudinal depth resolution^[50]. The multi-perspective videos were fed as inputs to a 1D InIm convolutional neural network (1DInImCNN) deep learning model. This minimized the computational load required due to the elimination of the intermediate depth estimation and 3D InIm reconstruction. Furthermore, the spatial and temporal features were learned simultaneously rather than in separate stages. The training data was collected in clear and turbid water without occlusion from $α = 0.0001 {mm}^{- 1}$ to $α = 0.0300 {mm}^{- 1}$ and tested in clear and turbid water with occlusion from $α = 0.0001 {mm}^{- 1}$ to $α = 0.0290 {mm}^{- 1}$ . The light source was placed in arbitrary locations, assuming that the light source range was unknown a priori. In summary, the computational cost, measured using floating point operations, and the detection performance in terms of Matthew’s correlation coefficient (MCC)^[51] show that the end-to-end integrated 1DInImCNN is substantially better compared to conventional 3D InIm using CNN-BiLSTM.

Figure 8.Flow chart of end-to-end integrated 1DInImCNN-based signal detection model. Reprinted with permissions from Refs. [14,15].

Download full size

View all figures

3.2.4. Lensless imaging using diffusers

In Sec. 3.2.3, we introduced an end-to-end integrated underwater optical signal detection methodology and compared its improvements with the CNN-BiLSTM-based signal detection system presented in Sec. 3.2.2. In this section, we extend our discussion of Sec. 3.2.3 to a lensless-based sensing system^[15] (see Sec. 2.2), replacing the camera lens with a diffuser. The experiments were performed on a $1 \times 3$ lensless camera array as shown in Fig. 8. Using a 1DInImCNN, the model was trained in clear and turbid water without occlusion from $α = 0.0007 {mm}^{- 1}$ to $α = 0.0145 {mm}^{- 1}$ and tested on clear and turbid water data with occlusion from $α = 0.0010 {mm}^{- 1}$ to $α = 0.0215 {mm}^{- 1}$ (see Fig. 9). Unlike traditional lenses that form an image on the image sensor, the diffusers scatter light into speckle patterns across the image sensors. The diffuser-based lensless system compared with its lens-based system in Ref. [14] is computationally less demanding in terms of the number of floating point operations. The reason is that the computational requirement could be further reduced by randomly cropping 1D rows or columns of pixels^[52] without compromising the signal detection performance significantly. Thus, the authors used a dimensionality-reduced single-camera system with 1 pixel × 50 pixel and 1 pixel × 150 pixel and a dimensionality-reduced $1 \times 3$ camera array system with 1 pixel × 50 pixel per camera and 1 pixel × 100 pixel per camera. The authors concluded that the lensless underwater optical signal detection system can outperform the lens-based counterpart under the experimental conditions considered.

Figure 9.(a)–(c) Sample training data without occlusion. (a) Central perspective video of the encoded optical signal in clear water. (b)–(c) Sample training data in turbid water from a lensless-based 1D camera array with diffusers at $α = 0.0037 {mm}^{- 1}$ and $α = 0.0095 {mm}^{- 1}$ , respectively. (d)–(f) Sample testing data with occlusion. (d) Central perspective video of the encoded optical signal in clear water with partial occlusion. (e)–(f) Sample testing data in turbid water with partial occlusion from a lensless-based 1D camera array with diffusers at $α = 0.0170 {mm}^{- 1}$ and $α = 0.0198 {mm}^{- 1}$ , respectively. Reprinted with permission from Ref. [15].

Download full size

View all figures

3.3. Underwater Object Classification and Object Detection

Another application we present in this review is 3D InIm underwater object classification and object detection. In object classification, given an image, we assign a label to the entire image, whereas in object detection we try to localize the objects present in an image using bounding boxes. A bounding box is only assigned to a target when the confidence score exceeds a predefined threshold and has a maximum score value out of duplicate predictions according to non-maximum suppression^[53]. In this section, we review underwater reports using distortion-tolerant object classification^[22] and You Only Look Once Version 4 (YOLOv4) object detection^[54]. However, more advanced versions of YOLO^[55] and other deep learning algorithms may be used.

3.3.1. Object classification

Underwater object classification in turbidity using 3D InIm was investigated in Ref. [22]. This was the first report on using InIm for underwater 3D object classification and as such used the older neural network algorithms available. A microlens array was inserted in front of a stationary single camera sensor to capture the 3D scene on a $39 \times 26$ array of elemental images. The training data was collected on six rotation-variable objects in free space and then tested on six rotation-variable objects in clear and turbid water with and without partial occlusion from $α = 3 m^{- 1}$ to $α = 12 m^{- 1}$ . The training and testing images were preprocessed using 3D InIm reconstruction [see Eq. (1)] and computational algorithms such as gradient map^[26], normalized power transform^[26], and principal component analysis (PCA)^[56], then classified using a neural network. The classification results on the 3D reconstructed underwater testing data in terms of accuracy were 100% for all six object classes in low turbid water ( $α = 6 m^{- 1}$ ) without and with occlusion. In intermediate turbid water ( $α = 12 m^{- 1}$ ), the accuracy was 99% for two out of the six object classes and 100% for the other four object classes. In intermediate turbid water ( $α = 12 m^{- 1}$ ) with occlusion, the accuracy was 98.7%, 99.7%, and 100% for two object classes, one object class, and three object classes, respectively. One limitation of the proposed system is the acquisition of the perspective images using a microlens array, which limits the depth of field and resolution for quality 3D InIm reconstruction. To avoid these limitations and to capture high-resolution perspective images, the microlens array can be removed, and a single camera can be moved laterally on a moving platform, or the imaging system can be implemented with a camera array.

3.3.2. Object detection

Unlike object classification, which ignores the target locations, object detection is a suitable approach for localizing multiple objects. An example was applied in Ref. [16], where a dual-purpose underwater object detection and optical (temporal) signal detection-based system were explored using 3D InIm. The underwater objects were detected using YOLOv4, and the temporally encoded optical signals were classified using the CNN-BiLSTM model in Sec. 3.2.2. Both models were trained on clear and turbid water data without occlusion with Beer’s coefficient from $α = 0.0025 {mm}^{- 1}$ to $α = 0.0101 {mm}^{- 1}$ and tested on clear and turbid water data with occlusion from $α = 0.0027 {mm}^{- 1}$ to $α = 0.0391 {mm}^{- 1}$ . Examples of 3D reconstructed underwater testing scenes of two objects for object detection and a red LED signal for temporal signal detection in partial occlusion are shown in Figs. 10(b), 10(c), 10(e), 10(f), 10(h), and 10(j), respectively. The proposed approach is a comprehensive underwater sensing system that can localize underwater objects and permits optical communication for signal transmission and detection. Furthermore, the system can be installed on underwater robots to navigate, acquire data, and identify objects. Therefore, the system has benefits in various applications such as coastal monitoring, underwater visualization, and underwater signal detection.

Figure 10.Sample underwater testing scenes with occlusion applying object detection and signal detection. Reprinted with permission from Ref. [16].

Download full size

View all figures

4. Results and Discussion

In summary, we compare the previously published InIm-based approaches in underwater imaging scenarios. In Sec. 3.1.1, the turbid images were processed using statistical image processing algorithms, namely, histogram matching, histogram equalization, and gamma correction to remedy the effects of turbidity. However, these algorithms generated color pixel errors. Peplography was introduced in Sec. 3.1.2 to solve the artificial color pixel intensities issue in Sec. 3.1.1 using scattering media estimation and ballistic photon detection. This method works in any homogeneous as well as inhomogeneous media. However, the optimal number of ballistic photons is manually chosen. If this parameter is not chosen correctly, the ballistic photons may not be recovered, or saturation can occur in the reconstructed 3D image.

In Sec. 3.2.1, correlation-based filtering was integrated into an active polarimetric-based InIm system for underwater signal detection. However, correlation-based filters are effective for smaller object variations and are not as effective as deep learning approaches. The earliest underwater InIm report using a neural network was highlighted in Sec. 3.3.1, which used object classification to classify underwater objects. However, the system in Sec. 3.3.1 acquired the perspective images using a microlens array, which limited the depth of field and reduced the resolution, thus, resulting in poor quality reconstructed 3D images. This approach can be improved using a camera array or a single camera sensor and with more advanced deep learning algorithms. For capturing high-quality perspective images, the systems in Secs. 3.1.1 –3.1.3, 3.2.1 –3.2.4, and 3.3.2 do not use a microlens array. Additionally, deep learning algorithms were applied in object visualization (Sec. 3.1.3), signal detection (Secs. 3.2.2 –3.2.4), and object detection (Sec. 3.3.2).

Section 3.1.3 incorporated the physical laws into a deep learning model using an unpaired training set to eliminate the need to collect large, paired datasets, whereas Secs. 3.2.2–3.2.4 and 3.3.2 discussed purely data-driven deep learning approaches. The work in Sec. 3.1.3 can be extended to other scattering media with any other physical degradation model as well. Sections 3.2.2 and 3.3.2 implemented a 3D InIm-based system using a CNN-BiLSTM model. The reconstruction of the image frames, along with camera array calibration, depth estimation, and spatial and temporal feature extraction, increased the computational time and introduced errors, thus, influencing the detection performance. Section 3.2.3 eliminated these intermediate steps by directly feeding the captured multi-perspective images to a deep learning model (1DInImCNN) to lower the computational complexity. Section 3.2.4 replaced the lens-based imaging system in Sec. 3.2.3 with a lensless-based imaging system using diffusers and showed that by reducing the dimensionality of the diffuser data, the computational time of the 1DInImCNN model could be decreased, and the detection performance could be improved substantially. Sections 3.2.1 –3.2.4 and 3.3.1 reviewed signal detection-based and object classification-based systems, respectively. Section 3.3.2 reviewed a dual-purpose underwater system that can perform object detection and temporal signal detection simultaneously.

5. Conclusion

We reviewed the published reports on underwater imaging and sensing using 3D InIm for object visualization^[18,19,21], temporal signal detection^[12–16], object classification^[22], and object detection^[16]. 3D InIm is shown to be effective in reducing the effect of turbidity, having depth-sectioning capabilities, and working in challenging environments such as partial occlusion. Visualization of objects degraded by scattering and absorption may be remedied using statistical image processing algorithms, peplography, and physics-based deep learning. For signal detection, we reviewed correlation-based filtering using a 4D nonlinear correlation filter, deep learning using a CNN-BiLSTM model, and an end-to-end integrated 1DInImCNN model. We also reviewed YOLOv4 and a distortion-tolerant neural network for object detection and object classification, respectively.

Although LiDAR was briefly discussed, detailed descriptions and reviews of this approach are outside the scope of this manuscript; therefore, we limit our discussions only to 3D InIm-based approaches. Most of our research efforts have focused on 3D InIm in turbidity without occlusion or with partial occlusion, but there are other degraded environments that we have not considered that are of importance in underwater optical imaging and that could be considered in future work such as turbulence and photon sparse environments. All underwater 3D InIm systems covered in this review are lens-based, except for the work in Ref. [15]. To the best of our knowledge, Ref. [15] is the earliest report implementing a lensless-based 3D InIm system using diffusers in underwater imaging. Lensless-based systems are gaining interest in the research community due to their low-cost, compact size, large field of view, and low weight over lens-based imaging systems. However, there is limited research on this subject in underwater imaging. Polarimetric underwater imaging, which uses active illumination, was covered in signal detection using 3D InIm and polarization-difference imaging, object visualization using 3D InIm and active polarization descattering, and in other published reports in turbid environments^[57–60]. There are more advanced polarimetric imaging techniques in the underwater literature, which are outside the scope of this paper, such as Stokes-based imaging^[61] and Mueller matrix imaging^[62,63]. These techniques could be potential future work for underwater 3D InIm-based systems. As in any review papers of this nature, we may have omitted discussions reported in published papers or works by other research groups, for which we apologize in advance.

Category: Review Article

Received: May. 8, 2024

Accepted: Oct. 17, 2024

Published Online: Nov. 21, 2024

The Author Email: Bahram Javidi (bahram.javidi@uconn.edu)

DOI:10.3788/AI.2025.20002

微信扫一扫：分享