Quantitative phase microscopy is capable of achieving nondestructive and label-free imaging of transparent samples, rendering it suitable for biological cell research. However, the coupling of sample refractive index and physical thickness cannot be presented separately in phase data. Decoupling methods require tedious experimental and computational processes and thus cannot meet the needs of automated real-time detection in biomedical research and applications. To address this issue, this study constructs a new semantic segmentation network based on U-Net by adding the attention mechanism under the idea of a residual structure and dense connection module. This enables exploration of the method of decoupling the physical thickness and refractive index of uniform medium samples based on a single-phase map. The model was trained on a dataset comprising polystyrene microsphere phase maps, and phase data decoupling was achieved for samples of mature red blood cells with different geometric features. The relative error of the average refractive index obtained via single-frame phase separation was 0.9%. This method requires only a single-phase map of the sample in any direction and trains a neural network model using standard samples for highly specific quantitative extraction of chemical and physical information from biological cell samples. Moreover, it has the characteristics of convenient data collection and low computational complexity and can serve as a reference for automated quantitative analysis of phase information.
To address the decline in the image quality under low-light conditions, low-light image enhancement methods aim to improve the visible details such as brightness and color richness of degraded images to produce clearer images that align more closely with human visual expectations. Although remarkable progress has been made in deep learning-based enhancement methods, traditional convolutional neural networks have limitations in terms of feature extraction due to locality, rendering the effective modeling of long-distance relationships between image pixels challenging for the network. In contrast, the Transformer model utilizes the self-attention mechanism to better capture long-range dependencies between pixels. However, existing research reveals that global self-attention mechanisms can lead to a lack of spatial locality in networks, thereby deteriorating the processing ability of transformer-based networks for local feature details. Therefore, in this study, a novel low-light image enhancement network, MFF-Net, is proposed. The principle of cross-domain feature fusion is adopted to integrate the advantages of convolutional neural network and Transformer to obtain cross-domain feature representations containing multiscale and multidimensional information. In addition, to maintain feature semantic consistency, a feature semantic transformation module is specially designed. Experimental results on public low-light datasets show that the proposed MFF-Net achieves better enhancement effects than mainstream methods, with the generated images exhibiting better visual quality.
Herein, a V-shaped pyramid bilateral feature fusion network (VPBF-Net) is proposed to address small-scale target missing segmentation, inaccurate edge segmentation, and inefficient fusion of deep and shallow feature information in current semantic segmentation networks. In the encoding stage, a V-shaped atrous spatial pyramid pooling (VASPP) module adopts multiple-parallel-branch interactive connection structures to enhance the information exchange between the local semantic information of each branch. In addition, multibranch feature hierarchical fusion is adopted to reduce grid artifact effects. Furthermore, a coordinate attention module is used to assign weights to the extracted deep semantic information, enhancing the network's attention to the segmentation target. In the decoding stage, a bilateral attention feature aggregation module is designed to guide shallow feature fusion through multiscale deep semantic information, thereby capturing different-scaled shallow feature representations and achieving more efficient deep and shallow feature fusion. Experiments are conducted on the PASCAL VOC 2012 dataset and Cityscapes dataset, the proposed method achieves average intersection to union ratios of 83.25% and 77.21%, respectively, indicating advanced results. Compared with other methods, the proposed method can more accurately perform small-scale object segmentation, alleviating missed segmentation and misclassification.
This paper proposed an underwater optical image enhancement algorithm based on degradation characteristic indices. First, this method determined the degradation characteristic present in the original image based on the these indices. Second, the image restoration process was performed according to the degradation characteristics of the original image. Finally, an image enhancement method was applied to the restored image using the bounded general logarithm ratio operation. The proposed algorithm was tested to identify degradation characteristics and enhance images in two typical underwater scenarios: one with auxiliary lighting and the other with natural lighting. Processing results showed that the degradation characteristic parameters were restored to a reasonable range, and the image enhancement effect reached an ideal level. For the middle crack image, with a mean adaptive gradient gain of 1.5054, the maximum brightness difference of the aperture layer decreased from 155 to 44, perceptual fog density reduced from 2.38 to 0.37, dynamic range ratio increased from 60.00% to 76.08%, and contrast increased from 6.15 to 107.35. For the slope image, with a mean adaptive gradient gain of 1.5678, the maximum brightness difference of the aperture layer decreased from 65 to 24, perceptual fog density decreased from 0.62 to 0.21, dynamic range ratio increased from 29.41% to 89.80%, color distortion index improved from 0.66 to 1.00, and contrast increased from 30.77 to 316.25. The proposed algorithm was compared with nine existing enhancement algorithms to evaluate its effectiveness. Results show that the proposed algorithm has advantages in terms of image restoration and enhancement.
An underwater target detection algorithm that uses a multiscale and cross-spatial information aggregation network is proposed. First, a deformable layer aggregation module is used within the backbone network to extract features, enhancing the network's positioning accuracy. Second, the Conv2former module is used to enhance the neck's global information extraction capability and reduce missing detections caused by mutual occlusion among underwater targets. Finally, a multiscale attention parallel enhancement module that uses parallel convolution blocks to extract deeper features is proposed. This module integrates an efficient multiscale attention module to filter out interference from background and image distortion and introduces multiple cross-level connections to effectively integrate low-level local features with high-level strong semantic information, thereby improving model detection accuracy. The ablation experiment is conducted on the URPC dataset. Compared with the original model, the accuracy rate, recall rate, mean average precision (mAP)@0.5, and mAP@0.5∶0.95 of the improved model increase by 3.6 percentage points, 2.6 percentage points, 3.5 percentage points, and 3.3 percentage points, respectively. Tests on the RUOD dataset under different scenarios indicate that the proposed model offers notable advantages over several current mainstream models.
In response to the challenges of fuzzy image and numerous small targets in underwater target detection, which lead to missed detection and false detection with the YOLOv8n algorithm, we proposed an enhanced lightweight underwater target detection algorithm. Initially, within the backbone network, certain convolutions were substituted with non-strided space-to-depth convolution, and a global attention mechanism was introduced to augment global contextual information, thereby improving the network's ability to extract features from blurry and small targets. Subsequently, the conventional upsampling method was replaced with a lightweight upsampling operator, content aware reassembly of features, to broaden the model's receptive field. Furthermore, the normalized Wasserstein distance was introduced and integrated with complete intersection over union to devise a novel localization regression loss function, aimed at increasing the accuracy of small target localization in complex underwater environment. Finally, a dynamic target detection head combined with parameterized rectified linear unit was proposed to enhance the performance of the original detection head, thereby improving the model's proficiency in managing small underwater targets. Experimental results demonstrated that the improved YOLOv8n algorithm achieved a mean average precision of 86.62% on the RUOD dataset, marking a 3.20 percentage points improvement over that of the original YOLOv8n algorithm. The total number of model parameters was 5.67 M, with the number of gigabit floating-point operations is 12.5, fulfilling the criteria for lightweight model.
In response to concerns about the insufficient visibility of target information and loss of details in traditional multiscale fusion methods for infrared and visible images, this paper proposed a hybrid multiscale decomposition fusion method based on anisotropic guided filtering. Initially, an adaptive image enhancement method based on texture contours was introduced to improve visible images by simultaneously enhancing brightness, contrast in dark regions, and texture details. Subsequently, the brightness layer of the source image was extracted using the edge-preserving smoothing property of anisotropic guided filtering. The difference layer was decomposed into a base layer, a small-scale detail layer, and multiple levels of large-scale detail layers via Gaussian filtering. The fusion rule for the brightness layer employed an absolute maximum value approach, and a fusion method that combined visual saliency with least squares optimization was proposed for the base layer. The small-scale detail layer adopted a fusion strategy based on modified Laplacian energy, and the large-scale detail layers employed a composite fusion strategy based on local variance and spatial frequency. Finally, the fusion image was reconstructed by combining the merged layers. Compared with nine other classic and advanced methods, the proposed method performs well in both subjective and objective analyses.
Fine-grained bird recognition tasks frequently face the challenges of small interclass and large intraclass differences. In this study, we propose an incremental learning method for fine-grained bird recognition based on prompt learning. Learnable visual prompts are first introduced into the incremental learning model to alleviate the phenomenon of catastrophic forgetting in the incremental learning model. For fine-grained bird recognition, text information of different granularities is introduced as the text prompts in the incremental learning model, which are then fused with the visual prompts to learn the characteristics of different birds from coarse to fine and to improve fine-grained bird recognition accuracy. Numerical experiments on the CUB-200-2011 dataset show that the proposed model has better image recognition accuracy than other incremental learning models. For general image recognition tasks, proposed method exhibits higher recognition accuracy and better anti-forgetting on CIFAR-100 and 5-datasets.
One of the difficulties in the expansion of light field data is to expand the viewpoint and image point plane support simultaneously and maintain a good space-angle consistency. In this paper, we propose to use the neural light field network to represent the rays parameterized by a biplane, generate the rays that do not exist in the atomic light field data, and extend the viewpoint and image plane branches. To gather statistics of extension part of the error of generated rays, we refer to the error between the generated rays and original data in the overlapping area of the sub-light field. It allows determining the proportion of data with good generation effect in the extended part. We analyze the influence of the size of the overlapping area of the sub-light field on the effect of the extended light field. Experimental results on Blender simulation data show that the proposed method can realize the simultaneous expansion of the sub-light field viewpoint and image plane branch, and the epipolar plane images (EPI) display extension part can maintain a good space-angle consistency. When the proportion of overlapping regions of sub-light field data increases from 42.9% to 77.8%, the proportion of data with good generation effect in extended regions increases from 82.91% to 84.68%. This analysis has certain guiding significance for the design of sub-light field data when expanding light field data.
The current mainstream lightweight object detection models exhibit low detection accuracy in unmanned aerial vehicle (UAV) photography scenes. This study introduces a high-precision and lightweight aerial photography image object detection model based on YOLOv8s, named LEFE-YOLOv8. First, an enhanced feature extraction convolution (EFEConv) incorporating an attention mechanism was developed. It is integrated with partial channel convolution (PConv) and 1×1 convolution to create a lightweight enhanced feature extraction module. This integration augments the model's feature extraction capabilities and reduces the number of parameters and computational complexity. Subsequently, a lightweight dynamic upsampling operator module was incorporated into the feature fusion network, effectively addressing the information loss problem during the upsampling process in high-level feature networks. Finally, a detection head with multi-scale modules was designed to enhance the network model's multi-scale detection capabilities. The final experimental results demonstrate that, compared with the benchmark model, the improved model achieves an average accuracy of 42.3% and 83.9% on the VisDrone2019 and HIT-UAV datasets, respectively, with less than 10×106 parameters. These results establish the model's suitability for aerial image object detection tasks.
Aiming at the degradation of actual reconstructed image quality caused by the mismatch between ideal and practical optical transmission models in a computer-generated holography algorithm, this paper proposes a simplified learning-based holographic optical transmission model that uses physical information. The proposed model can explicitly learn the defects of holographic displays and can be flexibly used in various hologram optimization algorithms to solve the mismatch problem in optical transmission models. In the untuned holographic display prototype, reconstruction results of the proposed model are superior to those of ideal holographic light transmission model. Moreover, the proposed model can obtain high-quality holographic reconstruction images without strict requirements such as the fine assembly of optical elements and the good uniformity of laser light sources.
Flash photography is an important diagnostic technique for transient processes such as high-speed collisions and explosions. Conventional flash photography systems use enhanced complementary metal-oxide-semiconductor (ICMOS) cameras coupled with light cones as imaging devices, which have the advantages of small size, low weight, and fast imaging speed. However, the coupling of light cones can introduce complex image distortion, thereby reducing the imaging performance. In this study, based on the image distortion characteristics caused by the coupling of light cones in ICMOS cameras employed in flash photography systems, an efficient distortion correction method is proposed. In particular, variable step field-of-view segmentation is performed on image edges based on a field-of-view segmentation model to solve the problem of poor edge correction caused by insufficient edge feature points in images. Experimental results show that compared with existing methods, the proposed method reduces the root mean square error and maximum residual error of the corrected image by 17.8% and 70.2%, respectively. In addition, the proposed variable step field-of-view segmentation method has higher accuracy than existing methods for correcting image distortion caused by ICMOS-coupled light cones.
The cyclopean eye serves as the reference point for determining visual direction in humans. Traditional theory places this point at the midpoint between the two eyes. However, this assumption overlooks individual differences in cyclopean eye positions, potentially leading to inconsistencies in viewing and interaction experiences within stereoscopic environments. Such discrepancies can notably reduce user interaction quality with stereoscopic display content, potentially hindering broader adoption of this technology. To address these differences, a method is developed for measuring cyclopean eye position coordinates using a stereoscopic display system. This technique involves incorporating a cyclopean eye testing experiment based on an improved average error method. Through subjective experimental measurements, we calculate the cyclopean eye positions of different users. The experimental results confirm that the proposed method effectively captures deviations in users' cyclopean eye position. Our findings are expected to develop strategies to improve interactive virtual reality experiences.
Aiming at the problem that the traditional autofocus method needs to collect more defocused images, which greatly increases focusing time and limits its application in visual measurement systems, an autofocus method based on deep learning is proposed. This method transforms the autofocus problem into an image defocus distance prediction problem. First, a lightweight deep regression network is constructed using ShuffleNetv2 and a multilayer perceptron (MLP). The network is subsequently trained on the collected target image dataset in the working scene. Through a reasonable focusing strategy, two frames of images can be used to complete the focusing, which reduces the focusing time, thereby circumventing the problem of large focusing error caused by local extreme points in the traditional autofocus method. The experimental results show that the focusing time of this method is only 15%?24% of the traditional autofocus method, and the focusing stability is improved by about 40% compared with the traditional autofocus method, providing the advantages of fast focusing speed, high focusing stability, and low model complexity, which can be well applied to the visual measurement system.
Vascular visualization is crucial for investigating vascular diseases, the development mechanism of chronic diseases, as well as the related diagnosis and treatment. Meanwhile, laser speckle imaging technology is widely used in vascular visualization and blood-flow monitoring, although the quality of vascular imaging is degraded by the presence of various noises. To improve the quality of deep vascular imaging, this study investigates the capability of vascular imaging and the feasibility of detecting the relative flow rate of blood flow of four existing spatial-domain lining-ratio methods via invivo experiments. Additionally, a quantitative assessment of the vascular-visualization capability based on the contrast noise ratio is introduced. The results show that the adaptive-window-space directional contrast method offers better imaging than the other three spatial contrast methods (spatial contrast, space directional contrast, and adaptive-window contrast methods). Based on an in vivo experiment, the adaptive-window-space directional contrast method maintains high-quality and high-resolution blood-flow mapping, thus retaining more microvascular structural and functional information. Consequently, more comprehensive blood-flow distribution maps are obtained, thereby facilitating the monitoring of blood flow in deep tissues.
The temperature increase of homing head caused by thermal radiation increase under high-energy laser irradiation, severely affects the signal-to-noise ratio (SNR) and imaging quality of optical systems. To address this issue, this study simulates an optical system of a roll-pitch infrared homing head in the 3?5 μm band, focuses on the temperature effects on the optical system under high-energy laser irradiation, and explores the impact of the internal temperature rise on the infrared homing head performance. Results show that as the internal temperature rises from 293.15 K to 1073.15 K, the SNR of the optical system decreases from 8.73 to ~0, causing the homing head to completely lose its target recognition capability. Thus, this research emphasizes the need to focus on the thermal stability of optical components when designing homing head systems to ensure their imaging quality and guidance precision during operation. This study not only helps improve the current design of homing heads but also provides a theoretical foundation for the development of subsequent high-energy laser defense systems.
To address the limitations of existing research methods in terms of comprehensively measuring vehicle dimensions and accurately identifying vehicle information, this study proposes an innovative method for vehicle size and information recognition. First, the YOLO V5s network automatically recognizes target vehicles, followed by the creation of a vehicle database containing detailed attributes such as vehicle model, size, number of axles, wheelbase, suspension, and tire specifications. Second, vehicle contours are extracted using the convolutional encoder-decoder network and optimized using the dilation corrosion algorithm. To improve processing speed and reduce computational resource requirements, a depthwise separable convolution technique is integrated into the DenseNet network. Finally, the optimized DenseNet network predicts the vehicle's specific dimensions from the contour map. The experimental results show that the proposed method achieves width and height measurement errors within a range of ±60 mm, and length measurement errors within ±85 mm. In addition, this method provides information on number of axles, wheelbase, suspension, and tire parameters. This study demonstrates that by integrating image recognition and deep learning techniques, it is possible to effectively measure vehicle dimensions and identify relevant information, resulting in substantial improvements in accuracy for applications such as dynamic weighing systems.
To address the challenges of low detection accuracy, frequent missed detections, and false detections in steel surface defect detection for small target samples, an improved YOLOv7 algorithm is proposed. First, the Swin-Transformer is introduced for feature extraction, and a dual-branch loop is employed to fuse global and local features interactively. Second, distribution shift convolution is utilized to improve the efficient layer aggregation network, thereby enhance its local feature extraction capability. Finally, the regression loss function incorporates the weighted normalized Wasserstein distance and complete intersection over union methods to reduce the sensitivity to small target deviations. Experimental comparisons on the NEU-DET dataset demonstrate that the improved algorithm increases the mean average precision by 8.8 percentage points, reaching 83.1%. Proposed algorithm improves the accuracy of steel surface defect detection and reduces false detections and missed detections for small target samples. In addition, the proposedd algorithm achieves a detection speed of 71 frames/s, meeting the requirements for real-time detection.
To address the inadequate detection precision and computational efficiency of common traffic sign detection methods under poor lighting conditions, capturing small distant targets, and in complex backgrounds, this study introduces an enhanced YOLOv5s algorithm, named BMGE-YOLOv5s. The proposed method employs BoTNet (bottleneck Transformer network) to replace the original backbone network of YOLOv5s. It also designs a lightweight network, C3GBneckv2, which integrates the GhostNetv2 bottleneck and an efficient channel attention mechanism, reducing the number of parameters while significantly enhancing the feature extraction capability for traffic signs. To further enhance the accuracy of bounding box localization, the MPDIoU loss function is utilized. Experimental results indicate that the improved network model achieves a mean average precision of 93.1% at an intersection ratio threshold of 0.5, indicating an improvement of 3.3 percentage points over the baseline model on the same dataset. Moreover, the proposed model demonstrates a 9.375% decrease in floating-point operations, a ~25.98% decrease in the number of parameters, and a ~67.40% increase in detection speed. The proposed algorithm effectively balances robustness and real-time performance, showing a clear performance advantage over traditional methods.
Nondestructive testing technologies, each based on different principles, exhibit inherent limitations when used in isolation. The integration of multiple testing technologies can mitigate these limitations, offering a more robust and comprehensive defect detection process. This study proposes a composite defect detection method that integrates digital shearing speckle pattern interferometry with infrared thermography. A unified excitation source is utilized to harmonize the optical paths, enabling synchronized data acquisition from these two technologies. The synthesized results are then analyzed to identify defects. Experimental findings indicate that the proposed method effectively combines the strengths of the two technologies, resulting in excellent defect detection performance in real-world scenarios.
Advances in high-sensitivity detection technology have made single-photon-level sensitivity feasible. Combined with high-precision timing technology, laser lidar using time-dependent single-photon counting has emerged, offering substantial improvements in measurement resolution and accuracy. However, this high-sensitivity detection is highly vulnerable to environmental noise, generating considerable noise data during photon-counting lidar measurements. To address the influence of noise, a photon-counting lidar point cloud data denoising algorithm based on a backpropagation (BP) neural network is proposed. By selecting and normalizing the point cloud data feature values, the BP neural network is trained to accurately perform binary classification denoising. The proposed algorithm minimizes the human error associated with conventional denoising algorithms, delivers excellent denoising performance, and is strongly adaptable to various detection environments. Even under strong background noise detection conditions, the proposed algorithm obtains an F-number of 0.9773. In addition, the proposed algorithm exhibits good information extraction capabilities under simulated background noise conditions, and its consistency advantage is further confirmed through validation on ICESAT-2 point cloud experimental data.
This paper proposes a road marking point cloud extraction method based on enhanced RandLA-Net to address the issue of inadequate accuracy in road marking extraction within high-precision maps. Road markings possess spatial characteristics such as smoothness, minimal undulation, approximate parallelism to the horizontal plane, and high echo intensity. To differentiate road markings from other features, we used total variance, flatness, perpendicularity, and echo intensity, thereby enhancing the distinction and similarity in RandLA-Net neighborhood point clouds. First, we calculated the three covariance features of the point cloud. Second, we applied the improved RandLA-Net feature fusion module to realize feature fusion and semantic segmentation. The segmentation results were then refined by Euclidean clustering to derive the final road marking point cloud. The proposed method was validated on the publicly available Toronto-3D and WHU-MLS datasets and compared with prevalent point cloud semantic segmentation methods and traditional thresholding techniques at the semantic segmentation and road marking extraction stages. The experimental results demonstrate that the proposed method provides more complete and accurate road marking point clouds.
As a hot topic in computer vision, binocular stereo matching has broad applications in various tasks such as distance perception, remote sensing, and autonomous driving. An end-to-end disparity estimation method based on an improved attention concatenation cost volume network is proposed herein to address the challenges of depth discontinuity and inaccurate disparity prediction in boundary regions observed in current methods. First, a multiscale feature fusion network is introduced to combine multiscale feature maps containing rich information from both shallow and deep layers. This approach enhances the fine-grained representation of image details and mitigates the problem of inaccurate disparity prediction in areas with depth discontinuities. Subsequently, a Sobel edge smoothing loss is designed to establish a constraint between the disparity map boundary and the scene's edge contours, alleviating inaccuracies in disparity prediction at the image's target boundaries. Experimental verification of the proposed method on the Sceneflow dataset reveals that the proposed method achieves 0.467 score in the EPE metric and 1.51% in the D1 metric. On the KITTI dataset, the method achieves 1.44% score in the 3-All metric and 1.61% in the D1-All metric. Compared to the attention concatenation cost volume network, the proposed method shows reduced EPE and D1 scores by 3.51% and 5.63%, respectively, and reduced 3-All and D1-All metrics by 2.04% and 2.42%, respectively, demonstrating superior disparity estimation performance.
A voxel self-attention auxiliary (VSAA) network is proposed to address the issue of poor detection performance in LiDAR object detection algorithms for autonomous driving scenes. This issue stems from a lack of deep understanding of the spatial structure, owing to its reliance on a convolutional neural network (CNN). VSAA network can be directly applied to most voxel-based target detection algorithms to enhance its feature extraction capabilities. First, the VSAA network enhances the efficiency of searching relevant voxels in subsequent self-attention calculations by further constructing voxel hash tables for secondary encoding, based on the foundation of voxel feature encoding. Second, VSAA network applies the self-attention mechanism at the voxel level to capture comprehensive global information and profound contextual semantic information. Finally, this study proposes the VA-SECOND and VA-PVRCNN algorithms by applying VSAA network to the benchmark algorithms SECOND and PV-RCNN, respectively. The features of VSAA network and CNN are fused to compensate for the disadvantage of the small receptive field of the CNN, thus enhancing the detection ability of the algorithm and allowing it to understand an entire spatial scene. Experimental results obtained using the KITTI dataset show that, compared with the benchmark algorithms, VA-SECOND and VA-PVRCNN algorithms improve the average detection accuracy of all detected targets by 1.16 percentage point and 1.54 percentage point, respectively, which proves the effectiveness of the VSAA network.
To address the scale variation challenges during target handoff in dual-camera systems, this paper proposes a dual-field-of-view target handoff method based on feature association. The approach initially localizes the target in the switched field of view using a homography matrix and then employs an optimized YOLOv5 object detection network to search for candidate targets. Finally, this approach uses an enhanced OSNet network for feature association. To improve the accuracy of target handoff, the loss function of YOLOv5 was optimized. Additionally, the bottleneck attention module and cosine distance metric were introduced into OSNet. Experimental results on the CrowdHuman and Market-1501 datasets indicate that the optimized YOLOv5 network increases the average precision by 1.0 percentage point, achieving a precision of 38.5%. The mean average precision of the improved OSNet network increases by 5.4 percentage point, reaching 68.1%. When deployed on the Rockchip RK3399Pro embedded platform, equipped with 60 frame/s cameras of resolutions 1600 × 1200 and focal lengths of 35 mm and 8 mm, respectively, this approach accurately completes the target handoff within 14 frames, demonstrating the feasibility and stability of the proposed method in real-world surveillance scenarios.
To solve the problem wherein motion blur reduces the operational accuracy of visual simultaneous localization and mapping (SLAM), this study proposes an improved visual SLAM method based on blurred image evaluation and feature matching. First, following analysis of the generation principle of image motion blur, a blur parameter is designed based on the re-blurring theory to express the blur degree of the image. Then, adaptive thresholding is used to remove the blurred image. Finally, the grid-based motion statistics algorithm is improved for the feature matching process, replacing the commonly used feature matching method in SLAM. Experiments and analysis of two open source datasets are conducted under different environments. The results show that: 1) the designed blur parameters effectively represent the blur degree of the image. In the prediction of image quality evaluation scores on a standard image library, the root mean square error is reduced by 9.3%-12.3% as compared with other algorithms. Compared with other algorithms when using adaptive thresholds for fuzzy image classification on the KITTI dataset, the accuracy rate and F1 score under the proposed method are increased by 11.0%-17.2% and 22.9%-30.9%, respectively. 2) The improved feature matching algorithm improves the quality of feature matching. Compared with the conventional feature matching algorithm on the KITTI dataset, the interior point rate and matching accuracy under the proposed method are increased by 11.6%-33.1% and 30.4%-38.9%, respectively, and the matching time is reduced by 52.8%-55.8%. 3) In general, the proposed method can reduce the negative effects of motion blur on visual SLAM positioning. Compared with conventional visual SLAM, when processing image sequences of long-distance complex lines, the proposed method reduces the average absolute error and RMS error by 10.4%-26.0% and 10.0%-27.3%, respectively.
The unsatisfied effects of point cloud classification and segmentation caused by insufficient geometric feature extraction and high-level semantic feature loss in deep learning-based point cloud processing methods are problematic. Hence, this study proposes a point cloud classification and segmentation network based on topological awareness and channel attention to address these issues. The proposed method improves the precision of point cloud classification and segmentation from information amplification and feature enhancement. First, for the weak expression of low-level geometric features caused by the point cloud data being disordered, the topological relationships between the point cloud data and their neighborhood points are constructed using the local minimum triangulation strategy. Then, the residual multilayer perceptron module is applied to extract more granular local geometric information. Finally, the mixed pooling strategy is exploited for the feature aggregation of local information, and the channel affinity attention mechanism is used to reduce the feature loss in the network. Extensive comparative experiments are conducted on the ModelNet40 dataset, ScanObjectNN dataset, ShapeNet Part dataset, and S3DIS dataset. Using the proposed method, overall accuracy values of 93.6% and 85.6% are achieved in a classification task. Moreover, average crossover ratio values of 85.8% and 63.7% of are achieved in a segmentation task. Therefore, the experimental results demonstrate that the proposed method displays state-of-the-art performance in both point cloud classification and segmentation tasks.
Detecting irregular obstacles under complex scenarios of intelligent parking is a difficult task. Therefore, a method that employs a gridded structured light projection for the detection area is proposed in this study. Specifically, this method captures the deformation of structured light grids on obstacle surfaces, thereby enhancing the precision of obstacle feature collection. In addition, a method for generating depth maps via the training of an end-to-end network is introduced. Subsequently, the fusion of external contour features from red green blue (RGB) images with three-dimensional (3D) depth features from depth images is achieved, culminating in the proposition of a dual-feature parallel processing algorithm for RGB and depth imagery. A multi-scale feature fusion extraction model is designed, facilitating multifaceted feature extraction and in-depth fusion without escalating model complexity, which enables the transition of mesh models towards accurate 3D representations. Consequently, a multi-scale feature-informed, graph convolutional neural network-based end-to-end 3D reconstruction model is established. Experimental results in intelligent parking scenarios indicate that compared to foundational 3D reconstruction models, the model proposed herein achieves a mean reduction of 2% and 9% in chamfer distance and earth mover’s distance, respectively. Furthermore, relative to three mainstream 3D reconstruction models, the mean reduction in chamfer distance is 60%, 2%, and 78%, respectively, while the reduction in earth mover’s distance is 16%, 23%, and 91%, respectively.
In this study, a vertebral localization model based on a spatial attention mechanism is proposed to realize automatic diagnosis of the scoliosis Cobb angle and infer the type of scoliosis. First, taking a complete human vertebral X-ray as the research object, the proposed model simultaneously extracts the center point of the vertebral column and cervical vertebrae, completes the anchor point localization of the vertebral position, and filters the cervical vertebrae to get the center point of the vertebral column through the anchor point. Second, it uses the angular offset to trace four angular points of the vertebral column center point for restoring the vertebral column and locating the vertebral region of interest (ROI). Finally, it constructs a vector to compute the scoliosis Cobb angle and infer the scoliosis type on the image of the vertebral ROI using the minimum outer rectangle method. Experimental results show that the accuracy of the Cobb angle measurement of proposed method is higher than those of the regression-based and segmentation-based methods in the proximal thoracic, main thoracic, and thoracolumbar regions of the vertebral. The consistency analysis of the automatic diagnostic method and the manual measurement methods by two doctors shows that the correlation coefficients of the two groups are 0.901 and 0.913, and the corresponding average absolute deviations are 3.05° and 2.90°, demonstrating that the automatic diagnostic method has good reliability.
Hyperspectral anomaly detection processes background and anomalous targets in spectral data using an unsupervised approach. However, the complex nature of hyperspectral background distributions and the presence of anomalous targets in the training samples challenge the model's generalization and application capabilities. To address this issue, we propose the S2FDNet detection network that integrates sample self-learning with dual feature fusion. First, an anomaly background category search algorithm based on measure K-means was employed to classify the background and anomaly rough labels under weak supervision. A dual spectral and spatial feature extraction framework, including a global-local spectral feature extraction module and a multiscale spatial feature extraction module, was then employed to enhance the discriminative capacity for background and anomalous features across high-dimensional spaces. The model underwent updates to abnormal and background sample sets and model parameters during the weakly supervised training mode, and anomalies were directly detected using predicted probabilities during testing. Evaluations with two hyperspectral datasets confirm that S2FDNet algorithm effectively identifies anomalous targets and improves the distinction between background and anomalies.
For hyperspectral image (HSI) classification, although convolutional neural network (CNN)-based feature extraction methods have been widely applied and have achieved notable results, they still have limitations such as fixed receptive-field sizes and a tendency to overlook spatial-spectral correlations when extracting local features. In this regard, a Transformer network architecture that integrates multigranularity CNN and spatial-spectral self-attention (SSSA) is proposed herein. This architecture optimizes traditional CNN using multigranularity CNN by employing three-dimensional and two-dimensional convolutions to extract spatial-spectral and deep spatial features. Meanwhile, heterogeneous convolution is employed to finely extract multigranularity features, thereby overcoming the limitation of fixed kernel size in traditional CNN. In addition, to solve the problem of the neglect of local features in the self-attention mechanism in traditional Transformers, the mechanism is improved to enable the involved model to simultaneously construct global correlations for spatial and spectral information. Moreover, by introducing dual-channel depth-separable convolution for spatial-spectral-feature embedding, an effective connection between multigranularity CNN and SSSA is achieved. Further, experimental results show that owing to the successful extraction of local and global features, the involved model outperforms other mainstream HSI classification models on various datasets.
The operating efficiency of an anomaly detection algorithm based on low-rank sparse decomposition decreases significantly when it is used to process a large amount of hyperspectral image data. Thus, a fast hyperspectral image anomaly detection algorithm based on orthogonal projection is proposed in this study. First, a hyperspectral image is projected onto background orthogonal subspace to improve the distinction between the background and anomalous objects, so as to separate them easily. Next, a new hyperspectral image representation model is proposed, and an algorithm for automatic target generation is introduced to construct a dictionary matrix. A high-dimensional image data matrix is mapped into a low-dimensional matrix using this model with the orthogonality of the dictionary matrix, so as to reduce the dimensionality of the data and the computational complexity of the algorithm. Proposed algorithm is tested on three real image datasets and it achieves detection accuracies of 0.9964, 0.9984, and 0.9999, respectively. In addition, the average operation time taken by proposed algorithm on the three datasets is more than 90% shorter than the shortest operation time taken by comparative algorithms. Experimental results on three datasets demonstrate that the computational efficiency of proposed algorithm is higher than those of other algorithms that ensure detection performance.
Due to their ease of access and low cost, unmanned aerial vehicle (UAV) visible remote sensing images have been widely used for the statistical analysis of agricultural resources. To obtain more representative features of UAV visible remote sensing images and achieve accurate land-cover classification, a land-cover classification algorithm based on the joint distribution of color-spatial features is proposed. First, the index of the golden rectangular patch is defined to select patches for sampling from the labeled data. Based on the golden rectangles of the selected patches, a logarithmic spiral was constructed to choose the training samples. Color feature reference points and neighborhood pixels were then applied to calculate the difference information and extract the color-space joint feature for each sample. Subsequently, the objective function of the joint feature is constructed using Jensen's inequality and fuzzy classification maximum likelihood. Next, the multidimensional mixed Weibull distribution of each sample is solved using several iterations. Finally, a similarity measure corresponding to the multidimensional mixed Weibull distribution was defined to classify each sample under analysis. Experimental results show that the overall accuracy of the proposed algorithm reaches 98.6%, which is better than that of local binary pattern, gray level cooccurrence matrix, random forest, ResNet, and VGG, proving the effectiveness of the proposed algorithm.
Satellite remote sensing technology provides higher-quality typhoon satellite cloud map data, which is a major means of determining the intensity levels of typhoons, and this technology has been widely applied in typhoon forecasting. To address the problems of selective loss of cloud features in the oblivion gate and the loss of edge information caused by the fuzzy original truncation operation of the physical prediction results, this study proposes a typhoon class prediction method based on physical constraints and cloud map generation (CPGANTyphoon). The proposed method uses convolutional networks to approximate the physical equations, optimizes the feature extraction through prior knowledge, combines with adversarial training to improve image quality, uses a joint loss function to reduce visual disparities, and finally predicts typhoon levels for the generated images. Experimental results show that the CPGANTyphoon model generates the predicted images with a structural similarity index measure score of 0.916, a peak signal-to-noise ratio (PSNR) score of 30.36, a fuzzy c-mean accuracy of 0.981, and an overall accuracy of 0.985 for typhoon level prediction. The model can accurately generate typhoon cloud maps and predict typhoon levels for future moments.
To address the issue of decreased tracking accuracy due to variation in illumination in hyperspectral target tracking tasks, a hyperspectral target tracking algorithm based on deep spectral-ternary concatenated (DSTC) features is proposed. A threshold can be initially set to segment the target from the background by utilizing the local spectral curve of the target. The spectral curve of the target can be captured by utilizing band matching, and the spectral weight curve of the target can be derived by computing the structural tensor. Subsequently, dimensionality reduction of the hyperspectral image can be accomplished by performing spectral angle distance operation between the spectral curve of the target and hyperspectral image. Hence, the deep features of the target can be extracted. The scale invariant local ternary pattern (SILTP) features are extracted from the target image. Then, the spectral weights are allocated to SILTP features, and SILTP features are integrated with spectral information to derive the spectral-ternary concatenated (STC) features. The dimension-reduced target deep features and STC features are convolved by channels to obtain more discriminative and robust DSTC fusion features. Finally, the fused DSTC features are fed into the dual correlation filter. The experimental results demonstrate that the tracking algorithm proposed in this study exhibits superior tracking performance under the challenge of illumination variations when compared to the current state-of-the-art algorithms.
In outdoor large-scene environment measurements, when laser simultaneous loca lization and mapping (SLAM) is used for three-dimensional (3D) map construction, the odometer position based on the LiDAR sensor can easily produce cumulative errors, leading to dislocation drift and even mapping failure of the 3D point cloud map, which seriously affects the accuracy and application of LiDAR SLAM 3D map construction. Hardware description language (HDL)-Graph-SLAM is a lightweight laser SLAM mapping algorithm that adds a loopback detection module in the mapping process but only takes distance as a constraint. In large scenes or corridors and other similar scenes with a single environment, the interaction between the cumulative error of the laser odometer and the single environmental feature can easily lead to an error in the closed loop, whereby the distance-based association cannot find the correct correspondence between the current point cloud frame and historical point cloud frame, leading to a dislocation drift in the point cloud map. To improve the calibration accuracy of large loop detection, this study proposes a hybrid loop calibration laser SLAM algorithm, which uses the fusion processing of two methods based on the spatial position method (distance threshold) and appearance similarity method (bag of words model), to search for and obtain candidate loop frames, which effectively improve the robustness of the loop detection algorithm. Experimental results show that compared with the simple HDL-Graph-SLAM with only distance threshold algorithm, the hybrid loopback calibration method proposed in this study significantly improves the accuracy of laser odometer pose estimation in large outdoor environments, increasing absolute trajectory estimation accuracy by 16%, thus effectively improving 3D mapping accuracy.
To obtain remote sensing images with more high-frequency information and textural detail information and solve the problems of super-resolution networks, such as complex structure, numerous parameters and large model size, this paper proposes a multiscale parameter-free attention mechanism enhanced network. First, the proposed network uses convolutional layers to extract shallow features from low-resolution remote sensing images. The shallow features are then input to the proposed multiscale parameter-free attention enhancement network, which combines parallel connection of multiple convolutional layers with different-sized convolutional kernels to refine the extraction of multiscale features. The proposed network also enhances feature information with a high contribution via the symmetric activation function to inhibit redundant information under the parameter-free attention mechanism. After six residual-connected multiscale parameter-free attention enhancement modules, the reconstruction module generates the final reconstructed image. Experimental results demonstrate that compared with the existing representative methods, the proposed network exhibits significant reconstruction advantages in terms of performance metrics and visual effects. Moreover, the peak signal-to-noise ratio and structural similarity of the proposed network outperformed those of the compared methods.
High resolution remote sensing images contain rich details and spectral information. Consequently, they have important applications in land use, building detection, land cover classification, and other ground detection scenarios. This study proposed a superpixel segmentation algorithm that combined grafting attention and detail perception to address the issues of incorrect segmentation of texture regions and loss of small targets. First, an edge-guided spatial detail module was constructed to weaken the differences in merging different levels and compensate for the loss of spatial detail information during the sampling process. Second, a grafting attention mechanism was designed to enhance local region features and consequently improve the ability to extract edges of small targets. Finally, the concept of texture aware loss was proposed for enhancing the expression of texture regions through adaptive adjustments to the texture weights of feature maps. Compared to existing mainstream superpixel segmentation algorithms, the experimental results using the proposed algorithm on remote sensing image datasets yield segmentation error and boundary recall performance indicators of 0.15% and 0.87%, respectively. This indicates an improvement in the segmentation performance of the model for texture and small target areas.
This paper proposes a remote sensing small target detection algorithm based on multimodal fusion to address the problems of high similarity between detection targets and background, inaccurate target location, and feature extraction challenges. The feature extraction component utilizes multimodal fusion to extract shared and specific information across different modalities, complementing the information between different modes, and enhancing the model's information extraction capabilities. In the feature fusion component a receptive field spatial attention convolution is implemented to accurately perceive the spatial positions within the feature map and prioritize the importance of each feature in the receptive field. For the prediction component the Shape-intersection over union border regression loss function is used. This function considers not only the geometric relationship between the ground truth and the prediction boxes but also the inherent characteristics of the bounding box to enhance the regression accuracy. Experimental evaluations on the VEDAI and NWPU datasets demonstrate that the enhanced algorithm achieves mean average precisions of 72.83% and 93.5%, respectively, surpassing the baseline model by 8.40 percentage points and 2.7 percentage points. Compared to other advanced algorithms, the proposed algorithm effectively reduces both the false detection and missed detection rates.
Perception is an important technology in the field of autonomous driving, primarily relying on radar and cameras. To perceive accurate environmental information, it is necessary to accurately calibrate the external parameters of sensors. To solve the problem of calibration failure due to the inability in updating the calibration in a timely manner and car bumps, this study proposes an untargeted extrinsic calibration method suitable for urban streets. Buildings, vehicles, and road markings were selected as feature objects, and point clouds and image feature points were extracted. Based on the initial extrinsic, a random search algorithm was used to match the point cloud and image, and the optimal extrinsic was obtained based on the best matching result. Considering the KITTI dataset as an example, the feasibility and effectiveness of the method were verified through various experiments. The experimental results indicate that for rotational disturbances under 3°, the mean translation error remains under 0.095 m and the mean rotation error remains under 0.32°, indicating high accuracy. Compared with the CRLF method based on line features, the proposed method reduces the translation and rotation errors by 0.1 m and 0.55°, respectively, when only perturbing the rotation amount. Thus, the method is applicable to most urban street autonomous driving scenarios and exhibits good accuracy.
To address the limitations of conventional methods for monitoring landslide in complex terrains and vegetation-covered areas, this paper proposes an enhanced display and recognition method for landslide hazard based on airborne LiDAR technology. In particular, the construction of a high-precision digital elevation model is adopted in conjunction with various terrain-visualization techniques, such as mountain shadows, slope analysis, red stereo maps, and sky field of view factors. The support vector machine (SVM) model is used to classify fused images and identify landslide-susceptible areas. Experimental results show that this method effectively identifies and enhances the display of landslide hazard areas, and that its accuracy in identifying landslide-susceptible areas based on the SVM is 83.86%. The proposed method not only enhances the visualization of landslide hazard areas but also improves the accuracy in identifying landslide-susceptible areas, thus providing effective technical support for landslide disaster prevention and emergency response.
Aiming at the problems of large trajectory error and low efficiency of visual simultaneous localization and mapping (SLAM) algorithms in low light environments, a visual SLAM algorithm based on oriented fast and rotated brief (ORB)-SLAM2 is proposed by fusing an image brightness enhancement module and inertial measurement unit (IMU) information. Aiming to improve the number of key frames generated by the algorithm in low-light environments, a Gamma correction factor that can adaptively change according to image brightness is designed to extract the feature points after adaptively adjusting the brightness of the low-light image selected by the brightness threshold. The extracted feature points are tracked using Lucas-Kanade (LK) optical flow, to estimate the initial pose, which is further optimized by using vision and IMU information to improve the operation efficiency and robustness of the algorithm. Experiments are conducted on public datasets and Bingda robot operating system (ROS). The results show that compared with ORB-SLAM2, the average absolute trajectory error, average relative pose error, and average tracking time per frame of the improved algorithm are reduced by 35%, 25%, and 24%, respectively, which proves that the accuracy and efficiency of the algorithm presented in this paper are higher, and the algorithm has good practical value for applications in low light environments.
Sound-field measurement technology is an important means of studying and evaluating acoustic phenomena as it clarifies the distribution and propagation laws of sound in space. By measuring various parameters such as sound pressure in the sound field, researchers can study the propagation characteristics and quality of sound as well as the interaction of sound waves in a specific environment. Compared to traditional sound-field measurement technology, digital holographic sound-field measurement technology converts the pixel points of a camera into high-precision sensors, exploits the interference principle to record the hologram created by the sound field, and digitally recovers the phase information of the light field modulated by the sound field. These processes quantitatively reconstruct an image of the sound field using acousto-optic effects. The real propagation characteristics of sound waves in the medium can be noninvasively viewed as a video with a full field of view and high spatial resolution. This review focuses on the basic principles of digital holographic sound-field measurement technology, conventional measurement methods, sound-field phase reconstruction methods, and the application prospects of digital holography technology in sound-field measurements.
Doppler wind LiDAR has the advantages of low blind spots, high spatiotemporal resolution, and flexible scanning methods. Recently, it has been widely used for the detection of atmospheric wind fields. Especially in the field of aviation meteorology, LiDAR is applied to detect and warn of low-altitude dangerous wind fields at airports. First, a systematic review of the basic principles of Doppler LiDAR and its current research status domestically and internationally is presented. Second, the application methods and progress of LiDAR in detecting airport wind shear, turbulence, and aircraft wake are introduced. Finally, the challenges and future development directions of Doppler LiDAR for applications in the aviation field are summarized.