A turbulent fuzzy target restoration algorithm with a nonconvex regularization constraint is proposed to address degradation issues, such as low signal-to-noise ratio, blurring, and geometric distortion, in target images caused by atmospheric turbulence and light scattering in long-range optoelectronic detection systems. First, we utilized latent low-rank spatial decomposition (LatLRSD) to obtain the target low-rank components, texture components, and high-frequency noise components. Next, two structural components were obtained by denoising the LatLRSD model; these were weighted and reconstructed in the wavelet transform domain, and nonconvex regularization constraints were added to the constructed target reconstruction function to improve the reconstruction blur and scale sensitivity problems caused by the traditional lp norm (p=0,1,2) as a constraint term. The results of a target restoration experiment in long-distance turbulent imaging scenes show that compared with traditional algorithms, the proposed algorithm can effectively remove turbulent target blur and noise; the average signal-to-noise ratio of the restored target is improved by about 9 dB. Further, the proposed algorithm is suitable for multiframe or single-frame turbulent blur target restoration scenes.
This paper proposes a multiplexed fusion deep aggregate learning algorithm for underwater image enhancement. First, the image preprocessing algorithm is used to obtain the image attribute information of three branches (contrast, brightness, and colour) respectively. Then, the image attribute dependency module is designed to obtain fusion features of multiplexed using a fusion network, and then explore the potential fused image attribute correlations through parallel graph convolution. A self-attention deep aggregate learning module is introduced to deeply mine the interaction information between the private and public domains of the multiplexed using sequential self-attention and global attribute iteration mechanisms, and also effectively extract and integrate the important information between image attributes by means of aggregation bottlenecks to achieve more accurate feature representation. Finally, skip connections are introduced to continue enhancing the image output to further improve the effect of image enhancement. Numerous experiments have demonstrated that the proposed method can effectively remove colour bias and blurring, and improve image clarity, as well as facilitate underwater image segmentation and key point detection tasks. The peak signal-to-noise ratio and structural similarity metrics can reach the highest values of 23.01 dB and 0.90, which are improved by 5.0% and 4.7% compared with the suboptimal method, while the underwater colour image quality metrics and information entropy metrics have the highest values of 0.93 and 14.33, which are improved by 2.2% and 0.5% compared with the suboptimal method.
An U-shaped dual-energy computed tomography (DECT) material decomposition network, called DM-Unet, that combines a selective state spaces model Mamba and efficiency channel attention module is proposed in this paper. The network uses a visual state space module that introduces a channel attention mechanism to capture feature information, adjusts the weights of different levels for feature information in a block through adjustable parametric residual connections, and reduces the gradient explosion and the loss of organizational details through residual connections between the encoder and decoder. Experimental results show that the root mean square error of the base matter image obtained by DM-Unet is as low as 0.041 g/cm3, the structural similarity reaches 0.9981, and the peak signal-to-noise ratio can reach 36.54 dB. Compared with traditional decomposition methods, DM-Unet shows better ability to restore organizational details, noise suppression, and edge information restoration, and is able to fulfill the task of DECT decomposition, which can provide accurate references for the subsequent medical diagnostic work.
In order to improve the consistency of metrics between the objective assessment and the human subjective evaluation of stereo image quality, inspired by the top-down mechanism of human vision, this paper proposes a stereo attention-based no-reference stereo image quality assessment method. In the proposed stereo attention module. First, the amplitude of binocular response is adaptively adjusted by the energy coefficient in the proposed binocular fusion module, and the binocular features are processed simultaneously in the spatial and channel dimensions. Second, the proposed binocular modulation module realizes the top-down modulation of the high-level binocular information to the low-level bino- and monocular information simultaneously. In addition, the dual-pooling strategy proposed in this paper processes the binocular fusion map and binocular difference map to obtain the critical information that is more conducive to quality score regression. The performance of the proposed method is validated based on the publicly available LIVE 3D and WIVC 3D databases. The experimental results show that the proposed method achieves high consistency between objective assessment indices and labels.
This study develops a lightweight roadside object detection algorithm called MQ-YOLO. The algorithm is based on multiscale sequence fusion. It addresses the challenges of low detection accuracy for small and occluded targets and the large number of model parameters in urban traffic roadside object detection tasks. We design a D-C2f module based on multi-branch feature extraction to enhance feature representation while maintaining speed. To strengthen the integration of information from multiscale sequences and enhance feature extraction for small targets, the plural-scale sequence fusion (PSF) module is designed to reconstruct the feature fusion layer. Multiple attention mechanisms are incorporated into the detection head for greater focus on the salient semantic information of occluded targets. To enhance the detection performance of the model, a loss function based on the normalized Wasserstein distance is introduced. Experimental results on the DAIR-V2X-I dataset demonstrate that MQ-YOLO achieves improved mAP@50 and mAP@(50?95) by 3.9 percentage point and 6.0 percentage point compared to the valuses obtained with baseline YOLOv8n with 3.96 Mbit parameters. Experiments on the DAIR-V2X-SPD-I dataset show that the model has good generalizability. During roadside deployment, the model reaches detection speeds of 62.5 frame/s, meeting current roadside object detection requirement for edge deployment in urban traffic.
Aiming at the problem that the traditonal cloth simulation filtering (CSF) algorithm cannot distinguish the local microtopography of pavement damage, which leads to the wrong detection and omission of pothole damages, an adaptive descend distance CSF algorithm for pavement pothole extraction is proposed. First, the proposed algorithm preprocesses and denoises the point cloud of the road to obtain the pavement point cloud. Second, by improving the displacement distance of the"external force drop"and"internal force pull back"processes of the simulated cloth in the CSF algorithm, the adaptive distance drop of the simulated cloth is realized, and then further constructs the accurate local datum plane of the road surface and generates the depth-enhanced information model of the point cloud. Finally, depth threshold classification and Euclidean clustering algorithm are used to achieve precise detection of potholes and extract geometric attribute features of potholes. Experiments and analysis of the measured road data show that, the recall of potholes in the measured data reaches 83.3%, and the precision reaches 87.5%, the maximum relative error of area is 17.699%, and the maximum relative error of depth is 9.677%, which has a certain degree of robustness and applicability. The proposed algorithm can provide a powerful support for the work of large-scale three-dimensional pavement point cloud data for the automatic and precise detection of potholes on pavements.
Marine microorganisms are fundamental to marine ecosystems. However, underwater imaging often blurs microbial contours due to water absorption and scattering. To address this, we propose a contour segmentation method for underwater microorganisms that combines an underwater imaging model with Fourier descriptors. First, the background light and water attenuation coefficients are estimated using the underwater imaging model to extract a clear, water-free feature map of the object. Next, a classification header determines the target location, while a regression header uses Fourier descriptors to represent and refine the microorganism's contour in the pixel domain. In addition, hologram reconstruction and preprocessing steps are applied, and a microbial contour segmentation dataset is generated. Experimental results demonstrate that the Fourier descriptor outperforms the star polygon method in contour representation accuracy and spatial continuity. Compared to traditional segmentation methods, the proposed algorithm achieves an F1 score of 0.8894, intersection over union of 0.7887, and pixel accuracy of 0.8608, all improved metrics indicating superior segmentation capability.
Aiming at the problems of low detection accuracy and missed detection caused by complex contour information, large change of shape and small size contraband in X-ray images, an improved GELAN-YOLOv8 model based on YOLOv8 is proposed. First, the RepNCSPELAN module based on generalized efficient layer aggregation network (GELAN) is introduced to improve the feature extract ability for contraband. Second, the GELAN-RD module is proposed by combining deformable convolution v3 (DCNv3) and RepNCSPELAN module to adapt contraband with different postures and serious changes in size and angle. Third, the spatial pyramid pooling is improved, so that the model can pay more attention to the feature information of small target contraband. Finally, the Inner-ShapeIoU is proposed by combining inner-intersection over union (Inner-IoU) and Shape-IoU to reduce the false detection and missed detection and speed up the convergence of the model. Results on the SIXray dataset show that the mAP@0.5 of the improved algorithm are 2.8 percentage points higher than YOLOv8n, and the performance is better than YOLOv8s. The GELAN-YOLOv8 effectively realizes the real-time detection of contraband in X-ray images.
High-dynamic range (HDR) image reconstruction algorithms based on the generation of bracketed image stacks have gained popularity for their capabilities in expanding the dynamic range and adapting to complex lighting scenarios. However, existing approaches based on convolutional neural networks often suffer from local receptive fields, limiting the utilization of global information and recovery of over- or underexposed regions. To solve this problem, this study introduces a Transformer architecture that equips the network with a global receptive field to establish long-range dependency. In addition, a unidirectional soft mask is added to the Transformer to alleviate the effects of invalid information from over- and underexposed regions, further improving the reconstruction quality. Experimental results show that the proposed algorithm improves the peak signal-to-noise ratio by 2.37 dB and 1.33 dB on the VDS and HDREye datasets, respectively, and subjective comparisons further prove the effectiveness of the proposed algorithm. This study provides a novel approach for improving the information recovery capabilities of HDR image reconstruction algorithms for over- and underexposed regions.
The study on attention in attention network (A2N) in single-image super-resolution has revealed that all attention modules are not beneficial to the network. Therefore, in the design of the network, input features can be divided into attention and nonattention branches. The weights on these branches can be adaptively adjusted using dynamic attention modules based on the input features so that the network can strengthen useful features and suppress unimportant features. In practical applications, lightweight networks are suitable to be run on resource-constrained devices. Based on A2N, the number of attention in attention block (A2B) in the original network is reduced and lightweight receptive field modules are introduced to enhance the overall performance of the network. In addition, by adjusting the L1 loss to a combination loss based on Fourier transform, the spatial domain of the image is transformed into the frequency domain, enabling the network to learn the frequency characteristics of the image. The experimental results show that the improved A2N reduces parameter count by about 25%, computational complexity by about 20%, and inference speed by 15%, thereby enhancing the performance.
With an increase in the shooting depth, underwater images suffer from issues such as low brightness, color distortion, and blurred details. Therefore, an underwater low-illumination image enhancement algorithm based on an encoding and decoding structure (ULCF-Net) is designed. First, a brightness enhancement module is designed based on a half-channel Fourier transform, which enhances the response in dark regions by combining the frequency domain and spatial information. Second, cross-scale connections are introduced within the encoding and decoding structure to improve the detailed expression of underwater optical images. Finally, a dual-stream multiscale color enhancement module is designed to improve the color fusion effects across different feature levels. Experimental results on publicly available underwater low-illumination image datasets demonstrate that the proposed ULCF-Net exhibits excellent enhancement in terms of brightness, color, and details.
Fisheye cameras offer lower deployment costs than traditional cameras for detecting the same scene. However, accurately detecting distorted targets in fisheye images requires increased computational complexity. To address the challenge of achieving both accuracy and inference speed in fisheye image detection, we propose an enhanced YOLOv8m-based fisheye object detection model, which we refer to as Fisheye-YOLOv8. First, we introduce the Faster-EMA module, which integrates lightweight convolution and multiscale attention to reduce delay and complexity in feature extraction. Next, we design the RFA-BiFPN structure, incorporating a parameter-sharing mechanism to enhance the detection speed and accuracy through receptive field attention and a weighted bidirectional pyramid structure. In addition, the lightweight G-LHead detection head is introduced to minimize the number of model parameters and reduce complexity. Finally, the LAMP pruning algorithm is introduced to balance improvements in recognition accuracy with inference speed. Experimental results demonstrate that Fisheye-YOLOv8 achieves mean average precision values of 60.5% and 59.7% on the Fisheye8K and WoodScape datasets, respectively, which is an increase of 2.2 and 1.2 percentage points compared to YOLOv8m. Moreover, the proposed model's parameter and computational complexity are only 20.5% and 29.7% of those of YOLOv8m, respectively, with a detection speed of 118 frames/s. The proposed model meets real-time requirements and is better suited for fisheye camera deployment than the other models.
To solve the problems of the low-accuracy detection or inaccurate classification of small target defects in solar cell panel defect detection, an improved lightweight YOLOv5s solar cell panel defect detection model suitable for small target detection is proposed in this study. First, an SiLU activation function is used to replace the original activation function to optimize the convergence speed and enhance the generalization ability of the model. Second, the C3TR and convolution block attention modules are used to re-optimize the backbone feature sampling structure to improve the recognition ability for different defect types, especially small target defects. Third, the content-aware re-assembly of features is realized in the feature extraction network to improve the detection accuracy and detection speed without increasing the model weight. Finally, a dynamic nonmonotonic loss function WIoUv3 is added to the dynamic matching prediction box and real frame to enhance the robustness of small target datasets and noise. Experimental results show that the mean average precision (mAP@0.5) of the proposed model is 95.9% and that its classification accuracies for large-area cracks and star-shaped scratches reach 98.0% and detection speed reaches 75.133 frame/s, demonstrating its lightweight nature and rapidness that meet the requirements of industrial production.
To address the scarcity of point cloud datasets in foggy weather, an optical model-based foggy weather point cloud rendering method is proposed. First, a mathematical relationship is established between the LiDAR impulse responses during good weather and foggy weather in the same scene. Second, an algorithm is designed using laser attenuation in a foggy weather, and the visibility of the rendered point cloud is set by modifying the attenuation coefficient, backscattering coefficient, and differential reflectance of the target in the algorithm to obtain the rendered point cloud of the foggy weather under the set visibility. Experiments reveal that the proposed method effectively renders foggy weather point cloud with a visibility within 50?100 m, and that the method shows stable results. Compared with the real foggy weather point clouds, the average values of KL (Kullback-Leibler) dispersion of the rendered point clouds are less than 6, the average values of the percentage of Hausdorff distance less than 0.5 m are not less than 85%, and the average values of the mean square error distance are less than 8, proving the feasibility of proposed method. Therefore, the proposed method can render foggy weather point cloud under good weather and overcomes the lack of foggy weather point cloud datasets and visibility data.
Light-sheet fluorescence microscopy imaging systems are extensively used for imaging large-volume biological samples. However, as the field of view of the optical system expands, imaging will exhibit spatially uneven degradation throughout the entire field of view. Conventional model-driven and deep learning approaches exhibit spatial invariance, making it challenging to directly address this degradation. A position-dependent model-driven deconvolution network is developed by introducing positional information into the model-driven deconvolution network, which is achieved by randomly selecting training image pairs with different degradation patterns during training and using block-based reconstruction techniques during image restoration. The experimental results reveal that the network facilitates rapid deconvolution of large field-of-view optical images, thereby considerably enhancing image processing efficiency, image quality, and the uniformity of image quality within the field of view.
The particle size and distribution characteristics of exhaled aerosol particle fields are related to the health status. This study proposes a diagnostic method for exhaled aerosol particle fields using coaxial digital holography. A connected domain method was employed to extract particle sizes from the reconstructed images of exhaled aerosol particle field holograms, determine particle size distributions, and calculate statistical parameters (D10, D32, D43, and DN50). The method was tested on 14 healthy volunteers and 10 infected patients to obtain the distribution and average values of two sets of particle size statistical parameters. The detection accuracy within the distribution range of particle size statistical parameters was calculated, and benchmark points for each statistical parameter were obtained. Using the four benchmark points and average particle size statistical parameters as indicators, six test volunteers were tested. By minimizing the accuracy and coefficient of variation, the Sauter mean diameter is identified as the optimal criterion for assessing the health status, achieving a detection accuracy rate of over 80%. This study introduces an optical method for the preliminary diagnosis of influenza through the analysis of exhaled aerosol particle fields.
The traditional Poisson surface reconstruction algorithm often generates pseudo-closed surfaces along the internal and external edges, resulting in a deviation of the reconstruction results. To address this issue, an improved method based on boundary region growth is proposed. First, based on the Poisson surface, the object boundary points are extracted by normal lines and the continuous boundary curves are fitted. Then, vector cross multiplication is utilized to screen the neighboring points of the boundary and extract the boundary features. Finally, a double-ray method is employed to extract initial seed points for the boundary inner surface, and the Poisson surface is segmented using the region growth method, which is constrained by the boundary characteristics. Experimental results show that the proposed method can effectively eliminate pseudo-closed surfaces in the model and enhance the accuracy and integrity of reconstructed surfaces. This method has good adaptability and performance stability with different types of objects.
To address the challenges of difficult fingerprint extraction in situ and the risk of damaging fingerprints from improper handling, this study proposes a new method using an aggregation-induced emission (AIE) reagent (C27H19N3SO) combination with a microgrid atomization technique based on the time-resolved imaging principles. This study analyzes the imaging effectiveness and influence of background colors on fingerprint fluorescence extraction. The extraction quality is evaluated by analyzing the gray values and the number of identified secondary fingerprint features. Experimental results indicate that the AIE reagent in situ aggregation-induced activation offers remarkable advantages for latent fingerprint imaging. Unlike traditional chemical reagent development methods, proposed method improves the contrast between fingerprints and their backgrounds, making it both efficient and nondestructive. The proposed method can reliably extract over 100 consecutive oily latent fingerprints. In addition, this study defines the applicable scope of the AIE reagent, enriches the methodological framework for latent fingerprint development, and holds practical value with promising potential for widespread application.
This paper proposes a two-mirror telescope alignment method based on the on-axis field of view coma and off-axis symmetrical field of view astigmatism, utilizing vector aberration theory based on the aberration field characteristics of a misaligned Ritchey-Chrétien (R-C) telescope. A simulation alignment experiment is conducted on an R-C telescope, with random misalignments introduced into the secondary mirror. The alignment is successfully completed after three iterations using the proposed method, demonstrating its feasibility. An actual assembly experiment further validates the proposed method, achieving wave aberration values of 0.0730λ for the on-axis field of view and 0.0808λ and 0.0834λ for the two symmetrical off-axis fields of view, respectively. Experimental results indicate that the proposed method effectively aligns the R-C telescope, ensuring high imaging quality across all fields of view.
This article proposes a novel three-tap time of flight (ToF) modulation and demodulation scheme, which can effectively reduce the power consumption of multiple-phase data acquisition, transmission, and calculation for depth computing without losing ranging accuracy. Our three-tap rotation strategy also resolves the inconsistent gain and large residual distance calibration accuracy caused by CMOS technology. We also propose an inter-cluster pseudo-random coding design that effectively reduces the mutual interference among multiple ToF cameras working in the same scene. The proposed combination strategy can potentially advance the low-power miniaturization of ToF three-dimensional perception imaging technology.
A miniaturized, low-cost, and portable cell microscopic image acquisition system designed for wireless transmission has been developed, using Unigraphics NX software and three-dimensional printing for system structure to achieve cost efficiency and lightweight. The system measures 13 cm×5 cm×20 cm and weighs ~1.5 kg. An intelligent microimage acquisition software platform is built using functions from the OpenCV library. Testing results demonstrate that the proposed system successfully transmits microscopic images wirelessly via WiFi, performs cell counting and cell migration, and meets the requirements of standard cell observation experiments. The resolution of system is 1.5 μm, which is suitable for both teaching and scientific research applications.
To overcome the limitations of current multicamera 3D object detection methods, which often struggle to balance precision and computational speed, we propose an enhanced version of DETR3D. The algorithm framework is based on the encoder-decoder architecture of DETR3D. We incorporate a 3D position encoder alongside the image feature extraction branch to enhance image features. Object queries are initialized with two components, representing the object's bounding box and instance features. In the decoder stage, we introduce a multiscale adaptive attention mechanism based on Euclidean distance, allowing the algorithm to effectively capture multiscale information in 3D space, which significantly improves detection performance for complex and diverse objects in autonomous driving scenarios. During feature sampling, we integrate temporal information to align features across consecutive frames, improving detection accuracy. Additionally, multipoint sampling is employed to strengthen the robustness of the sampling process. Experiments conducted on the nuScenes dataset indicate that compared to the baseline algorithm, our proposed approach achieves a 17.1% improvement in detection accuracy and a 4.22-fold increase in computational speed. Moreover, it proves effective in detecting objects even in occluded environments.
We propose a novel method and develop a device for the three-dimensional measurement of light-field polarization vectors. The device can analyze the polarization state at each point in the light field and measure the wavefront distribution using a polarization analysis system comprising a polarizer, waveplate, and Shack-Hartmann wavefront sensor, thereby enabling the reconstruction of a three-dimensional polarization-vector distribution. Using a polarization modulation device and spatial-light modulator, we experimentally generate laser beams with varying polarization states and wavefronts, which are subsequently tested under four light-field conditions. Results show the device performs excellently, with the root-mean-square (RMS) error of the Stokes vector below 0.05, and the RMS and peak-to-valley (PV) values of wavefront error below 0.1 μm. This method effectively overcomes the limitations of conventional two-dimensional detections. It accurately restores three-dimensional polarization information as well as provides high-spatial-resolution and precise polarization-vector measurements, thus offering an effective option for optical measurement with broad application prospects.
To address the challenges of difficulty in capturing small defects in complex backgrounds and a large number of model parameters in insulator defect detection from aerial images, we proposed an UAV insulator defect detection method based on improved GCP-YOLOv8s. First, GSConv was incorporated into the network to replace conventional convolutions, reducing the model's parameter count. Second, the Bottleneck module in C2f was replaced by the FasterNet Block module, creating a lightweight C2f-Faster module that further minimized model size. To improve the network's feature extraction capability, the efficient multi-scale attention (EMA) was integrated into the C2f-Faster forward network, forming the CF-EMA lightweight feature extraction module, which effectively addressing the challenge of extracting small defect features in complex backgrounds. Finally, to prevent the loss of minor defect feature information, additional minor defect detection layers were added to improve the fusion of shallow and deep feature maps, enhancing the detection accuracy for small defects. The experimental results demonstrate that GCP-YOLOv8s achieves an mAP@0.5 of 97.6%, marking an improvement of 1.8 percentage points over YOLOv8s, with a parameter count of only 7.2×106, representing a 36.3% reduction compared to YOLOv8s. The proposed method demonstrates an effective balance between detection accuracy and model lightweight.
To address the issue of handwriting alteration in the field of court scientific document examination, we employ reflection transformation imaging (RTI) technique to capture the three-dimensional characteristics of cross-strokes in handwritten characters. The proposed method allows us to examine cross-stroke sequences from the same type of ink across multiple modes. First, we create three types of samples: cross-crossing, tilt-crossing, and point-crossing, dividing each into two categories based on stroke order: “first horizontal then vertical” and “first vertical then horizontal”, with 50 samples in each category. Second, 1200 samples are imaged and analyzed using the RTI technique to evaluate the detection rates of various samples under different RTI modes. Analysis of variance (ANOVA) is then performed on the detection rates in normal visualization mode. Preliminary experimental results show that the ballpoint pens and neutral pens exhibit the best test effectiveness in cross-crossing and tilted-crossing samples, with detection rates of up to 95%. Show pens and fiber pens also perform well in these categories, with detection rates of 85% and 65% respectively. But their detection effect decreases in point-crossing samples, with a detection rate of only 70% for ballpoint pens and neutral pens, and 25% for Show pens and fiber pens. ANOVA reveals notable differences in detection rates between sample types in normal visualization mode. RTI technology offers new opportunities for enhancing document inspection methods and shows promising potential for future development in this field.
The wear of a wheel flange and the swinging of a sintering machine trolley may cause the trolley to derail, causing safety production accidents. Currently, the detection method significantly depends on manual inspection, which has drawbacks, such as poor working conditions, low detection efficiency, and the potential for missed detections and false alarms. In this study, a platform for detecting the flange thickness and swing angle of sintering machine trolley wheels was established. Two laser projection lines of the wheel flange profile are formed on the surface of the trolley wheels using a dual-line laser. The collected laser stripe lines of the wheel flange profile undergo image processing, including median filtering, morphological operations, skeleton extraction, and connected component analysis. The system extracts the laser stripe centerline and critical measurement points at both ends of the wheel flange. Based on the geometric relationship between the rail, wheel flange, and laser stripe lines, the system detects the wheel flange thickness and swing angle. The experimental results demonstrate that the detection method can achieve real-time detection of the flange thickness and swing angle. The relative error of the flange thickness measurement is 0.31%, and the absolute error is approximately 1 mm. The relative error of the swing angle measurement is 0.56%, and the absolute error is approximately 1°. These values satisfy the production requirements of the sintering process and provides technical support for the safe and stable operation of sintering machine trolley.
When images of surface defects on tubular vessels are acquired, the images are prone to change due to environmentally variable factors, resulting in inconsistency between the collected image features and the algorithm’s training image features. To solve this problem of degradation in detection accuracy, in this study, an unsupervised domain adaptive surface defect detection algorithm is proposed. First, the convolutional neural network extracts the labeled data in the source domain and unlabeled data in the target domain. Second, the strategy of adversarial training of domain classifiers is used to align image-level features and instance-level features. To fully utilize the correlation of feature maps at different scales, an improved channel-attention fusion domain classifier is proposed to enhance the discriminative ability of the domain classifiers. Finally, the results of the corresponding domain classifiers are strongly matched to ensure that the network detection results are independent of input data source. Specifically, the detection is conducted under the condition of the generated domain adaptive domain invariant to enhance the detection accuracy. The experimental results show that the detection accuracy of the algorithm model is improved from 83.1% to 93.4%, which significantly reduces the phenomenon of wrong detection and missed detection, and the algorithm is more adaptable to the variable environment of the actual production.
To address the challenges posed by high dynamic background interference, multi-scale target detection, and potential feature loss in sea surface target detection tasks during maritime search and rescue operations, based on the YOLOv8 model, this study proposes an improved TS-YOLOv8 network based on the ideas of feature augmentation and adaptive feature fusion. First, a Transformer-based feature fusion (TFF) module is introbuced based on the Transformer's query mechanism. This module facilitates feature augmentation across various scales by enabling depth information interaction among different feature layers. Second, employing learnable parameters, the network adaptively fuses features from each layer. Third, this paper integrates an almost parameter-free Shuffle Attention mechanism to capture intricate feature details while ensuring network efficiency. Comparison experiments with a variety of mainstream detection algorithms and multiple sets of ablation experiments are carried out on the AFO dataset, the mAP50 of the proposed method reaches 95.14%. Compared with the baseline model, the mAP50 is increased by 5.60 percentage points, the mAP95 is increased by 7.38 percentage points, and the FPS reaches 110 frames/s. Multiple sets of ablation experiments are carried out on the SeaDronesSee dataset, the mAP50 of the proposed method reaches 91.34%. Compared with the baseline model, the mAP50 is increased by 4.47 percentage points, the mAP95 is increased by 5.92 percentage points, and the FPS reaches 106 frames/s. Results indicate that proposed model can fully meet the demanding requirements of maritime search and rescue missions.
Aiming at the problems that the traditional 2D laser simultaneous localization and mapping (SLAM) map optimization algorithm in indoor environments is not obvious in point cloud feature extraction when positioning and constructing maps, the front-end accuracy is not high and prone to error accumulation, and the back-end is prone to error loopbacks, we propose a 2D laser SLAM algorithm based on the improvement of the map optimization in complex environments. First, covariance analysis is applied to obtain the point cloud plane change factor to adaptively extract the local neighborhood feature points. Second, the inertial measurement unit (IMU) pre-integration is used in the front-end to provide the initial value for scanning matching. Then the relationship between the eigenvalues of the magnitude of the attitude covariance matrix obtained by scanning matching method is analyzed to determine the robot position in degraded environments and to reduce the problem of localization error of scanning matching. Finally, a two-stage filtering method is used in the loopback detection part to introduce the maximum loopback-compatible subset method to choose the correct closed-loop loop, and mileage checking is performed to eliminate the local neighborhood eigenpoints of SLAM. And the mileage check is performed to eliminate the effect generated by the wrong closed loop in SLAM. The results are validated in real scenarios using a differential wheeled automated guided vehicle (AGV), and show that the front-end pose estimation is highly accurate compared to Hector-SLAM and Cartographer algorithms, and the loop constraints are found accurately in large scenarios where loopback detection is required. The relative error is only about 0.21% compared with the real scene. The results of the study have certain theoretical and engineering significance for improving the accuracy of 2D laser SLAM map construction.
The accurate extraction of the laser stripe centerline in the three-dimensional measurement system of the icing model is the key to realize the high-precision three-dimensional measurement of the icing model. Aiming at the problem of poor centerline extraction accuracy due to the serious light penetration of the laser stripe on the surface of the ice body, this paper proposes a laser stripe centerline extraction method based on the normal guidance on the surface of the ice body. First, obtain the contour of the laser stripe and the interfering spot and exclude the interfering spot based on the shape of the contour, and at the same time, combine the improved boundary threshold tracking algorithm with the grayscale center of gravity method to coarsely locate the laser centerline. Then, use principal component analysis (PCA) to obtain the normal of the laser centerline obtained from the coarsely located laser centerline. Finally, bi-linear interpolation is performed in the normal direction and combined with the grayscale center of gravity method to obtain the high-precision sub-pixel center point coordinates. The experimental results show that the measurement accuracy of the proposed laser centerline extraction algorithm is 0.035 mm, which is 5.487 times higher than the Steger algorithm, and the extraction speed is 6.578 times higher.
A cloth simulation filtering algorithm based on an adaptive local filtering threshold was proposed to address large rejection errors in the ground and nonground point filtering results of airborne LiDAR point cloud data corresponding to suburban terrain environments using traditional cloth simulation filtering. First, the classic cloth simulation algorithm was used to extract the initial detected ground points and perform interpolation fitting to obtain a rough terrain surface. Then, combined with the adaptive filtering threshold calculation method based on local slope change rate, the filtering threshold of each point was automatically derived. This enabled determining the height difference between each point and the corresponding elevation of the fitting surface for efficient point cloud filtering. Experimental results show that the proposed algorithm can effectively improve the accuracy of ground point extraction compared with traditional cloth simulation filtering and accurately extract ground point clouds in large-scale complex environments such as suburban areas.
To enhance the accuracy of traditional local stereo matching algorithms in weak-texture regions and address the limitations of the AD-Census transform in adapting to local region features during cost fusion, an adaptive stereo matching algorithm based on local information entropy and an improved AD-Census transform is proposed. In the cost calculation stage, the local information entropy of the input image is first computed. Then, based on the entropy of the pixel neighborhood, an adaptive window size is selected to refine the Census transform. Next, an adaptive fusion weight is determined from the local information entropy to combine the improved Census and AD costs. In the cost aggregation stage, a unidirectional dynamic programming aggregation algorithm is introduced. After disparity computation and optimization, the final disparity map is produced. The algorithm is evaluated on the Middlebury platform using standard test images. Experimental results indicate that the proposed algorithm achieves an average mismatch rate of 5.94% in non-occluded areas and 8.37% across all areas, outperforming many existing algorithms in terms of matching quality and robustness to noise.
To address the problems related to target detection of electric tricycles in road traffic management in China and the shortcomings of current detection models in small target detection and real-time performance, this study proposes a detection method based on an improved YOLOv5s model. The original YOLOv5s model is first improved by adding a small object detection head and by introducing a Transformer structure that combines an efficient additive attention mechanism, and then a dataset based on urban road scenes is built. The model is improved in terms of accuracy, recall, and mean average precision (mAP@0.5) by 0.67%, 2.68%, and 5.78%, respectively. The model also achieves a frame rate of 92 frame/s and demonstrates good processing capabilities, thus meeting the real-time detection requirements for actual road traffic situations.
To address the problem of overall contour loss and missing key features in the simplified point-cloud data of steel billets, a KD-tree-guided surface-curvature-driven steel-billet point-cloud (KDSCP) simplification algorithm is proposed. First, a discrete topological relationship between points is constructed based on the KD-tree for k nearest neighbor queries. Second, the point-cloud areas are partitioned based on curvature feature thresholds. Finally, the partitioning of the steel-billet point-cloud data is simplified using adjustable simplification-rate random sampling and centroid nearest-neighbor point-simplification methods. KDSCP is compared with random sampling and improved curvature-sampling methods. The results show that KDSCP not only better preserves the main contour of the steel-billet point cloud but also achieves 17.97% and 28.70% improvements in feature-point retention, 0.4494 dB and 1.9879 dB increases in the PSNR, and 3.8791 and 2.5540 mm reductions in the Hausdorff distance at a simplification rate of 55%, respectively. The proposed KDSCP point-cloud simplification algorithm can significantly simplify steel-billet point-cloud data while maintaining complete contour and key feature information, thus benefitting the real-time processing of steel-billet point clouds.
The traditional point pair features (PPF) algorithm lacks sufficient point cloud matching accuracy in precision industrial production and robustness to planar point clouds. To address these issues, this study proposes a novel point regions features (PRF) registration method. In this method, PRF point domain features enhance matching by incorporating the feature complexity and average direction of target point pairs within their respective neighborhoods as complementary features. The algorithm utilizes the complexity of different point domains as a weighted criterion for feature matching, conducting a weighted voting process. The point cloud is then obtained in the real working scene. Experimental results from common point cloud matching experiments in real-world scenarios show that the proposed PRF registration algorithm significantly improves point cloud accuracy and robustness with minimal impact on speed.
A high-real-time, high-quality infrared image equalization algorithm has been developed to address the frequent brightness or darkness in captured images caused by the image's large dynamic range and high-speed motion of the targets and cameras during target tracking with infrared cameras. The proposed algorithm adjusts the contrast between the image subject and background by optimizing the exposure parameters of the infrared camera. First, the image is partitioned by analyzing brightness variations in adjacent areas. Then, non-continuous area retrieval is performed to identify the image subject, which is weighted to calculate the image brightness. To support camera miniaturization, the interpolation method is combined with the lookup table method to iteratively adjust the exposure parameters based on the image's average brightness. This algorithm ensures image subject clarity, speed of infrared camera exposure adjustment, and eliminates the need for additional storage units. Infrared camera testing demonstrates that the proposed algorithm achieves infrared image equalization at an average speed of 3.2 frames per second with an average exposure error of 5.15%, highlighting its practical application value.
Regarding the issues in four-dimensional (4D) millimeter-wave radar simultaneous localization and mapping (SLAM), such as sparse point clouds, instability, and high noise levels, which result in difficulties in point-cloud matching and large cumulative errors, a 4D millimeter-wave radar SLAM algorithm based on local frame fusion is proposed for the 4DRadarSLAM algorithm framework. First, the ego-velocity was estimated to remove noise points and local frame fusion was performed using the pose-transformation relationship of consecutive frames to address sparse point clouds. Subsequently, secondary scan matching was implemented on single and fusion frames to optimize the pose and improve the positioning accuracy of the odometer. Second, the intensity information of radar point clouds was used to construct a scan context descriptor. Combining the average relative error and point-cloud-distribution error to calculate the similarity yields the closed-loop constraint, which effectively reduces the cumulative error. Finally, the odometer factor and closed-loop factor were combined to construct a factor graph to optimize the global pose. Testing and verification results on two types of public datasets from Nanyang Technological University and Shanghai Jiao Tong University show that compared with the 4DRadarSLAM algorithm, the proposed algorithm offers higher accuracy and environmental adaptability, thus providing a new solution for 4D millimeter-wave radar SLAM construction.
The problems of breast tumor segmentation from ultrasound images, such as low contrast between the tumor and the normal tissue, blurred boundaries, complex shapes and positions of tumors, and high noise in images, are a concern for researchers. This paper presents a hierarchical transformer with a multiscale parallel aggregation network for breast tumor segmentation. The encoder uses MiT-B2 to establish long-range dependencies and effectively extract features at different resolutions. At the skip connection between the encoder and the decoder, a cascaded module incorporating a multi-scale receptive field block and shuffle attention (SA) mechanism is constructed. receptive field block is used to capture multi-scale local information of the tumor, addressing the problem of high similarity between the lesion and surrounding normal tissue. The SA mechanism accurately identifies and localizes tumors while suppressing noise interference. In the decoder, an aggregation module is constructed to progressively fuse features from parallel branches to enhance segmentation accuracy. The experimental results on the BUSI dataset show that, compared to TransFuse, the proposed model achieves improvements of 3.21% and 3.19% in the Dice and intersection over union metrics, respectively. The model also shows excellent results for the other two datasets.
To enhance the accuracy and credibility of existing polyp segmentation methods, this study proposes a reliable segmentation technique that uses local channel attention. First, the improved pyramid vision transformer is employed to extract polyp region features, thereby addressing the insufficient feature extraction capabilities of traditional convolutional neural networks. In addition, a local channel attention mechanism is applied to fuse cascade features, and the edge detail information is gradually recovered to enhance the overall representational capability of the model while ensuring accurate polyp localization. Finally, a trusted polyp segmentation model is developed based on subjective logic evidence to derive the probability and uncertainty of the polyp segmentation problem, and a plausibility measure is applied to the segmentation results. Extensive experiments demonstrate that the proposed approach outperforms state-of-the-art polyp segmentation techniques in terms of accuracy, robustness, and generalization, leading to more reliable polyp segmentation results.
Automated segmentation of retinal vessels is crucial for the auxiliary diagnosis and treatment of various ophthalmic diseases. To address the challenges of vessel loss, vessel breakage, and background miss-segmentation into vessels in retinal vessel segmentation, we propose a method that combines multi-directional stripe convolution and pyramid dual pooling. First, the four-direction stripe convolution module is used to enhance vessel feature extraction, where four directions refer to horizontal, vertical, antidiagonal, and main diagonal directions. Second, the pyramid dual pooling feature fusion module extracts features via average pooling and max pooling at multiple scales. Then, these obtained multi-scale features are fused to make the model understand and utilize local detail and global context more comprehensively. Finally, the incorporation of a channel-space dual attention module into the skip connection improves the focus of the model on critical features. Experimental results on the CHASE-DB1 and DRIVE datasets demonstrate that the proposed method outperforms existing mainstream segmentation methods in terms of the area under the receiver operating characteristic curve and accuracy evaluation metrics, indicating its potential to assist in the clinical diagnosis of relevant ophthalmic diseases.
Fusion super-resolution reconstruction methods of red-green-blue (RGB) images and hyperspectral images have shortcomings such as inadequate utilization of the pixel structural and spectral similarities of images, intrinsic space loss during image scaling, and loss of spectral information and contextual relationships. To address these issues, we propose the SparseVAFormer graph convolutional model. We first utilize composite graph convolution to capture local detail features by leveraging the spatial relationships of pixels and then modeled the correlation between different spectra, thereby enabling full exploration of the high-dimensional characteristics and non-Euclidean structural information of images. We then construct the VAFormer module to map the data to a low-dimensional latent space to capture the core features of images. Through self-attention mechanisms, the model considers all pixels in the entire image when computing the representation of each pixel, thereby capturing complex long-distance spatial and spectral dependency relationships between pixels. This process enables the model to simulate the spectral reflection characteristics of real hyperspectral images. Finally, we design a multi-scale mixed convolution module to strengthen the flow of differential information between different levels and channels, thereby assisting the model in capturing complex features ranging from subtle textures to large-scale structures. Experimental results demonstrate that the proposed model achieves the best peak signal-to-noise ratio of 51.299 dB and 49.762 dB on the CAVE and Harvard datasets, respectively. Thus, the sparse VAFormer graph convolutional model can effectively fuse multi-spectral and hyperspectral images, outperforming some advanced models in the field of hyperspectral image super-resolution such as FF-former and LGAR.
In this study, a new method is proposed by improving an existing deep-learning network, where aerial high-resolution hyperspectral data and LiDAR data are combined for the fine classification of tree species. First, feature extraction and fusion are performed for different data sources. Subsequently, a classification network named CA-U-Net is constructed based on the U-Net network by adding a channel-attention-mechanism module to adjust the weights of different features adaptively. Finally, we attempt to address the problem of low identification precision for small-sample species by modifying CA-U-Net in class-imbalance cases. The research results show that 1) the CA-U-Net network performs well, with an overall classification accuracy of 96.80%. Compared with the FCN, SegNet, and U-Net networks, the CA-U-Net network shows improvements of 8.56, 11.99, and 3.31 percent points, respectively, in terms of classification accuracy. Additionally, the network exhibits a higher convergence speed. 2) Replacing the original loss function in the CA-U-Net network with a cross-entropy loss function based on the class-sample-size balance can improve the classification accuracy for tree species with fewer samples. The proposed methodology can serve as an important reference in small-scale forestry, such as orchard management, urban-forest surveys, and forest-diversity surveys.
Remote-sensing images have complex spatial variability and wide coverage scenarios, but the current remote-sensing scene-classification algorithms cannot easily extract and utilize information effectively. Hence, a network architecture that fuses multiscale hierarchical features (MHFNet) is proposed to solve these problems. First, FasterNet is introduced as a multilevel feature extractor to extract multiple levels of features from remote-sensing scenes. Subsequently, a multiscale interaction transformer (MSIT) is proposed to capture the abundant information of various scales hidden in each level and to model dependencies between remote pixels. Finally, an adaptive token mixer (ATM) is designed to enhance the model's understanding and analysis capabilities of remote-sensing scenes by examining the correlation between hierarchical features and fusing hierarchical features. The accuracy rates of MHFNet on two public remote-sensing datasets, i.e., AID and NWPU-RESISC45, are 98.63% and 95.73% respectively. The classification results show that MHFNet performs better than other classification methods.
Medical image segmentation can accurately and quickly extract structures of interest in images and has major application value in medical imaging diagnosis, disease analysis, surgical planning, and other fields. Traditional medical image segmentation methods typically rely on edge detection, template matching techniques, statistical shape models, active contours, and traditional machine learning techniques. However, due to problems such as blur, noise, and low contrast in images, the accuracy and robustness of traditional methods are limited. Deep learning methods gradually extract features by learning different levels of abstraction of data. Compared with traditional methods, they have the advantages of high accuracy, strong adaptability, and strong scalability. To better conduct research on auxiliary diagnosis of medical image segmentation, this article reviews the application of convolutional neural networks, Transformer, and U-Net and Transformer hybrid structures in medical image segmentation, and conducts a comprehensive comparative analysis of these models. The feasibility of these models in medical image segmentation is confirmed through visualization results and image evaluation metrics. Finally, we summarize the existing problems in current research and present future research directions.
Since its introduction in 1998, optical coherent elastography technology has significantly advanced in detecting and imaging of the biomechanical properties of soft tissues over the past two decades. This technology stands out owing to its high spatial resolution, sensitivity in measuring elastic moduli, and rapid imaging speed, making it one of the most promising optical elastography technologies for clinical application. At present, research groups worldwide are focusing on three main core elements of optical coherent elastography technology: developing safer and more effective excitation methods to generate the necessary vibration signals for elasticity evaluation, establishing new mechanical models to accurately quantify the biomechanical properties of tissues under complex boundary conditions, and developing new algorithms for the quantitative analysis of biomechanical properties. These efforts aim to accelerate the clinical application and transformation of this technology. This article reviews the fundamental theories and latest advancements in optical coherent elastography, explores noncontact approaches, establishes mechanical wave models for various biological tissues, and outlines future directions to facilitate its clinical application.