To address the challenges of long reconstruction time and numerous model voids in large-scale scenes and weakly textured regions during 3D reconstruction of unmanned aerial vehicle (UAV) images using existing multi-view stereo reconstruction (MVS) algorithms, an improved 3D reconstruction algorithm based on PatchMatch MVS (PM-MVS), called MCP-MVS, is proposed. The algorithm employs a multi-constraint matching cost computation method to eliminate outlier points from the 3D point cloud, thereby enhancing robustness. A pyramid red-and-black checkerboard sampling propagation strategy is introduced to extract geometric features across different scale spaces, while graphics processing unit based parallel propagation is exploited to improve the reconstruction efficiency. Experiments conducted on three UAV datasets demonstrate that MCP-MVS improves reconstruction efficiency by at least 16.6% compared to state-of-the-art algorithms, including PMVS, Colmap, OpenMVS, and 3DGS. Moreover, on the Cadastre dataset, the overall error is reduced by 35.7%, 20.3%, 19.5%, and 11.6% compared to PMVS, Colmap, OpenMVS, and 3DGS, respectively. The proposed algorithm also achieves the highest F-scores on the Cadastre and GDS datasets, 75.76% and 79.02%, respectively. These results demonstrate that the proposed algorithm significantly reduces model voids, validating its effectiveness and practicality.
To address the issue of low window-detection completeness and accuracy caused by the irregular distribution of windows on building facades, this study proposes a novel window-detection method that leverages hole constraints and hierarchical localization. This approach utilizes the least-squares method to fit lines to the projected point cloud data of the building facade, with distance constraints applied to obtain the primary wall point cloud data. The initial window position is determined using the hole-based detection method. Incorporating the concept of region expansion, the method employs an improved Alpha-Shape algorithm to extract boundary points around the initially identified window positions. Feature points among the boundary points are identified, and the boundary points are regularized based on these feature points, thus enabling the precise construction of window wireframe models. Experimental results show that this method significantly improves the accuracy of window detection, as evidenced by its average accuracy and completeness of 100% and 93.34%, respectively.
Hyperspectral remote sensing has been widely applied in geological research due to its rich multi-band spectral information. Most studies mainly focus on the identification of soil components and clay minerals, with relatively fewer studies on carbonate rocks, so this paper proposes a decision tree model to achieve precise classification of carbonate rocks based on hyperspectral data. A continuum-removed method is used to preprocess the data, and then combines spectral knowledge and machine learning to extract features. Specifically, the study determines spectral intervals closely related to carbonate rocks through spectral knowledge and extracts key waveform features from the spectral curves. Subsequently, the study uses the random forest algorithm to select features with discriminative capabilities, determines the optimal classification discriminant through threshold analysis, and builds a decision tree model. Finally, the model performance is evaluated using a confusion matrix, and the classification accuracy is compared with other five models. Results show that the decision tree model constructed based on the order of the lowest point wavelength of the absorption valley, the right shoulder wavelength of the absorption band , and the absorption bandwidth exhibited the highest classification accuracy, with an accuracy rate of 95.57%.
To address issues such as poor output consistency, information loss, and blurred boundaries caused by incomplete truth labeling in current weakly-supervised point cloud semantic segmentation methods, a weakly-supervised point cloud semantic segmentation method with input consistency constraint and feature enhancement is proposed. Additional constraint is provided on the input point cloud to learn the input consistency of the augmented point cloud data, in order to better understand the essential features of the data and improve the generalization ability of the model. An adaptive enhancement mechanism is introduced in the point feature extractor to enhance the model's perceptual ability, and utilizing sub scene boundary contrastive optimization to further improve the segmentation accuracy of the boundary region. By utilizing query operations in point feature query network, sparse training signals are fully utilized, and a channel attention mechanism module is constructed to enhance the representation ability of important features by strengthening channel dependencies, resulting in more effective prediction of point cloud semantic labels. Experimental results show that the proposed method achieves good segmentation performance on three public point cloud datasets of S3DIS, Semantic3D, and Toronto3D, with a mean intersection over union of 66.4%, 77.9%, and 80.5%, respectively, using 1.0% truth labels for training.
In the context of real-world observations of water purification flocculation processes, current image segmentation-based methods for detecting floc features face several challenges, which include poor recognition accuracy for deep-lying flocs, high annotation costs, and difficulties in adaptively processing depth-of-field information aiming at these problems, a new floc feature detection method based on improved density map and locally enhanced convolutional neural network (LECNN) is proposed. First, a density map construction method based on multipoint marking and average kernel smoothing is designed, to address the inability of the density map to simultaneously reflect multiple floc feature parameters. Second, a scene depth adaptive structure that assigns different weights to flocs at various depths is proposed, to mitigate the inaccuracies in floc parameter detection caused by parallax. Then, the proposed LECNN captures multiscale receptive fields while emphasizing local features. In comparative tests on a floc image dataset with multipoint markings, LECNN demonstrates accurate and robust density map fitting performance against recently proposed pixel-level prediction network structures, achieving a performance improvement over other floc feature detection benchmark methods in experimental results.
To address the problems of detail loss and color distortion in the current image defogging algorithms, this paper proposes a multi-dimensional attention feature fusion image dehazing algorithm. The core step of the proposed algorithm is the introduction of a union attention mechanism module, which can simultaneously operate in three dimensions of channel, space, and pixel to achieve accurate enhancement of local features, while parallel a multi-scale perceptual feature fusion module effectively captures global feature information of different scales. To achieve a more refined and accurate dehazing effect, a bi-directional gated feature fusion mechanism is added to the proposed algorithm to realize the deep fusion and complementarity of local information and global information features. Experimental validation on multiple datasets, such as RESIDE, I-Hazy, and O-Hazy shows that, the proposed algorithm exhibits better performance than the existing state-of-the-art in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). Compared with the classical GCA-Net, the PSNR and SSIM of the proposed algorithm increased by 2.77 dB and 0.0046, respectively. Results of this study can provide new insights and directions for investigating image dehazing algorithms.
Due to the complex background and small target of the transmission line cotter pin, the intelligent power inspection of unmanned aerial vehicles (UAVs) is vulnerable to the problems of low detection accuracy and high missed and false detection rates. Addressing these issues, the present study proposes a target detection algorithm based on YOLOv8-DEA to better adapt to UAVs and other application scenarios. First, the backbone network's C2f module is modified, enabling the model to focus on regions of interest and enhancing its perception of local image structures. Subsequently, an efficient mamba-like linear attention (EMLLA) mechanism is used to capture distant dependencies, and the efficient multilayer perceptron (EMLP) module is applied to map the model features to a higher dimensionality, enhancing the model's expressiveness. Finally, a dynamic selection mechanism is used to improve the Neck layer. The adaptive fusion of deep and shallow features enables the effective integration of features from different levels, allowing the model to accurately capture global semantic information, as well as extract rich detailed information when processing complex and diverse data. The experimental results demonstrate that the improved algorithm achieves a 2.33 percentage points increase in mean average precision (mAP@0.5) and 3.67 percentage points increase in recall (R@0.5) on the custom cotter pin dataset. Additionally, the algorithm achieves a precision of 95.58% and speed of 67.84 frame/s. When compared to mainstream algorithms, the proposed method not only exhibits improved detection accuracy, but also ensures real-time performance, making it more suitable for the needs of transmission-line cotter pin detection in engineering applications.
In this study, a three-dimensional (3D) facial UV texture restoration algorithm based on gated convolution is proposed to address the texture loss caused by self-occlusion in unconstrained facial images captured from large viewing angles during reconstructing 3D facial structures from a single image. First, a gated convolution mechanism is designed to learn a dynamic feature selection approach for each channel and spatial position, thereby enhancing the network's ability to capture complex nonlinear features. These gated convolutions are then stacked to form an encoder-decoder network that repairs 3D facial UV texture images with irregular defects. In addition, a spectral normalization loss function is introduced to stabilize the generative adversarial network, and a segmented training approach is implemented to overcome the challenges of cost and accessibility in collecting 3D facial texture datasets. The experimental results show that the proposed algorithm outperforms mainstream algorithms in terms of the peak signal-to-noise ratio and structural similarity. The proposed algorithm effectively restores UV texture maps under large angle occlusion, yielding more comprehensive facial texture maps with natural, coherent pixel restoration, and realistic texture details.
To address issues such as low brightness, low contrast, and lack of details in images taken under challenging conditions like nighttime, backlighting, and severe weather, a nonlinear adaptive dark detail enhancement algorithm is proposed for improving low-light images. To ensure color authenticity, the original image is first converted to HSV space, and the brightness component V is extracted. For dealing with the issues of poor brightness and low contrast in low-light images, an improved gamma correction algorithm is then adopted to adaptively adjust image brightness. Subsequently, a brightness adaptive contrast enhancement algorithm is introduced, combining a low-pass filtering approach to adaptively enhance high-frequency details. This helps highlight textures and edge information of dark areas of the image. Finally, a brightness-guided adaptive image fusion algorithm is proposed to preserve edge details in highlighted areas while avoiding overexposure. Experimental results demonstrate that the proposed algorithm effectively adapts to the image characteristics of low-light environments. It not only significantly enhances the brightness and contrast of low-light images but also highlights details in darker areas while preserving color authenticity.
To address the problem of reduced clarity caused by detail information degradation during haze processing in complex scenes, this study presents a multi-scale feature fusion dehazing network based on U-net. In the encoder component, we employ a dynamic large kernel convolution with a dynamic weighting mechanism to enhance global information extraction. This mechanism allows for adaptive adjustment of feature weights, thereby improving the model's adaptability to complex scenes. In addition, we introduce a parallel feature attention module PA1 to capture critical details and color information in images, effectively mitigating the loss of important features during the dehazing process. To tackle the challenges posed by complex illumination changes and uneven haze conditions, we incorporate coordinate attention in the decoder's parallel feature attention module PA2. This approach integrates spatial and channel information, allowing for a more comprehensive capture of key details in feature maps. Experimental results show that the proposed network model achieves excellent dehazing effects across various datasets. The proposed network model outperforms classical dehazing networks such as FFA-Net and AOD-Net, effectively addressing detail loss while providing superior image dehazing performance.
Although Transformers excel in global feature extraction, they often have limitations in capturing local image details, leading to the loss of some local lighting information and uneven overall lighting distribution. To address this issue, this study proposes a low-light image enhancement algorithm based on light perception enhancement and dense residual denoising. The proposed algorithm leverages the advantages of Transformer and convolutional neural networks and effectively enhances the visual quality and lighting uniformity of low-light images through a detailed light perception mechanism. The network architecture features a codec design that integrates multilevel feature extraction and attention fusion modules internally. The feature extraction modules capture both global and local image features, and the attention fusion modules filter and combine these features to optimize information transmission. Additionally, to address the issue of noise amplification in low-light image enhancement, the enhanced image denoising module effectively reduces the noise of the enhanced image using dense residual connection technology. The performance of the proposed algorithm in processing low-light images is evaluated via comparative experiments. The experimental results show that the proposed algorithm can not only improve the problem of uneven lighting but also significantly reduce image noise and achieve higher quality image output.
Aiming to solving the problems of feature redundancy and blurred edge texture in images reconstructed via some of the existing image super-resolution reconstruction algorithms, an image super-resolution reconstruction network with spatial/high-frequency dual-domain feature saliency is proposed. First, the network constructs a feature distillation refinement module to reduce feature redundancy via introduction of blueprint separable convolution, and it then designs parallel dilated convolutions to refine extraction of multiscale contextual features so as to reduce feature loss and compensate for loss of texture in local regions. Second, a spatial dual-domain fusion attention mechanism is designed to enhance the high-frequency feature expression for fully capturing long-range dependency between different locations and channels of the feature map while facilitating reconstruction of edge texture details. The experimental results demonstrate that with its reconstructed image quality, the proposed model outperforms other comparison algorithms both in terms of objective metrics and subjective perception on multiple datasets. At a scaling factor of 2, compared with the VapSR, SMSR, and EDSR, the proposed method enhances the peak signal-to-noise ratio (PSNR) by an average of 0.14 dB, 0.36 dB, and 0.35 dB, respectively.
The existing algorithms for detecting dress code violations at airports exhibit high computational complexity and weak real-time performance. Furthermore, they are prone to errors and omissions during detection in complex airport security scenarios, making it difficult to meet the requirements of real-time security detection. In response to this situation, a method called SGS-YOLO is proposed based on the YOLOv8n technology route for detecting violations of dress code by airport security personnel. First, a parameter-free SimAM attention mechanism is introduced into the backbone network of the model to enhance the perception ability of important features and improve the accuracy of object detection. Second, GSConv and VoV-GSCSP modules are introduced into the neck network to reduce the number of parameters, which helps achieve lightweighting of the model. Finally, a detection box regression loss function based on SIOU is adopted to reduce misjudgments in cases involving small changes between the predicted and real target boxes. The experimental results show that compared with the baseline model, the SGS-YOLO improves the average accuracy by 6.3 percentage points. Further, it reduces the number of parameters and floating-point operations by 9.63% and 8.64%, respectively. The proposed approach effectively achieves a balance between model lightweighting and performance, and thus, it possess good engineering application value.
Nighttime images suffer from low visibility due to insufficient illumination and glow effects caused by artificial light sources, which severely impair image information. Most existing low-light image enhancement algorithms are designed for underexposed images. Applying these methods directly to low-light images with glows often intensifies the glow regions and further degrades image visibility. Moreover, these algorithms typically require paired or unpaired datasets for network training. To address these issues, we propose a zero-shot enhancement method for low-light images with glows, leveraging a layer decomposition strategy. The proposed network comprises two main components: layer decomposition and illumination enhancement. The layer decomposition network integrates three sub-networks, including channel attention mechanism modules. Under the guidance of glow separation loss with an edge refinement term and self-constraint information retention loss, the input images is decomposed into three components: glow, reflection, and illuminance images. The illuminance map is subsequently processed via the illumination enhancement network to estimate enhancement parameters. The enhanced image is reconstructed by combining the reflection map and the enhanced illuminance map, following the Retinex theory. Experimental results demonstrate that the proposed method outperforms state-of-the-art unsupervised low-light image enhancement algorithms, achieving superior visualization effects, the best NIQE and PIQE indices, and a near-optimal MUSIQ index. The method not only effectively suppresses glows but also improves the visibility of dark regions, producing more natural enhanced images.
Due to significant brightness differences in airborne flare-containing marine optical images, the image enhancement process may result in low contrast and fuzzy details. To solve these problems, an image enhancement method based on local compensation and non-subsampled contourlet transform (NSCT) is proposed. First, the image is segmented into high-brightness area and low illumination area by mean filtering and the maximum interclass variance method. Then, contrast limited adaptive histogram equalization (CLAHE) algorithm is used to balanced image brightness of high-brightness area, and NSCT algorithm is used to decompose the low illumination area into low-frequency component and several high-frequency components. Subsequently, the uneven illumination of low-frequency component is corrected by the multi-scale-Retinex algorithm, while the high-frequency components are adjusted to improve the details by Laplace operator. The processed low-frequency and high-frequency components are subjected to NSCT reconstruction, and CLAHE algorithm is used to further improve the contrast of image. Finally, by using improved local compensation model, the enhanced low-illumination area is compensated, and therefore the enhanced image can be obtained. Experimental results show that, compared with other methods, the image information entropy, average gradient, and contrast of the algorithm in this paper are improved by 1.35%, 40.62%, and 77.15% on average. Besides, the enhanced image also performs better in terms of image brightness, contrast and texture details.
In the infrared- and visible-image alignment of power equipment, severe nonlinear radial aberrations as well as significant viewing-angle and scale differences occur between the infrared and visible images, which result in image-alignment failure. Hence, a local normalization-based algorithm for the infrared- and visible-image alignment of power equipment is proposed. First, local normalization was performed to eliminate the nonlinear distortion of the images and improve the accuracy of the curvaturescale space (CSS) algorithm in extracting the feature points. Subsequently, the main direction of the feature points was calculated based on the local curvature information, and the multiscale oriented gradient histogram (MSHOG) was used as the feature descriptor. Finally, the features were matched using the proposed accurate matching method to obtain the parameters of the inter-image projective transformations. The proposed algorithm has average root mean square errors of 2.18 and 2.24 and average running times of 13.09 s and 12.07 s under infrared- and visible-image datasets of electric power equipment, respectively. Experimental results verify the effectiveness of the method in addressing images to be aligned with obvious differences in viewpoints and proportions, as well as in realizing the high-precision alignment of infrared and visible images of electric power equipment.
With the traditional attention mechanism, the representational ability and detection performance of the model are limited or its complexity and calculation cost are high. To solve these problems, an innovative lightweight multi-head mixed self-attention (MMSA) mechanism is proposed, aimed at enhancing the performance of object detection networks while maintaining the model's simplicity and efficiency. The MMSA module ingeniously integrates channel information with spatial information, as well as local and global information, by introducing a multi-head attention mechanism, further augmenting the network's representational capabilities. Compared to other attention mechanisms, MMSA achieves a superior balance between model representation, performance, and complexity. To validate the effectiveness of MMSA, it is integrated into the Backbone or Neck portions of the YOLOv8n network to enhance its multi-scale feature extraction and feature fusion capabilities. Extensive comparative experiments on the CityPersons, CrowdHuman, TT100K, BDD100K, and TinyPerson public datasets show that, compared with the original algorithm, YOLOv8n with MMSA improved their mean average precision (mAP@0.5) by 0.9 percentage points, 0.9 percentage points, 2.3 percentage points, 1.0 percentage points, and 1.7 percentage points, respectively, without significantly increasing the model size. Additionally, the detection speed reached 145 frame/s, fully meeting the requirements of real-time applications. Experimental results fully demonstrate the effectiveness of the MMSA mechanism in improving object detection outcomes, showcasing its practical value and broad applicability in real-world scenarios.
To address the problems of detail loss, low brightness and color distortion when processing sea fog images by dark channel prior defogging algorithm, this paper proposes a sea fog image defogging algorithm based on sky region segmentation. First, accurate segmentation of the sky region is achieved through threshold segmentation and region growing. On this basis, an approach with stronger anti-interference capabilities is used to optimize the atmospheric light intensity, the median value of the top 0.1% pixels belonging to the region with the highest luminance is chosen as the atmospheric light intensity. Second, the transmittance is refined using fast bootstrap filtering and an adaptive correction factor is introduced to adjust and optimize the transmittance mapping. Finally, the obtained transmittance and atmospheric light intensity are utilized with an atmospheric scattering model to restore the defogging image. Experimental results demonstrate that the algorithm significantly enhances evaluation metrics such as structural similarity and peak signal-to-noise ratio, effectively improving the quality of the defogging image.
To address the challenges in traditional image restoration algorithms based on regularization models that the information encapsulated in the regularization term prior may not be sufficiently rich, and determining the regularization coefficients can be cumbersome or require adaptive adjustments, combining the advantages of traditional and deep learning methods, this paper combines L2-norm regularization with deep learning, proposes a deep learning network with strict mathematical model foundation and interpretability: interpretable deep learning image restoration algorithm with L2-norm prior. Nonlinear transformations are employed to replace the regularization term in the traditional model, and deep learning networks are utilized to solve the regularization model. This not only optimizes the model solving process but also enhances the interpretability of the deep learning network. Experimental results demonstrate that the proposed algorithm is capable of effectively removing image blurriness while suppressing image noise, thereby improving image quality.
An improved multilayer progressive guided face-image inpainting network is proposed to solve problems such as artifacts and incongruent facial contours after face-image inpainting. The network adopts an encoding-decoding structure comprising structure-complement, texture-generation, and main branches, and gradually guides the generation of structure and texture features among different branches. A feature-extraction module is introduced to enhance the connection between different branches when the feature transfer is carried out in different branches. Additionally, a feature-enhancing attention mechanism is designed to strengthen the semantic relationship between channel and spatial dimensions. Finally, the output features of different branches are passed on to the context aggregation module such that the inpainting images become more similar to the actual images. Experimental results show that, compared with PDGG-Net (Progressive Decoder and Gradient Guidance Network), the proposed network in the CelebA-HQ dataset presents average improvements of 0.003 and 0.13 dB in terms of the SSIM and PSNR, respectively. To prevent overfitting, multi-dataset joint training and fine-tuning are performed in the sparse profile dataset, which improves the SSIM and PSNR by 0.003 and 0.29 dB on average, respectively, compared with the results of direct training using the profile dataset.
Existing infrared and visible image fusion methods cannot effectively balance the unique and similar structures of infrared and visible images, resulting in suboptimal visual quality. To address these problems, this study proposes a cross-fusion Transformer-based fusion method. The cross-fusion Transformer block is the core of the proposed network, which applies a cross-fusion query vector to extract and fuse the complementary salient features of infrared and visible images. This cross-fusion query vector balances the global visual characteristics of infrared and visible images and effectively improves the fusion visual effect. In addition, a multi-scale feature fusion block is proposed to address the problem of information loss caused by the down-sampling operation. Experimental results on the TNO, INO, RoadScene, and MSRS public datasets show that the performance of the proposed method surpasses existing representational deep-learning-based methods. Specifically, comparing with the suboptimum results on the TNO dataset, the proposed method obtains ~27.2%, ~29.2%, and ~9.9% improvements in terms of standardized mutual information, mutual information, and visual fidelity metrics, respectively.
To address the issues of high computational complexity and insufficient real-time performance encountered in three-dimensional object detection for complex aircraft maintenance scenes, a three-dimensional object detection method which integrates visual camera and LiDAR data and based on prior information is proposed. First, the parameters of a camera and LiDAR are calibrated, the point cloud obtained by LiDAR is preprocessed to obtain an effective three-dimensional point cloud, and the YOLOv7 algorithm is used to identify aircraft fuselage targets in the camera images. Next, the depth of the target object is calculated based on its prior length using the Efficient Perspective-n-Point (EPnP) method. Finally, depth information and point cloud clustering methods are utilized to complete three-dimensional object detection and identify obstacles. Experimental results show that the proposed method can accurately detect targets from environmental point clouds, with a recognition accuracy of 94.70%. Furthermore, the processing time for one frame is 42.96 ms, which indicates good performance in terms of both recognition accuracy and real-time capability, thus satisfying the collision risk detection requirements during aircraft movement.
To address the challenges of low recognition accuracy and high computational complexity in underwater optical target recognition algorithms, a lightweight YOLOv8 underwater optical recognition algorithm based on automatic color equalization (ACE) image enhancement is proposed. Initially, we apply the ACE image enhancement algorithm to preprocess images. Subsequently, we improve the feature extraction capabilities by replacing the YOLOv8 backbone with an upgraded SENetV2 backbone network. To further decrease computational quantity, we introduce a lightweight cross-scale feature fusion module in place of the neck network. Then, we utilize DySample as a substitute for the traditional upsampler to improve image processing efficiency. We refine the DyHead detection head to better perceive targets. Finally, we enhance the accuracy of bounding box regression by replacing loss function of YOLOv8 with InnerMPDIoU based on the minimum point distance intersection ratio (MPDIoU). Experimental results show that the proposed SCDDI-YOLOv8 algorithm achieves a mean average precision of 77.3% and 71.5% on the URPC2020 and UWG datasets, respectively, while reducing parameters by ~20.7%, floating-point operations by 6×108, and model size by 1.2 MB compared with the original YOLOv8n. Compared with other advanced algorithms, the proposed algorithm can meet the sensitive computational needs of edge devices.
To address the problems that existing methods have difficulty in achieving a high compression ratio and low distortion when processing whole-brain data of macaque with high dynamic range, this paper proposes an end-to-end multi-scale compression network based on the U-Net framework. First, the stability of the network is increased and high-frequency information of the image data is preserved by establishing a multi-level controllable jump connection between the compression module and the reconstruction module. Then, the data output by the coding module are processed with straight-through estimation quantization to accelerate the modeling process of the probability model and improve the compression ratio. Experimental results show that the rate-distortion curves of the network on the cellular architecture dataset and the nerve fiber dataset are better than those of other mainstream deep learning methods and the traditional JPEG2000 method. Under a compression ratio of 160, the multi-scale structural similarity index is not less than 0.99.
Inspection robots have become critical tools for roller detection in belt conveyors. However, the infrared images detected by these robots often suffer from low resolution and a low signal-to-noise ratio, thereby introducing higher requirements for target detection algorithms. In this study, we propose improvements to inspection robots for roller detection tasks in belt conveyors based on the YOLOv5 network. Inspired by DenseNet, we first introduce dense connection modules into the YOLOv5 network to enhance its feature extraction capabilities. We then introduce a Wise-IoU (WIoU) loss function to evaluate the quality of anchor rectangles and in turn improve network performance and generalization capabilities. Experimental evaluations on a dataset of infrared data collected by inspection robots on belt conveyors demonstrate that, compared with the original YOLOv5, the recall rate and mean average precision are improved by 2.4 percentage points and 1.5 percentage points, respectively (with the latter reaching 98%), while a recognition speed of 80 frame/s and model size of 15 MB are maintained. The improved inspection robot features a small size, fast speed, and high efficiency.
To address the limitations of traditional image restoration techniques in accurately restoring laser interference images, this paper proposes a nove deep learning framework. This framework leverages convolutional neural networks and a multi-head attention mechanism to extract multi-scale features, thereby enhancing the understanding and restoration of image structures. Experiments are conducted on a synthetic laser interference image dataset comprising 5 scenes, each scene containing 5000 images. Experimental results reveal that the proposed framework visually restores images affected by laser interference and achieves high peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). In particular, the PSNR and SSIM values for the reconstructed images, across various levels of image damage, exceed 34 dB and 0.98, respectively. The proposed method holds promise for broad applications in laser interference scenarios and offers valuable support for military defense and civilian technologies.
To address the issue of limited denoising effectiveness caused by the lack of ground-truth images during the training of self-supervised image denoising methods, a multistage self-supervised denoising method based on a memory unit is proposed. The memory unit modularly stores intermediate denoising results, which resemble clear images, and collaboratively supervises the network training process. This ability allows the network to learn not only from noisy images but also from the intermediate outputs during training. Additionally, a multistage training scheme is introduced to separately learn features from flat and textured areas of noisy images, while a spatial adaptive constraint balances noise removal and detail retention. Experimental results show that the proposed method achieves peak signal-to-noise ratios of 37.30 dB on the SIDD dataset and 38.52 dB on the DND dataset, with structural similarities of 0.930 and 0.941, respectively. Compared with existing self-supervised image denoising methods, the proposed method remarkably improves both visual quality and quantitative metrics.
This paper proposes a new bidirectional weighted multiscale dynamic approach, the BiEO-YOLOv8s algorithm, to enhance the detection of small targets in aerial images. It effectively addresses challenges such as complex backgrounds, large-scale variations, and dense targets. First, we design a new ODE module to replace certain C2f modules, enabling the accurate, quick, and multiangle location of target features. Then, we develop a bidirectional weighted multiscale dynamic neck network structure (BiEO-Neck) to achieve deep fusion of shallow and deep features. Second, adding a small object detection head further enhances feature extraction capability. Finally, the generalized intersection union ratio boundary loss function is used to replace the original boundary loss function, thereby enhancing the regression performance of the bounding box. Experiments conducted on the VisDrone dataset demonstrat that as compared to the base model YOLOv8s, the proposed model achieved a 6.1 percentage points improvement in mean average precision, with a detection speed of only 4.9 ms. This performance surpasses that of other mainstream models. The algorithm effectiveness and adaptability are further confirmed through universality testing on the IRTarget dataset. The proposed algorithm can efficiently complete target detection tasks in of unmanned aerial vehicle aerial images.
In this study, we propose a novel scheme for expanding chaotic keys for encrypting and decrypting image and video signals. The process begins with achieving chaos synchronization in semiconductor lasers driven using a common signal over a 130-km optical fiber link with a synchronization coefficient of 0.945. The resulting synchronized chaotic signal is processed through dual-threshold quantization, and error bits are removed through lower-triangle reconciliation, thereby yielding consistent keys at 1 Gbit/s. These keys are expanded to 80 Gbit/s using the Mersenne twister algorithm. Analysis shows that they can pass the NIST tests, thereby demonstrating good randomness and security. Thus, the encryption and decryption of image and video signals using these expanded keys is experimentally demonstrated.
Light-sheet fluorescence microscopy imaging systems are extensively used for imaging large-volume biological samples. However, as the field of view of the optical system expands, imaging will exhibit spatially uneven degradation throughout the entire field of view. Conventional model-driven and deep learning approaches exhibit spatial invariance, making it challenging to directly address this degradation. A position-dependent model-driven deconvolution network is developed by introducing positional information into the model-driven deconvolution network, which is achieved by randomly selecting training image pairs with different degradation patterns during training and using block-based reconstruction techniques during image restoration. The experimental results reveal that the network facilitates rapid deconvolution of large field-of-view optical images, thereby considerably enhancing image processing efficiency, image quality, and the uniformity of image quality within the field of view.
To address the scarcity of point cloud datasets in foggy weather, an optical model-based foggy weather point cloud rendering method is proposed. First, a mathematical relationship is established between the LiDAR impulse responses during good weather and foggy weather in the same scene. Second, an algorithm is designed using laser attenuation in a foggy weather, and the visibility of the rendered point cloud is set by modifying the attenuation coefficient, backscattering coefficient, and differential reflectance of the target in the algorithm to obtain the rendered point cloud of the foggy weather under the set visibility. Experiments reveal that the proposed method effectively renders foggy weather point cloud with a visibility within 50?100 m, and that the method shows stable results. Compared with the real foggy weather point clouds, the average values of KL (Kullback-Leibler) dispersion of the rendered point clouds are less than 6, the average values of the percentage of Hausdorff distance less than 0.5 m are not less than 85%, and the average values of the mean square error distance are less than 8, proving the feasibility of proposed method. Therefore, the proposed method can render foggy weather point cloud under good weather and overcomes the lack of foggy weather point cloud datasets and visibility data.
To solve the problems of the low-accuracy detection or inaccurate classification of small target defects in solar cell panel defect detection, an improved lightweight YOLOv5s solar cell panel defect detection model suitable for small target detection is proposed in this study. First, an SiLU activation function is used to replace the original activation function to optimize the convergence speed and enhance the generalization ability of the model. Second, the C3TR and convolution block attention modules are used to re-optimize the backbone feature sampling structure to improve the recognition ability for different defect types, especially small target defects. Third, the content-aware re-assembly of features is realized in the feature extraction network to improve the detection accuracy and detection speed without increasing the model weight. Finally, a dynamic nonmonotonic loss function WIoUv3 is added to the dynamic matching prediction box and real frame to enhance the robustness of small target datasets and noise. Experimental results show that the mean average precision (mAP@0.5) of the proposed model is 95.9% and that its classification accuracies for large-area cracks and star-shaped scratches reach 98.0% and detection speed reaches 75.133 frame/s, demonstrating its lightweight nature and rapidness that meet the requirements of industrial production.
Fisheye cameras offer lower deployment costs than traditional cameras for detecting the same scene. However, accurately detecting distorted targets in fisheye images requires increased computational complexity. To address the challenge of achieving both accuracy and inference speed in fisheye image detection, we propose an enhanced YOLOv8m-based fisheye object detection model, which we refer to as Fisheye-YOLOv8. First, we introduce the Faster-EMA module, which integrates lightweight convolution and multiscale attention to reduce delay and complexity in feature extraction. Next, we design the RFA-BiFPN structure, incorporating a parameter-sharing mechanism to enhance the detection speed and accuracy through receptive field attention and a weighted bidirectional pyramid structure. In addition, the lightweight G-LHead detection head is introduced to minimize the number of model parameters and reduce complexity. Finally, the LAMP pruning algorithm is introduced to balance improvements in recognition accuracy with inference speed. Experimental results demonstrate that Fisheye-YOLOv8 achieves mean average precision values of 60.5% and 59.7% on the Fisheye8K and WoodScape datasets, respectively, which is an increase of 2.2 and 1.2 percentage points compared to YOLOv8m. Moreover, the proposed model's parameter and computational complexity are only 20.5% and 29.7% of those of YOLOv8m, respectively, with a detection speed of 118 frames/s. The proposed model meets real-time requirements and is better suited for fisheye camera deployment than the other models.
With an increase in the shooting depth, underwater images suffer from issues such as low brightness, color distortion, and blurred details. Therefore, an underwater low-illumination image enhancement algorithm based on an encoding and decoding structure (ULCF-Net) is designed. First, a brightness enhancement module is designed based on a half-channel Fourier transform, which enhances the response in dark regions by combining the frequency domain and spatial information. Second, cross-scale connections are introduced within the encoding and decoding structure to improve the detailed expression of underwater optical images. Finally, a dual-stream multiscale color enhancement module is designed to improve the color fusion effects across different feature levels. Experimental results on publicly available underwater low-illumination image datasets demonstrate that the proposed ULCF-Net exhibits excellent enhancement in terms of brightness, color, and details.
The study on attention in attention network (A2N) in single-image super-resolution has revealed that all attention modules are not beneficial to the network. Therefore, in the design of the network, input features can be divided into attention and nonattention branches. The weights on these branches can be adaptively adjusted using dynamic attention modules based on the input features so that the network can strengthen useful features and suppress unimportant features. In practical applications, lightweight networks are suitable to be run on resource-constrained devices. Based on A2N, the number of attention in attention block (A2B) in the original network is reduced and lightweight receptive field modules are introduced to enhance the overall performance of the network. In addition, by adjusting the L1 loss to a combination loss based on Fourier transform, the spatial domain of the image is transformed into the frequency domain, enabling the network to learn the frequency characteristics of the image. The experimental results show that the improved A2N reduces parameter count by about 25%, computational complexity by about 20%, and inference speed by 15%, thereby enhancing the performance.
High-dynamic range (HDR) image reconstruction algorithms based on the generation of bracketed image stacks have gained popularity for their capabilities in expanding the dynamic range and adapting to complex lighting scenarios. However, existing approaches based on convolutional neural networks often suffer from local receptive fields, limiting the utilization of global information and recovery of over- or underexposed regions. To solve this problem, this study introduces a Transformer architecture that equips the network with a global receptive field to establish long-range dependency. In addition, a unidirectional soft mask is added to the Transformer to alleviate the effects of invalid information from over- and underexposed regions, further improving the reconstruction quality. Experimental results show that the proposed algorithm improves the peak signal-to-noise ratio by 2.37 dB and 1.33 dB on the VDS and HDREye datasets, respectively, and subjective comparisons further prove the effectiveness of the proposed algorithm. This study provides a novel approach for improving the information recovery capabilities of HDR image reconstruction algorithms for over- and underexposed regions.
Aiming at the problems of low detection accuracy and missed detection caused by complex contour information, large change of shape and small size contraband in X-ray images, an improved GELAN-YOLOv8 model based on YOLOv8 is proposed. First, the RepNCSPELAN module based on generalized efficient layer aggregation network (GELAN) is introduced to improve the feature extract ability for contraband. Second, the GELAN-RD module is proposed by combining deformable convolution v3 (DCNv3) and RepNCSPELAN module to adapt contraband with different postures and serious changes in size and angle. Third, the spatial pyramid pooling is improved, so that the model can pay more attention to the feature information of small target contraband. Finally, the Inner-ShapeIoU is proposed by combining inner-intersection over union (Inner-IoU) and Shape-IoU to reduce the false detection and missed detection and speed up the convergence of the model. Results on the SIXray dataset show that the mAP@0.5 of the improved algorithm are 2.8 percentage points higher than YOLOv8n, and the performance is better than YOLOv8s. The GELAN-YOLOv8 effectively realizes the real-time detection of contraband in X-ray images.
Marine microorganisms are fundamental to marine ecosystems. However, underwater imaging often blurs microbial contours due to water absorption and scattering. To address this, we propose a contour segmentation method for underwater microorganisms that combines an underwater imaging model with Fourier descriptors. First, the background light and water attenuation coefficients are estimated using the underwater imaging model to extract a clear, water-free feature map of the object. Next, a classification header determines the target location, while a regression header uses Fourier descriptors to represent and refine the microorganism's contour in the pixel domain. In addition, hologram reconstruction and preprocessing steps are applied, and a microbial contour segmentation dataset is generated. Experimental results demonstrate that the Fourier descriptor outperforms the star polygon method in contour representation accuracy and spatial continuity. Compared to traditional segmentation methods, the proposed algorithm achieves an F1 score of 0.8894, intersection over union of 0.7887, and pixel accuracy of 0.8608, all improved metrics indicating superior segmentation capability.
Aiming at the problem that the traditonal cloth simulation filtering (CSF) algorithm cannot distinguish the local microtopography of pavement damage, which leads to the wrong detection and omission of pothole damages, an adaptive descend distance CSF algorithm for pavement pothole extraction is proposed. First, the proposed algorithm preprocesses and denoises the point cloud of the road to obtain the pavement point cloud. Second, by improving the displacement distance of the"external force drop"and"internal force pull back"processes of the simulated cloth in the CSF algorithm, the adaptive distance drop of the simulated cloth is realized, and then further constructs the accurate local datum plane of the road surface and generates the depth-enhanced information model of the point cloud. Finally, depth threshold classification and Euclidean clustering algorithm are used to achieve precise detection of potholes and extract geometric attribute features of potholes. Experiments and analysis of the measured road data show that, the recall of potholes in the measured data reaches 83.3%, and the precision reaches 87.5%, the maximum relative error of area is 17.699%, and the maximum relative error of depth is 9.677%, which has a certain degree of robustness and applicability. The proposed algorithm can provide a powerful support for the work of large-scale three-dimensional pavement point cloud data for the automatic and precise detection of potholes on pavements.
This study develops a lightweight roadside object detection algorithm called MQ-YOLO. The algorithm is based on multiscale sequence fusion. It addresses the challenges of low detection accuracy for small and occluded targets and the large number of model parameters in urban traffic roadside object detection tasks. We design a D-C2f module based on multi-branch feature extraction to enhance feature representation while maintaining speed. To strengthen the integration of information from multiscale sequences and enhance feature extraction for small targets, the plural-scale sequence fusion (PSF) module is designed to reconstruct the feature fusion layer. Multiple attention mechanisms are incorporated into the detection head for greater focus on the salient semantic information of occluded targets. To enhance the detection performance of the model, a loss function based on the normalized Wasserstein distance is introduced. Experimental results on the DAIR-V2X-I dataset demonstrate that MQ-YOLO achieves improved mAP@50 and mAP@(50?95) by 3.9 percentage point and 6.0 percentage point compared to the valuses obtained with baseline YOLOv8n with 3.96 Mbit parameters. Experiments on the DAIR-V2X-SPD-I dataset show that the model has good generalizability. During roadside deployment, the model reaches detection speeds of 62.5 frame/s, meeting current roadside object detection requirement for edge deployment in urban traffic.
In order to improve the consistency of metrics between the objective assessment and the human subjective evaluation of stereo image quality, inspired by the top-down mechanism of human vision, this paper proposes a stereo attention-based no-reference stereo image quality assessment method. In the proposed stereo attention module. First, the amplitude of binocular response is adaptively adjusted by the energy coefficient in the proposed binocular fusion module, and the binocular features are processed simultaneously in the spatial and channel dimensions. Second, the proposed binocular modulation module realizes the top-down modulation of the high-level binocular information to the low-level bino- and monocular information simultaneously. In addition, the dual-pooling strategy proposed in this paper processes the binocular fusion map and binocular difference map to obtain the critical information that is more conducive to quality score regression. The performance of the proposed method is validated based on the publicly available LIVE 3D and WIVC 3D databases. The experimental results show that the proposed method achieves high consistency between objective assessment indices and labels.
An U-shaped dual-energy computed tomography (DECT) material decomposition network, called DM-Unet, that combines a selective state spaces model Mamba and efficiency channel attention module is proposed in this paper. The network uses a visual state space module that introduces a channel attention mechanism to capture feature information, adjusts the weights of different levels for feature information in a block through adjustable parametric residual connections, and reduces the gradient explosion and the loss of organizational details through residual connections between the encoder and decoder. Experimental results show that the root mean square error of the base matter image obtained by DM-Unet is as low as 0.041 g/cm3, the structural similarity reaches 0.9981, and the peak signal-to-noise ratio can reach 36.54 dB. Compared with traditional decomposition methods, DM-Unet shows better ability to restore organizational details, noise suppression, and edge information restoration, and is able to fulfill the task of DECT decomposition, which can provide accurate references for the subsequent medical diagnostic work.
This paper proposes a multiplexed fusion deep aggregate learning algorithm for underwater image enhancement. First, the image preprocessing algorithm is used to obtain the image attribute information of three branches (contrast, brightness, and colour) respectively. Then, the image attribute dependency module is designed to obtain fusion features of multiplexed using a fusion network, and then explore the potential fused image attribute correlations through parallel graph convolution. A self-attention deep aggregate learning module is introduced to deeply mine the interaction information between the private and public domains of the multiplexed using sequential self-attention and global attribute iteration mechanisms, and also effectively extract and integrate the important information between image attributes by means of aggregation bottlenecks to achieve more accurate feature representation. Finally, skip connections are introduced to continue enhancing the image output to further improve the effect of image enhancement. Numerous experiments have demonstrated that the proposed method can effectively remove colour bias and blurring, and improve image clarity, as well as facilitate underwater image segmentation and key point detection tasks. The peak signal-to-noise ratio and structural similarity metrics can reach the highest values of 23.01 dB and 0.90, which are improved by 5.0% and 4.7% compared with the suboptimal method, while the underwater colour image quality metrics and information entropy metrics have the highest values of 0.93 and 14.33, which are improved by 2.2% and 0.5% compared with the suboptimal method.
A turbulent fuzzy target restoration algorithm with a nonconvex regularization constraint is proposed to address degradation issues, such as low signal-to-noise ratio, blurring, and geometric distortion, in target images caused by atmospheric turbulence and light scattering in long-range optoelectronic detection systems. First, we utilized latent low-rank spatial decomposition (LatLRSD) to obtain the target low-rank components, texture components, and high-frequency noise components. Next, two structural components were obtained by denoising the LatLRSD model; these were weighted and reconstructed in the wavelet transform domain, and nonconvex regularization constraints were added to the constructed target reconstruction function to improve the reconstruction blur and scale sensitivity problems caused by the traditional lp norm (p=0,1,2) as a constraint term. The results of a target restoration experiment in long-distance turbulent imaging scenes show that compared with traditional algorithms, the proposed algorithm can effectively remove turbulent target blur and noise; the average signal-to-noise ratio of the restored target is improved by about 9 dB. Further, the proposed algorithm is suitable for multiframe or single-frame turbulent blur target restoration scenes.
The existing aluminum surface-defect detection algorithms yield low detection accuracy in practical tasks. Hence, this paper proposes an improved YOLOv8s aluminum profile surface-defect detection algorithm (CDA-YOLOv8). First, the 3×3 downsampling convolution in the network was improved using the context guided block (CG Block) module. This enhances the extraction of features from the global context of the target and aggregate local salient features and global features, thus improving the feature generalization ability. Second, the dilation-wise residual (DWR) module was introduced to improve the Bottleneck structure in C2f, thus improving the multiscale feature-extraction capability. Finally, to address the feature-information loss of microdefects on the surface of aluminum profiles, an ASFP2 detection layer was designed, which integrates the small-target detection layer and the scale sequence feature fusion (SSFF) module. The layer was integrated into the neck of YOLOv8s to extract and transfer more critical small-target feature information in small-sized defects, thereby enhancing the detection performance. Experimental results show that the CDA-YOLOv8 algorithm achieves 93.4%, 80.4%, and 88.1% for indicators of precision, recall, and mean average precision, respectively, which are 5.1 percentage points, 2.4 percentage points, and 4.4 percentage points higher than those of the original YOLOv8s algorithm. This algorithm significantly improves detection performance, particularly through its ability to detect microdefects.
To address issues such as voids and incomplete shapes in three-dimensional (3D) point clouds obtained during the current 3D reconstruction process, a multiscale hybrid feature extraction and activation query point-cloud completion network is proposed. This network adopts an encoder-decoder structure. To extract local information while considering the overall structure, a multiscale hybrid feature extraction module is proposed. The input point cloud was classified into different scales through downsampling, and the hybrid feature information of the point cloud was extracted at each scale. To maintain the high correlation of the point-cloud completion results, an activation query module that retains the feature sequences with high scores and strong correlations is proposed for scoring operations. After the feature sequences are passed through the decoder for point-cloud completion, a complete point cloud is obtained. Experiments on the public dataset PCN indicate that in the comparison of quantitative and visual results, the proposed network model achieves superior completion effects in point-cloud completion and can further enhance the quality of point-cloud completion.
The aim of infrared and visible image fusion is to merge information from both types of images to enhance scene understanding. However, the large differences between the two types make it difficult to preserve important features during fusion. To solve this problem, this paper proposes a dynamic contrast dual-branch feature decomposition network (DCFN) for image fusion. The network adds a dynamic weight contrast loss (DWCL) module to the base encoder to improve alignment accuracy by adjusting sample weights and reducing noise. The base encoder, based on the Restormer network, captures global structural information, while the detail encoder, using an invertible neural network (INN), extracts finer texture details. By combining DWCL, DCFN improves the alignment of visible and infrared image features, enhancing the fused image quality. Experimental results show that this method outperforms existing approaches, significantly improving both visual quality and fusion performance.
Lidar-scanned point cloud data often suffer from missing information, and most existing point cloud completion methods struggle to reconstruct local details because of the sparse and unordered nature of the data. To address this issue, this paper proposes an attention-enhanced multiscale dual-feature point cloud completion method. The multiscale dual-feature fusion module is designed by combining global and local features, to improve completion accuracy. To enhance feature extraction, an attention mechanism is introduced to boost the network's ability to capture and represent key feature points. During the point cloud generation phase, a pyramid-like decoder structure is used to progressively generate high-resolution point clouds, preserving geometric details and reducing distortion. Finally, a generative adversarial network framework, combined with an offset-position attention discriminator, further enhances the point cloud completion quality. The experimental results show that the complementary accuracy of this method on the PCN dataset improves by 11.61% compared to that of PF-Net, and the visualization results are better than those of other methods in comparisons, which verify the effectiveness of the proposed network.
In order to solve the problems of background noise interference, variable scale and low detection accuracy caused by small scale defects in insulator defect detection, an insulator defect detection algorithm based on guided attention and scale perception (GASPNet) is proposed. First, a guided attention module (GAM) is constructed on the backbone network to guide the attention of deep features by using shallow features that have a stronger ability to express small targets, and combining channel and space bidirectional attention to reduce the interference of background noise. Second, in the neck network, a feature enhanced fusion network (FEFN) is proposed to enhance the effective fusion of semantic information and local information by cross-fusing different levels of feature information. Finally, the EIoU loss function is used to define the penalty term by combining the vector angle and position information, which improves the regression accuracy of the detection box and achieves accurate detection of small scale targets. The experimental results show that the mean average precision (mAP@0.5) of GASPNet on the insulator defect detection dataset reaches 94.8%, and the detection speed is 95.3 frame/s, which is significantly better than other detection algorithms. At the same time, the embedded experiments verify that GASPNet still has efficient real-time detection performance under the condition of limited computing resources, which is suitable for practical application scenarios.
In recent years, the Transformer has demonstrated remarkable performance in image super-resolution tasks, attributed to its powerful ability to capture global features using a self-attention mechanism. However, this mechanism has high computational demands and is limited in its ability to capture local features. To address these challenges, this study proposes a lightweight image super-resolution reconstruction network based on dual-stream feature enhancement. This network incorporates a dual-stream feature enhancement module designed to enhance reconstruction performance through the effective capture and fusion of both global and local image information. In addition, a lightweight feature distillation module is introduced, which employs shift operations to expand the convolutional kernel's field of view, significantly reducing network parameters. The experimental results show that the proposed method outperforms traditional convolution-based reconstruction networks in terms of both subjective visual quality and objective metrics. Furthermore, compared to Transformer-based reconstruction networks such as SwinIR-Light and NGswin, the proposed method achieves an average improvement of 0.06 dB and 0.14 dB in PSNR, respectively.
The limited generalization and dataset scarcity of existing generative facial image detection methods present significant challenges. To address these issues, this study proposes a high-quality facial image detection model based on noise variation and the diffusion model. The proposed method employs an inversion algorithm using the denoising diffusion implicit model (DDIM) to generate inverted images with text-based guidance. By comparing the noise distribution differences between real and generated images after inversion, the method optimizes a residual network to identify image authenticity, and enhances both accuracy and generalization. Additionally, a dataset of 10000 high-quality, multi-category facial images is constructed to address the shortage of available facial data. Experimental results demonstrate that the proposed algorithm achieves 98.7% accuracy in detecting generated facial images and outperforms existing methods, enabling effective detection across diverse facial images.
To address the missed detection of small prohibited items in X-ray security inspection images due to low pixel ratios and ambiguous features, this study proposes a detection algorithm based on fine-grained feature enhancement. First, we design a learnable spatial reorganization module that replaces traditional downsampling operations with dynamic pixel allocation strategies to reduce fine-grained feature loss. Second, we construct a dynamic basis vector multi-scale attention module that adaptively adjusts the number of basis vectors according to feature entropy, enabling cross-dimensional feature interaction. Finally, we introduce a 160×160 high-resolution detection head that reduces the minimum detectable target size from 8 pixel×8 pixel to 4 pixel×4 pixel. Experimental results demonstrate that on the SIXray, OPIXray, and PIDray datasets, our algorithm achieves mean Average Precision (mAP) values of 93.3%, 91.2%, and 86.9%, respectively, showing improvements of 1.2%?3.1% over the YOLOv8 baseline model while only increasing the parameter count by 0.2×106.
To address the challenge of balancing model complexity and real-time performance in unmanned aerial vehicle (UAV)-based forest fire smoke detection, this paper proposes a lightweight and efficient multi-scale detection algorithm based on an improved YOLOv8n architecture, named LEM-YOLO (Lightweight Efficient Multi-Scale-You Only Look Once). First, a lightweight multi-scale feature extraction module C2f-IStar (C2f-Inception-style StarBlock) is designed to reduce model complexity while enhancing the representation capability for images of flames and smoke that exhibit drastic scale variations. Second, a multi-scale feature weighted fusion module (EMCFM) is introduced to mitigate the information loss and background interference of densely packed small targets during the feature fusion process. Third, a lightweight shared detail-enhanced convolutional detection head (LSDECD) is constructed using shared detail-enhanced convolutions to reduce computational load and improve the model's ability to capture image details. Finally, the complete intersection over union (CIoU) loss function is replaced by the powerful intersection over union (PIoU) loss function to improve the convergence efficiency in handling non-overlapping bounding boxes. Experimental results indicate that, compared with the baseline model, the improved model achieves increases of 1.9 percentage points and 2.5 percentage points in mean average precision at intersection over union of 0.5 and 0.5 to 0.95, respectively, while reducing model parameters by 31.6% and computational cost by 27.2%, and the processing speed reaches 57.82 frame/s. The improved model achieves an effective balance between lightweight design and detection performance.
Existing methods for few-shot fine-grained image classification often suffer from feature selection bias, making it difficult to balance local and global information, which hinders the accurate localization of key discriminative regions. To address this issue, a multiscale joint distribution feature fusion metric model is proposed in this paper. First, a multiscale residual network is employed to extract image features, which are then processed by a multiscale joint distribution module. This module computes the Brownian distance covariance between the different scales, thereby integrating both local and global information to enhance the representation of important regions. Finally, an adaptive fusion module with attention mechanism based on global average pooling and Softmax weight normalization is used to dynamically adjust feature contributions and maximize the impact of key region features on the classification results. Experimental results indicate that classification accuracies of 87.22% and 90.65% are achieved on the 5-way 1-shot task of the CUB-200-2011 and Stanford Cars datasets, respectively, demonstrating significant performance in few-shot fine-grained image classification tasks.
To address the issue of noise interference in ultraviolet (UV) images, the denoising method based on Facet filtering and local contrast is proposed to enhance image quality for corona detection. First, Facet filtering with a small kernel is applied to the UV image to enhance target pixels and suppress high-intensity noise. Subsequently, a three-layer sliding window traverses the UV image to calculate the local contrast at three levels, generating a saliency map. Finally, noise is removed through threshold segmentation. Experimental results show that proposed method significantly improves UV image quality by enhancing the distinction between salient regions and background, thereby facilitating corona detection.
Solidago canadensis L. is a priority invasive species under strict management in China. Effective detection, identification, and localization are essential for its control. To rapidly and accurately identify Solidago canadensis L. in complex natural environments, a lightweight detection model, YOLOv8-SGND, is proposed as an improvement over the YOLOv8 model to address issues of large parameter size and high computational complexity. Based on the YOLOv8 model's head, the new model designs a lightweight network structure and introduces a shared group normalization detection (SGND) head to enhance both localization and classification performance while significantly reducing the parameter count. First, batch normalization is replaced with group normalization in the convolutional blocks. Second, convolution parameters are shared between two convolutional blocks after feature aggregation to reduce the parameter volume and computational complexity. To improve the model's robustness and optimize the balance of errors in the bounding-box coordinates, the original complete intersection over union (CIoU) is replaced by wise IoU (WIoU) v3 as the bounding-box loss function. Finally, while using shared convolution, the Scale layer is applied to adjust the bounding-box predictions, thus ensuring consistency with the input image dimensions and feature-map sizes across different detection layers. Detection experiments on real-world data show that the proposed YOLOv8-SGND model achieves mAP@0.50 and mAP@0.50∶0.95 of 98.8% and 79.6% (mAP is mean average precision), respectively, which represent improvements of 2.8 and 6.0 percentage points over the original YOLOv8 model, respectively. Additionally, the model parameters and floating-point operations are reduced by approximately 21.4% and 1.6 Gbit, respectively. Compared with mainstream object-detection algorithms such as YOLOX, Faster R-CNN, Cascade R-CNN, TOOD, and RTMDet, YOLOv8-SGND outperforms them in all precision evaluation metrics. The proposed method offers high detection accuracy and inference speed, thus can provide technical support for the lightweight and intelligent recognition of invasive species.
To address the inefficiency of manual color difference classification, this study proposes a multi-strategy improved black-winged kite optimized extreme learning machine (MBKA-ELM) model for dyeing fabric color difference classification. First, as the random initialization of hidden layer weights and biases in extreme learning machine (ELM) algorithms can lead to uneven model training and algorithm instability, the black?‐?winged kite (BKA) optimization algorithm is employed to optimize these key parameters. Second, the incorporation of mirror reverse learning, BKA circumnavigation foraging, and longitudinal and transverse crossover strategies enhances both the convergence speed and global optimization ability of the algorithm. Finally, the MBKA-ELM model is constructed for dyeing fabric color difference classification, achieving an accuracy rate of 98.8% and confirming the feasibility of using this model compared to conventional color difference calculation formulas for detection. Comparative experiments demonstrate the stabilization of the MBKA-ELM model after 10 iterations with a higher classification accuracy than comparable models. Compared with the traditional ELM and optimized models—black-winged kite optimized ELM, spotted emerald optimized ELM, Guanhao pig optimized ELM, cougar optimized ELM, and snake optimized ELM—the classification accuracy improves by 13%, 3.4%, 1.4%, 5%, 4.2%, and 3%, respectively. The proposed model demonstrates superior convergence speed and classification accuracy.
During the photovoltaic inspection of drones, it is necessary to obtain the position information of the photovoltaic panels in the image to determine the fault location of photovoltaic modules. The accuracy of the position and orientation system (POS) data transmitted by small drones is low because of their technical limitations, making it difficult to use them for photovoltaic panel positioning in small drone images. This paper proposes an automatic positioning method for photovoltaic panels in drone images to address this issue. Based on the drone images and the geographical coordinates of the photovoltaic panels within the inspection area, the actual location of the photovoltaic panels within the image is determined, providing accurate location information for troubleshooting. First, the proposed method establishes a coordinate transformation model and its unknown parameters based on the projection relationship between the object and image. It determines the parameter range based on the shooting conditions, selects multiple sets of parameters to determine the geographical coordinates of the photovoltaic panels within the image, and compares these with the mean square error of the back-projection difference of the photovoltaic panels within the area as a loss value. Next, it optimizes the parameters by comparing multiple sets of loss values and verifies them using overlapping images. Finally, based on the optimal parameters, coordinate conversion is performed to achieve automatic positioning of the photovoltaic panel image. In two experiments using more than 14,000 fault boards, the positioning accuracy of the proposed method reaches 97.78% and 98.54%, which increases by 13.06 percentage points and 12.96 percentage points, respectively, compared to the positioning method based on POS data. It also overcomes the influence of changes in the shooting conditions, thereby verifying the accuracy and robustness of the proposed method.
To address the issues of inferior image quality, uneven lighting, and blurred details in low-light environments that result in low detection accuracy, this study proposes a night-time detection model named LowLight-YOLOv8n, which is an improved version of YOLOv8n. First, a low-light image enhancement network named Retinexformer is introduced before convolutional feature extraction in the Backbone network, thus improving the visibility and contrast of low-light images. Second, conventional convolution operations are replaced with RFCAConv in both Backbone and Neck networks, where convolution kernel weights are adjusted adaptively to solve the issue of shared parameters in conventional convolutions, thus further enhancing the model's feature extraction and downsampling capabilities. Subsequently, a new C2f_UniRepLKNetBlock structure is formed by combining the large convolution kernel architecture of UniRepLKNet with the C2f module of the Neck network, thereby achieving a larger receptive field that encompasses more areas of the image with fewer convolution operations, thus allowing a broader range of contextual information to be aggregated, and more potential target information to be captured in low-light images. Finally, a new bounding-box regression loss function named Focaler-CIoU is adopted, which focuses on the detection of difficult samples. Experimental results on the ExDark dataset show that, compared with the baseline model YOLOv8n, LowLight-YOLOv8n improves the mAP@0.5 and mAP@0.5∶0.95 metrics by 6.8% and 5.8%, respectively, and reduces the number of parameters by 0.09×106.
The accurate identification and localization of tilted droplets is a key preprocessing link for achieving a high-precision measurement of the dynamic contact angle. For the problems of low detection accuracy of traditional algorithms and excessive hardware occupancy of deep learning-based target detection algorithms, this paper proposes Light-YOLOv8OBB, a lightweight tilted droplet detection and localization model based on the improved YOLOv8 algorithm. First, this paper designs a C2f-light convolutional structure to lighten and improve the backbone network. Second, the Slim-Neck design paradigm is introduced into the neck network to further lighten the network model. The convolutional attention mechanism module is added to strengthen the model's ability to detect small target objects. Experiments on a homemade droplet dataset and analysis of the results reveal that our algorithm can balance model performance and detection efficiency well. The mean average precision (mAP@0.5:0.95) value of proposed algorithm reaches 76.7%, an improvement of 7.5 percentage points compared with the base model, whereas the number of parameters and computation decrease by 38.7% and 34.9%, respectively, compared with the base model, and the inference time is only 16.1 ms on NVIDIA GeForce MX250.
To address the problems of detection accuracy degradation in small traffic sign targets and excessive model complexity in complex scenes, this study proposes a lightweight traffic sign detection method using an improved YOLOv8n. An omni-dimensional dynamic convolution and efficient multi-scale attention (EMA) mechanism are introduced into the backbone network of YOLOv8n to accurately acquire sign features and context information. A small target detection layer of 160 pixel×160 pixel is added to effectively combine features with different scales, preserve more detailed information, and improve the precision of small target detection. GhostBottleneckv2 is introduced for lightweight processing, and the GSConv module is designed to reduce model complexity and accelerate convergence speed. The WIoU v3 loss function is used to enhance the ability of the model to locate the targets. The experimental results show that the proposed algorithm improves the mean average precision by 7.6 percentage points and 2.4 percentage points and decreases the parameter number by 7.6% and 7.9% on the TT100K and CCTSDB2021 datasets, respectively. Hence, the proposed algorithm not only maintains the lightness characteristics of the YOLOv8n model, but also exhibits better detection performance.
In nighttime-driving scenes, the image quality deteriorates significantly owing to insufficient light and haze, which poses severe challenges to the driver and automatic drive system. Hence, a novel image-dehazing algorithm for nighttime-driving scenes is proposed. Instead of relying on the classical a priori theory, the algorithm considers the nighttime haze image as a superposition of haze and background layers from the reconstruction perspective, and a lightweight super-resolution reconstruction dehazing network is proposed without using a physical imaging model. By introducing a haze feature-extraction network based on dilated convolution and an attention-mechanism module that uses the haze feature graph as supervisory information, the dehazing network efficiently removes the haze layer while preserving the image details and generating clear and high-contrast images. Comparison experiments with five state-of-the-art dehazing methods are conducted on two nighttime fog map datasets. The experiments show that the super-resolution reconstruction dehazing network performs better than all other nighttime dehazing models. The results of ablation experiments show that the attention module based on the supervision of haze features significantly improves the dehazing capability of the network. This study provides new ideas and methods for solving the image-quality problem in nighttime-driving scenarios, thus facilitating improvements to the driving safety and reliability of automatic driving systems.
To recover more valuable image details such as texture and brightness in low-contrast images, a new color image enhancement model is proposed based on the saturation-value total variation histogram equalization (SV-TV-HE) algorithm and combined with saturation-value total generalized variation (SV-TGV) and adaptive brightness function. A numerical algorithm is further designed for the proposed model using the alternating direction multipliers method to improve local brightness and contrast and maintain local details. Image enhancement experiments have been carried out for synthetic and real images. Experimental results show that the proposed model can effectively avoid the staircase effect in the process of image restoration, thereby achieving better image enhancement effect than existing methods in restoring image contrast, brightness, and other details.
This study develops a model called RA-CRPN for detecting small targets in road vehicle vision. The method addresses the challenges of low detection precision and reliability for small objects detection in road scenes. These objects occupy a small number of pixels and their feature representation information is often insufficient. First, based on the Faster R-CNN framework, the RO-ResNet is integrated into the ResNet50 backbone network, which enabled the output feature blocks to capture contextual information. Second, the RA-ResNet module is added after the backbone network to provide new feature information for each ResNet stage by fusing context information with object features. Then, the improved coarse-to-fine RPN (CRPN) module is utilized to enhance feature alignment and proposal box correction during the two-stage transition, providing high-quality feature information for the region proposal network (RPN) stage. Finally, the SODA-D public small object dataset is employed to validate and analyze the model by comparing it with other methods. The overall average precision (AP), average precision of extremely small (APes) and average precision of relative small (APrs) of the proposed method are 3.9, 2.4, and 3.4 percentage points , respectively, better than the Faster R-CNN baseline model, indicating improved overall detection precision. Additionally, road object detection tests are conducted using a custom vehicle driving dataset. The results show that the mAP@50 (mean average precision at 50% intersection over union) of the model is 5.8 percentage points higher than that of the Faster R-CNN baseline model, further verifying the precision and robustness of the proposed model.
With the widespread application of deep learning techniques in medical image processing, precise thyroid segmentation is becoming increasingly important for disease diagnosis and treatment. This study proposes a SwinTransCAD model that integrates the Swin Transformer and a multi-scale attention decoding mechanism, effectively capturing the details of the thyroid to achieve precise segmentation. The study first outlines the clinical need for thyroid disease diagnosis and the limitations of traditional segmentation methods. Then the technical features of Swin Transformer and its potential applications in medical image processing are analyzed. Finally, it provides a detailed introduction to the structure of the SwinTransCAD model and the multi-scale attention decoding mechanism. Through comparative experiments, the generalizability of the model across different datasets and its advantages in various evaluation metrics are validated. Experimental results show that the proposed method outperforms existing technologies, providing technical support for the pre-diagnosis and auxiliary treatment of thyroid diseases.
To solve the problems of insufficient feature extraction, loss of detail texture information, and large number of model parameters in the current infrared and visible image fusion algorithms, a cross-scale pooling embedding image fusion algorithm with long- and short-distance attention collaboration is proposed. First, the depth separable convolution is used to design channel attention to enhance the expression of key channels and suppress redundant information. Second, based on group shuffle (GS) convolution, a multi-scale dense channel enhancement module is proposed, which enhances the multi-scale information interaction ability and reuses features by superimposing small convolution kernels and introducing dense connections to prevent information loss. Then, on the basis of the cross-scale embedding layer, a cross-scale pooling fusion embedding layer is proposed, and the features of some four stages are extracted using the fusion features of the pooling layer at different scales, so as to make full use of the features of each stage and reduce the computational complexity. Finally, the dual-path design is used to fuse long- and short-distance attention and design a convolutional feedforward network, so as to capture the dependence of long- and short-distance between features and reduce the amount of network parameters. Experimental results on the TNO and Roadscene public datasets of proposed algorithm and other seven algorithms show that, the outline of the fusion results by proposed algorithm is clear, and the entropy, average gradient, and structural content difference of proposed algorithm are improved compared with other algorithms, and the standard deviation of proposed algorithm is better on the Roadscence dataset. In addition, the detection performance comparison experiment of fused images on the M3FD dataset is carried out, and the experimental results show that the proposed algorithm performs well.
The confocal fundus camera is a high-frame-rate scanning imaging system used for capturing detailed images of the fundus. However, during image acquisition, its fast-scanning mirror causes misalignment between odd and even scan lines in the reconstructed images. This misalignment happens due to a delay between the mirror's motion signals and the scanning rate signals. To resolve this issue, the study analyzed the relationship between the scanning time and the sampling time of the actual image height scanned by the galvanometer. By using an image similarity function based on normalized squared error, along with the Levenberg-Marquardt nonlinear least squares fitting algorithm, an effective correction of the odd-even row misalignment in the reconstructed image was developed. Additionally, this study introduces a new method for quantitatively assessing the degree of odd-even line misalignment in images. Experimental results demonstrate that the proposed correction algorithm can effectively prevent calibration failures caused by inaccurate feature extraction in traditional methods. This new correction method is versatile, working not only for confocal fundus cameras but also for other high-speed imaging systems. Moreover, the proposed evaluation method can sensitively reflect the degree of odd-even line misalignment in images. Images processed with the algorithm showed a 78.7234% improvement in horizontal resolution and a noticeable enhancement in overall clarity compared to uncorrected images.
Aiming to solve problems of low-accuracy and slow grasp detection in unstructured environments, a grasp detection algorithm alter-attention pyramid network (APNet) is proposed. Generative residual convolutional neural network (GR-ConvNet) was selected as the backbone network, adaptive kernel convolution was used to replace standard convolution, and the SiLU activation function was replaced with the Hardswish activation function. A lightweight feature extraction network was developed, and efficient multiscale attention was introduced to increase focus on important grasping regions. Pyramid convolution was integrated into the residual network to effectively fuse multiscale features. The experimental results demonstrate that APNet achieves 99.3% and 95.8% detection accuracies on the Cornell and Jacquard datasets, with an average time required for single-object detection of 9 ms and 10 ms, respectively. Compared with existing algorithms, APNet demonstrated improved detection performance. In particular, APNet demonstrates an average success rate of 92% on a homemade multi-target dataset for a grasping experiment implemented in a CoppeliaSim simulation environment.
Aiming at the problems of edge blurring and loss of edge details that occur in the convolutional neural network image dehazing method when dealing with the image edge texture, in this study, a single image dehazing network design method, guided by multilevel edge a priori information, is proposed. The design method integrates an edge feature extraction block, an edge feature fusion block, and a dehazing feature extraction block, which performs rich edge feature extraction on foggy images and reconstructs the edge image. Furthermore, the edge feature fusion block efficiently fuses the edge a priori information with the context information of the foggy image at multiple levels. Then, the dehazing feature extraction block performs multiscale deep feature extraction on the image and adds attention mechanism to the important channels. A large number of experiments are conducted on the RESIDE dataset and compared with the mainstream dehazing methods, in which the peak signal-to-noise ratio and structural similarity index measurement of the indoor dataset reach 37.58 and 0.991, respectively. Additionally, the number of parameters and amount of computation are only 2.024×106 and 24.84×109, which shows that the method in this study effectively defogs the image while reducing the number of parameters and amount of computation. Moreover, the method exhibits good performance and edge detail preservation ability.
To address the challenges of insufficient feature information in small targets within unmanned aerial vehicle (UAV) aerial images and significant variation in target scales, a small target detection algorithm based on an improved YOLOv8n is proposed. The improvements focus on the following aspects: i) design of a multi-scale feature fusion layer in the detection layer tailored for feature extraction and processing of small targets to improve the ability to collect detailed information of small targets; ii) concerning the neck network, a triple-feature fusion module and a scale sequence feature fusion module are introduced to effectively fuse the detailed information from low-level feature maps with the semantic information from high-level feature maps. This enhances detection capabilities across targets of varying scales. In the backbone network, ConvNeXt v2 is used to replace the backbone network, thereby enhancing the localization and feature extraction capabilities of small targets against complex backgrounds. To optimize the deployment of the model in embedded systems, a layer adaptive amplitude pruning algorithm is adopted to balance the computational complexity and detection accuracy of the model. The algorithm performance is tested using the VisDrone2019 dataset and compared with several mainstream models. The experimental results indicate that the improved algorithm attains a detection accuracy (mAP50) of 30.2%, which is 3.8 percent point higher than that of the benchmark model, YOLOv8n, while reducing the parameter count by 1.63×106.
To address the issue of insufficient extraction and fusion of complementary information in infrared and visible image fusion, this study proposes a pulse-coupled dual adversarial learning network. The network utilizes dual discriminators that target infrared objects and visible texture details in the fused images, with the goal of preserving and enhancing modality-specific features. We also introduce a pulse-coupled neural network featuring a combined learning mechanism to effectively extract salient features and detailed information from the images. During the fusion stage, we implement a cross-modality fusion module guided by cross-attention, which further optimizes the complementary information between modalities and minimizes redundant features. We conducted comparative qualitative and quantitative analyses against nine representative fusion methods in the TNO, M3FD, and RoadScene datasets. Results show that the proposed method demonstrates superior performance in evaluation metrics, such as mutual information and sum of correlation differences. The method produces fused images with high contrast and rich detail and achieves better results in target detection tasks.
The automatic segmentation of dental images plays a crucial role in the auxiliary diagnosis of oral diseases. To address the issues of large parameter sizes in existing segmentation models and low segmentation accuracy of medical dental images, a lightweight dental image segmentation model, namely, the quadrant oblique displacement (QOD) UNeXt is proposed. First, QOD blocks are designed to displace features along four oblique directions, that is, the upper-left, upper-right, lower-left, and lower-right, to diffuse features and dynamically aggregate tokens, which thereby enhances segmentation accuracy. Second, a localized feature integration (LFI) module is incorporated into the decoder to improve the ability of the model to integrate detailed and global information. Finally, an efficient channel attention (ECA) module is introduced at the skip connections to further fuse local and global features. Experimental results on the STS-MICCAI 2023 and Tufts public datasets demonstrate that QOD-UNeXt significantly improves segmentation accuracy while maintaining a lightweight structure. Therefore, QOD-UNeXt exhibits excellent performance in dental medical image segmentation tasks.
Single-image rain removal aims to remove rain streaks from rainy images and restore rain-free images, to provide support for subsequent tasks such as detection and tracking. However, current rain removal methods have problems, such as image blur, detail loss, and color blur after rain removal. To address the limitations of existing methods, a two-stage rain removal and residual diffusion detail recovery network is proposed. In the first stage, to strengthen local rain streak feature learning and improve global information utilization ability, a dual-attention module (DAB) and a dual-attention U-shaped network (DAU-Net) are proposed by combining channel attention and multi-head self-attention mechanisms, so that the model can dynamically identify various rain streaks and remove them. In the second stage, the characteristics of the diffusion model that first constructed the overall semantic information and captured detailed information are used to leverage the powerful generation ability. The rain removal results of the first stage are used as conditions to guide the diffusion model to reverse sample and generate residual information, to address the problem of detail loss and image blur. Experimental results show that the proposed method performs well on both synthetic and real datasets. On the Rain100H and Rain100L test sets, peak signal-to-noise ratio (PSNR) of 31.75 dB and 39.12 dB and structural similarity (SSIM) of 0.912 and 0.981 are obtained, respectively. The two-stage rain removal network can effectively various complex rain streaks in various scenes and recover more of the image details to achieve better visual effects.
Stone cultural heritage is a precious carrier of human history and culture. However, weathering problems severely threaten the long-term preservation of stone cultural heritage sites. Determining the main weathering environment is crucial for developing the corresponding protection measures. However, determining the weathering environment in a non-destructive and contract-free manner is challenging. Therefore, this study proposes a weathering-environment evaluation method based on hyperspectral imaging technology. First, the visible-near-infrared (VNIR) and short-wave infrared (SWIR) spectral data of weathered sandstones in different environments were obtained using hyperspectral imaging technology. The spectra were preprocessed using standard normal variation (SNV) and multiple scattering calibration (MSC), the spectral features were downscaled via principal component analysis, and multiple machine-learning and deep-learning algorithms were used to establish classification models for weathering environments of stone cultural heritage. The results show that the classification model established based on SNV preprocessed spectral data has a higher overall accuracy rate. The deep-learning model outperformed the conventional machine-learning model, with a maximum accuracy rate of 0.98 attained. The proposed method enables a rapid, contact-free assessment of the weathering environments of stone cultural heritage sites and provide important support for targeted cultural-heritage protection.
The conventional pattern-recognition star-identification algorithm requires parameter setting in advance and is slow under high limit magnitudes. A star-identification algorithm based on the Voronoi graph is proposed. The algorithm extracts stars in the star map and normalizes them to a spherical point set. Next, it calculates the Voronoi graph and the corresponding star polygon features, including the perimeter, area and number of edges to be combined into the star-recognition feature. Subsequently, the features are matched against the navigation catalog and pointing is calculated based on matching star pairs. Simulation results show that the algorithm is feasible and can yield the match ratio under different conditions. The operating time of the algorithm is less than 100 ms in the optimal case, and the effects of position noise, pseudo stars, and missing stars on the matching rate of the algorithm were tested and verified. The recognition rate of the proposed algorithm under different fields of view and limit magnitude was obtained experimentally, and the optimal combinations were obtained. The recognition rate of the algorithm does not decline under a 1‰ position error. A comparison with the star-identification algorithm using radial and cyclic features shows that the proposed algorithm offers a higher recognition rate, a shorter recognition time, and better anti-position noise performance than the conventional pattern-recognition star-map recognition algorithm. Furthermore, the proposed algorithm requires neither parameter setting nor adjustment.
Bronze inscriptions are invaluable for studying ancient politics, economy, and culture. However, minimal stylistic variations and the predominance of unlabeled data in unearthed inscriptions pose challenges for computer-aided inscription analysis. To address this issue, a bronze inscription age clustering network based on a deep unsupervised clustering model is proposed. In the first stage, a ResNet50-based feature extraction module is constructed, incorporating an improved multiscale CBAM attention mechanism. This enhancement allows the network to simultaneously capture detailed and global features, thereby overcoming the limitations of traditional feature extraction methods that struggle with incomplete feature representation for inscriptions of similar ages. In the second stage, K-means clustering is applied to the extracted features. The clustering branch results serve as pseudo-labels, which are then used to compute the cross-entropy loss against the predictions of the model's prediction branch. In the third stage, iterative training is performed using cross-entropy loss backpropagation to continuously optimize the model parameters, enhancing the accuracy of feature extraction and clustering. The experimental results demonstrate that the proposed network achieves an overall accuracy of 89.43% on the standard inscription dataset, surpassing traditional unsupervised clustering networks by more than 14%.
There are problems such as uneven point cloud density distribution and limited separability of single-point reflectance arise in the task of semantic segmentation of outcrop point clouds. To achieve efficient and accurate lithology segmentation of outcrop point clouds, this research proposes a lithology segmentation method with efficient channel attention mechanism (ECA) based on the multiple eigenvalues of outcrop voxel (MGECA). First, this method voxelizes the raw point cloud and computes the spatial-spectral feature parameters of each voxel. Then, a multi-granularity convolutional neural network is used for multi-scale feature fusion. Next, the classical self-attention mechanism in the Transformer model is improved using an ECA, allowing the weighted encoding of feature maps so the model can establish global spatial and spectral correlations. Finally, designs a dual-channel group convolution to connect the convolutional neural network and ECA, and achieve spatial and spectral feature integration, while reduce computational complexity. Experimental results show that MGECA achieved a lithology recognition total accuracy of 90.6% and a mean intersection over union of 70.4% on the Crescent Bay laser outcrop point cloud dataset, representing improvements of 31.7 percentage points and 24.7 percentage points, respectively, compared to DGPoint model. Results indicate that the proposed method has a significant advantage in segmentation performance within outcrop point cloud scenarios compared to existing methods.
Infrared detection serves as a crucial tool for remote search and surveillance, and it plays a significant role in many applications. To enhance the infrared detection accuracy of small targets in complex backgrounds, a Trans-YOLO detection framework based on an improved YOLOv8 model with RT-DETR is proposed. First, to avoid the issue of non-maximum suppression (NMS) in YOLOv8 erroneously suppressing true targets, the Head component of YOLOv8 is replaced with the Decoder & Head from RT-DETR. Furthermore, to address the challenges of weak signal strength and small size of infrared small targets, an RGCSPELAN module is designed to enable the detection network to perform more fine-grained processing of the input features. Finally, to reduce the semantic disparity between deep and shallow features, a new feature fusion strategy, called CAFM-based fusion (CAFMFusion) mechanism, is designed to facilitate the flow of different types of feature information within the network, thereby enhancing the model's ability to detect targets of varying sizes. Experimental results show that the proposed Trans-YOLO model achieves 86. 1% and 99. 5% mean average precision at IoU=0.5 (intersection over union) on two public datasets with complex scenarios, representing improvements of 7.7 percentage points and 3.0 percentage points over the original YOLOv8 model, respectively. Additionally, the model achieves the processing speed of 371.9 frame/s and 369.4 frame/s on the two datasets, respectively, effectively balancing accuracy and speed.
Deep learning-based object detection algorithms have matured considerably. However, detecting novel classes based on a limited number of samples remains challenging as deep learning can easily lead to feature space degradation under few-shot conditions. Most of the existing methods employ a holistic fine-tuning paradigm to pretrain on base classes with abundant samples and subsequently construct feature spaces for the novel classes. However, the novel class implicitly constructs a feature space based on multiple base classes, and its structure is relatively dispersed, thereby leading to poor separability between the base class and the novel class. This study proposes the method of associating a novel class with a similar base class and then discriminating each class for few-shot object detection. By introducing dynamic region of interest headers, the model improves the utilization of training samples and explicitly constructs a feature space for new classes based on the semantic similarity between the two. Furthermore, by decoupling the classification branches of the base and new classes, integrating channel attention modules, and implementing boundary loss functions, we substantially improve the separability between the classes. Experimental results on the standard PASCAL VOC dataset reveal that our method surpasses the nAP50 mean scores of TFA, MPSR, and DiGeo by 10.2, 5.4, and 7.8, respectively.
In order to solve the problems of high missing detection rate of single-model images and low detection speed of existing dual-model image fusion in pedestrian detection tasks under low visibility scenes, a lightweight pedestrian detection network based on dual-model relevant image fusion is proposed. The network model is designed based on YOLOv7-Tiny, and the backbone network is embedded with RAMFusion, which is used to extract and aggregate dual-model image complementary features. The 1×1 convolution of feature extraction is replaced by coordinate convolution with spatial awareness. Soft-NMS is introduced to improve the pedestrian omission in the cluster. The attention mechanism module is embedded to improve the accuracy of model detection. The ablation experiments in public infrared and visible pedestrian dataset LLVIP show that compared with other fusion methods, the missing detection rate of pedestrians is reduced and the detection speed of the proposed method is significantly increased. Compared with YOLOv7-Tiny, the detection accuracy of the improved model is increased by 2.4%, and the detection frames per second is up to 124 frame/s, which can meet the requirements of real-time pedestrian detection in low-visibility scenes.
Building outlines serve as data sources for various applications. However, accurately extracting outlines from scattered and irregular point clouds presents a challenge. To address this issue, a method utilizing the concept of the multi-level minimum bounding rectangle (MBR) is proposed for extracting precise outlines of regular buildings. Initially, the boundary points are segmented into groups using an iterative region growing technique. Subsequently, the group with the maximum boundary points is utilized to identify the initial MBR. The initial MBR is then decomposed into multi-level rectangles, ensuring that the boundary points align with rectangles of different levels. Ultimately, the outlines are generated using the multi-level MBR approach. To evaluate the effectiveness of the proposed method, experiments were conducted on regular buildings in Vaihingen. The results demonstrate that the proposed method achieves an accurate initial MBR with a slightly enhanced efficiency compared to the minimum area and the maximum overlapping methods. The root mean square errors of the extracted outline corners measure 0.71 m, surpassing the performance of four other comparison methods. In conclusion, the proposed method enables the effective extraction of outlines from regular buildings, providing a valuable contribution to subsequent three-dimensional reconstruction tasks.
Traditional non-real-time image stitching methods can easily lead to global stitching interruption due to local image misalignment. In addition, microscopic images have numerous similar microstructures, causing problems such as long feature detection time and high misalignment rate. To address these issues, a microscopic image prediction stitching algorithm based on carrier stage motion information is proposed. First, the size of the overlapping area between adjacent images is determined by controlling the XY axis movement distance of the electric carrier stage. The accelerated robust feature algorithm is then used to detect feature points in the overlapping area of the image. Second, the range of feature points to be matched is predicted based on the position relationship of the images, and the feature point with the minimum Euclidean distance is selected within the predicted range for matching. Finally, matching point pairs are coarsely screened by the slope of the matching feature points, and precise matching is performed using the random sample consensus algorithm to calculate the homography matrix and complete the image stitching. The improved weighted average algorithm is used to fuse the stitched images. Experimental results show that the proposed algorithm achieves a superior matching rate improvement of 7.95% to 26.52% compared to those obtained via the brute force and fast library for approximate nearest neighbors algorithms, effectively improving the registration accuracy. Moreover, at a resolution of 1600×1200, the multi-image stitching rate of 2 frame·s-1 achieves better results than those obtained by the AutoStitch software.
To address challenges involving low accuracy in feature point matching, low matching speed, cracks at the stitching points, and extended stitching time in vehicle undercarriage threat detection imaging, an optimized image-stitching algorithm is proposed. First, the corner detection (FAST) algorithm is used to extract image feature points, and then, the binary robust invariant scalable key point (BRISK) algorithm is used to describe the retained feature points. Second, the fast nearest neighbor search (FLANN) algorithm is used for coarse matching. Next, the progressive uniform sampling (PROSAC) algorithm is used for feature point purification. Finally, the Laplace pyramid algorithm is used for image fusion and stitching. The experimental results show that, when compared with SIFT, SURF, and ORB algorithms, the proposed algorithm improves the image feature matching accuracy by 13.10 percentage points, 8.59 percentage points, and 11.27 percentage points, respectively, in the image data of dangerous objects under the vehicle. The matching time is shortened by 76.26%, 85.36%, and 10.27%, respectively. The image-stitching time is shortened by 63.73%, 64.21%, and 20.07%, respectively, and there are no evident cracks at the stitching point. Therefore, the image-stitching algorithm based on the combination of FAST, BRISK, PROSAC, and Laplace pyramid is a high-quality fast image-stitching algorithm.
Traditional multi-scale fusion methods cannot highlight target information and often miss details and textures in fusion images. Therefore, an infrared and visible light image fusion method based on gradient domain-guided filtering and saliency detection is proposed. This method utilizes gradient domain-guided filtering to decompose the input image into basic and detail layers and uses a weighted global contrast method to decompose the basic layer into feature and difference layers. In the fusion process, phase consistency combined with weighted local energy, local entropy combined with weighted least squares optimization, and average rules are used to fuse feature layers, difference layers, and detail layers. The experimental results show that the multiple indicators of the proposed fusion method are significantly improved compared to those of other methods, resulting in a superior visual effect of the image. The proposed method is highly effective in highlighting target information, preserving contour details, and improving contrast and clarity.
Currently, most point cloud semantic segmentation methods based on graph convolution overlook the critical aspect of edge construction, resulting in an incomplete representation of the features of local regions. To address this limitation, we propose a novel graph convolutional network AE-GCN that integrates edge enhancement with an attention mechanism. First, we incorporate neighboring point features into the edges rather than solely considering feature differences between the central point and its neighboring points. Second, introducing an attention mechanism ensures a more comprehensive utilization of local information within the point cloud. Finally, we employ a U-Shape segmentation structure to improve the network's semantic point cloud segmentation adaptability. Our experiments on two public datasets, Toronto_3D and S3DIS, demonstrate that AE-GCN outperforms most current methods. Specifically, on the Toronto_3D dataset, AE-GCN achieves a competitive average intersection-to-union ratio of 80.3% and an overall accuracy of 97.1%. Furthermore, on the S3DIS dataset, the model attains an average intersection-to-union ratio of 68.0% and an overall accuracy of 87.2%.
The multi-energy computed tomography (CT) technique can resolve the absorption rates of various energy X-ray photons in human tissues, representing a significant advancement in medical imaging. By addressing the challenge of swift degradation in reconstructed image quality, primarily due to non-ideal effects such as quantum noise, a dual-stream Transformer network structure is introduced. This structure utilises the shifted-window multi-head self-attention denoising approach for projection data. The shifted windows Transformer extracts the global features of the projection data, while the locally-enhanced window Transformer focuses on local features. This dual approach capitalizes on the non-local self-similarity of the projection data to maintain its inherent structure, subsequently merged by residual convolution. For model training oversight, a hybrid loss function incorporating non-local total variation is employed, which enhances the network model's sensitivity to the inner details of the projected data. Experimental results demonstrate that our method's processed projection data achieve a peak signal to noise ratio (PSNR) of 37.7301 dB, structure similarity index measurement (SSIM) of 0.9944, and feature similarity index measurement (FSIM) of 0.9961. Relative to leading denoising techniques, the proposed method excels in noise reduction while preserving more inner features, crucial for subsequent accurate diagnostics.
A depth image super-resolution reconstruction network (DF-Net) based on dual feature fusion guidance is proposed to address the issues of texture transfer and depth loss in color image guided deep image super-resolution reconstruction algorithms. To fully utilize the correlation between depth and intensity features, a dual channel fusion module (DCM) and a dual feature guided reconstruction module (DGM) are used to perform deep recovery and reconstruction in the network model. The multi-scale features of depth and intensity information are extracted using a input pyramid structure: DCM performs feature fusion and enhancement between channels based on a channel attention mechanism for depth and intensity features; DGM provides dual feature guidance for reconstruction by adaptively selecting and fusing depth and intensity features, increasing the guidance effect of depth features, and overcoming the issues of texture transfer and depth loss. The experimental results show that the peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) of the proposed method are superior to those of methods such as RMRF, JBU, and Depth Net. Compared to the other methods, the PSNR value of the 4× super-resolution reconstruction results increased by an average of 6.79 dB, and the RMSE decreased by an average of 0.94, thus achieving good depth image super-resolution reconstruction results.
Single target tracking algorithm based on Siamese architecture suffers from untimely target state update. To address this issue, a generic template update mechanism is proposed based on the dynamic fusion of templates and memory information. The mechanism uses a dual module fusion update strategy. The short-term memory information of search feature map is fused using a memory fusion module to capture target variations. The trusted tracking result of the previous frame is used as a dynamic template. The original and dynamic templates are fused using a weight fusion module from the correlated feature perspective to achieve more accurate target localization using the original and short-term memories during the tracking process. The template update mechanism is applied to three mainstream algorithms, SiamRPN, SiamRPN++ and RBO, and experiments are conducted on the VOT2019 public dataset. The results show that the performance of the algorithms is effectively improved after applying the mechanism. Specially, for the SiamRPN++ algorithm, the average overlap expectation is improved by 6.67%, the accuracy is improved by 0.17%, and the robustness is enhanced by 5.39% after applying the template update mechanism. In addition, the SiamRPN++ algorithm with the mechanism has better tracking performance in complex scenarios with occlusion, deformation and background interference.
Image segmentation is an important research direction in computer vision. Fuzzy clustering methods have been widely applied in image segmentation due to their unsupervised nature. However, traditional fuzzy clustering methods often fail to segment images with high-intensity noise and complex shapes. To solve this problem, a weighted factor is proposed based on saliency detection to construct a weighted filter and a pixel correlation model, which improves the noise resistance of the algorithm. The proposed weighted filter outperforms the optimal results of the traditional filter in terms of structural similarity by 0.1. Moreover, a kernel metric is introduced to accommodate the segmentation needs of complex images. Extensive experimental results on synthetic, natural, remote sensing and medical images demonstrate that the proposed algorithm outperforms the traditional methods in visual effects and improves the segmentation accuracy by 2% compared with the optimal results of traditional methods.
Aiming at the characteristics of small target image segmentation of pointer meter and the limitations of existing methods, a bilateral deep learning backbone network called BiUnet is proposed for pointer meter image segmentation, which combines spatial details and semantic features. Starting from BiSeNet V2 algorithm, the semantic branch, detail branch and bilateral fusion layer are redesigned in this network. First, the ConvNeXt convolution block is used to adjust and optimize the detail branch to improve the feature extraction ability of the algorithm for pointer and scale line boundary details. Second, the semantic branch is redesigned based on the advantages of the U-shape structure of encoder and decoder to integrate different scales of semantic information, which improves the special segmentation ability of the semantic branch for small objects such as pointer and scale. Finally, a bilateral-guide splicing aggregation layer is proposed to fuse the detail branch and the semantic branch features. The ablation experiments on the self-made instrument image segmentation dataset confirm the validity and feasibility of the proposed network design scheme. Comparative experiments with different backbone networks are carried out on the instrument dataset, the experimental results show that the mIoU (mean intersection of union) of BiUnet's instrument segmentation accuracy reaches 88.66%, which is 8.64 percentage points higher than the BiSeNet V2 network (80.02%). Both of them have better segmentation accuracy than common backbone networks based on Transformer and pure convolution.
We propose a multi-stage underwater image enhancement model that can simultaneously fuse spatial details and contextual information. The model is structured in three stages: the first two stages utilize encoder-decoder configurations, and the third entails a parallel attention subnet. This design enables the model to concurrently learn spatial nuances and contextual data. A supervised attention module is incorporated for enhanced feature learning. Furthermore, a cross-stage feature fusion mechanism is designed is used to consolidate the intermediate features from preceding and succeeding subnets. Comparative tests with other underwater enhancement models demonstrate that the proposed model outperforms most extant algorithms in subjective visual quality and objective evaluation metrics. Specifically, on the Test-1 dataset, the proposed model realizes a peak signal-to-noise ratio of 26.2962 dB and structural similarity index of 0.8267.
Pulmonary nodule computed tomography (CT) images have diverse details and interclass similarity. To address this problem, a dual-path cross-fusion network combining the advantages of convolutional neural network (CNN) and Transformer is constructed to classify pulmonary nodules more accurately. First, based on windows multi-head self-attention and shifted windows multi-head self-attention, a global feature block is constructed to capture the morphological features of nodules; then, a local feature block is constructed based on large kernel attention, which is used to extract internal features such as the texture and density of nodules. A feature fusion block is designed to fuse local and global features of the previous stage so that each path can collect more comprehensive discriminative information. Subsequently, Kullback-Leibler (KL) divergence is introduced to increase the distribution difference between features of different scales and optimize network performance. Finally, a decision-level fusion method is used to obtain the classification results. Experiments are conducted on the LIDC-IDRI dataset, and the network achieves a classification accuracy, recall, precision, specificity, and area under curve (AUC) of 94.16%, 93.93%, 93.03%, 92.54%, and 97.02%, respectively. Experimental results show that this method can classify benign and malignant pulmonary nodules effectively.
A self-adaptive underwater image enhancement algorithm is proposed to address the issues of color distortion, decreased contrast, and blurring caused by the imaging environment in underwater images. First, based on the local and global color biases in the Lab color space, color compensation is applied to attenuated colors, and thereafter the grayscale world algorithm is used to restore the color balance of underwater images. Second, automatic color scale and gamma correction methods are used to adjust the information of each channel to obtain images with high dynamic range and high illumination. Finally, high-frequency information is obtained through the antisharpening mask method, and image details are enhanced to obtain clear underwater images. The proposed algorithm utilizes statistical information, such as the color deviation and mean square deviation of the image, to achieve adaptive processing. The experimental results show that the proposed algorithm can effectively remove color deviation from underwater images, improve image contrast and clarity, and enhance visual effects. Compared with other algorithms, it has advantages in processing efficiency and time.
In recent years, with the development of deep learning, feature extraction methods based on deep learning have shown promising results in hyperspectral data processing. We propose a multi-scale hyperspectral image feature extraction method with an attention mechanism, including two parts that are respectively used to extract spectral features and spatial features. We use a score fusion strategy to combine these features. In the spectral feature extraction network, the attention mechanism is used to alleviate the vanishing gradient problem caused by spectral high-dimension and multi-scale spectral features are extracted. In the spatial feature extraction network, the attention mechanism helps branch networks extract important information by making the network backbone focus on important parts in the neighborhood. Five spectral feature extraction methods, three spatial feature extraction methods and three spatial-spectral joint feature extraction methods are used to perform comparative experiments on three datasets. The experimental results show that the proposed method can steadily and effectively improve the classification accuracy of hyperspectral images.
A tone mapping algorithm for high dynamic range (HDR) images based on the improved Laplacian pyramid is proposed to enhance the rendering effect of HDR images on ordinary displays. The algorithm decomposes the preprocessed image into high-frequency and low-frequency layers, which are then fed into two feature extraction sub-networks. The algorithm combines their output images having different features via a fine-tuning network and finally obtains a low dynamic range image with a superior perceptual effect. Furthermore, the algorithm designs an adaptive group convolution module to enhance the ability of the sub-network to extract local and global features. The test results show that, compared to the existing advanced algorithms, the proposed algorithm can compress the brightness of the HDR image better, retain more image details, and achieve superior objective quality and subjective perception.
The two common degradations of underwater images are color distortion and blurred detail due to the absorption and dispersion of light by water. We propose an underwater image-enhancement algorithm model based on multi-scale attention and contrast learning to acquire underwater images with bright colors and clear details. The model adopts the encoding-decoding structure as the basic framework. To extract more fine-grained features, a multi-scale channel pixel attention module is designed in the encoder. The module uses three parallel branches to extract features at different levels in the image. In addition, the extracted features by the three branches are fused and introduced to the subsequent encoder and the corresponding decoding layer to improve the ability to extract network features and enhance details. Finally, a contrast-learning training network is introduced to improve the quality of enhanced images. Several experiments prove that the enhanced image by the proposed algorithm has vivid colors and complete detailed information. The average values of the peak signal-to-noise ratio and structural similarity index are up to 25.46 and 0.8946, respectively, and are increased by 4.4% and 2.8%, respectively, compared with the other methods. The average values of the underwater color image quality index and information entropy are 0.5802 and 7.6668, respectively, and are increased by at least 2% compared with the other methods. The number of feature matching points is increased by 24 compared to the original images.
Accurate estimation of the point spread function (PSF) is crucial for restoring blurry images caused by motion blur. This paper proposes a method using window functions to improve the accuracy of PSF parameter estimation and eliminate the interference from the central bright line in the spectrogram on the blurred-angle estimation. To achieve this, two-dimensional discrete Fourier transform and logarithmic operation are performed on the motion-blurred image, followed by the calculation of the power spectrogram. Thereafter, the Hanning window function is added to the spectrogram, and the image is processed using median filtering smoothing and binary transformation processing, in combination with morphological algorithm and Canny operator edge detection. Finally, the fuzzy direction is obtained using the Radon transform. Based on the blurred-direction results, the spectrogram of the motion-blurred image is processed by Radon transform in the direction of the blur angle. The distance between the negative peaks is analyzed to obtain the dark fringe spacing, and the blur length is calculated according to the relation between the dark fringe spacing and the blur length. This completes the estimation of the two point spread function parameters. Comparing the proposed algorithm with existing ones, the results show an improvement in the accuracy of parameter estimation, and a reduction in ringing and artifact phenomena generated during restoration. The proposed method makes full use of image information, and is easy to operate.
This paper proposes a nondestructive detection method for detecting wall disease by employing multi-spectral imaging based on convolutional neural networks. This method aims to address issues such as low detection efficiency and easy interference by subjective factors that are associated with the use of artificial survey methods in traditional wall disease detection. The minimum noise separation method is used to preprocess the multispectral imaging data of a city wall, which reduces the dimensions of the data while preserving the original data features and reducing data noise. To address the problem of low classification accuracy caused by mixed and diverse pixels of different types of wall damage, a convolution operation is used to extract the features of wall damage, with the most important features retained and irrelevant features removed, resulting in a sparse network model. The extracted features are integrated and sorted through a full connection layer. Two dropout are included to prevent overfitting. Finally, on a wall multispectral dataset, the trained convolution neural network classification model is used to detect wall damage at the pixel level, and the predicted results are displayed visually. Experimental results show that the overall accuracy and Kappa coefficient are 93.28% and 0.91, respectively, demonstrating the effectiveness of the proposed method, which is crucial for enhancing the detection accuracy of wall disease and fully understanding its distribution.
The recognition accuracy and efficiency of coal and gangue have a great impact on coal-production capacity but the existing recognition and separation methods of these minerals still have deficiencies in terms of separation equipment, accuracy, and efficiency. Herein, a coal and gangue recognition method is presented based on two-channel pseudocolor lidar images and deep learning. Firstly, a height threshold is set to remove the interference information from the target ore based on the lidar distance channel information. Concurrently, the original point-cloud data are projected in a reduced dimension to quickly obtain the reflection intensity information and surface texture features of coal gangue. The intensity and distance channels after dimensional reduction are then fused to construct the dual-channel pseudocolor image dataset for coal and gangue. On this basis, the DenseNet-121 is optimized for the pseudocolor dataset, and the DenseNet-40 network is used for model training and testing. The results show that the recognition accuracy of coal gangue is 94.56%, which proves that the two-channel pseudo-color image acquired by lidar has scientific and engineering value in the field of ore recognition.
An integral task in self-service baggage check is the detection of whether pallets are added to the baggage. Pallets loaded with the baggage are mostly obscured; therefore, a fast detection method based on a multi-layer skeleton model registration is proposed to address this issue. A point cloud skeleton model and a point-line model are constructed using a 3D point cloud model to describe the characteristics of the pallet. During online detection, the designed banded feature description is used to capture the border point clouds. Moreover, the proposed point-line potential energy iterative algorithm is utilized to register the point-line model and border points as well as to realize pallet discrimination. An iterative nearest point registration based on random sampling consistency is used to achieve accurate registration and pose calculation as well as to obtain the accurate pose of the pallet. Experimental results show that the algorithm can maintain an accuracy of 94% even when 70% of the pallet point cloud data are missing. In addition, the speed of the proposed algorithm exceeds that of a typical algorithm by more than six times.
To address the low robustness and positioning accuracy of the traditional visual simultaneous localization and mapping (SLAM) system in a dynamic environment, this study proposed a robust visual SLAM algorithm in an indoor dynamic environment based on the ORB-SLAM2 algorithm framework. First, a semantic segmentation thread uses the improved lightweight semantic segmentation network YOLOv5 to obtain the semantic mask of the dynamic object and selects the ORB feature points through the semantic mask. Simultaneously, the geometric thread detects the motion-state information of the dynamic objects using weighted geometric constraints. Then, an algorithm is proposed to assign weights to semantic static feature points and local bundle adjustment (BA) joint optimization is performed on camera pose and feature point weights, effectively reducing the influence of the dynamic feature points. Finally, experiments are conducted on a TUM dataset and a genuine indoor dynamic environment. Compared with the ORB-SLAM2 algorithm before improvement, the proposed algorithm effectively improves the positioning accuracy of the system on highly dynamic datasets, showing improvements of root mean square error (RMSE) of the absolute and relative trajectory errors by more than 96.10% and 92.06%, respectively.
Nowadays, micro-video event detection exhibits great potential for various applications. As for event detection, previous studies usually ignore the importance of keyframes and mostly focus on the exploration of explicit attributes of events. They neglect the exploration of latent semantic representations and their relationships. Aiming at the above problems, a deep dynamic semantic correlation method is proposed for micro-video event detection. First, the frame importance evaluation module is designed to obtain more distinguishing scores of keyframes, in which the joint structure of variational autoencoder and generative adversarial network can strengthen the importance of information to the greatest extent. Then, the intrinsic correlations between keyframes and the corresponding features are cooperated through a keyframe-guided self-attention mechanism. Finally, the hidden event attribute correlation module based on dynamic graph convolution is designed to learn latent semantics and the corresponding correlation patterns of events. The obtained latent semantic-aware representations are used for final micro-video event detection. Experiments performed on the public datasets and the newly constructed micro-video event detection dataset demonstrate the effectiveness of the proposed method.
To address the problems of excessive loss of detail information, unclear texture, and low contrast during the fusion of infrared and visible images, this study proposes an infrared and visible image fusion method based on image enhancement and secondary nonsubsampled contourlet transform (NSCT) decomposition. First, an image enhancement algorithm based on guided filtering is used to improve the visibility of visible images. Second, the enhanced visible and infrared images are decomposed by NSCT to obtain low- and high-frequency subbands, and different fusion rules are used in different subbands to obtain the NSCT coefficient of the first fusion image. The NSCT coefficients of the primary fused image are reconstructed and decomposed into low- and high-frequency subbands, which are then fused with the low- and high-frequency subbands of the visible light image, respectively to obtain the NSCT coefficients of the secondary fused image. Finally, the NSCT coefficients of the secondary fused image are reconstructed by inverse transformation to obtain the final fused image. Numerous experiments are conducted with public datasets, using eight evaluation indicators to compare the proposed method with eight fusion methods based on multiple scales. Results show that the proposed method can retain more details of the source image, improve the edge contour definition and overall contrast of the fusion results, and has advantages in terms of subjective vision and the use of evaluation indicators.
The current mainstream lightweight object detection models exhibit low detection accuracy in unmanned aerial vehicle (UAV) photography scenes. This study introduces a high-precision and lightweight aerial photography image object detection model based on YOLOv8s, named LEFE-YOLOv8. First, an enhanced feature extraction convolution (EFEConv) incorporating an attention mechanism was developed. It is integrated with partial channel convolution (PConv) and 1×1 convolution to create a lightweight enhanced feature extraction module. This integration augments the model's feature extraction capabilities and reduces the number of parameters and computational complexity. Subsequently, a lightweight dynamic upsampling operator module was incorporated into the feature fusion network, effectively addressing the information loss problem during the upsampling process in high-level feature networks. Finally, a detection head with multi-scale modules was designed to enhance the network model's multi-scale detection capabilities. The final experimental results demonstrate that, compared with the benchmark model, the improved model achieves an average accuracy of 42.3% and 83.9% on the VisDrone2019 and HIT-UAV datasets, respectively, with less than 10×106 parameters. These results establish the model's suitability for aerial image object detection tasks.
One of the difficulties in the expansion of light field data is to expand the viewpoint and image point plane support simultaneously and maintain a good space-angle consistency. In this paper, we propose to use the neural light field network to represent the rays parameterized by a biplane, generate the rays that do not exist in the atomic light field data, and extend the viewpoint and image plane branches. To gather statistics of extension part of the error of generated rays, we refer to the error between the generated rays and original data in the overlapping area of the sub-light field. It allows determining the proportion of data with good generation effect in the extended part. We analyze the influence of the size of the overlapping area of the sub-light field on the effect of the extended light field. Experimental results on Blender simulation data show that the proposed method can realize the simultaneous expansion of the sub-light field viewpoint and image plane branch, and the epipolar plane images (EPI) display extension part can maintain a good space-angle consistency. When the proportion of overlapping regions of sub-light field data increases from 42.9% to 77.8%, the proportion of data with good generation effect in extended regions increases from 82.91% to 84.68%. This analysis has certain guiding significance for the design of sub-light field data when expanding light field data.
Fine-grained bird recognition tasks frequently face the challenges of small interclass and large intraclass differences. In this study, we propose an incremental learning method for fine-grained bird recognition based on prompt learning. Learnable visual prompts are first introduced into the incremental learning model to alleviate the phenomenon of catastrophic forgetting in the incremental learning model. For fine-grained bird recognition, text information of different granularities is introduced as the text prompts in the incremental learning model, which are then fused with the visual prompts to learn the characteristics of different birds from coarse to fine and to improve fine-grained bird recognition accuracy. Numerical experiments on the CUB-200-2011 dataset show that the proposed model has better image recognition accuracy than other incremental learning models. For general image recognition tasks, proposed method exhibits higher recognition accuracy and better anti-forgetting on CIFAR-100 and 5-datasets.
In response to concerns about the insufficient visibility of target information and loss of details in traditional multiscale fusion methods for infrared and visible images, this paper proposed a hybrid multiscale decomposition fusion method based on anisotropic guided filtering. Initially, an adaptive image enhancement method based on texture contours was introduced to improve visible images by simultaneously enhancing brightness, contrast in dark regions, and texture details. Subsequently, the brightness layer of the source image was extracted using the edge-preserving smoothing property of anisotropic guided filtering. The difference layer was decomposed into a base layer, a small-scale detail layer, and multiple levels of large-scale detail layers via Gaussian filtering. The fusion rule for the brightness layer employed an absolute maximum value approach, and a fusion method that combined visual saliency with least squares optimization was proposed for the base layer. The small-scale detail layer adopted a fusion strategy based on modified Laplacian energy, and the large-scale detail layers employed a composite fusion strategy based on local variance and spatial frequency. Finally, the fusion image was reconstructed by combining the merged layers. Compared with nine other classic and advanced methods, the proposed method performs well in both subjective and objective analyses.
In response to the challenges of fuzzy image and numerous small targets in underwater target detection, which lead to missed detection and false detection with the YOLOv8n algorithm, we proposed an enhanced lightweight underwater target detection algorithm. Initially, within the backbone network, certain convolutions were substituted with non-strided space-to-depth convolution, and a global attention mechanism was introduced to augment global contextual information, thereby improving the network's ability to extract features from blurry and small targets. Subsequently, the conventional upsampling method was replaced with a lightweight upsampling operator, content aware reassembly of features, to broaden the model's receptive field. Furthermore, the normalized Wasserstein distance was introduced and integrated with complete intersection over union to devise a novel localization regression loss function, aimed at increasing the accuracy of small target localization in complex underwater environment. Finally, a dynamic target detection head combined with parameterized rectified linear unit was proposed to enhance the performance of the original detection head, thereby improving the model's proficiency in managing small underwater targets. Experimental results demonstrated that the improved YOLOv8n algorithm achieved a mean average precision of 86.62% on the RUOD dataset, marking a 3.20 percentage points improvement over that of the original YOLOv8n algorithm. The total number of model parameters was 5.67 M, with the number of gigabit floating-point operations is 12.5, fulfilling the criteria for lightweight model.
An underwater target detection algorithm that uses a multiscale and cross-spatial information aggregation network is proposed. First, a deformable layer aggregation module is used within the backbone network to extract features, enhancing the network's positioning accuracy. Second, the Conv2former module is used to enhance the neck's global information extraction capability and reduce missing detections caused by mutual occlusion among underwater targets. Finally, a multiscale attention parallel enhancement module that uses parallel convolution blocks to extract deeper features is proposed. This module integrates an efficient multiscale attention module to filter out interference from background and image distortion and introduces multiple cross-level connections to effectively integrate low-level local features with high-level strong semantic information, thereby improving model detection accuracy. The ablation experiment is conducted on the URPC dataset. Compared with the original model, the accuracy rate, recall rate, mean average precision (mAP)@0.5, and mAP@0.5∶0.95 of the improved model increase by 3.6 percentage points, 2.6 percentage points, 3.5 percentage points, and 3.3 percentage points, respectively. Tests on the RUOD dataset under different scenarios indicate that the proposed model offers notable advantages over several current mainstream models.
This paper proposed an underwater optical image enhancement algorithm based on degradation characteristic indices. First, this method determined the degradation characteristic present in the original image based on the these indices. Second, the image restoration process was performed according to the degradation characteristics of the original image. Finally, an image enhancement method was applied to the restored image using the bounded general logarithm ratio operation. The proposed algorithm was tested to identify degradation characteristics and enhance images in two typical underwater scenarios: one with auxiliary lighting and the other with natural lighting. Processing results showed that the degradation characteristic parameters were restored to a reasonable range, and the image enhancement effect reached an ideal level. For the middle crack image, with a mean adaptive gradient gain of 1.5054, the maximum brightness difference of the aperture layer decreased from 155 to 44, perceptual fog density reduced from 2.38 to 0.37, dynamic range ratio increased from 60.00% to 76.08%, and contrast increased from 6.15 to 107.35. For the slope image, with a mean adaptive gradient gain of 1.5678, the maximum brightness difference of the aperture layer decreased from 65 to 24, perceptual fog density decreased from 0.62 to 0.21, dynamic range ratio increased from 29.41% to 89.80%, color distortion index improved from 0.66 to 1.00, and contrast increased from 30.77 to 316.25. The proposed algorithm was compared with nine existing enhancement algorithms to evaluate its effectiveness. Results show that the proposed algorithm has advantages in terms of image restoration and enhancement.
Herein, a V-shaped pyramid bilateral feature fusion network (VPBF-Net) is proposed to address small-scale target missing segmentation, inaccurate edge segmentation, and inefficient fusion of deep and shallow feature information in current semantic segmentation networks. In the encoding stage, a V-shaped atrous spatial pyramid pooling (VASPP) module adopts multiple-parallel-branch interactive connection structures to enhance the information exchange between the local semantic information of each branch. In addition, multibranch feature hierarchical fusion is adopted to reduce grid artifact effects. Furthermore, a coordinate attention module is used to assign weights to the extracted deep semantic information, enhancing the network's attention to the segmentation target. In the decoding stage, a bilateral attention feature aggregation module is designed to guide shallow feature fusion through multiscale deep semantic information, thereby capturing different-scaled shallow feature representations and achieving more efficient deep and shallow feature fusion. Experiments are conducted on the PASCAL VOC 2012 dataset and Cityscapes dataset, the proposed method achieves average intersection to union ratios of 83.25% and 77.21%, respectively, indicating advanced results. Compared with other methods, the proposed method can more accurately perform small-scale object segmentation, alleviating missed segmentation and misclassification.
To address the decline in the image quality under low-light conditions, low-light image enhancement methods aim to improve the visible details such as brightness and color richness of degraded images to produce clearer images that align more closely with human visual expectations. Although remarkable progress has been made in deep learning-based enhancement methods, traditional convolutional neural networks have limitations in terms of feature extraction due to locality, rendering the effective modeling of long-distance relationships between image pixels challenging for the network. In contrast, the Transformer model utilizes the self-attention mechanism to better capture long-range dependencies between pixels. However, existing research reveals that global self-attention mechanisms can lead to a lack of spatial locality in networks, thereby deteriorating the processing ability of transformer-based networks for local feature details. Therefore, in this study, a novel low-light image enhancement network, MFF-Net, is proposed. The principle of cross-domain feature fusion is adopted to integrate the advantages of convolutional neural network and Transformer to obtain cross-domain feature representations containing multiscale and multidimensional information. In addition, to maintain feature semantic consistency, a feature semantic transformation module is specially designed. Experimental results on public low-light datasets show that the proposed MFF-Net achieves better enhancement effects than mainstream methods, with the generated images exhibiting better visual quality.
Quantitative phase microscopy is capable of achieving nondestructive and label-free imaging of transparent samples, rendering it suitable for biological cell research. However, the coupling of sample refractive index and physical thickness cannot be presented separately in phase data. Decoupling methods require tedious experimental and computational processes and thus cannot meet the needs of automated real-time detection in biomedical research and applications. To address this issue, this study constructs a new semantic segmentation network based on U-Net by adding the attention mechanism under the idea of a residual structure and dense connection module. This enables exploration of the method of decoupling the physical thickness and refractive index of uniform medium samples based on a single-phase map. The model was trained on a dataset comprising polystyrene microsphere phase maps, and phase data decoupling was achieved for samples of mature red blood cells with different geometric features. The relative error of the average refractive index obtained via single-frame phase separation was 0.9%. This method requires only a single-phase map of the sample in any direction and trains a neural network model using standard samples for highly specific quantitative extraction of chemical and physical information from biological cell samples. Moreover, it has the characteristics of convenient data collection and low computational complexity and can serve as a reference for automated quantitative analysis of phase information.
This paper proposes a SwinT-MFPN slope scene image classification model designed to balance performance, inference speed, and convergence speed, leveraging the Swin-Transformer and feature pyramid network (FPN). The proposed model overcomes the challenges associated with rapidly increasing computational complexity and slow convergence in high-resolution images. First, the Mish activation function is introduced into the FPN to construct an MFPN structure that extracts features from the original high-resolution image, producing a feature map with reduced dimensions while eliminating redundant low-level feature information to enhance key features. The Swin-Transformer, which is known for its robust deep-level feature extraction capabilities, is then employed as the model's backbone feature extraction network. The original cross-entropy loss function of the Swin-Transformer is replaced by a weighted cross-entropy loss function to mitigate the effects of imbalanced class data on model predictions. In addition, a root mean square error evaluation index for accuracy is proposed. The proposed model's stability is verified using a self-constructed dam slope dataset. Experimental results demonstrate that the proposed model achieves a mean average precision of 95.48%, with a 3.01% improvement in time performance compared to most mainstream models, emphasizing its applicability and effectiveness.
To enhance the reliability of airport surveillance systems during heavy fog, this paper proposes a method to estimate atmospheric light values for airport images dehazing. First, the method estimates the atmospheric transmittance based on the dark channel prior (DCP) and applies a standard atmospheric scattering model to restore the initial dehazed image. Second, the clarity coefficient is introduced to provide feedback on the dehazed image. Based on this feedback, a rule is designed to iteratively update the atmospheric light value, adjusting it until the clarity coefficient peaks. Finally, guided by the updated atmospheric light value, the atmospheric scattering model reconstructs the optimal fog-free image. Experimental results demonstrate that, compared with traditional DCP method, the proposed method achieves more accurate atmospheric light values, resulting in more natural dehazing and outperforming existing objective evaluation indices for image dehazing
Fine-grained image classification aims to recognize subcategories within a given superclass accurately; however, it is faced with challenges of large intra-class differences, small inter-class differences, and limited training samples. Most current methods are improved based on Vision Transformer with the goal of enhancing classification performance. However, the following issues occur: ignoring the complementary information of classification tokens from different layers leads to incomplete global feature extraction, inconsistent performance of different heads in multi-head self-attention mechanism leads to inaccurate part localization, and limited training samples are prone to overfitting. In this study, a fine-grained image classification network based on feature fusion and ensemble learning is proposed to address the above issues. The network consists of three modules: the multi-level feature fusion module integrates complementary information to obtain more complete global features, the multi-expert part voting module votes for part tokens through ensemble learning to enhance the representation ability of part features, the attention-guided mixup augmentation module alleviates the overfitting issue and improves the classification accuracy. The classification accuracy on CUB-200-2011, Stanford Dogs, NABirds, and IP102 datasets is 91.92%, 93.10%, 90.98%, and 76.21%, respectively, with improvements of 1.42, 1.50, 1.08, and 2.81 percentage points, respectively, compared to the original Vision Transformer model, performing better than other compared fine-grained image classification methods.
To address the issue of image blurring when the object size exceeds the depth of the field for the imaging system, we propose an image enhancement method based on clear region screening. First, the imaging system employs step capture to obtain multiple images, each focusing on different clear areas of the same target. Second, a combination of bilateral filtering and edge extraction algorithms is employed to weaken low-definition regions of images while retaining strong edge features. The combination of strong edges and regions mitigates the misleading effects of blurred regions near edges. A high-resolution region screening algorithm based on image-variance features is then introduced to select clear images of non-edge regions and match them with strong region-edge maps to obtain the final target images. The experimental results show that, compared to conventional image-enhancement algorithms, the clear regions of the image obtained by the proposed method exhibit minimal change and the blurred regions are effectivly enhanced. Compared with the histogram equalization, adaptive filtering, and Retinex algorithms, the proposed algorithm increases the information entropy, signal-to-noise ratio, structural similarity, and sharpness by an average of 4.1%, 21.3%, 36.0%, and 9.53% , respectively, while decreases the standard error by 23.3% on average.
Infrared target detection is an important means of remote search and monitoring, and the accuracy of infrared tiny target detection determines the practical application value of this method. A detection framework based on a multi-hop deep network is proposed to improve the performance of tiny target detection in complex backgrounds. First, to deal with the"weak"and"small"shape characteristics of tiny targets, an anchor-free mechanism is used to build feature pyramids as the backbone for extracting feature maps. Then, to realize progressive feature interaction and adaptive feature fusion, a multi-hop fusion block composed of multi-scale dilation convolution groups is designed at the connection level. Finally, to reduce the sensitivity to position perturbations of tiny targets, the Wasserstein distance between the real and predicted targets is used as a similarity measure. The experimental results show that compared to existing methods, the proposed method delivers better detection performance in terms of accuracy and efficiency.
We propose a local stereo matching algorithm that integrates tree segmentation with side window technology to enhance disparity estimation accuracy at edge regions, which is a common issue in existing local stereo matching algorithms. Unlike the traditional fixed window aggregation strategy, the proposed algorithm utilizes a cost aggregation strategy based on side window technology. This approach adaptively selects the optimal side window for cost aggregation, substantially improving the disparity accuracy in edge regions. Moreover, tree segmentation technology is employed during the disparity refinement stage to propagate reliable pixel points through circular searches, thereby enhancing disparity accuracy across edges and complex textured areas. Experimental results from the Middlebury dataset demonstrate that proposed algorithm achieves high accuracy and efficiency in disparity calculation, particularly excelling in challenging areas such as image edges and complex textures.
When the light source is located on the back of the object, the captured image has a dark foreground and a bright background. This non-uniform light distribution significantly affects the overall visual quality of the image. As existing image enhancement methods have difficulty in effectively solving the non-uniform illumination problem of backlit images, this study proposes a backlight image enhancement network guided by an attention mechanism. First, a U-shaped network is used to construct an enhancement subnetwork (EM-Net), to achieve multiscale feature extraction and reconstruction. Second, a condition subnetwork (Cond-Net) is introduced to generate a backlight area attention map to guide the EM-Net to focus on the backlight area in the image. Then, using a dual-branch enhancement block (DEB), the brightness of the backlight area is fully enhanced while maintaining the contrast of the front-light area. In addition, a spatial feature transformation (SFT) layer is introduced in the backlight branch of the DEB, allowing EM-Net to focus on improving the visibility of the backlight area according to the guidance provided by the area attention map. Finally, to strengthen the correlation between the backlight and front-light areas during the enhancement process, a bilateral mutual attention module (BMAM) is proposed to further improve the reconstruction ability of the EM-Net. Experimental results show that the peak signal to noise ratio (PSNR) metric obtained by the proposed algorithm on the backlight data set (BAID) and non-uniform exposure local color distribution prior dataset (LCDP) exceeds that obtained by the latest backlight image enhancement contrastive language-image pretraining (CLIP)-LIT algorithm by 3.15 dB and 4.81 dB, respectively. Compared with other image enhancement algorithms based on deep learning, the proposed algorithm can effectively improve the visual quality of backlit images with higher computing efficiency.
Underwater images are an important carrier of marine information. High-quality and clear underwater images are an important guarantee for a series of underwater operations such as marine resource exploration and marine safety monitoring. Underwater images will experience quality degradation due to factors such as selective absorption and scattering of light. In view of this, an underwater image enhancement network model based on deep multi-prior learning is proposed. First, four variants of underwater images are obtained under the prior guidance of the underwater optical imaging physical model, and a separate feature processing module containing five U-Net network structures is used to learn five private feature maps; then,the up-sampling feature maps from each U-Net structure are extracted, and through a joint feature processing module, a public feature map is learned; finally, the feature fusion module is used to uniformly represent the private feature map and the public feature map to generate an enhanced underwater image. Experimental results show that compared with various underwater image enhancement network models, the proposed model is more effective in enhancing underwater image quality. It has achieved excellent performance in multiple quality evaluation indicators.
Hyperspectral image classification is a basic operation for understanding and applying hyperspectral images, and its accuracy is a key index for measuring the performance of the algorithm used. A novel two-branch residual network (DSSRN) is proposed that can extract robust features of hyperspectral images and is applicable to hyperspectral image classification for improving classification accuracy. First, the Laplace transform, principal component analysis (PCA), and data-amplification methods are used to preprocess hyperspectral image data, enhance image features, remove redundant information, and increase the number of samples. Subsequently, an attention mechanism and a two-branch residual network are used, where spectral and spatial residual networks are adopted in each branch to extract spectral and spatial information as well as to generate one-dimensional feature vectors. Finally, image-classification results are obtained using the fully connected layer. Experiments are conducted on remote-sensing datasets at the Indian Pine, University of Pavia, and Kennedy Space Center. Compared with the two-branch ACSS-GCN, the classification accuracy of proposed model shows 1.94、0.27、20.85 percentage points improvements on the three abovementioned datasets, respectively.
To effectively extract road information from different environments for intelligent vehicles, a road information extraction method based on hypervoxel segmentation is proposed. Road information is mainly divided into road edge and lane line information. First, the non-ground point cloud is filtered according to either the point cloud elevation information or installation location of scanning system. Second, the point cloud is over-segmented using the voxel adaptive hypervoxel segmentation method, allowing the separate segmentation of road edge features. Third, the boundary point extraction algorithm and driving path of scanning system are used to complete the extraction of road edges, subsequently dividing the driving area according to the road edge information. Finally, lane lines are extracted using local adaptive threshold segmentation and spatial density filtering. Experimental results show that the extraction accuracy of the road edge height and road width are 92.6% and 98.4%, deviation degree of lane line is less than 4%, and maximum deviation distance is no more than 0.04 m.
Object detection based on active learning typically utilizes limited labeled data to enhance detection model performance. This method allows learners to select valuable samples from a large pool of unlabeled data for manual labeling and to iteratively train and optimize the model. However, existing object detection methods that use active learning often struggle to effectively balance sample uncertainty and diversity, which results in high redundancy of query samples. To address this issue, we propose an adversarial active learning method guided by uncertainty for object detection. First, we introduce a loss prediction module to evaluate the uncertainty of unlabeled samples. This uncertainty guides the adversarial network training and helps construct a query sample set that includes both uncertainty and diversity. Second, we evaluate sample diversity based on feature similarity to reduce redundancy of query samples. Finally, experimental results on the MS COCO and Pascal VOC datasets using multiple detection frameworks demonstrate that the proposed method can effectively improve object detection accuracy with fewer annotations.
In catastrophic environments such as fires, earthquakes, and explosions, images captured by drone cameras often become blurry because of strong vibrations. These vibrations severely affect the image quality and efficiency of emergency rescue operations. To address this issue, we propose an image deblurring method that uses unmanned aerial vehicle (UAV) inertial sensor data to construct the point spread function (PSF). The proposed method captures the motion information of an airborne camera using an inertial sensor and derives the PSF from this data, effectively overcoming the difficulties associated with traditional methods that consider complex textures, low contrast, or noise. The estimated PSF is combined with the total variation regularization technique to restore the images. By introducing the split Bregman iterative technique into the implemented algorithm, the complex optimization problem is effectively broken down into a series of simple sub-problems. This approach accelerates the calculation speed and yields high-precision image deblurring. Experimental and simulation results show that the proposed method effectively restores image blur caused by UAV vibrations, suppresses artifacts and ringing, and considerably improves the imaging quality of UAV cameras under vibration.
In pigment classification of mural multi-spectral image, traditional algorithms typically extract the spatial features of the image through the fixed pane. Specifically, the spatial relationship between different pigments is ignored, and the classification error of pigments in the halo area is large. Furthermore, the feature extraction method of a single scale cannot effectively express the differences between pigment blocks. In this study, a pigment classification method for mural multi-spectral images based on multi-scale superpixel segmentation is proposed. First, the dimensionality of mural multi-spectral data is reduced by using adaptive band optimization method, which effectively reduces the amount of data required for superpixel segmentation. Second, the pseudo-color image synthesized by the first three bands after the band optimization and dimensionality reduction is segmented based on gradient constraint. It leads to segmentation results that are more close to the actual contour and improves the accuracy of pigment classification. Third, the selected sample pixels are mapped into the super pixels to realize the spatial information and feature enhancement of the image. Finally, given that a single scale cannot be accurately applied to each pigment block, multi-scale superpixels are used to segment false-color mural images, obtain segmentation maps of different scales, perform mean filtering in the same superpixel label region of the segmentation map, and use support vector machine (SVM) classifier to classify the multi-scale superpixel segmentation images. A fusion decision strategy based on majority voting is adopted to obtain the final classification result. The experimental results show that the proposed method can realize an overall accuracy of 98.84% and average accuracy of 97.75% on the simulated mural multi-spectral image dataset. Hence, the proposed method can provide more accurate classification results than the control group.
To address the issue of accurate detection and identification of faint targets in geosynchronous orbit space under the background of starry sky, a multiple inspection object detection algorithm based on Hough transform is proposed. This study analyzes the characteristics of space targets in a geosynchronous orbit and the difficulties in detection and identification, as well as the shortcomings of traditional target detection algorithms. By using the continuous multi-frame images through denoising, threshold segmentation, centroid extraction, and star map matching, the influence of most of the stars is filtered out. The multi-frame images are then superimposed using Hough transform, and multiple tests are conducted to achieve accurate target extraction, which significantly improves the applicability of Hough transform in the detection of weak targets in space. The effectiveness of proposed algorithm is verified through field experiments and simulation data analysis. Compared with the traditional Hough algorithm, the detection accuracy is increased by 62.5%, the false alarm rate is reduced by 74.9%, and the time consumption of the algorithm is reduced by 7.2%; moreover, the detection accuracy is greater than 98% and the false alarm rate is less than 2% when the signal-to-noise ratio is greater than or equal to 3.
The semantic segmentation of remote sensing images is a crucial step in the analysis of geographic-object-based remote sensing images. Combining remote sensing image data with elevation data effectively enhances feature complementarity, thereby improving pixel-level segmentation accuracy. This study proposes a dual-source remote sensing image semantic segmentation model, STAM-SegNet, that leverages the Swin Transformer backbone network to extract multiscale features. The proposed model integrates an adaptive gating attention mechanism and a multiscale residual fusion strategy. The adaptive gated attention mechanism includes gated channel attention and gated spatial attention mechanisms. Gated channel attention enhances the correlation between dual-source data features through competition/cooperation mechanisms, effectively extracting complementary features of dual-source data. In contrast, gated spatial attention uses spatial contextual information to dynamically filter out high-level semantic features and select accurate detail features. The multiscale feature residual fusion strategy captures multiscale contextual information via multiscale refinement and residual structure, thereby emphasizing detailed features, such as shadows and boundaries, and improving the model's training speed. Experiments conducted on the Vaihingen and Potsdam datasets demonstrate that the proposed model achieved an average F1-score of 89.66% and 92.75%, respectively, surpassing networks such as DeepLabV3+, UperNet, DANet, TransUNet, and Swin-UNet in terms of segmentation accuracy.
Low-light image stitching is a technique that enables the stitching of images taken from different perspectives into a large field-of-view image under insufficient lighting conditions. The low contrast and high noise of images caused by inadequate lighting compromise the robustness and quantity of feature extraction, making feature matching and image stitching challenging. In response, this study proposes a low-light image stitching method based on an improved speeded-up robust feature (SURF) algorithm. In this method, a scale space was constructed first using the integral image of low-light images and Laplacian operations were performed, followed by edge extraction and binarization of the images. Further, the edges-in-shaded-region (ESR) image was generated based on the edge-extracted and binarized images to obtain scale weights, thereby dynamically adjusting the SURF feature extraction threshold. This effectively resolves the issue of mismatch between feature point pixel thresholds and overall image brightness, enhancing the robustness of the feature extraction algorithm. Additionally, the obtained scale weights can serve as weighting coefficients for the multiscale Retinex algorithm to achieve better image enhancement effects. In this method, binary descriptors were employed to accelerate the feature description and matching process. Finally, a homography matrix was calculated based on matching relationships to perform homography transformation and stitching of the enhanced images. Experimental results demonstrate that the proposed algorithm effectively improves the speed and performance of low-light image stitching, offering better robustness and adaptability compared with the traditional SURF algorithm.
Herein, a hyperspectral image classification algorithm that integrates convolutional network and graph neural network is proposed to address several challenges, such as high spectral dimensionality, uneven data distribution, inadequate spatial-spectral feature extraction, and spectral variability. First, principal component analysis is performed to reduce the dimensionality of hyperspectral images. Subsequently, convolutional networks extract local features, including texture and shape information, highlighting differences between various objects and regions within the image. The extracted features are then embedded into the superpixel domain, where dynamic graph convolution occurs via an encoder. A dynamic adjacency matrix captures the long-term spatial context information in the hyperspectral image. These features are combined through a decoder to effectively classify different pixel categories. Experiments conducted on three commonly used hyperspectral image datasets demonstrate that this method outperforms five other classification techniques with regard to classification performance.
To solve the problems that the complexity and closure of indoor scenes lead to the time-consuming and poor coverage of the reconstructed indoor 3D model, a method for indoor structure from motion (SFM) assisted by geomagnetic features is proposed. First, ordinary smartphone sensors were used to obtain indoor images and geomagnetic data. Second, to divide the overall image set into local image sets, a clustering algorithm was used to cluster geomagnetic data, and the clustering results of the geomagnetic data were used as attributes of the corresponding images to obtain the local image sets. Subsequently, the hierarchical SFM was used to construct sparse sub models for each local image set, and the matching points between each sparse sub model were determined. Finally, the RANSAC generalized Procrustes analysis (RGPA) algorithm was used to register local reconstructions and obtain a complete model. Experimental results of indoor reconstruction on the same and different floors show that the proposed method performs well in terms of reconstruction efficiency, reconstruction coverage, and point-cloud generation rate. Compared with the hierarchical SFM method, the proposed method offers a higher reconstruction efficiency by 37% on both datasets, and its reconstruction coverage is closer to the reconstruction target, thus providing a supplementary solution for constructing the same type of indoor environment.
In the field of atmospheric measurement, clouds are the most uncertain factor in atmospheric models, so accurate segmentation and recognition of cloud image are indispensable. However, due to the stochastic nature of clouds and atmospheric conditions, challenges exist in the precision and accuracy of cloud image segmentation. To address this issue, we propose a novel network named CloudHS-Net based on MobileNetV2. This network incorporates a hybrid concatenation structure, dilated convolutions, and a mixed dilated design, along with an efficient channel attention mechanism, for practical cloud image segmentation. The performance of the network is thoroughly evaluated on the SWIMSEG and HHCL-Cloud datasets through comparative tests with other advanced models, providing insights into the network's performance and the roles of its various components. Experimental results demonstrate that the efficient channel attention and hybrid concatenation structures effectively enhance the segmentation performance of the model. Compared to current advanced ground-based cloud image segmentation networks, CloudHS-Net excels in the task of sky cloud image segmentation, achieving an accuracy of 95.51% and mean intersection over union (MIoU) of 89.86%. The model reduces disturbances originating from atmospheric environment, such as sunlight, pay stronger attention to cloud. This leads to enhanced precision in cloud image segmentation, allowing for a more accurate capture of cloud coverage status and the experimental results show that the method is feasible.
This study proposes a two-stage three-dimensional object detection algorithm tailored for roadside scenes, aiming to address the challenges of high missed detection rates for long-distance vehicles and high false detection rates for pedestrians in complex scenes involved in cloud object detection tasks. This algorithm improves PointPillars and Transformer. In the first stage of the algorithm, the PointPillars-based backbone network incorporates the SimAM attention mechanism to capture similarity information, prioritizing essential features. This stage replaces standard convolutional blocks in the downsampling section with residual structures to improve network performance. The second stage of the algorithm utilizes Transformer to refine the candidate boxes generated in the first stage: the encoder constructs the original point features for encoding, while the decoder employs channel weighting to enhance channel information, thereby enhancing detection accuracy and mitigating false detection. The effectiveness of the proposed algorithm was tested on the DAIR-V2X-I roadside dataset and the KITTI vehicle-end dataset. Experimental results demonstrated substantial improvements in detection accuracy over other publicly available algorithms. Compared with the benchmark algorithm PointPillars, for moderate detection difficulty, accuracy improvements in detecting cars, pedestrians, and cyclists on the DAIR-V2X-I dataset were 1.9 percentage points, 10.5 percentage points, and 2.11 percentage points, respectively. Moreover, corresponding improvements on the KITTI dataset were 2.34 percentage points, 4.73 percentage points, and 8.17 percentage points, respectively.
As a classical computer vision perception task, pose estimation is commonly used in scenarios such as autonomous driving and robot grasping. The pose estimation algorithm based on template matching is advantageous in detecting new objects. However, current state-of-the-art template matching methods based on convolutional neural networks generally suffer from large memory consumption and slow speed. To solve these problems, this paper proposes a deep learning-based lightweight template matching algorithm. The method, which incorporates depth-wise convolution and the attention mechanism, drastically reduces the number of model parameters and has the capability to extract more generalized image features. Thus, the accuracy of position estimation for unseen and occluded objects is improved. In addition, this paper proposes an iterative rendering perspective sampling strategy to significantly reduce the number of templates. Experiments on open-source datasets show that the proposed lightweight model utilizes only 0.179% of the parametric quantity of the commonly used template matching model, while enhancing the average pose estimation accuracy by 3.834%.
A method for calculating the expected clarity value of region-of-interest (ROI) central subregion images is proposed by segmenting an ROI into ROI central, subcentral, and edge subregions. In particular, the ROI was segmented horizontally and vertically into multiple odd-numbered ROI subregions, and different standard deviation-weighted Gaussian filtering functions were used to filter and denoise different ROI subregions. The farther away from the ROI central subregion, the larger is the standard deviation between the ROI subcenter and edge subregions. This ensures the clarity value of the ROI central subregion image while effectively reduces the clarity value of the ROI edge subregion, thus providing reliable data for subsequent calculations of the expected clarity for the ROI image. Additionally, the conventional two-dimensional 3×3 Sobel operator was extended to a four-directional 5×5 Sobel operator, thus resulting in stronger edge responses and better clarity curves. Subsequently, the algorithm above was implemented using field programmable gate array (FPGA) high-speed image-processing technology, which significantly reduced the computation time. Experimental results show that the proposed method effectively eliminates the effect of noise on the expected clarity value of the ROI images and significantly reduces the details pertaining to the ROI edge-subregion images, thereby ensuring focus on the ROI center subregion continuously. Compared with software computing, FPGA presents a higher computing speed and offers better real-time performance, with a computing speed 130 times that of software computing.
In order to solve the problem of haze weather affecting image quality, this paper proposed a two-branch feature fusion image dehazing algorithm. Firstly, the data fitting branch of dense residual form increased the network depth and extracted high-frequency detail features. The knowledge transfer branch of U-Net form provided supplemental knowledge to the finite data. Then the multi-scale fusion module adaptively fused feature of two branches to recover high-quality dehazing images. In addition, brightness constraint was introduced to combined loss function to assign higher weights to the dense haze region. Finally, both synthetic and real-world datasets were used for testing and compared with existing dehazing algorithms such as FFA and GCANet. Experimental results showed that the proposed algorithm had good dehazing effect both on synthetic and real hazy images. And compared with other comparison algorithms, the average value of peak signal to noise ratio on four nonhomogeneous haze datasets was increased by 1.55 dB?10.30 dB and the average value of structural similarity was increased by 0.0312?0.2440.
An image stitching algorithm that combines enhanced optimal seam and optimized brightness is proposed to address the issue of ghosting and inconsistent brightness in panoramic images arising from large parallax and exposure differences. First, a deformation model based on minimizing projection biases was used for image registration to accurately align overlapping area. Second, an enhanced optimal seam algorithm was implemented between two intersections in the overlapping area to avoid information loss in the panoramic image. Finally, leveraging the Poisson fusion image, the energy functional of the ideal panoramic image gradient and a nonuniform illumination fitting model were constructed to optimize the brightness and improve the brightness consistency of the panoramic image. Experimental results show that compared with the algorithm proposed in reference [11], the proposed algorithm improves the structural similarity by 5.58% and peak-signal-to-noise ratio by 9.55% in terms of eliminating large parallax. Compared with before optimization, the proposed algorithm reduces the average gradient of the illumination component by 14.90% and improves the average gradient by 12.09% in terms of eliminating exposure differences. Thus, the algorithm can be used for image stitching in scenes with large disparities and exposure differences.
To overcome the difficulty of obtaining large annotated datasets, a proxy task based on a diffusion model was introduced, allowing for self-supervised learning of a priori knowledge from unlabeled datasets, followed by fine-tuning on a small labeled dataset. Inspired by the diffusion model, different levels of noise are weighted with the original images as inputs to the model. By training the model to predict the input noise, a more robust learning of the representation of intravascular ultrasound (IVUS) images at the pixel level was achieved. Additionally, the combined loss function of mean square error (MSE) and structural similarity index (SSIM) was introduced to improve the performance of the model. The experimental results of this method on 20% dataset demonstrate that the Jaccard metric coefficients of the lumen and meida are increased by 0.044 and 0.101, respectively, compared with result of random initialization, and the Hausdorff distance coefficients are improved by 0.216 and 0.107, respectively, compared with result of random initialization, which is similar to the result of using 100% dataset for training. This framework applies to any structural image segmentation model and significantly reduces the reliance on ground truth while ensuring segmentation effectiveness.
Traditional camera imaging suffers from insufficient preservation of high-frequency information, inaccurate solution of encoding exposure codewords, and difficulty in estimating blur kernels. Toward solving these problems, this study focuses on a codeword searching method for coded exposure cameras and proposes an intelligent optimized cyclic search strategy based on a memetic algorithm framework. A mutation crossover operator is used in differential evolution to obtain a global solution; subsequently, a taboo search is performed to conduct a local search on the global solution, thereby iteratively searching to obtain an optimal codeword sequence. A loss function suitable for encoding exposure image restoration is designed, and an end-to-end blind deconvolution kernel generative adversarial network is used to compare the performances of different codeword acquisition methods for blurry image restoration. Experimental results show that the proposed intelligent optimization algorithm can solve the codeword sequence more accurately and with better robustness than the other methods. When using the same network for blurred image restoration, the proposed algorithm yields superior restoration results compared with the existing methods from subjective and objective perspectives. Thus, the proposed method has a high engineering application value for enhancing motion blur restoration.
A hierarchical matching multi-object tracking algorithm based on pseudo-depth information was proposed to address the performance limitations of traditional multi-object tracking methods that rely on intersection over union (IOU) for association under target occlusion, as well as the constraints of feature re-identification in dealing with visually similar objects. The proposed algorithm utilized a stereo geometric approach to acquire pseudo-depth information of objects in the image. Based on the magnitude of pseudo-depth, both the detection boxes and trajectories were divided into multiple distinct subsets. When some objects were occluded but had significant differences in pseudo-depth, they were classified into different pseudo-depth levels, thereby avoiding matching conflicts. Subsequently, a pseudo-depth cost matrix was computed using the pseudo-depth information, and an IOU pseudo-depth (IOU-D) matching was performed within the same pseudo-depth level to associate occluded targets located at the same pseudo-depth level. Experimental results show that the proposed algorithm achieved 65.1% and 58.5% higher order tracking accuracy (HOTA) on the MOT17 and DanceTrack test sets, respectively. Compared to the baseline model, ByteTrack, the proposed algorithm improved by 2.0% and 10.8% on the two data sets, respectively. Experimental results indicate that effectively utilizing the potential pseudo-depth information in the image can significantly enhance the tracking accuracy of occluded targets.
Space three-dimensional (3D) reconstruction is important across various domains, including remote sensing, military, and aerospace. Among these, light field imaging technology stands out as widely utilized. Enhancing the image quality of light field images is paramount for achieving more accurate 3D reconstructions. First, integrating light field imaging into space imaging systems and designing a model based on wave optics streamline the imaging process, thereby simulating the original light field image. Subsequently, employing digital refocusing algorithms enables the acquisition of light field images at different focal planes. However, challenges such as errors induced by relative motion, inaccuracies in digital refocusing algorithms, and signal loss due to microlens arrays in the optical path lead to image blurring. Current image deblurring techniques could not fulfil the stringent quality standards of light field imaging. Hence, this study introduces an algorithm to alleviate blurring in remote sensing light field-refocused images. An energy function is constructed by leveraging the insight that image blur correlates with increased local minimum intensity values and decreased local maximum gradient values. An enhanced semi-quadratic splitting method facilitates the estimation of potential images and blur kernels, thus achieving deblurring. Experimental results demonstrate the superiority of the proposed algorithm over existing image deblurring techniques for processing light field-refocused images.
Multi-exposure image fusion addresses the issue of insufficient image sensors for capturing scenes with large dynamic ranges. Multiple images with different exposure levels in the same scene are fused to obtain a large-dynamic-range image that contains rich scene details. A self-adaptive weight-detail-preserving multi-exposure image-fusion algorithm is proposed to address the typical issues of insufficient image-detail preservation and edge halo in fusion. Contrast and structural components in image-block decomposition are used to extract fused structural weights and two-dimensional entropy is used to select brightness benchmarks to calculate exposure weights. Subsequently, saturation weights are used to better restore the brightness and color information of the scene in the fused image. Finally, double-pyramid fusion is used to fuse the source-image sequence at multiple scales to avoid unnatural halos at the boundaries and obtain a large-dynamic-range fused image that preserves more details. Seventy sets of multi-exposure images from three datasets are selected for experiments. The results show that the average values for the fusion-structure similarity and cross-entropy of the proposed algorithm are 0.983 and 2.341, respectively. Compared with classical or recent multi-exposure fusion algorithms, the proposed algorithm can maintain the brightness distribution of the scene while maintaining more image information, thus demonstrating its effectiveness. The proposed algorithm offers excellent fusion results and good visual effects.
Honeycomb lung is a CT imaging manifestation of various advanced lung diseases, characterized by diverse cystic lesions presenting a honeycomb-like appearance. Existing computer-aided diagnosis methods struggle to effectively address the low identification accuracy caused by the varied morphology and different locations of cellular lung lesions. Therefore, a combined CNN and Transformer model guided by lesion signals is proposed for cellular lung CT image recognition. In this model, a multi-scale information enhancement module is first employed to enrich the spatial and channel information of features obtained by CNN at different scales. Simultaneously, a lesion signal generation module is used to strengthen the expression of lesion features. Subsequently, Transformer is utilized to capture long-range dependency information of features, compensating for the deficiency of CNN in extracting global information. Finally, a multi-head cross-attention mechanism is introduced to fuse feature information and obtain classification results. Experimental results demonstrate that the proposed model achieves accuracies of 99.67% and 97.08% on the honeycomb lung and COVID-CT dataset, respectively. It outperforms other models, providing more precise recognition results and validating the effectiveness and generalization of the model.
Current multifocus fusion algorithms use only a single-image feature extraction scale, leading to problems such as loss of detail edges and local blurring in imaging. In response to these algorithms, this paper proposes a multifocus image fusion algorithm based on multiscale null U-Net. First, in the encoder part of U-Net, a multiscale null module was introduced to replace the traditional convolution module, which fully uses sensory fields with various scales to capture local and global information more comprehensively. In addition, to enhance the image feature characterization further, a RFB-s module was employed in the middle layer of U-Net to optimize the localization ability of multiscale features. The proposed fusion algorithm adopted the end-to-end supervised learning method in deep learning. This method was divided into three modules: feature extraction, feature fusion, and image reconstruction. Among these, the feature extraction module used U-Net containing multiscale null modules. Experimental results show that the fused images obtained using the proposed algorithm have clear detailed texture and are free of overlapping artifacts. Among all multifocus image fusion algorithms used for comparison, the proposed algorithm is optimal in terms of average gradient, visual information fidelity, and mutual information evaluation metrics. Additionally, this algorithm achieves suboptimal results close to the optimal results in edge information retention metrics. Meanwhile, the ablation experiment results further verify that the proposed multiscale null module can remarkably enhance the feature extraction capability of the network, thereby improving the quality of image fusion.
Aiming at the negative correlation between the depth of field and resolution of surgical microscopes, this study proposed an image acquisition scheme based on an ophthalmic surgical microscope system to expand the depth of field under the premise of high-resolution imaging. The involved binocular images have a large depth of field and high resolution respectively. Subsequently, a binocular image-fusion algorithm was designed according to the imaging characteristics obtained after the transformation. The involved focus detection results were employed as initial fusion decision maps, which were subsequently refined by combining the color and texture information of the images. The detailed information was effectively fused through double-scale decomposition. Experimental results show that the proposed scheme and fusion algorithm can be used to highlight the details of high-resolution images by preserving the clear range of large-depth-of-field images. The depth-of-field enhancement achieved using the proposed algorithm is >50% compared with that of the original algorithm. Overall, the proposed method is suitable for visual observation of large-depth-of-field and high-resolution surgical microscopic images, intraoperative two- and three-dimensional displays, and postoperative image preservation and analysis.
Most traditional fully supervised person search approaches are only applicable to one data domain and have limited generalization ability on unknown data domains. Researchers have recently started studying domain-adaptive person search, aiming to improve the generalization ability of the involved model for unknown target domains, where domain alignment and reliable positive and negative generations are the primary challenges. To this end, herein, a domain-adaptive person search approach with diverse images and instance augmentation is proposed, which aims to effectively achieve domain alignment and reliable positive and negative generations. This approach introduces two novel modules: source-domain image augmentation and negative-enhanced re-id learning modules. The former aims to improve the domain adaption ability of the involved model and the detection precision on target domains by only enhancing source-domain data diversity. Meanwhile, the latter introduces a diverse-negative mining module to enrich the diversity of negatives and improve the discriminability of learned re-id features. The proposed modules were only used during training, which did not increase the involved test inference time. Experiments were performed on two widely employed person search datasets: CUHK-SYSU and PRW, demonstrating the effectiveness of the proposed approach and its superiority over traditional people search approaches. For instance, the proposed approach achieves mean average precision (mAP) of 40.8% on the PRW test set, indicating higher performance than that of the existing domain-adaptive approach DAPS by 6.1 percentage points.
Aiming at the problem of six-degree-of-freedom pose estimation for noncooperative targets in space, this research involved designing a lightweight network named LSPENet based on convolutional neural networks, which could be used to realize end-to-end pose estimation without manually designing features. We used depth-separable convolution and efficient channel attention (ECA) to form the basic module, which balanced the complexity and accuracy of the network. One branch was designed for location estimation using direct regression, and another branch was designed for orientation estimation by introducing soft-assignment coding. Experimental results on the URSO dataset show that soft-assignment coding-based orientation estimation exhibits substantially lesser errors than direct regression-based orientation. Further, compared with the other end-to-end pose estimation network, the proposed network reduces parameter count by 76.7% and decreases single-image inference time by 13.3%, while simultaneously improving location estimation accuracy by 54.6% and orientation estimation accuracy by 57.8%. Overall, LSPENet provides a new idea for monocular visual pose estimation on board.
This paper proposes a feature matching method that combines an adaptive keyframe strategy with motion information to address the problem that the feature matching accuracy of the visual inertial navigation system decreases due to blurred imaging and maneuvering in dynamic environments. First, we propose an adaptive keyframe strategy to improve the quality of keyframe selection by establishing an updating criterion for keyframes based on four indicators: time, inertial motion, imaging clarity, and parallax. Second, the common viewing region among adjacent keyframes is identified through geometric transformation of the image based on inertial motion to enhance feature detectability. Next, an improved Oriented FAST and Rotated BRIEF (ORB) feature method based on the Gaussian image pyramid is used to improve the matching accuracy of feature points. Finally, the performance of the proposed method is verified using EuRoC public datasets. The results show that the proposed method has better accuracy and robustness in applications with dynamic scenes, such as illumination changes and image blur.
The rapid and accurate three-dimensional (3D) reconstruction of ocean waves holds paramount significance for marine engineering research. To address the issues of low processing efficiency in traditional ocean wave 3D reconstruction algorithms and the accuracy affected by too many holes during the generation of point clouds, this paper proposes an approach that combines disparity mask and self-supervised learning for 3D ocean wave reconstruction. First, the disparity images are obtained through training network model based on image reconstruction, disparity smoothness, and left-right disparity consistency losses. Second, a mask decoder is added to generate disparity mask images. Finally, through leveraging prior knowledge of common disparity regions, a novel mask loss function is designed to mitigate the impact of disparity noise in non-common regions and ocean surface occlusion problems. The experimental results on the Acqua Alta dataset demonstrate that the proposed method can reduce noise in ocean wave point clouds effectively. In the case of precision close to the traditional algorithm, the point cloud reconstruction speed reached 0.024 seconds per frame.
In the realm of unmanned aerial vehicle aerial photography, images obtained from disparate sensors often exhibit significant parallax and resolution disparities, which can lead to failures in image registration processes. Addressing this challenge, this study introduces an innovative approach for the registration of infrared and visible light images, utilizing a rotation-invariant Gabor representation descriptor. The methodology commences by resolving the image's weighted matrix, followed by the application of the Harris algorithm to the weighted matrix within the context of phase congruence, thereby pinpointing the image's key features. Subsequently, the Gabor representation framework is refined to precisely ascertain the orientation of key features, effectively mitigating the impact of substantial parallax. To further enhance the process, the nearest neighbor matching (NNM) algorithm, in tandem with fast sampling consistency (FSC), is deployed to filter out outliers and augment the accuracy of matches. The technique demonstrates an average accuracy of 46%, 72%, and 62% across the CVC-15 stereo, LWIR-RGB long-wave infrared, and proprietary datasets, respectively. Correspondingly, the average processing times are 6.886 seconds, 7.800 seconds, and 9.631 seconds. Experimental results prove that the efficacy of the proposed method, particularly in scenarios where the images to be registered present considerable parallax and resolution differences.
This paper proposes an improved adaptive two-dimensional gamma correction method based on the illumination component and target mean value to address the issue of over-enhancement in nonuniformly illuminated images. The process begins with the conversion of images to the HSV space, from which the V-channel image is extracted for processing. Utilizing the illumination-reflection model, the illumination component is estimated through a guided image filter with good edge retention. Concurrently, the V-channel image region is segmented into bright and dark regions, and a target mean function with varying adjustment coefficients is established. The illumination component and adaptive target mean value are used to act on the gamma function for two-dimensional gamma correction, and histogram equalization is subsequently performed. The final output is obtained by merging V-channel component with the H and S channels and converting it back to the RGB space. Experimental evaluations on DICM and LIME datasets reveal that in comparison to four typical enhancement algorithms, the proposed algorithm achieves an average increase of 10.6% in information entropy, 97.5% in mean gradient (MG), and 77.8% in signal-to-noise ratio (SNR), with an average processing time of 0.32 s. These enhancements significantly improve the visual quality of images, making them more suitable for machine vision research. The proposed algorithm offers advantages in terms of high real-time performance and simplicity and produces output images with more natural colors, uniform brightness, clearer details, and an overall enhanced visual effect.
In the task of lithology classification, the feature information obtained from a single data source is limited. Hence, multisource data fusion is an important means by which to improve the accuracy of lithology classification. As typical remote sensing data sources, aerial remote sensing images and digital elevation models can provide complementary spectral and elevation information. In order to improve the accuracy of lithology classification, a new lithology classification method for multisource remote sensing data is proposed. The proposed method combines the spatial attention mechanisms of channel and multiscale convolutional neural networks. Additionally, this method enhances the learning ability of convolutional neural networks on deep features of aerial remote sensing images and digital elevation models by designing a multiscale void convolutional module to better capture the spatial relationships of features and effectively eliminate the structural differences of heterogeneous data in the original data space. By designing local and global multiscale channel spatial attention modules, different weights can be assigned to spectral channels and spatial regions of multisource data in an adaptive way to both realize more targeted training of the network by using the significance of features and further improve the classification performance of the model. Finally, a basin in Sichuan province is taken as the study area to validate the proposed techniques. The experimental results show that the proposed method is significantly better than four typical machine learning methods in the overall accuracy and average accuracy, which proves that the proposed multisource data fusion method can make full use of the complementary advantages of different data sources and effectively improve the discrimination accuracy of geological lithology.
Low-light object detection is a major challenge in object detection tasks. Conventional methods for object detection exhibit significant performance degradation under low-light conditions, and existing low-light object detection methods consume excessive computational resources, making them unsuitable for deployment on devices with limited computing capabilities. To address these issues, this study proposes an end-to-end lightweight object detection algorithm called low-light YOLO (LL-YOLO). To tackle the problem of unclear and difficult-to-learn features in low-light images, a low-light image generation algorithm is designed to generate low-light images for training the detector, assisting it in learning feature information in low-light environments. In addition, the network structure of the detector is adjusted to reduce the loss of feature information during computation, thereby enhancing the model's sensitivity to feature information. Furthermore, to mitigate the problem of severe noise interference on feature information in low-light images, an aggregation ELAN (A-ELAN) module for aggregating peripheral information is proposed that uses depth-wise separable convolution and attention mechanisms to capture contextual information, enhance the obtained feature information, and weaken the impact of noise. Experimental results demonstrate that the LL-YOLO algorithm achieves a mAP@0.5 of 81.1% on the low-light object detection dataset ExDark, which is an improvement of 11.9 percentage points over that of the directly trained YOLOv7-tiny algorithm. The LL-YOLO algorithm exhibits strong competitiveness against existing algorithms.
This study presents a nonuniform dehazing method based on hierarchical weight interaction and Laplacian prior to address the issues of detail loss and residual haze in nonuniform hazy images, which often result in degraded image quality. First, a hierarchical weight interaction module is introduced in the baseline network to adaptively adjust weights and perform a weighted fusion of feature maps at different scales. Furthermore, a global receptive field aggregation module is introduced to enrich the receptive field, allowing the model to comprehensively understand the content information in the image. Then, a frequency domain information branch is introduced to decompose the image into low-frequency and high-frequency components using wavelet functions. The low-frequency component contains global structural information, whereas the high-frequency component provides detailed local information. This decomposition collectively enhances the image clarity. Finally, a Laplacian loss is incorporated to reconstruct the image, effectively restoring its fine-grained features and improving the quality of the generated images. Experimental results show that the proposed algorithm achieves superior results on the test set, with an increase in peak signal-to-noise ratio (PSNR) by 0.8 dB, 1.54 dB, 1.14 dB, and 0.23 dB compared with the original algorithm on four datasets.
Domain diversity between different datasets poses an evident challenge for adapting the person re-identification (Re-ID) model trained on one dataset to another. State-of-the-art unsupervised domain adaptation methods for person Re-ID optimize the pseudo labels created by clustering algorithms on the target domain; however, the inevitable label noise caused by the clustering procedure is ignored. Such noisy pseudo labels substantially hinder the model's ability to further improve feature representations on the target domain. To address this problem, this study proposes a mutual teaching approach for unsupervised domain adaptation of person Re-ID based on relation-aware attention (RAA) and local feature relearning (FRL). For feature extraction, we employ multi-channel attention to capture the corresponding local features of a person and use spatial-channel correspondence to relearn discriminative fine-grained details of global and local features; thereby, enhancing the network's feature representation capabilities. We also use RAA to steer the two networks toward different feature regions to enhance their distinctiveness and complementarity. Extensive experiments were conducted on public datasets to validate the proposed method. The experimental results show that the proposed method performs well in multiple-person Re-ID tasks.
One of the common challenges encountered in microexpression recognition using convolutional neural networks is the heightened complexity caused by increased accuracy. To address this challenge, this study introduces an enhanced lightweight dual-stream attention network, called the enhanced dual-stream MISEViT network (EDSMISEViTNet), for microexpression recognition. First, microexpression samples are preprocessed, and peak frames are extracted as spatial features. Additionally, the TV-L1 optical flow method is used to extract the temporal features between the start frame and the vertex frame of each sample. Furthermore, this study improves the MobileViT network by designing an MI module that combines Inception and SE modules and introduces an attention module for efficient feature extraction. Temporal and spatial features are separately fed into this network, and the resultant features are concatenated, fused, and subsequently subjected to classification. To enhance precision, the CASME II, SAMM, and SMIC datasets are combined into a composite dataset for experimentation. The results reveal that the proposed algorithm model requires a training parameter count of only 3.9×106 and processes a single sample in just 71.8 ms. Compared with the existing methods, this approach achieves excellent accuracy while maintaining a low parameter count.
To address issues such as detail loss, artifacts, and unnatural appearance associated with current low-illumination image enhancement algorithms, a multiscale low-illumination image enhancement algorithm based on brightness equalization and edge enhancement is proposed in this study. Initially, an improved Sobel operator is employed to extract edge details, yielding an image with enhanced edge details. Subsequently, the brightness component (V) of the HSV color space is enhanced using Retinex, and brightness equalization is accomplished via improved Gamma correction, yielding an image with balanced brightness. The Laplacian weight graph, significance weight graph, and saturation weight graph are computed for the edge detail-enhanced image and brightness-balanced image, culminating in the generation of a normalized weight graph. This graph is then decomposed into a Gaussian pyramid, while the edge detail-enhanced image and brightness-balanced image are decomposed into a Laplacian pyramid. Finally, a multiscale pyramid fusion strategy is employed to merge the images, resulting in the final enhanced image. Experimental results demonstrate that the proposed algorithm outperforms existing algorithms on the LOL dataset in terms of average peak signal to noise ratio, structural similarity, and naturalness image quality evaluator. This algorithm effectively enhances the contrast and clarity of low-illumination images, resulting in images with richer detail information, improved color saturation, and considerably enhanced quality.
In this study, a YOLOv5-based underwater object detection algorithm is proposed to address the challenges of mutual occlusion among underwater marine organisms, low detection accuracy for elongated objects, and presence of numerous small objects in underwater marine biological detection tasks. To redesign the backbone network and improve feature extraction capabilities, the algorithm introduced deformable convolutions, dilated convolutions, and attention mechanisms, mitigating the issues of mutual occlusion and low detection accuracy for elongated objects. Furthermore, a weighted explicit visual center feature pyramid module is proposed to address insufficient feature fusion and reduce the number of failed detections for small objects. Moreover, the network structure of YOLOv5 is adjusted to incorporate a small object detection layer that uses the fused attention mechanism, improving the detection performance for small objects. Experimental results reveal that the improved YOLOv5 algorithm achieves a mean average precision of 87.8% on the URPC dataset, demonstrating a 5.3 percentage points improvement over the original YOLOv5 algorithm while retaining a detection speed of 34 frame/s. The proposed algorithm effectively improves precision and reduces missed and false detection rates in underwater object detection tasks.
Most intravascular ultrasound (IVUS) image segmentation methods lack capture of global information, and the topological relationships of segmentation results do not conform to medical prior knowledge, which affects subsequent diagnosis and treatment. To address the above issues, an image segmentation method is proposed that combines convolutional neural networks (CNN) and Transformer dual branch backbone networks with topological forcing networks. A backbone network is constructed by juxtaposing CNN branches and Transformer branches to achieve the fusion of local and global information. The modules that make up the Transformer branch combine axial self attention mechanism and enhanced mixed feedforward neural network to adapt to small datasets. In addition, connecting the topology forcing network after the backbone network and using a bilateral filtering smoothing layer instead of a Gaussian filtering smoothing layer can further improve the segmentation accuracy while ensuring the correctness of the topology structure of the segmentation results. The experimental results show that the Jaccard measure coefficients of the lumen and media obtained by the proposed method are 0.018 and 0.016 higher than those of the baseline network, and the Hausdorff distance coefficients are 0.148 and 0.288 higher, respectively. Moreover, the accuracy of the topological structure is 100%. This method can provide accurate, reliable, and topologically correct segmentation results for IVUS images, and performs well in visualization results and various evaluation indicators.
With the gradual expansion of human activities into space, Earth's outer space, especially its geosynchronous orbit, is becoming increasingly crowded. A large amount of space debris is generated from abandoned space equipment and space activity waste. Scattered space debris may cause space accidents, leading to damage or derailment of space equipment. Therefore, space object detection systems are of great significance for ensuring the safety of the space environment. Stellar image preprocessing can improve image quality and target signal-to-noise ratio (SNR), which is significant for space target recognition, space target tracking, spacecraft navigation, and spacecraft attitude determination. This study mainly focuses on image denoising, background correction, threshold processing, and centroid extraction. The existing methods and their advantages and disadvantages are summarized, and the corresponding improvement methods are proposed. For image denoising and background correction, different algorithms are validated using a real stellar image. Additionally, the processing effects are analyzed using SNR gain and background suppression factor, and the effect for the targets with different SNRs are analyzed. Consequently, the neighborhood maximum filtering and improved background correction methods are proposed. In the threshold processing section, we analyze the histogram characteristics of real stellar images and propose an iterative adaptive threshold method based on them. For centroid extraction, we use Gaia data to generate a simulated stellar image based on the Gaussian point spread function. After adding white noise, we analyze the sub-pixel centroid extraction error and calculation time of different algorithms. Finally, based on the study results, the urgent need for future space target recognition is pointed out, and relevant suggestions are proposed.
Accurate detection of foreign objects on ballastless railway track beds is crucial for ensuring train safety. Although unsupervised anomaly detection algorithms based on deep learning address the effect of insufficient abnormal data on detection, the "generalization" ability of the encoder can proficiently reconstruct anomalous instances, thereby affecting detection accuracy. To solve this problem, this study proposes an anomaly detection framework for ballastless track beds utilizing image inpainting. First, inpainting was employed to obscure and subsequently restore the image using training on nonanomalous and incomplete image data, aiming to improve the model's contextual semantic understanding and enhance its reconstruction ability. Second, the maximum value obtained from the average anomaly map of the test and reconstructed images, which was analyzed across multiple scales, was utilized as the reconstruction error to calculate the anomaly score. This step aimed to widen the reconstruction error boundary between the abnormal and normal images. Finally, experimental results show a notable advantage of the proposed algorithm over alternative methods on public datasets, such as MNIST, CIFAR-10, and the ballastless track bed dataset.
The existing deep-learning-based high dynamic range (HDR) image reconstruction methods used for HDR image reconstruction are prone to losing detailed information and providing poor color saturation. This is because the input image is overexposed or underexposed. To address this issue, we propose a dual-attention network-based HDR image reconstruction method. First, this method utilizes the dual-attention module (DAM) to apply the attention mechanism from pixel and channel dimensions, respectively, to extract and fuse the features of two overexposed or underexposed source images, and obtain a preliminary fusion image. Next, a feature enhancement module (FEM) is constructed to perform detail enhancement and color correction for the fused images. The final reference to contrastive learning is generating images closer to the reference image and away from the source image. After multiple trainings, the HDR image is finally generated. The experimental results show that our proposed method achieves the best evaluation results on peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and learned perceptual image patch similarity (LPIPS). Moreover, the generated HDR image exhibits good color saturation and accurate details.
Aiming at the problem that the inter-layer resolution of computed tomography (CT) sequence images is much lower than the intra-layer resolution, an inter-layer interpolation network for CT images combined with feature pyramid and deformable separated convolution is proposed. The network consists of two modules, the image generation module and the image enhancement module. The image generation module utilizes the MultiResUNet to achieve feature extraction of the input image, and uses two different sets of deformable separation convolutions to generate candidate inter-layer images by performing convolution operations on the input image respectively. The image enhancement module fuses the multi-scale features of the input image through the feature pyramid and the image synthesis network, and generates additional images focusing on contextual details to further enhance the texture details of the candidate inter-layer images. The experimental results show that the inter-layer images generated by the proposed inter-layer interpolation network achieve better results in both qualitative and quantitative analysis, and perform better in the processing of image edge contours and texture details, which can effectively improve the inter-layer resolution of CT sequence images.
In few-shot image classification tasks, capturing remote semantic information in feature extraction modules based on convolutional neural network and single measure of edge-feature similarity are challenging. Therefore, in this study, we present a few-shot image classification method utilizing a graph neural network based on Swin Transformer. First, the Swin Transformer is used to extract image features, which are utilized as node features in the graph neural network. Next, the edge-feature similarity measurement module is improved by adding additional metrics, thus forming a dual-measurement module to calculate the similarity between the node features. The obtained similarity is used as the edge-feature input of the graph neural network. Finally, the nodes and edges of the graph neural network are alternately updated to predict image class labels. The classification accuracy of our proposed method for a 5-way 1-shot task on Stanford Dogs, Stanford Cars, and CUB-200-2011 datasets is calculated as 85.21%, 91.10%, and 91.08%, respectively, thereby achieving significant results in few-shot image classification.
A small-target traffic sign detection algorithm based on multiscale feature fusion is proposed to address the limited effectiveness of the existing target detection algorithms in detecting traffic signs with small sizes or inconspicuous features. First, a bidirectional adaptive-feature pyramid network is designed to extract all detail features and jump connections to enhance multiscale feature fusion. Second, a dual-head detection structure is proposed for the scale characteristics of small targets, focusing on small-target feature information while reducing the number of model parameters. Next, using the Wise-IoU v3 bounding box loss function and a dynamic nonmonotonic focusing mechanism, the harmful gradients generated by low-quality examples are reduced by employing the anchor-box gradient gain allocation strategy. Finally, coordinate convolution (CoordConv) is incorporated into the feature extraction network to enhance the spatial awareness of the model by improving the network's focus on coordinate information. The experimental results on the Tsinghua-Tencent 100K dataset show that the improved model has a mean average precision (mAP) of 87.7%, which is a 2.2 percentage points improvement over YOLOv5s. Moreover, the number of parameters is only 6.3×107, thereby achieving a detection effect with fewer parameters and higher accuracy.
To address the low hyperspectral image (HSI) pixel classification accuracy caused by few number of labeled training samples and high-dimensional spectral data, this study proposes a self-supervised learning-based feature extraction method to extract low-dimensional features representing crucial information of HSI data. First, an unsupervised data augmentation was used to expand the HSI training dataset. Then, the feature extraction module constructed by an external attention module was trained using an extended training dataset under a self-supervised framework. The self-affinity features between bands of a single sample and the potential correlation between different samples were extracted under the supervision of the intrinsic co-occurrence attributes of the spectral data. Finally, the trained feature extraction module was applied to reduce the dimension of raw HSI data, and the low-dimensional features were input to the subsequent classifier to classify the HSI pixels. The feasibility and effectiveness of the proposed method were evaluated through quantitative evaluation of dimensionality reduction results on Indian Pines, Salinas, and Pavia University datasets. The experimental results show that the feature extraction module generated using the proposed method can fully extract the spatial-spectral features from the original spectra. The proposed method is insensitive to the size of the training set and is suitable for small-sized HSI data.
To resolve the problems of scattered focus-edge blurring, artifacts, and block effects during the multifocus image fusion, an algorithm based on low-rank and sparse matrix decomposition (LRSMD) and discrete cosine transform (DCT) is designed to achieve the multifocus image fusion. First, the source images were decomposed into low-rank and sparse matrices using LRSMD. Subsequently, the DCT-based method was designed for detecting the focus regions in the low-rank matrix part and obtaining the initial focus decision map. The decision map was verified using the repeated consistency verification method. Meanwhile, the fusion strategy based on morphological filtering was designed to obtain fusion results of the sparse matrix. Finally, the two parts were fused using the weighted reconstruction method. The experimental results show that the proposed algorithm has the advantages of high clarity and full focus in subjective evaluations. The best results for the four metrics, including edge information retention, peak signal-to-noise ratio, structural similarity, and correlation coefficient in objective evaluations, improved by 62.3%, 6.3%, 2.2%, and 6.3%, respectively, compared with the other five mainstream algorithms. These improvement results prove that the proposed algorithm effectively improves focused information extraction from source images and enhances the focused edge detail information. Furthermore, the algorithm is crucial for reducing the artifact and block effects.
The emergence and application of attention mechanisms have addressed some limitations of neural networks concerning the utilization of global information. However, common attention modules face issues with the receptive field being too small to focus on overall information. Moreover, existing global attention modules tend to incur high computational costs. To address these challenges, a lightweight, universal attention module, termed"global-sampling spatial-attention module", is introduced herein based on convolution, pooling, and comparison methods. This module relies on the comparison methods to derive spatial-attention maps for intermediate feature maps generated during deep network inference. Moreover, this module can be directly integrated into convolutional neural networks with minimal costs and can be end-to-end trained with the networks. The introduced module was primarily validated using a randomly selected subset of the ImageNet-1K dataset and a proprietary low-slow-small drone dataset. Experimental results show that compared with other modules, this module exhibits an improvement of approximately 1?3 percentage points in tasks related to image classification and small object detection and recognition. These findings underscore the efficacy of the proposed module and its applicability in small object detection.
To solve the problem of insufficient use of source image information by existing fusion methods, a method is proposed using rolling guided filter and anisotropic diffusion to extract the base and detail layers of an image, respectively. These layers were then fused using visual saliency mapping and weight map construction, and a certain weight was added to merge the fused layers into the final image. The proposed method was tested and verified using several scenes from an open dataset. The experimental results show that the final images obtained using the proposed method exhibit better contrast, retain richer texture features at edge details, and maintain a uniform image pixel intensity distribution; furthermore, the visual effects and fusion accuracy of the final images are better than other existing fusion methods. Moreover, significant progress has been made in indicators, such as average gradient, information entropy, and spatial frequency.
In this study, a low-light image enhancement algorithm based on multiscale depth curve estimation is proposed to address the poor generalization ability of existing algorithms. Low-light image enhancement is achieved by learning the mapping relationship between normal images and low-light images with different scales. The parameter estimation network comprises three encoders with different scales and a fusion module, facilitating the efficient and direct learning for low-light images. Furthermore, each encoder comprises cascaded convolutional and pooling layers, thereby facilitating the reuse of feature layers and improving computational efficiency. To enhance the constraint on image brightness, a bright channel loss function is proposed. The proposed method is validated against six state-of-the-art algorithms on the LIME, LOL, and DICM datasets. Experimental results show that enhanced images with vibrant colors, moderate brightness, and significant details can be obtained using the proposed method, outperforming other conventional algorithms in subjective visual effects and objective quantitative evaluations.
To address the prevalent focus on reducing the parameter counts in current efficient super-resolution reconstruction algorithms, this study introduces an innovative efficient global attention network to solve the issues regarding neglecting hierarchical features and the underutilization of high-dimensional image features. The core concept of the network involves implementing cross-adaptive feature blocks for deep feature extraction at varying image levels to remove the insufficiency in high-frequency detail information of images. To enhance the reconstruction of edge detail information, a nearest-neighbor pixel reconstruction block was constructed by merging spatial correlation with pixel analysis to further promote the reconstruction of edge detail information. Moreover, a multistage dynamic cosine thermal restart training strategy was introduced. This strategy bolsters the stability of the training process and refines network performance through dynamic learning rate adjustments, mitigating model overfitting. Exhaustive experiments demonstrate that when the proposed method is tested against five benchmark datasets, including Set 5, it increases the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) performance metrics by an average of 0.51 dB and 0.0078, respectively, and trims the number of parameters and floating-point operations (FLOPs) by an average of 332×103 and 70×109 compared with leading networks. In conclusion, the proposed method not only reduces complexity but also excels in performance metrics and visualization, thereby attaining remarkable network efficiency.
A segmentation network of heart magnetic resonance image that combines prior knowledge in the frequency domain and feature fusion enhancement is proposed to address the issues of unclear boundaries caused due to the small grayscale differences among the heart substructures in heart magnetic resonance images and the varying shapes and sizes of the right ventricular region, affecting segmentation accuracy. The proposed model is a D-shaped structured network comprising a frequency domain prior guidance and feature fusion enhancer subnetworks. First, the original image is transformed from the spatial domain to the frequency domain using Fourier transform, extracting high-frequency edge features and combining the low-level features of the frequency domain prior-guided subnetwork with the corresponding stages of the feature fusion enhancement subnetwork for improving the edge recognition ability. Second, a feature fusion module with local and global attention mechanisms is introduced at the jump connection of the feature fusion enhancer network to extract contextual information and obtain rich texture details. Finally, the Transformer module is introduced at the bottom of the network to further extract long-distance semantic information, enhance the expression ability of the model, and improve segmentation accuracy. Experimental results on the ACDC dataset reveal that compared to existing methods, the proposed method achieves the best results in objective indicators and visual effects. Good cardiac segmentation results can provide reference for subsequent image analysis and clinical diagnosis.
At present, most of the multi-spectral pedestrian detection algorithms focus on the fusion methods of visible light and infrared images, but the number of parameters to fully fuse multi-spectral images is huge, resulting in lower detection speed. To solve this problem, we propose a multi-spectral pedestrian detection algorithm based on YOLOv5s with high timeliness. To ensure the detection speed of the algorithm, we select the merging method of visible light and infrared light channel direction as the input of the network, and improve the detection accuracy by improving the traditional algorithm. First, some standard convolution is replaced by deformable convolution to enhance the ability of the network to extract irregular shape feature objects. Second, the spatial pyramid pooling module in the network is replaced by multi-scale residual attention module, which weakens the interference of the background to the pedestrian target and improves the detection accuracy. Finally, by changing the connection mode and adding the large-scale feature splicing layer, the minimum detection scale of the network is increased, and the detection effect of the network for small targets is improved. Experimental results show that the improved algorithm has obvious advantages in detection speed, and improves the mAP@0.5 and mAP@0.5∶0.95 by 5.1 and 1.9 percentage points over the original algorithm, respectively.
Herein, an end-to-end deep neural network based on iterative adaptive filtering principle is proposed. This network aims to solve the significant image edge blurring caused by the optical structure of simple lenses. A pixel level deblurring filter is proposed, using a single glued lens with a large field of view, to effectively adapt to the spatial changes of blur and restore the blurry features of the input image. The effectiveness of the proposed method is verified through simulation and experiments conducted on a prototype camera system.
Recently, attention mechanisms have been widely applied for image super-resolution reconstruction, substantially improving the reconstruction network's performance. To maximize the effectiveness of the attention mechanisms, this paper proposes an image super-resolution reconstruction algorithm based on an adaptive two-branch block. This adaptive two-branch block designed using the proposed algorithm includes attention and nonattention branches. An adaptive weight layer would dynamically balance the weights of these two branches while eliminating redundant attributes, thereby ensuring an adaptive balance between them. Subsequently, a channel shuffle coordinate attention block was designed to achieve a cross-group feature interaction to focus on the correlation between features across different network layers. Furthermore, a double-layer residual aggregation block was designed to enhance the feature extraction performance of the network and quality of the reconstructed image. Additionally, a double-layer nested residual structure was constructed for extracting deep features within the residual block. Extensive experiments on standard datasets show that the proposed method has a better reconstruction effect.
A LiDAR point object primitive obtaining method still encounters challenges, such as large computation amount and ineffective segmentation for different building roof planes. A point object primitive obtaining method based on multiconstraint graph segmentation is proposed to address these challenges. A graph-based segmentation strategy is adopted for this method. First, constraint conditions of adjacent points are used to construct a network graph structure to reduce the complexity of the graph and improve the efficiency of the algorithm. Subsequently, the angle of the normal vectors of adjacent nodes is constrained using a threshold value to divide the point cloud located in the same plane into the same object primitive. Finally, the maximum side length constraint is performed to separate the building point cloud from its adjacent vegetation points. Three sets of public test data provided by the International Society for Photogrammetry and Remote Sensing (ISPRS) and two datasets located in Wuhan University were selected for testing to verify the validity of the proposed method. Experimental results show that the proposed method can effectively divide different roof planes of buildings. DBSCAN and spectral clustering methods were used for comparison, and precision, recall, and F1 score were adopted as evaluation indexes. Compared with the other two methods, the proposed method achieves the best overall segmentation results in case of the five datasets with different building environments, with better recall and F1 score.