Laser & Optoelectronics Progress
Co-Editors-in-Chief
Dianyuan Fan
2025
Volume: 62 Issue 4
39 Article(s)
Yihe Huang, Junli Wang, Longsheng Wang, Pengfa Chang, and Anbang Wang

In this study, we propose a novel scheme for expanding chaotic keys for encrypting and decrypting image and video signals. The process begins with achieving chaos synchronization in semiconductor lasers driven using a common signal over a 130-km optical fiber link with a synchronization coefficient of 0.945. The resulting synchronized chaotic signal is processed through dual-threshold quantization, and error bits are removed through lower-triangle reconciliation, thereby yielding consistent keys at 1 Gbit/s. These keys are expanded to 80 Gbit/s using the Mersenne twister algorithm. Analysis shows that they can pass the NIST tests, thereby demonstrating good randomness and security. Thus, the encryption and decryption of image and video signals using these expanded keys is experimentally demonstrated.

Feb. 25, 2025
  • Vol. 62 Issue 4 0437001 (2025)
  • Ruifang Zhang, Yiting Du, and Xiaohui Cheng

    This paper proposes a new bidirectional weighted multiscale dynamic approach, the BiEO-YOLOv8s algorithm, to enhance the detection of small targets in aerial images. It effectively addresses challenges such as complex backgrounds, large-scale variations, and dense targets. First, we design a new ODE module to replace certain C2f modules, enabling the accurate, quick, and multiangle location of target features. Then, we develop a bidirectional weighted multiscale dynamic neck network structure (BiEO-Neck) to achieve deep fusion of shallow and deep features. Second, adding a small object detection head further enhances feature extraction capability. Finally, the generalized intersection union ratio boundary loss function is used to replace the original boundary loss function, thereby enhancing the regression performance of the bounding box. Experiments conducted on the VisDrone dataset demonstrat that as compared to the base model YOLOv8s, the proposed model achieved a 6.1 percentage points improvement in mean average precision, with a detection speed of only 4.9 ms. This performance surpasses that of other mainstream models. The algorithm effectiveness and adaptability are further confirmed through universality testing on the IRTarget dataset. The proposed algorithm can efficiently complete target detection tasks in of unmanned aerial vehicle aerial images.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437002 (2025)
  • Xiaodong Zhang, Linghan Zhu, Shaoshu Gao, Xinrui Wang, and Shuo Wang

    To address the issue of limited denoising effectiveness caused by the lack of ground-truth images during the training of self-supervised image denoising methods, a multistage self-supervised denoising method based on a memory unit is proposed. The memory unit modularly stores intermediate denoising results, which resemble clear images, and collaboratively supervises the network training process. This ability allows the network to learn not only from noisy images but also from the intermediate outputs during training. Additionally, a multistage training scheme is introduced to separately learn features from flat and textured areas of noisy images, while a spatial adaptive constraint balances noise removal and detail retention. Experimental results show that the proposed method achieves peak signal-to-noise ratios of 37.30 dB on the SIDD dataset and 38.52 dB on the DND dataset, with structural similarities of 0.930 and 0.941, respectively. Compared with existing self-supervised image denoising methods, the proposed method remarkably improves both visual quality and quantitative metrics.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437003 (2025)
  • Haoqian Wang, Ju Liu, Teng Li, Zhongjie Xu, Xiang'ai Cheng, and Zhongyang Xing

    To address the limitations of traditional image restoration techniques in accurately restoring laser interference images, this paper proposes a nove deep learning framework. This framework leverages convolutional neural networks and a multi-head attention mechanism to extract multi-scale features, thereby enhancing the understanding and restoration of image structures. Experiments are conducted on a synthetic laser interference image dataset comprising 5 scenes, each scene containing 5000 images. Experimental results reveal that the proposed framework visually restores images affected by laser interference and achieves high peak signal-to-noise ratio (PSNR) and structural similarity (SSIM). In particular, the PSNR and SSIM values for the reconstructed images, across various levels of image damage, exceed 34 dB and 0.98, respectively. The proposed method holds promise for broad applications in laser interference scenarios and offers valuable support for military defense and civilian technologies.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437004 (2025)
  • Zhichao Zhou, Jianping Zhou, Xiaodong Yang, Xiaojing Wan, and Binbin Gui

    Inspection robots have become critical tools for roller detection in belt conveyors. However, the infrared images detected by these robots often suffer from low resolution and a low signal-to-noise ratio, thereby introducing higher requirements for target detection algorithms. In this study, we propose improvements to inspection robots for roller detection tasks in belt conveyors based on the YOLOv5 network. Inspired by DenseNet, we first introduce dense connection modules into the YOLOv5 network to enhance its feature extraction capabilities. We then introduce a Wise-IoU (WIoU) loss function to evaluate the quality of anchor rectangles and in turn improve network performance and generalization capabilities. Experimental evaluations on a dataset of infrared data collected by inspection robots on belt conveyors demonstrate that, compared with the original YOLOv5, the recall rate and mean average precision are improved by 2.4 percentage points and 1.5 percentage points, respectively (with the latter reaching 98%), while a recognition speed of 80 frame/s and model size of 15 MB are maintained. The improved inspection robot features a small size, fast speed, and high efficiency.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437006 (2025)
  • Yabo Liu, Xiaoquan Yang, and Tao Jiang

    To address the problems that existing methods have difficulty in achieving a high compression ratio and low distortion when processing whole-brain data of macaque with high dynamic range, this paper proposes an end-to-end multi-scale compression network based on the U-Net framework. First, the stability of the network is increased and high-frequency information of the image data is preserved by establishing a multi-level controllable jump connection between the compression module and the reconstruction module. Then, the data output by the coding module are processed with straight-through estimation quantization to accelerate the modeling process of the probability model and improve the compression ratio. Experimental results show that the rate-distortion curves of the network on the cellular architecture dataset and the nerve fiber dataset are better than those of other mainstream deep learning methods and the traditional JPEG2000 method. Under a compression ratio of 160, the multi-scale structural similarity index is not less than 0.99.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437007 (2025)
  • Shun Cheng, Jianrong Li, Zhiqian Wang, Shaojin Liu, and Muyuan Wang

    To address the challenges of low recognition accuracy and high computational complexity in underwater optical target recognition algorithms, a lightweight YOLOv8 underwater optical recognition algorithm based on automatic color equalization (ACE) image enhancement is proposed. Initially, we apply the ACE image enhancement algorithm to preprocess images. Subsequently, we improve the feature extraction capabilities by replacing the YOLOv8 backbone with an upgraded SENetV2 backbone network. To further decrease computational quantity, we introduce a lightweight cross-scale feature fusion module in place of the neck network. Then, we utilize DySample as a substitute for the traditional upsampler to improve image processing efficiency. We refine the DyHead detection head to better perceive targets. Finally, we enhance the accuracy of bounding box regression by replacing loss function of YOLOv8 with InnerMPDIoU based on the minimum point distance intersection ratio (MPDIoU). Experimental results show that the proposed SCDDI-YOLOv8 algorithm achieves a mean average precision of 77.3% and 71.5% on the URPC2020 and UWG datasets, respectively, while reducing parameters by ~20.7%, floating-point operations by 6×108, and model size by 1.2 MB compared with the original YOLOv8n. Compared with other advanced algorithms, the proposed algorithm can meet the sensitive computational needs of edge devices.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0437008 (2025)
  • Houde Wu, Zhenyi Liu, Hongchang Wang, Chiyao Li, Xiaoxue Gu, and Ruiqi Guo

    Owing to the absorption and scattering of water and its impurities in underwater optical imaging, the underwater images obtained by conventional optical imaging methods have disadvantages such as low image contrast and short visible distance. In this study, a laser field synchronous scanning underwater optical imaging system is proposed. First, a 520-nm-wavelength line-structure laser is employed as the light source, and a Galvo galvanometer controls the line laser beam to scan the target plane in one dimension to form a two-dimensional illumination light field. Subsequently, the camera uses a scientific complementary metal oxide semiconductor (sCMOS) image sensor with an electronic roll-shutter to compress the instantaneous field of view of the camera into a narrow strip by reducing the exposure time. Finally, the light source is controlled to synchronously illuminate the instantaneous field of view of the camera, and the overlap volume of the light path between the camera and the light source is significantly compressed, greatly reducing the influence of water backscattering on the imaging. In this study, turbidity and distance experiments are conducted in a pool. The results of the turbidity experiment reveal that when the water turbidity is 19.6 FTU, the peak contrast improves by 4.7 times, and the peak contrast signal-to-noise ratio improves by 2.1 times. In the distance experiment, when the water attenuation coefficient is 0.2, and the imaging distance is 13 m, the peak contrast improves by 5.6 times, and the peak contrast signal-to-noise ratio improves by 2.8 times. Clear imaging is achieved at the 15-m range. This method effectively reduces the impact of backscattering on image quality, facilitating further development of underwater tasks and demonstrating high applicability in the field of underwater imaging.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0411001 (2025)
  • Xunsheng Ji, Yin Zhu, and Jielong Yang

    Multiview three-dimensional (3D) reconstruction technology, which is an important driving technology in fields such as medical imaging and cultural relic protection, is gaining increasing attention and research interest. Despite recent successes in 3D reconstruction methods using implicit neural representations, challenges remain in terms of accurately capturing object details and processing complex scenes. To address these issues, a 3D reconstruction method based on a multiscale S-density strategy, called MS-Neus, is proposed. The proposed method attempts to improve the reconstruction quality and fidelity by obtaining more local information and expanding the expressiveness of implicit neural representations. The integration of information across different scales results in accurate and detailed reconstruction outcomes, showcasing rich details and realism in complex scenes. The experimental results obtained on the DTU MVS dataset demonstrate that the proposed method outperforms existing techniques for high-quality surface reconstruction, more accurately reproducing geometric shapes and detailed object features, particularly excelling in the management of complex geometric structures.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0411002 (2025)
  • Mengwei Qin, Bo Chen, Bingliang Li, and Jing Yang

    Traditional phase retrieval algorithms only utilize prior information, such as the non-negative constraints and support constraints of the signal, which makes it difficult to effectively reconstruct original signal undersampling conditions. Under the theoretical framework of compressive sensing, combined with the sparsity of natural images in the gradient domain, fractional total variation is incorporated into the phase retrieval model as prior information, and the proposed nonconvex optimized phase retrieval model is solved using the alternating direction multiplier method. The experimental results indicate that fractional order total variation converges faster than does integer order total variation at lower sampling rates. Compared with classical phase retrieval algorithms, such as HIO and RAAR, the proposed algorithm has stronger detail reconstruction ability in amplitude information phase retrieval with a low sampling rate and Gaussian noise pollution.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0411003 (2025)
  • Jie Cao, Yiqiang Peng, Likang Fan, Lingfan Mo, and Longfei Wang

    To address the issue of poor performance in detecting small objects by current 3D object detection algorithms based on the combination of point clouds and voxels, this paper proposes a 3D object detection algorithm based on a multi-attention mechanism (MA-RCNN). First, a channel attention mechanism is introduced in the PV-RCNN baseline algorithm to process the bird's-eye view features after compressing voxel features, aiming to propagate spatial information to feature channel levels. Second, a spatial attention mechanism is introduced to amplify locally important information, thereby enhancing the expressive power of the features. Then, in the refined candidate box network, a point cloud self-attention mechanism is designed to construct relationships between key points, thus enhancing the algorithm's understanding of spatial structures. Experimental results on the KITTI dataset show that compared to the baseline algorithm, MA-RCNN improves the mean average precision for small objects such as pedestrians and cyclists by 3.20 percentage points and 1.64 percentage points, respectively, demonstrating its effectiveness. Compared to current mainstream 3D object detection algorithms, MA-RCNN still achieves better detection performance, verifying its advanced nature. The MA-RCNN is deployed on the real vehicle hardware platform for online testing, and the results verify its industrial value.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412001 (2025)
  • Yan Wang, Jian Luo, Jin Tao, Hong Peng, and Siyi Chen

    A YOLOv8-PCB (printed circuit boards) defect detection model has been proposed to address the challenge of identifying small irregularly shaped surface defects on PCBs. This model incorporates the WIoUv3 loss function, which reduces penalties for low-quality anchor boxes, thereby accelerating algorithm convergence. It integrates shallow scale and small object detection heads to capture small defect features. The ADown downsampling technique is used in the backbone network to prevent excessive loss of contextual information while reducing the feature map's size. Furthermore, combining dynamic upsampling in the feature pyramid further improves the feature map's resolution, enhancing the model ability to detect PCB defect details. Experimental results show that the proposed model achieves an average accuracy of 98.37% and a recall rate of 96.39%. Compared with the benchmark model, average accuracy has increased by 3.62 percentage points, and the recall rate has risen by 5.49 percentage points. These enhancements significantly reduce missed detections and boost the model's overall detection performance.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412002 (2025)
  • Kaiyang Li, Lizhong Wang, Zhiwen Chang, and Sen Wang

    To address the problem wherein the traditional measurement method based on the fixed length segmentation of the cylindrical point cloud method cannot efficiently and accurately measure the bending radius of a bent tube, an adaptive cylindrical point cloud segmentation based on the bending segmentation of the bent pipe measurement method is proposed. First, the main direction of the bend point cloud is calculated based on the normal vector information of the bend. Next, the main direction of the bend point cloud is globally oriented by the diffusion method. Then, based on information related to the main direction and with the aid of the regional growth clustering method, adaptive segmentation of the bend point cloud is realized. For each segment of the cylindrical point cloud derived from the segmentation, the axes of the bends are extracted based on cylindrical fitting by the Levenberg-Marquardt (L-M) method. Finally, the bending segment nodes are obtained according to the critical curvature of the axis nodes, and a bending radius calculation is performed. Experimental results show that, for the bent section of a bent tube, this method has a measurement accuracy of no less than 0.18 mm. The proposed method is simple and efficient and requires very little manual operation. The method is proven to meet aerospace field requirements for bent tube measurements.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412003 (2025)
  • Jingfa Lei, Yuanhang Miao, Miao Zhang, Yongling Li, and Ruhai Zhao

    To solve the problem of missing digital fringe information when the binocular digital grating projection method is used to measure the objects with low surface reflectance such as coal, an adaptive multi-exposure image fusion binocular digital grating projection method is proposed in this paper. First, the initial images collected under initial exposure are segmented using the K-means clustering analysis method, and then the optimal exposure time for each pixel cluster is calculated. Second, images of projection white light and fringes are acquired based on the calculated optimal exposure time. Then, a method of multi-exposure image fusion is employed to synthesize new fringe images. Finally, coal block three-dimensional morphology reconstruction is performed using multi-frequency heterodyne method and phase matching. Experimental results show that the proposed method achieves a point cloud reconstruction rate of over 99% when photographing coal blocks, effectively overcoming the measurement difficulties caused by low reflectivity of the object being measured, reconstructing the surface morphology of coal blocks well.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412004 (2025)
  • Xiaolong Zhou, and Changjie Liu

    To address the current challenges in detecting weld porosity during vehicle production, such as difficulty in detection, low efficiency, and low accuracy, an improved model based on YOLOv5s is proposed. The model incorporates both spatial and channel attention modules in the Backbone network to enhance the feature extraction capability for weld porosity. Additionally, a bidirectional feature pyramid module is introduced in the Neck network to improve the feature fusion capability for small targets. Then, the loss function of the model is adjusted to strengthen the generalization ability for detecting small targets, specifically improving the localization accuracy of small defects. The improved model is validated on a vehicle production weld dataset, results show that the proposed model achieved a mean average precision of 62.9% and a detection frame rate of 89.72 frame/s. Compared with current mainstream object detection algorithms, the proposed model demonstrates superior overall performance.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412005 (2025)
  • Kaiqi Huang, and Chenkang Jin

    In automotive production, there is a significant issue of missed and false detections owing to the similarity in color between welding studs and stamped parts in stud detection. This study proposes a welding-stud detection method that combines deep perception with multi-scale feature fusion. First, the efficient multi-scale attention (EMA) module is integrated into the FasterBlock to construct the FasterEMA module, which enhances the spatial feature extraction capability using PConv and EMA. Second, a cascaded group attention mechanism is embedded in the transformer encoding layer to strengthen the network's perception of detailed features in deep layers and mitigate the interference of complex background information, thereby improving the accuracy of stud localization. Furthermore, the SN-CCFM module is constructed using GSConv from SlimNeck and VoV-GSCSP, replacing the standard convolution and RepBlock modules in the CCFM module to enhance the interaction between shallow and deep feature information, achieve multi-scale feature fusion, and improve detection accuracy. Finally, the detection network is combined with depth information obtained from RGB-D cameras to accurately determine the actual position of the studs. The experimental results demonstrate a recall of 86.6%, mean average precision of 88.2%, and detection speed of 45.3 frame/s, satisfactorily meeting industrial production requirements.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412006 (2025)
  • Yanhui Li, Zhongchun Fang, and Hairong Li

    To reduce computational costs and efficiently complete lane detection tasks, this paper proposes a multiframe lane detection method using a Swin Transformer embedded with a coordinate attention mechanism for lane detection in continuous multiframe image sequences. In this approach, continuous multiframe image sequences are taken as inputs and the Swin Transformer encoder-decoder architecture is adopted to ensure consistent input and output image sizes. The coordinate attention mechanism is embedded in patch merging from the stage 3 fusion layer of the Swin Transformer model, enhancing the model's focus on long-distance dependencies and its ability to extract both global and local features of lane lines. Additionally, introducing spatiotemporal long-short term memory between the encoder and decoder boosts the model's ability to predict temporal sequence information, significantly improving the lane line detection accuracy. Extensive experiments conducts on the CULane, Tusimple, and VIL-100 datasets demonstrate that the proposed method provides a comprehensive advantage in handling continuous multiframe image sequences, delivering superior detection performance compared to existing studies.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412007 (2025)
  • Qinghai Lü, Yang Zhao, Weiguo He, Hui Ouyang, and Zhongren Wang

    A segmentation algorithm based on the geometric features of a point cloud is proposed to solve the difficulty of detecting defects such as small weld puddles and weld tumors generated in the laser welding process of power battery covers. First, the point-cloud data acquired by the defect detection platform is filtered and denoised. Second, numerous non-weld-region point clouds are eliminated via the established weld coarse segmentation model, and the curvature threshold adaptive algorithm is used to achieve the accurate segmentation of weld seams. Subsequently, the region growth algorithm is improved by introducing Euclidean distance feature data to achieve the accurate segmentation of defects in the weld seams. Finally, the geometric dimensions of the extracted defects are calculated based on the measured model. The experimental results show that the average measurement error obtained by the proposed method is 0.041 mm, and the average measurement accuracy is improved by 77%, which meets the detection requirements and is of great significance for the intelligent detection of laser welding defects.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0412008 (2025)
  • Jing Liu, Yuan Zhang, Le Zhang, Bo Li, and Xiaowen Yang

    Accurate feature extraction in point cloud registration is often hindered by noise, surface complexity, overlap, and scale differences, which limit improvements in registration. To address this issue, this study proposes a point cloud registration algorithm based on the dynamic fusion of multiscale features. First, by employing sparse convolution operations at different depths, multilevel scale feature information is extracted from the point cloud data, obtaining rich levels of detail from local and global structures. Subsequently, the multilevel scale features are concatenated to form a fused feature representation, which enhances the integrity and accuracy of features. Additionally, the algorithm introduces a squeeze-excitation attention mechanism for the network skip connections to adaptively learn and reinforce important feature information. Concurrently, a global context module is integrated at the residual position to better capture global structural information. Finally, registration is completed by estimating the rigid transformation matrix through the random sample consensus (RANSAC) algorithm. Experimental results demonstrate significant advantages in feature extraction and registration accuracy compared to mainstream methods, effectively improving the performance of point cloud registration.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415001 (2025)
  • Chuanfang Zang, Jianwu Dang, and Jiu Yong

    Addressing the challenge of low alignment accuracy in 6D position estimation for weakly textured objects, adaptive multimodal-feature fusion is proposed for 6D object position estimation. First, the target object is calibrated using an RGB-D image, and the point cloud obtained from the depth information is segmented using spherical neighborhoods to enhance the ability of capturing detailed information in feature extraction. Second, the involved geometric attributes are strengthened by incorporating new object surface normals to obtain the complementary geometric information of the target. Subsequently, the extracted color, geometric, and normal three-branch features are fused in a high-dimensional space through adaptive feature fusion, enhancing the complementary strengths of each feature. Finally, a regression function is employed to obtain the target pose parameters, employing the predicted pose of high-confidence pixels as an initial estimate. This estimate undergoes iterative optimization to obtain the final pose, realizing accurate 6D pose estimation. Tests are conducted on the LineMOD and YCB-Video datasets, and the experimental results show that the proposed method exhibites significant advantages in object pose estimation accuracy compared to similar methods.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415002 (2025)
  • Gangning Lou, Peibo Sun, Shaoyao Liang, Li Zhang, Jiaqi Liu, Gangjian Hu, Liang Shen, Yongcheng Ji, and Yupeng Guo

    In colonoscopy, polyp automatic detection and image segmentation are key technologies to reduce the incidence rate of colon cancer and improve the survival rate of patients. The goal of this study is to develop a new deep-learning algorithm to improve the accuracy of automatic detection and segmentation of polyp images in colonoscopy, thereby contributing to the early detection and diagnosis of colon cancer and ultimately improving patient survival. To address the challenges of polyp image segmentation, this paper proposes a deep-learning algorithm named Gaussian error linear unit omni-dimensional dynamic convolution U-Net (referred to as GODC-U-Net). This algorithm is based on the U-Net network structure and integrates dynamic convolution and parallel multidimensional attention mechanisms to effectively learn the global and local feature information of polyp images. A hybrid loss function and a series of improvements to U-Net were introduced to further optimize model performance. Evaluation results on publicly available polyp segmentation benchmark datasets such as Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, and ETIS-AribPolyDB show that proposed method achieved advanced levels in terms of the Dice coefficient, intersection over union index, accuracy, recall, and accuracy. This algorithm demonstrates excellent generalizability and high performance under limited training data in addressing polyp image segmentation problems, thus providing effective technical support for the early detection and diagnosis of colon cancer.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415003 (2025)
  • Liping Chai, Weidong Wang, Chenyang Li, Likai Zhu, and Yue Li

    This paper proposes a laser simultaneous localization and mapping (SLAM) method to further enhance robot positioning and mapping capabilities in complex environments by improving the algorithm stability and robustness. The proposed method is based on a factor graph optimization framework that optimizes the front-end feature point extraction method to identify corner, plane, and surface points. Point curve constraint equations are introduced during local map matching. In the back-end, radar odometry and inertial measurement unit pre-integration factors are added to the factor graph to optimize the global pose. The experimental results obtained on the KITTI dataset and a self-collected dataset demonstrate that, compared to lidar-inertial odometry with SLAM (LIO-SAM), the proposed method achieves an approximately 15% improvement in the absolute trajectory error improvement, a 7% increase in the number of point clouds in the feature point neighborhood, and a substantial increase in the number of feature points. These improvements effectively enhance the performance of the proposed SLAM system compared to LIO-SAM.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415004 (2025)
  • Chuanwei Zhang, and Ruiqi Zhao

    To address the limitation of single-sensor performance in positioning and mapping tasks in large-scale urban environments,particularly the influence of dynamic objects on positioning accuracy and map construction,this paper proposes a semantic simultaneous localization and mapping method including semantic segmentation and laser inertial odometry (SLI-SLAM). First, the image semantic segmentation model employs an improved lightweight semantic segmentation network (D3p-S) to segment images and achieves point cloud semantic segmentation through spatiotemporal synchronization between sensors. Additionally, a geometric space consistency based on a facet model is designed to detect and eliminate dynamic obstacles. Second, inertial measurement unit (IMU) preintegration is used to eliminate the motion distortion caused by LiDAR, while point cloud ground segmentation and denoising help reduce the computational complexity. Finally, a factor graph is used to optimize the trajectory, enabling fast and accurate positioning of unmanned vehicles in urban road environments and the construction of three-dimensional semantic maps. Experimental results show that the proposed SLI-SLAM method reduces the root mean square error (RMSE) of absolute trajectory error by 30.33% compared with the classical laser SLAM algorithm (LIO-SAM) in highly dynamic urban road scenes.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415005 (2025)
  • Yanqing Wang, Deqiang Zhou, Hao Xu, Weifeng Sheng, Wenjuan Zuo, Qing Xi, and Quyan Chen

    An enhanced matching algorithm based on intersection, corner & end of wall (ICE) feature points is proposed to address the challenge of handling significant pose changes in indoor 2D point cloud matching quickly. This algorithm introduces a line feature extraction method to replace the split-merge algorithm in ICE method and uses adaptive filtering, dimensionality elevation, and Euclidean clustering to extract relevant point sets. The extracted line features are fitted using the least squares method to find intersection points, which are then labeled and used for feature point matching based on distance and attributes. Successful matches enable determination of point cloud transformation relationship through affine transformation. Experimental results show that there is no matching failure in large-scale rotation and translation for the proposed method, and the matching result is used as the initial value of the point to line-iterative closest point (PL-ICP) algorithm. When other algorithms fail, the absolute deviation of path estimation is 0.23 m, and the average time consumption is 58.9 ms.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415006 (2025)
  • Jun Yang, and Jiachen Guo

    Existing three-dimensional (3D) point cloud semantic recognition and segmentation algorithms often ignore the relationship between the local feature extraction network and the number of network layers, thus failing to resolve the difficulties associated with expanding the network and capturing advanced semantic information when extracting deeper local features. To address these limitations, an algorithm, namely, semantic recognition and segmentation algorithm of 3D point clouds using a multistage hierarchical fusion residual multilayer perceptron (MLP), is proposed. First, the point clouds are sampled in stages with layered structures to ensure that the network can fully extract feature information at various depths. This involves grouping the sampled points to build local neighborhoods, which enhances the ability of the network to mine local features. Each neighboring domain uses the expandable feature extraction operator of the residual MLP block to extract special information of the cloud. Finally, deep semantic information is integrated with shallow geometric information using interpolation and skip connections. The results reveal that the proposed algorithm achieves a recognition accuracy of 95.1% on the ModelNet40 dataset and a segmentation accuracy of 86.6% on the ShapeNet Part dataset. Thus, this algorithm can effectively extract rich point cloud feature information and offer improved capabilities for 3D cloud semantic recognition and segmentation.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415007 (2025)
  • Wei Zhang, Zhilong Zeng, Qi Fang, Jie Song, Guan Gui, Shenghuai Wang, and Chen Wang

    Efficient and accurate extraction of spatial structure information is crucial in point cloud semantic segmentation to understand three-dimensional scenes. To address the unstructured nature of point cloud data, we propose a point cloud segmentation method, PCANet, that effectively integrates position encoding and channel mechanisms to reduce redundant relationship learning and computational costs. PCANet first applies position encoding technology to capture the relative positional information of the point cloud data. Next, the encoded feature maps are weighted using a channel attention mechanism, which amplifies the representation ability of key features across different channels and expands the network's receptive field. The experimental results demonstrat that PCANet achieves strong segmentation performance on both the ShapeNet and S3DIS point cloud datasets. For ShapeNet, the instance mean intersection-over-union ratio (mIoU) reaches 87.4%, and the category mIoU reaches 85.8%, showing improvements of 2.3 percentage points and 3.9 percentage points, respectively, over PointNet++. In addition, the mIoU on the S3DIS dataset reaches 73.7%, outperforming PointNet++ by 20.2 percentage points. These results demonstrate that the proposed method performs well in segmenting small components and indoor point cloud scenes.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415008 (2025)
  • Liangliang Qian, Wen Nie, Haosheng Zhang, and Tianqiang Zhu

    Addressing the challenges faced by existing point cloud registration methods, which are often easily affected by noise, point cloud size, and initial pose, this study proposes a robust and efficient registration method to enhance accuracy. First, the original point cloud undergoes voxel gridding. Second, based on the signature of histograms of orientation (SHOT) feature descriptor, a strengthened SHOT keypoint set is extracted for rough matching using the four-point ensemble (4PCS) algorithm. This step achieves the initial registration of the source and target point clouds. Finally, a dual-scale, feature-constrained strategy is introduced to extend the generalized iterative closest point (ICP) to the voxel level, accurately registering two point clouds that have good initial poses. Herein, extensive experiments are conducted on the Stanford public point cloud dataset, as well as a real-world dataset, to evaluate the applicability of the algorithm across various scenarios. Experimental results show that the point cloud registration algorithm exhibits strong reliability and robustness, effectively handling different initial positions, overlap rates, and scales of point clouds. Moreover, the registration accuracy in real-world scenarios is improved by 37.6% and 23.2% compared to those corresponding to the ICP and fast four-point set (Super-4PCS) algorithms, respectively.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415009 (2025)
  • Chengtong Miao, Jingjing Wu, and Qiangqiang Xu

    The nonlinear response characteristics of stripe projection of structured light systems often lead to stripe gray-level distortion, adversely affecting both phase and measurement precision. To address the nonlinearity within structured light systems and the differential phase error caused by various degrees of defocused stripes, a novel phase gradient-based regional phase error self-correction methodology is introduced. Initially, the phase is segmented into distinct regions by leveraging data modulation of the stripe image and the gradient information of the wrapped phase. A phase error function is then constructed utilizing the wrapped phase across multiple frequencies, and the phase values from each region are subsequently input into this function for iterative correction. Experimental outcomes indicate a 79.07% reduction in phase error after applying this method. In addition, this approach eliminates the need for stripe pre-encoding, directly addresses errors within the unwrapped phase map, and demonstrates impressive resilience to stripe defocus.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0415010 (2025)
  • Guangcen Ma, Jinzhi Zhou, Haoyang He, and Saifeng Li

    This study proposes a vascular segmentation network that integrates multiscale feature and a dual attention mechanism to address low segmentation accuracy caused by unsatisfactory segmentation of small retinal vessels and poor vascular connectivity. First, the dilated residual module with introduction of the dual attention mechanism is used to replace the original convolutional layer of U-Net, achieving multiscale extraction of vascular features. Second, a feature fusion module is embedded in the skip connections, reducing information loss during the encoding-decoding process and enhancing vascular connectivity through the adaptive fusion of vascular information. Finally, a hybrid loss function is introduced to assist network training, alleviating the class imbalance problem in retinal vascular images. Experimental results on the DRIVE and CHASE~~DB1 datasets demonstrate that the proposed algorithm achieves an accuracy of 0.9625 and 0.9696, respectively. Compared with U-Net, the sensitivity of the proposed algorithm increased by 0.0420 and 0.0552, and the F1 score increased by 0.0140 and 0.0342, demonstrating improved segmentation performance.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0417001 (2025)
  • Chenglong Huang, Hanze Man, and Yabin Zhao

    Fingerprints are unique to each individual, leaving traces upon contact with objects, and remain essentially unchanged throughout a person's lifetime. These characteristics render fingerprints invaluable for personal identification in criminal investigations and court proceedings. Consequently, the visualization and compositional analysis of fingerprints have become key areas of focus in forensic science and technology. In this study, a confocal Raman imaging microscope was employed to irradiate sebaceous fingerprints on aluminum foil with a laser, thereby obtaining Raman scattering spectra. By analyzing the number, position, and intensity of the characteristic peaks in the Raman spectra, the corresponding molecular vibration groups were identified. Subsequently, Raman spectral imaging was performed using these characteristic peaks to determine the spatial distribution of the sebaceous fingerprints, effectively visualizing them. Results indicate that the sebum in the fingerprints originates from skin secretions, with triglycerides and fatty acids identified as the primary components. Both are lipid compounds, and by analyzing the intensity of the Raman characteristic peaks at 1660 cm-1 and 3000 cm-1, the degree of unsaturation in the lipids of the sebaceous fingerprints can be determined. Furthermore, the confocal Raman imaging microscope successfully captured clear images of the sebaceous fingerprints, revealing not only second-level features but also third-level features, such as sweat pore characteristics. Additionally, Raman spectroscopy, being a non-destructive analytical technique, requires no sample preparation and does not interfere with the subsequent processing of fingerprints by law enforcement agencies. This method offers a more convenient and efficient approach to fingerprint analysis.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0418001 (2025)
  • Kaixuan Chang, Jianhua Huang, Xiyan Sun, Jian Luo, Shitao Bao, and Huansheng Huang

    To address the challenges of low detection accuracy and high false detection rates of small targets by drones, a dual-modal image fusion method has been proposed. This paper outlines a framework for fusing infrared and visible images using front-end, mid-end, and back-end strategies. A dual-channel image fusion detection method has been developed based on the back-end fusion strategy and the YOLOv8 object detection framework. This method constructs a dual-channel feature fusion block to integrate features from both infrared and visible images. It also incorporates a BRA (bi-level routing attention) module into the neck network layer to improve the model ability to detect small targets. Experimental results show that the proposed method increases the mean average precision (mAP) by 14.78% and 12.99% compared to using single infrared and visible images with YOLOv8 on the DroneVehicle dataset. Additionally, the mAP of the proposed method increased by 7.1% compared to the PSFusion fusion detection method on the same dataset.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428001 (2025)
  • Qi He, and Hao Shen

    A multiscale remote sensing image strip target detection method without anchor points is proposed to address the issue of the poor detection performance of strip targets in optical remote sensing image target detection. Accordingly, an atrous spatial pyramid pooling feature extraction network that integrates a coordinate attention module and strip pooling module is designed in this study. Consequently, the extraction of strip target feature information is significantly improved. The proposed method was validated through effectiveness experiments on the DIOR dataset, and the optimal dilation rate was selected. Based on this, ablation experiments were conducted to verify the feasibility of the designed framework. Comparative experiments were further conducted considering four classic object detection methods. Compared with the classic anchor-free method CenterNet, the average precision mean of the proposed method improves by 8.63 percentage points and 11.45 percentage points at thresholds of 0.5 and 0.75, respectively. Moreover, compared with the other three methods, the proposed method is competitive. The experimental results show that the proposed method is more accurate in locating strip targets (such as bridges and airports) in optical remote sensing images and has better detection performance.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428002 (2025)
  • Junying Zeng, Senyao Deng, Chuanbo Qin, Yikui Zhai, Xudong Jia, Yajin Gu, and Jiahua Xu

    Traditional semantic segmentation networks often suffer from limited narrow receptive fields and single-branch network architectures, which inadequately capture the spatial relationships, boundaries, and contextual nuances of complex remote sensing scenes. These limitations can result in decreased segmentation accuracy and blurred boundaries. To address these challenges, we propose a multi-level branch cross-scale fusion network (MBCFNet) for remote sensing image semantic segmentation. First, the network employs a multi-level branch structure comprising a shallow Swin Transformer, spatial branch, semantic branch, and boundary branch, where each branch specializes in extracting specific feature levels. A cross-scale fusion module is then incorporated to effectively integrate multi-scale features from each branch, thereby enhancing the model's ability to comprehensively represent remote sensing landforms. Finally, a multi-scale decoding module with an expanded receptive field is introduced to transmit feature information across scales, effectively improving the network's adaptation to complex remote sensing scenes. The proposed MBCFNet achieves mean intersection over union scores of 86.93%, 84.51%, and 74.55% on the Vaihingen, Potsdam, and Uavid datasets, respectively, outperforming advanced semantic segmentation models, such as Mit-B2, ST-UNet, and GLOTS. The experimental results demonstrate the high segmentation accuracy and generalization capability of MBCFNet for remote sensing image semantic segmentation tasks.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428003 (2025)
  • Zhongwei Zhang, Furong Guo, and Shudong Liu

    To enhance detection performance for various targets in remote sensing images, an improved algorithm based on LSKNet-S is proposed. This algorithm notably advances the network's ability to detect remote sensing targets by integrating features across diverse receptive field levels. First, a multiscale feature fusion module is designed to strengthen the model's ability to extract global contextual information, with improvements to the multilayer perceptron. Concurrently, a lightweight local visual center module is introduced, enhancing the model's sensitivity to local features. The integration of these modules facilitates effective multiscale feature extraction and fusion within the model. Additionally, a scale-enhancing upsampling operation is incorporated within the detection head, which elevates the feature map resolution, allowing the model to more effectively capture detailed information on various targets within remote sensing images. Experimental results indicate that the proposed algorithm improves the mean average precision (mAP) by 3.43 percentage points on the HRSC2016 dataset and by 1.12 percentage points on the DIOR-R dataset, outperforming current mainstream algorithms. These results confirm the effectiveness of the proposed algorithm in remote sensing object detection contexts.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428004 (2025)
  • Hailong Zhang, Qiaolin Zeng, Jie Yang, Bowei Wang, and Chengfang Wang

    Most existing remote sensing object detectors use detection heads with parallel branches, which leads to noticeable misalignment between the predictions of the two branches. To address this problem, this study propose a branch feature alignment network (BFA-Net). First, a branch alignment module (BAM) is used to enhance feature interaction between the classification and regression branches by learning branch alignment features. Simultaneously, the module dynamically adjusts classification features and sampling positions to achieve alignment between the two branches. Then, during the label assignment, the model dynamically evaluates the classification and regression quality of samples via the designed alignment index, and the sorting-based assignment strategy screens positive samples from coarse to fine to ensure that different scaled objects receive sufficient supervisory information. Extensive experiments on commonly used datasets such as DOTA-V1.0, DOTA-V1.5, and DIOR-R, show that the BFA-Net achieves mean average precision values of 75.36%, 68.26%, and 65.50%, respectively. The proposed method has obvious advantages over other advanced algorithms in terms of detection performance and detection efficiency.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428005 (2025)
  • Baolan Chen, Huawang Li, and Yinxiao Wang

    Current remote-sensing scene classification methods do not fully utilize multi-scale and contextual information, which limits scene classification performance. To address these issues, a multi-scale context feature aggregation model based on a graph convolutional network (GCN) is proposed. In the image feature extraction module, multi-layer and global features of remote sensing images are extracted using the backbone network. Next, in the contextual information enhancement module, contextual information is extracted from multi-layer features utilizing the GCN. Then, in the multi-scale feature aggregation module, a progressive cross-layer attention method is used to model the correlation between different layer features with the aim of reducing semantic differences and achieving effective feature aggregation. Finally, global and aggregated features are fused to achieve scene classification, and label smoothing loss is used to enhance model generalization. Experimental results on the AID and NWPU-RESISC45 datasets validate the effectiveness of the proposed model, which achieves competitive performance in scene classification.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0428006 (2025)
  • You Ding, Liyuan Xu, Tong Liu, Zhengliang Liu, and Yuan Ren

    Optical vortex refers to a structured beam carrying orbital angular momentum, with an annular intensity distribution and a vortex-like wavefront structure. Due to its unique optical complex amplitude and orbital angular momentum, the optical vortex demonstrates tremendous potential in biomedicine, high-capacity optical communication, quantum technology, optical measurement and imaging. In terms of rotation detection, a rotational Doppler frequency shift is generated when a vortex beam illuminates rotating objects. The parameters of the rotation, precession and other motion characteristics can be efficiently and conveniently measured based on the detection of this frequency shift, which is of great significance for identifying the geometric structure, motion characteristics and attitude information of targets. In terms of imaging, the orbital angular momentum of an optical vortex can be used for digital spiral imaging. Optical vortices of different modes are a set of orthogonal complete basis vectors in Hilbert space and can be developed into optical vortex single pixel complex amplitude imaging combined with single pixel imaging technology, which has a potential application in astronomical observation and medical diagnosis. In this paper, we comprehensively review the research progress in rotation detection and imaging technology based on optical vortices. The theoretical principle and the latest research advances are introduced, and future development is also discussed.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0400001 (2025)
  • Zhonghong Yang, Liyuan He, Yingchao Wu, Hanwen Zhang, Yuewen Yu, Dongjie Zhao, Xinyu Li, Rong Liu, Wenliang Chen, and Chenxi Li

    At present, biomedical optics has become a research method closely related to human medical health. Tissue phantoms have optical and mechanical properties similar to tissues and play an important role in the research of biooptical detection and imaging methods, instrument development, and system calibration. This review provides an overview of the preparation materials, preparation technology, and the progress relevant to the applied research, discussing the optical and mechanical properties of various imitated materials, advantages, disadvantages, and applicability of each type of material and process. The applications of phantoms in optical coherence tomography, photoacoustic imaging, and fluorescence imaging are summarized, and the development directions of phantoms in different applications are analyzed. This provides a reference for the selection and preparation of the relevant material in the study of biological optics.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0400002 (2025)
  • Jiayu Qiu, Yasheng Zhang, Yuqiang Fang, Pengju Li, and Kaiyuan Zheng

    Event cameras are novel visual sensors inspired by biology, representing an interdisciplinary research hotspot in computational neuroscience and computer vision. Unlike traditional cameras, event cameras can asynchronously output event streams related to brightness changes, offering advantages such as high time resolution, wide dynamic range, low latency, low bandwidth, and low power consumption. They are suitable for real-time dynamic perception of high-speed moving targets and represent a new research direction in computer vision detection and tracking. This review first introduces the types, working principles, advantages, and disadvantages of event cameras, followed by an in-depth analysis of existing object detection and tracking algorithms based on event cameras. Subsequently, event datasets related to visual detection and tracking are introduced. Finally, future development trends in this field are discussed.

    Feb. 25, 2025
  • Vol. 62 Issue 4 0400004 (2025)
  • Please enter the answer below before you can view the full text.
    Submit