Laser & Optoelectronics Progress
Co-Editors-in-Chief
Dianyuan Fan
2024
Volume: 61 Issue 14
45 Article(s)
Zhi Cheng, Zaohui Deng, Liping Gao, Yin Tao, Chao Mu, and Lili Du

Atmospheric turbulence causes image degradation. For a single degraded image of atmospheric turbulence, an image restoration method based on grid networks was proposed in this study. To realize local and deep multiscale feature extraction, dilated convolution was used in the backbone module to expand the model sensory field. Additionally, a spatial attention module was added to the post-processing module. This enabled to better deal with the white spots and artifacts in the restored image and improve image quality. Experimental results show that the proposed network quickly outputs recovery results, demonstrating an average restoration output time of 0.29 s, and the average peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the simulated data obtained using the proposed algorithm in a dynamic scene are maximally improved up to 9.44 dB and 0.1173, respectively, compared with other methods. Furthermore, the algorithm exhibits better effect for recovering atmospheric turbulence in real scenes.

Jul. 25, 2024
  • Vol. 61 Issue 14 1401001 (2024)
  • Bo Zhang, and Yufan Wu

    One of the common challenges encountered in microexpression recognition using convolutional neural networks is the heightened complexity caused by increased accuracy. To address this challenge, this study introduces an enhanced lightweight dual-stream attention network, called the enhanced dual-stream MISEViT network (EDSMISEViTNet), for microexpression recognition. First, microexpression samples are preprocessed, and peak frames are extracted as spatial features. Additionally, the TV-L1 optical flow method is used to extract the temporal features between the start frame and the vertex frame of each sample. Furthermore, this study improves the MobileViT network by designing an MI module that combines Inception and SE modules and introduces an attention module for efficient feature extraction. Temporal and spatial features are separately fed into this network, and the resultant features are concatenated, fused, and subsequently subjected to classification. To enhance precision, the CASME II, SAMM, and SMIC datasets are combined into a composite dataset for experimentation. The results reveal that the proposed algorithm model requires a training parameter count of only 3.9×106 and processes a single sample in just 71.8 ms. Compared with the existing methods, this approach achieves excellent accuracy while maintaining a low parameter count.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437001 (2024)
  • Yajun Li, Min Zhang, Yangyang Deng, and Ming Xin

    Domain diversity between different datasets poses an evident challenge for adapting the person re-identification (Re-ID) model trained on one dataset to another. State-of-the-art unsupervised domain adaptation methods for person Re-ID optimize the pseudo labels created by clustering algorithms on the target domain; however, the inevitable label noise caused by the clustering procedure is ignored. Such noisy pseudo labels substantially hinder the model's ability to further improve feature representations on the target domain. To address this problem, this study proposes a mutual teaching approach for unsupervised domain adaptation of person Re-ID based on relation-aware attention (RAA) and local feature relearning (FRL). For feature extraction, we employ multi-channel attention to capture the corresponding local features of a person and use spatial-channel correspondence to relearn discriminative fine-grained details of global and local features; thereby, enhancing the network's feature representation capabilities. We also use RAA to steer the two networks toward different feature regions to enhance their distinctiveness and complementarity. Extensive experiments were conducted on public datasets to validate the proposed method. The experimental results show that the proposed method performs well in multiple-person Re-ID tasks.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437002 (2024)
  • Yonghua Tang, Yanjun Meng, Sen Lin, Feifan Shi, Zhipeng Zhang, and Xingtong Liu

    This study presents a nonuniform dehazing method based on hierarchical weight interaction and Laplacian prior to address the issues of detail loss and residual haze in nonuniform hazy images, which often result in degraded image quality. First, a hierarchical weight interaction module is introduced in the baseline network to adaptively adjust weights and perform a weighted fusion of feature maps at different scales. Furthermore, a global receptive field aggregation module is introduced to enrich the receptive field, allowing the model to comprehensively understand the content information in the image. Then, a frequency domain information branch is introduced to decompose the image into low-frequency and high-frequency components using wavelet functions. The low-frequency component contains global structural information, whereas the high-frequency component provides detailed local information. This decomposition collectively enhances the image clarity. Finally, a Laplacian loss is incorporated to reconstruct the image, effectively restoring its fine-grained features and improving the quality of the generated images. Experimental results show that the proposed algorithm achieves superior results on the test set, with an increase in peak signal-to-noise ratio (PSNR) by 0.8 dB, 1.54 dB, 1.14 dB, and 0.23 dB compared with the original algorithm on four datasets.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437003 (2024)
  • Changyu Li, and Lei Ge

    Low-light object detection is a major challenge in object detection tasks. Conventional methods for object detection exhibit significant performance degradation under low-light conditions, and existing low-light object detection methods consume excessive computational resources, making them unsuitable for deployment on devices with limited computing capabilities. To address these issues, this study proposes an end-to-end lightweight object detection algorithm called low-light YOLO (LL-YOLO). To tackle the problem of unclear and difficult-to-learn features in low-light images, a low-light image generation algorithm is designed to generate low-light images for training the detector, assisting it in learning feature information in low-light environments. In addition, the network structure of the detector is adjusted to reduce the loss of feature information during computation, thereby enhancing the model's sensitivity to feature information. Furthermore, to mitigate the problem of severe noise interference on feature information in low-light images, an aggregation ELAN (A-ELAN) module for aggregating peripheral information is proposed that uses depth-wise separable convolution and attention mechanisms to capture contextual information, enhance the obtained feature information, and weaken the impact of noise. Experimental results demonstrate that the LL-YOLO algorithm achieves a mAP@0.5 of 81.1% on the low-light object detection dataset ExDark, which is an improvement of 11.9 percentage points over that of the directly trained YOLOv7-tiny algorithm. The LL-YOLO algorithm exhibits strong competitiveness against existing algorithms.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437004 (2024)
  • Song Dai, Ximing Sun, Jingming Zhang, Yongshan Zhu, Bin Wang, and Dongmei Song

    In the task of lithology classification, the feature information obtained from a single data source is limited. Hence, multisource data fusion is an important means by which to improve the accuracy of lithology classification. As typical remote sensing data sources, aerial remote sensing images and digital elevation models can provide complementary spectral and elevation information. In order to improve the accuracy of lithology classification, a new lithology classification method for multisource remote sensing data is proposed. The proposed method combines the spatial attention mechanisms of channel and multiscale convolutional neural networks. Additionally, this method enhances the learning ability of convolutional neural networks on deep features of aerial remote sensing images and digital elevation models by designing a multiscale void convolutional module to better capture the spatial relationships of features and effectively eliminate the structural differences of heterogeneous data in the original data space. By designing local and global multiscale channel spatial attention modules, different weights can be assigned to spectral channels and spatial regions of multisource data in an adaptive way to both realize more targeted training of the network by using the significance of features and further improve the classification performance of the model. Finally, a basin in Sichuan province is taken as the study area to validate the proposed techniques. The experimental results show that the proposed method is significantly better than four typical machine learning methods in the overall accuracy and average accuracy, which proves that the proposed multisource data fusion method can make full use of the complementary advantages of different data sources and effectively improve the discrimination accuracy of geological lithology.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437005 (2024)
  • Xin Ma, Chunyu Yu, Gang Chen, Ningning Sun, and Rongheng Ma

    This paper proposes an improved adaptive two-dimensional gamma correction method based on the illumination component and target mean value to address the issue of over-enhancement in nonuniformly illuminated images. The process begins with the conversion of images to the HSV space, from which the V-channel image is extracted for processing. Utilizing the illumination-reflection model, the illumination component is estimated through a guided image filter with good edge retention. Concurrently, the V-channel image region is segmented into bright and dark regions, and a target mean function with varying adjustment coefficients is established. The illumination component and adaptive target mean value are used to act on the gamma function for two-dimensional gamma correction, and histogram equalization is subsequently performed. The final output is obtained by merging V-channel component with the H and S channels and converting it back to the RGB space. Experimental evaluations on DICM and LIME datasets reveal that in comparison to four typical enhancement algorithms, the proposed algorithm achieves an average increase of 10.6% in information entropy, 97.5% in mean gradient (MG), and 77.8% in signal-to-noise ratio (SNR), with an average processing time of 0.32 s. These enhancements significantly improve the visual quality of images, making them more suitable for machine vision research. The proposed algorithm offers advantages in terms of high real-time performance and simplicity and produces output images with more natural colors, uniform brightness, clearer details, and an overall enhanced visual effect.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437006 (2024)
  • Jing Xu, Lidong Bao, Ming Fang, and Tianjiao Du

    In the realm of unmanned aerial vehicle aerial photography, images obtained from disparate sensors often exhibit significant parallax and resolution disparities, which can lead to failures in image registration processes. Addressing this challenge, this study introduces an innovative approach for the registration of infrared and visible light images, utilizing a rotation-invariant Gabor representation descriptor. The methodology commences by resolving the image's weighted matrix, followed by the application of the Harris algorithm to the weighted matrix within the context of phase congruence, thereby pinpointing the image's key features. Subsequently, the Gabor representation framework is refined to precisely ascertain the orientation of key features, effectively mitigating the impact of substantial parallax. To further enhance the process, the nearest neighbor matching (NNM) algorithm, in tandem with fast sampling consistency (FSC), is deployed to filter out outliers and augment the accuracy of matches. The technique demonstrates an average accuracy of 46%, 72%, and 62% across the CVC-15 stereo, LWIR-RGB long-wave infrared, and proprietary datasets, respectively. Correspondingly, the average processing times are 6.886 seconds, 7.800 seconds, and 9.631 seconds. Experimental results prove that the efficacy of the proposed method, particularly in scenarios where the images to be registered present considerable parallax and resolution differences.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437007 (2024)
  • Junjie Huang, Feng Xu, Liang Luo, and Tianbao Chen

    The rapid and accurate three-dimensional (3D) reconstruction of ocean waves holds paramount significance for marine engineering research. To address the issues of low processing efficiency in traditional ocean wave 3D reconstruction algorithms and the accuracy affected by too many holes during the generation of point clouds, this paper proposes an approach that combines disparity mask and self-supervised learning for 3D ocean wave reconstruction. First, the disparity images are obtained through training network model based on image reconstruction, disparity smoothness, and left-right disparity consistency losses. Second, a mask decoder is added to generate disparity mask images. Finally, through leveraging prior knowledge of common disparity regions, a novel mask loss function is designed to mitigate the impact of disparity noise in non-common regions and ocean surface occlusion problems. The experimental results on the Acqua Alta dataset demonstrate that the proposed method can reduce noise in ocean wave point clouds effectively. In the case of precision close to the traditional algorithm, the point cloud reconstruction speed reached 0.024 seconds per frame.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437008 (2024)
  • Linbin Wu, Yunfeng Cao, and Ning Ma

    This paper proposes a feature matching method that combines an adaptive keyframe strategy with motion information to address the problem that the feature matching accuracy of the visual inertial navigation system decreases due to blurred imaging and maneuvering in dynamic environments. First, we propose an adaptive keyframe strategy to improve the quality of keyframe selection by establishing an updating criterion for keyframes based on four indicators: time, inertial motion, imaging clarity, and parallax. Second, the common viewing region among adjacent keyframes is identified through geometric transformation of the image based on inertial motion to enhance feature detectability. Next, an improved Oriented FAST and Rotated BRIEF (ORB) feature method based on the Gaussian image pyramid is used to improve the matching accuracy of feature points. Finally, the performance of the proposed method is verified using EuRoC public datasets. The results show that the proposed method has better accuracy and robustness in applications with dynamic scenes, such as illumination changes and image blur.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437009 (2024)
  • Jiahui Liu, Yonghe Zhang, and Wenxiu Zhang

    Aiming at the problem of six-degree-of-freedom pose estimation for noncooperative targets in space, this research involved designing a lightweight network named LSPENet based on convolutional neural networks, which could be used to realize end-to-end pose estimation without manually designing features. We used depth-separable convolution and efficient channel attention (ECA) to form the basic module, which balanced the complexity and accuracy of the network. One branch was designed for location estimation using direct regression, and another branch was designed for orientation estimation by introducing soft-assignment coding. Experimental results on the URSO dataset show that soft-assignment coding-based orientation estimation exhibits substantially lesser errors than direct regression-based orientation. Further, compared with the other end-to-end pose estimation network, the proposed network reduces parameter count by 76.7% and decreases single-image inference time by 13.3%, while simultaneously improving location estimation accuracy by 54.6% and orientation estimation accuracy by 57.8%. Overall, LSPENet provides a new idea for monocular visual pose estimation on board.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437010 (2024)
  • Zhiqiang Dong, Jiale Cao, and Aiping Yang

    Most traditional fully supervised person search approaches are only applicable to one data domain and have limited generalization ability on unknown data domains. Researchers have recently started studying domain-adaptive person search, aiming to improve the generalization ability of the involved model for unknown target domains, where domain alignment and reliable positive and negative generations are the primary challenges. To this end, herein, a domain-adaptive person search approach with diverse images and instance augmentation is proposed, which aims to effectively achieve domain alignment and reliable positive and negative generations. This approach introduces two novel modules: source-domain image augmentation and negative-enhanced re-id learning modules. The former aims to improve the domain adaption ability of the involved model and the detection precision on target domains by only enhancing source-domain data diversity. Meanwhile, the latter introduces a diverse-negative mining module to enrich the diversity of negatives and improve the discriminability of learned re-id features. The proposed modules were only used during training, which did not increase the involved test inference time. Experiments were performed on two widely employed person search datasets: CUHK-SYSU and PRW, demonstrating the effectiveness of the proposed approach and its superiority over traditional people search approaches. For instance, the proposed approach achieves mean average precision (mAP) of 40.8% on the PRW test set, indicating higher performance than that of the existing domain-adaptive approach DAPS by 6.1 percentage points.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437011 (2024)
  • Yumeng Chen, Yi Wang, Zeyuan Liu, Wenguang Chen, Huaiyu Cai, and Xiaodong Chen

    Aiming at the negative correlation between the depth of field and resolution of surgical microscopes, this study proposed an image acquisition scheme based on an ophthalmic surgical microscope system to expand the depth of field under the premise of high-resolution imaging. The involved binocular images have a large depth of field and high resolution respectively. Subsequently, a binocular image-fusion algorithm was designed according to the imaging characteristics obtained after the transformation. The involved focus detection results were employed as initial fusion decision maps, which were subsequently refined by combining the color and texture information of the images. The detailed information was effectively fused through double-scale decomposition. Experimental results show that the proposed scheme and fusion algorithm can be used to highlight the details of high-resolution images by preserving the clear range of large-depth-of-field images. The depth-of-field enhancement achieved using the proposed algorithm is >50% compared with that of the original algorithm. Overall, the proposed method is suitable for visual observation of large-depth-of-field and high-resolution surgical microscopic images, intraoperative two- and three-dimensional displays, and postoperative image preservation and analysis.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437012 (2024)
  • Fenghao Nie, Mengxia Li, Mengxiang Zhou, Yuxue Dong, Zhiliang Li, and Long Li

    Current multifocus fusion algorithms use only a single-image feature extraction scale, leading to problems such as loss of detail edges and local blurring in imaging. In response to these algorithms, this paper proposes a multifocus image fusion algorithm based on multiscale null U-Net. First, in the encoder part of U-Net, a multiscale null module was introduced to replace the traditional convolution module, which fully uses sensory fields with various scales to capture local and global information more comprehensively. In addition, to enhance the image feature characterization further, a RFB-s module was employed in the middle layer of U-Net to optimize the localization ability of multiscale features. The proposed fusion algorithm adopted the end-to-end supervised learning method in deep learning. This method was divided into three modules: feature extraction, feature fusion, and image reconstruction. Among these, the feature extraction module used U-Net containing multiscale null modules. Experimental results show that the fused images obtained using the proposed algorithm have clear detailed texture and are free of overlapping artifacts. Among all multifocus image fusion algorithms used for comparison, the proposed algorithm is optimal in terms of average gradient, visual information fidelity, and mutual information evaluation metrics. Additionally, this algorithm achieves suboptimal results close to the optimal results in edge information retention metrics. Meanwhile, the ablation experiment results further verify that the proposed multiscale null module can remarkably enhance the feature extraction capability of the network, thereby improving the quality of image fusion.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437013 (2024)
  • Bingqian Yang, Xiufang Feng, Yunyun Dong, and Yuanrong Zhang

    Honeycomb lung is a CT imaging manifestation of various advanced lung diseases, characterized by diverse cystic lesions presenting a honeycomb-like appearance. Existing computer-aided diagnosis methods struggle to effectively address the low identification accuracy caused by the varied morphology and different locations of cellular lung lesions. Therefore, a combined CNN and Transformer model guided by lesion signals is proposed for cellular lung CT image recognition. In this model, a multi-scale information enhancement module is first employed to enrich the spatial and channel information of features obtained by CNN at different scales. Simultaneously, a lesion signal generation module is used to strengthen the expression of lesion features. Subsequently, Transformer is utilized to capture long-range dependency information of features, compensating for the deficiency of CNN in extracting global information. Finally, a multi-head cross-attention mechanism is introduced to fuse feature information and obtain classification results. Experimental results demonstrate that the proposed model achieves accuracies of 99.67% and 97.08% on the honeycomb lung and COVID-CT dataset, respectively. It outperforms other models, providing more precise recognition results and validating the effectiveness and generalization of the model.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1437014 (2024)
  • Wenxuan Zhang, Jinying Zhang, and Jingwen Li

    In order to address requirements in the realms of environmental science and diverse industries for the detection and identification of aerosol particles, a particle analysis platform is devised. This platform leverages inertial impact sampling in conjunction with digital holographic imaging technology, which enables simultaneous measurement of both particle density and refractive index, thereby fostering a marked improvement in classification accuracy. The operational workflow commences with the sampling and segregation of particles via an inertial impactor, followed by analysis with digital holographic imaging technology, yielding crucial insights into their mass density. Additionally, by fitting diffraction patterns of particles to the Lorenz-Mie theory, refractive index of individual particle is extracted. Empirical validation of our approach underscores its efficacy, particularly in scenarios characterized by closely matched particle density or refractive index. The introduction of an additional, independent fingerprint measurement emerges as a pivotal enhancement, yielding a substantive boost in both particle classification and identification accuracy.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1409001 (2024)
  • Hao Chen, Hongning Li, Hai Zhao, Yaru Gao, and Xin Yang

    Traditional three-dimensional (3D) imaging based on modulation degree measurement must control sinusoidal-fringe movements and synchronous focal-length changes, having a large number of shots and complex structure and control. This research proposes 3D imaging involving point matrix projection-based defocus estimation on the basis of modulation degree measurement profilometry. A point matrix with gradually changing focal length is projected onto the involved objects, and the brightness sequence of the measured surface is recorded to extract modulation information. The involved 3D information can be evaluated based on the correspondence between the maximum position and depth of the modulation sequence. This study involved analyzing the mentioned theoretical process in detail; a corresponding experimental platform was set up to verify it. The experimental results show that the proposed method can be used to accurately recover the target height. Further, compared with the traditional method, this method is structurally simpler and more convenient to control.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1411001 (2024)
  • Cibao Zhang, Libo Zhong, and Changhui Rao

    Noncommon path static and quasistatic aberrations are key factors that limit high-resolution and high-contrast imaging of large ground-based astronomical telescopes. These static aberrations cannot be detected by wavefront sensors in the adaptive optics (AO) systems of the telescopes and adversely affect imaging performance. Traditional methods for detecting noncommon path static aberrations exhibit drawbacks such as high computational complexity, slow iteration convergence, and ambiguity in aberration symbols. On the basis of focal plane distortion images, a static aberration detection model for imaging optical paths, considering the residual aberration constraints of the wavefront sensors in AO systems, was established. Experiments were conducted, wherein static aberrations were detected offline by using the stochastic parallel gradient descent (SPGD) iterative algorithm. The experimental results show that the proposed method exhibits high accuracy in static aberration detection of imaging optical paths, as well as good robustness and stability under different noise effects. Thus, a novel approach is proposed for detecting static aberrations in the imaging optical paths of large ground-based adaptive optical telescopes. The findings afford a theoretical basis and algorithm guidance for static aberration compensation and correction in AO systems.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1412002 (2024)
  • Zhengqiong Dong, Qingfeng Yang, Xianwen Huang, Jingyi Wang, Yijun Xie, Renlong Zhu, Xiangdong Zhou, and Lei Nie

    A three-dimensional microscopic measurement based on gradient variance is proposed for focus evaluation to achieve high-precision focusing. Building upon the traditional Brenner evaluation function, this method incorporated information on changes in the grayscale gradient along the vertical and diagonal directions. Additionally, by capturing richer image edge details, it enhances the sensitivity of the focus evaluation function, thereby improving the accuracy of three-dimensional microscopic measurements. Experiments with a step sample having a nominal height of 1 mm were conducted, yielding a relative measurement error of 0.49% (relative to the standard value) for the proposed method, outperforming traditional focus evaluation methods with an error of 1.22%. Furthermore, for the evaluation function, the proposed method exhibits a clarity ratio and a sensitivity with values of 2.4412 and 318.45, respectively.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1412003 (2024)
  • Chen Chen, Qilin Zeng, Xiaoyi Yu, Xianming Xiong, Hao Du, Jiahao Zhao, and Fengrui Shi

    Laser shear speckle interferometry is an optical measurement technique that can measure the derivative of physical surface displacement. It is widely used in fields such as non-destructive testing and precision measurements. In laser shearing speckle interferometry, accurate acquisition of phase information is crucial for measuring the morphologies and surface features of targets. However, phase information is often affected by factors such as noise and nonlinear distortion. A speckle interferometric phase unwrapping method based on UCNet is proposed to address these factors. In this study, with U-Net employed as the framework, parallel symmetric convolutions and multiscale decoders were introduced into the network to improve the model's ability to understand and utilize feature information at different scales. Simultaneously, the SmoothL1Loss loss function was utilized to enable the model to adapt to tasks at different scales. Datasets for network training were used to simulate and test the generated network model, and actual collected phase maps were used to verify the accuracy and generalization ability of the network. Results show that the structural similarity index of the UCNet network is 1.05 times higher than that of the deep learning phase unwrapping network, and it can accurately achieve laser shear speckle interferometry phase unwrapping.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1412004 (2024)
  • Yuxiang Wang, and Yueqian Shen

    Rock mass structure mainly comprises rock discontinuities, which control rock mass stability. In this study, a new similarity measurement method for obtaining rock mass point cloud data using three-dimensional (3D) laser scanning technology is proposed to address the limitation of traditional similarity metrics in expressing the similarity of rock point clouds. This method comprehensively represents the spatial position and directional differences between data points, thus reducing the similarity between points on different discontinuities. Using this similarity measurement as the clustering criterion for the DBSCAN algorithm, clustering of rock mass point clouds was performed with dynamic clustering parameters. This process yielded point cloud data for individual discontinuities. Subsequently, the clustering results were corrected, enabling intelligent identification of rock discontinuities. The method was applied to point cloud data from sedimentary rock outcrops, and the results were compared with those obtained using a discontinuities occurrence information comparison algorithm. The findings reveal that the results from the two methods are largely consistent, with the maximum deviation in occurrence being within 2°. This meets the requirements for engineering applications, providing a theoretical foundation and practical reference for similar applications.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1412005 (2024)
  • Zheng Li, Zhizhong Deng, Pengfei Wu, and Bin Liang

    Two visual methods are mainly used for measuring surface roughness based on laser speckle images. One method involves establishing the relationship between artificially designed speckle image feature parameters and surface roughness, and the other requires building a deep learning network prediction model. Both methods have limitations. The former involves a complex process in the feature parameter design, whereas the latter requires many sample images. This study proposes a method for predicting surface roughness based on laser speckle images and convolutional neural network-support vector regression (CNN-SVR). The proposed method incorporates transfer learning into a pretrained CNN, in which the deep features from the pooling layer of the network are input into an SVR network for surface roughness prediction. This approach automates the extraction of laser speckle image features and achieves high-precision predictions of surface roughness values with a few samples. Experimental results have demonstrated that the established model exhibits high accuracy in predicting the average absolute percentage errors of the surface roughness for plane grinding, horizontal milling, and vertical milling specimens, which are 3.46%, 3.20%, and 3.53%, respectively.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1412006 (2024)
  • Yaokun Wei, Yunjiang Kang, Danwei Wang, Peng Zhao, and Bin Xu

    In industrial settings, with densely arranged and distributed industrial parts, the use of horizontal box object detection often leads to issues, such as incorrect selection, missing parts, and loss of boundary direction. In this study, we propose a rotating workpiece object detection algorithm based on an enhanced version of YOLOv5s. First, a free parameter SimAM network is introduced to prioritize crucial information without increasing the number of model parameters. This enhancement enhances feature extraction in complex backgrounds and mitigates noise interference. Second, the original complete intersection over union (CIoU) regression function is replaced with the SIoU function, which incorporates an angle factor, aligning more with the rotation box detection. Substituting the activation function with Mish further enhances the model's convergence speed and accuracy. The algorithm introduces the phase-shifting coding method and an improved HardL-Tanh activation function to realize the prediction of angle and regression angle cosine values, thereby overcoming the angle multiuniformity and boundary problems associated with the five-parameter representation method and realizing the rotation frame detection of the workpiece. Experimental results demonstrate a mean accuracy precision of 97.4%, highlighting the proposed algorithm's advantages, including smaller weight files, higher average accuracy, and reduced prediction time. These qualities align with the real-time requirements of industrial applications.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415001 (2024)
  • Fangbin Wang, Kun Cao, Xue Gong, Darong Zhu, and Ping Wang

    Photovoltaic power stations have the environmental characteristics of large-scale sites, sparse structural elements, and narrow corridors caused by arrays of photovoltaic modules. In response to the issues of inaccurate pose estimation and incomplete mapping encountered while using the simultaneous localization and mapping (SLAM) algorithm with a two-dimensional LiDAR-equipped inspection robot for location and mapping in photovoltaic power stations, we propose an algorithm by adopting the Cartographer algorithm as a framework and incorporating a front-end optimization strategy based on factor graph optimization. Herein, we construct inertial measurement unit (IMU) factors through preintegration processing and match pose factors from LiDAR data scanning. Then, we jointly add them as constraints to the factor graph for optimization to obtain more accurate estimated poses and embed these poses into the original algorithm for map construction. Additionally, Experiments were conducted in a simulated photovoltaic power station and a simulated narrow corridor with mainstream filtering, Cartographer, and improved algorithms. Results reveal that our improved algorithm generates maps with a higher dimensional accuracy and a more accurate overall description.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415002 (2024)
  • Hanlin Chen, Xiujie Qian, Yannan Yang, and Jianyu Lan

    A targeting and power transmission system based on an improved algorithm derived from YOLOv5 and that utilizes ground-based laser technology is proposed to achieve precise targeting and tracking during real-time remote charging of unmanned aerial vehicles (UAV). The recognition algorithm incorporates convolutional attention mechanisms and small object detection layers that enhance the ground camera's ability to capture photovoltaic battery targets on the UAV. The tracking and targeting process utilizes centroid tracking and adaptive targeting algorithms to align the ground platform with the aerial target, enabling accurate and swift docking of the ground-to-air power transmission device. Both model training and experimental measurements demonstrate that for a photovoltaic battery array with a distance of 10 m from the laser emission end and an area of 4 cm×4 cm, the detection rate is not fewer than 80 frames/s, enabling precise recognition and targeting of UAV targets with a flight speed of less than 0.5 m/s. Therefore, this system possesses the characteristics of high-speed and high-precision targeting as well as those of simple emitter and receiver devices, making it a convenient and efficient laser wireless power transfer and targeting system for UAV.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415003 (2024)
  • Min Wang, Mingfu Zhao, Tao Song, Weiwei Li, Yuan Tian, Cheng Li, and Yu Zhang

    In this study, a multiview stereo network reconstruction method based on a feature aggregation transformer was proposed to address the problem of blurred matching in areas with weak textures or non-Lambertian surfaces. This is caused by the lack of understanding of the overall image and connections between images in existing multiview stereo methods. Initially, the input image extracted features by fusing deformable convolutional feature pyramid networks. Further, the size and shape of the receptive field were adaptively adjusted. Subsequently, a Transformer-based spatial aggregation module was introduced to capture the texture features of scenes more accurately for feature aggregation using the intra-image self-attention mechanism. This yielded the intra-view global contextual information and inter-image cross-attention mechanism to efficiently obtain inter-view information interactions, thereby achieving a reliable feature match by capturing the texture features of scenes more accurately. Finally, visibility cost aggregation was employed to estimate pixel visibility information to remove noisy and mismatched pixels from cost aggregation. Experimental results on the DTU and Tanks & Temples datasets show that the proposed method achieves superior reconstruction performance compared with other methods.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415004 (2024)
  • Zhao Tu, Jianfeng Zhong, Wei Wei, Shoujiang Chi, Dongming Liu, Guiyong Guo, and Shuncong Zhong

    We propose a method for pixel coordinate correction to address subtle camera offsets in an eye-in-hand scheme caused by external factors in a machine-vision positioning guidance system. This method involves establishing a camera-offset pixel coordinate estimation model based on the pinhole camera projection model to eliminate the influence of camera offset on the initial calibration parameters. By dividing the image region into subregions and designing the camera confirmation bit, we analyze the parameters that describe the camera-offset model in each subregion. This allows for the determination of the object feature points' pixel coordinates prior to offset using the post-offset pixel coordinates as known conditions, thereby ensuring the validity of the initial calibration parameters. Experimental results demonstrate that our method can effectively correct the pixel coordinate offset errors induced by changes in the camera's fixation condition with a resolution of 1 pixel.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415005 (2024)
  • Xianfeng Yang, Chen Liao, Chang Duan, Hui Shu, Mengjun Lai, and Chao Zhang

    Conventional LiDAR point-cloud compression methods often lead to a decrease in the total number of points and coordinate accuracy of the remaining points. Addressing the limitations of existing optimization methods for point-cloud compression parameters, which frequently overlook the quality loss associated with reducing the number of points, this paper presents a novel approach for the joint optimization modeling of downsampling and quantization parameters in LiDAR point-cloud compression. This method simultaneously tackles both types of losses, thereby improving the compression efficiency of point clouds. Initially, bitstream sizes resulting from compressing point clouds with various parameter pairs are statistically analyzed. Subsequently, an analytical model is developed to elucidate the relationship between the code rate and the pairs of downsampling and quantization parameters. This model is then employed to estimate the minimum distortion of the code rate and the corresponding parameter pairs. Finally, a joint optimization model for downsampling and quantization parameters is formulated based on the relationship between the code rate and the parameter pairs associated with the minimum distortion. The experimental results indicate that the proposed method effectively improves the compression efficiency of point-cloud data. Compared with the baseline encoder, this method achieves a BD-rate improvement of 10.43% on the fit dataset and 16.39% on the test dataset.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415006 (2024)
  • Zhifei Wei, Shaosheng Fan, and Mingxuan Xiong

    Simultaneous localization and mapping (SLAM) algorithm in the process of mapping using LiDAR, due to the changes in the environment and the fixed alignment parameters, it will produce a large cumulative error in the process of point cloud alignment, and the poor suppression of elevation error, which in turn will affect the global alignment accuracy and mapping positioning effect. Aiming at the above problems, a laser inertial SLAM algorithm based on iterative error state Kalman filter (IESKF) and factor map optimization is proposed, with an adaptive parameter adjustment module, a point cloud preprocessing module, a front-end odometry module, and a back-end factor map optimization module. According to the size of point cloud, different key frame distance parameters, alignment parameters, voxel downsampling parameters and ground constraints are decided; K-nearest neighbor (KNN) is used to select key frames to compose local maps, which makes full use of the spatial information in frame-map matching; the point cloud residuals are fused with IMU through the IESKF, which is an adaptive algorithm. The IESKF fuses the point cloud residuals with the a priori position of IMU to obtain the front-end odometry of the filter fusion method; the ground constraints are added in the back-end optimization and combined with the loopback constraints to form the factor map optimization, which improves the global consistency of the map construction. Multi-algorithm comparison experiments are carried out on the M2DGR public dataset and real scenarios, and the experimental results show that, in real scenarios, the proposed algorithm improves the global mapping accuracy by 38%, and reduces the elevation error by 52% compared with the LIO-SAM algorithm that only uses factor graph optimization, and improves the global mapping accuracy by 64%, and reduces the elevation error by 62% compared with the FAST-LIO2 algorithm that only uses IESKF. The results demonstrate that the proposed algorithm has better performance in terms of environmental adaptability and elevation error suppression.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415007 (2024)
  • Zehang Liao, Minqi He, Hao Wu, Wanyang Xia, Zhongren Wang, and Dahu Zhu

    To address the problems of oversegmentation of micro-small plane structures and undersegmentation of hole edges and smooth transition regions during the point cloud segmentation for vehicle body components in existing algorithms, a density-proportional growth consistency (DPGC) segmentation algorithm is proposed in this study. This algorithm is used to accurately segment complex surface-component point clouds by adhering to the principle of DPGC. The specific methodology involves the following steps: first, performing principal component analysis on the scanning point cloud data to calculate the standard-density point cloud, thereby establishing the algorithm's density-proportional benchmark; second, devising an adaptive point-cloud search radius function model to determine the optimal nearest neighbor search radius for each region, enhancing the segmentation accuracy across different feature regions; third, employing an adaptive radius density segmentation algorithm to preliminary screen plane regions using a density-ratio threshold between the point cloud and principal-projection point cloud; and finally, implementing an equal-scale adaptive radius density segmentation algorithm to compute the local-projection point cloud within the search radius. The segmentation is based on the density ratio between the local-projection point cloud and original cloud as well as the density ratio of the equal-scale region, further refining the nonplane regions to achieve the final segmentation result. Results of comparative tests demonstrate that the DPGC segmentation algorithm has a higher intersection over union and surpasses mainstream algorithms such as RANSAC-LS and improved region growth segmentation algorithm, particularly in areas with strong features such as door frames, thereby effectively achieving accurate point cloud segmentation of body components.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415008 (2024)
  • Zhibo Xu, Lü Qiujuan, Xinbin Gan, Jiamin Tan, and Yongsheng Liu

    In the process of point cloud denoising, after removing large-scale noise points from the point cloud data, there are usually small noise points mixed around the point cloud that are difficult to directly remove. This seriously affects the smoothness of the reconstructed surface and leads to a certain degree of feature distortion in the model. Thus, for small-scale noise points, this study proposes a point-cloud-guided filtering algorithm based on optimal neighborhood feature weighting. The optimal initial neighborhood is selected based on the information entropy function, and feature points are identified by combining surface and normal variations with distance features. The neighborhoods of the feature points are adaptively grown to obtain a smooth neighborhood. The guided filtering algorithm is adjusted by surface variation weighting to achieve anisotropic smoothness of the feature and non-feature parts of the complex surface part. As evidenced by experimental results, the proposed algorithm exhibits a more obvious smoothing effect on noisy point clouds, performs better in feature retention, and is significantly more efficient than several commonly used smoothing algorithms.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1415009 (2024)
  • Xiaoyu Du, Jiefu Li, Chao Zhao, Yukang Shu, Hang Zhao, Xiaofeng Shi, and Jun Ma

    Growth and development of zebrafish are valuable areas of research as these fish serve as an excellent animal model in biomedicine. This study used optical coherence tomography (OCT) to continuously monitor the three-dimensional imaging of zebrafish development, creating a comprehensive model for their growth and development. Furthermore, the study evaluated the morphology of key organs, such as the eyes, brain, and heart, through quantitative analysis. The findings revealed that zebrafish undergo transition from inner to outer ovary development primarily during the embryonic phase, with the most significant changes in shape and internal composition occurring during the larval phase. Basic organ development is completed during the juvenile stage, whereas the gonads reach maturity during the adult stage. Although the brain does not reach full maturity until adulthood, eye development is primarily functional during the embryonic stage. This growth pattern is significantly different from that observed in other species. Moreover, during the developmental stage of zebrafish, there is an approximately 60-fold increase in the ventricular area. This study successfully captured in vivo images of the entire zebrafish developmental process and examined the growth status of its internal organs. These results provide crucial information for future studies on growth and development using zebrafish as a model.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1417001 (2024)
  • Gui Ru, Ling Qin, Fengying Wang, Xiaoli Hu, and Yanhong Xu

    This study proposes a visible light positioning system to enhance the accuracy of underground positioning in coal mines and simplify the positioning system based on a simple circulation unit (SRU). The system comprises a single LED light and four photodetectors, where the four photodetectors are positioned on the front, back, left, and right positions of a safety helmet, with the point to be measured located at the top center of the helmet. The SRU neural network predicts the position information of the measured point. Simulation results show that within the positioning area of 3.6 m × 3.6 m × 3 m, the proposed system achieves a positioning accuracy of 1.42 cm, an average positioning time of 0.59 s, and 97% point positioning errors within 2.3 cm. Compared with other positioning algorithms, the proposed system demonstrates substantially enhanced positioning accuracy. To further validate the system's performance, the entire positioning system is implemented in an actual environment. The experimental results reveal an average positioning error of 10.21 cm, which meets the requirements for underground positioning in coal mines.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428001 (2024)
  • Xiaoying He, Weiming Xu, Kaixiang Pan, Juan Wang, and Ziwei Li

    It is challenging to directly obtain global information of existing deep learning-based remote sensing intelligent interpretation methods, resulting in blurred object edges and low classification accuracy between similar classes. This study proposes a semantic segmentation model called SRAU-Net based on Swin Transformer and convolutional neural network. SRAU-Net adopts a Swin Transformer encoder-decoder framework with a U-Net shape and introduces several improvements to address the limitations of previous methods. First, Swin Transformer and convolutional neural network are used to construct a dual-branch encoder, which effectively captures spatial details with different scales and complements the context features, resulting in higher classification accuracy and sharper object edges. Second, a feature fusion module is designed as a bridge for the dual-branch encoder. This module efficiently fuses global and local features in channel and spatial dimensions, improving the segmentation accuracy for small target objects. Moreover, the proposed SRAU-Net model incorporates a feature enhancement module that utilizes attention mechanisms to adaptively fuse features from the encoder and decoder and enhances the aggregation of spatial and semantic features, further improving the ability of the model to extract features from remote sensing images. The effectiveness of the proposed SRAU-Net model is demonstrated using the ISPRS Vaihingen dataset for land cover classification. The results show that SRAU-Net outperforms other models in terms of overall accuracy and F1 score, achieving 92.06% and 86.90%, respectively. Notably, the SRAU-Net model excels in extracting object edge information and accurately classifying small-scale regions, with an improvement of 2.57 percentage points in the overall classification accuracy compared with the original model. Furthermore, it effectively distinguishes remote sensing objects with similar characteristics, such as trees and low vegetation.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428002 (2024)
  • Keran Li, Ligang Li, Zehao He, Hongbing Xu, and Yongshou Dai

    Due to the weak ability of the water surface to reflect laser and the interference caused by water surface fluctuation, traditional laser simultaneous localization and mapping (SLAM) methods suffer from low positioning accuracy and poor robustness for unmanned boats in nearshore water scenarios. To solve this issue, a laser SLAM method based on embankment feature extraction (EF-SLAM) has been proposed in this study. First, EF-SLAM introduced stable water edge points, which are reflected from the shoreline and are distributed within a consistent range of water surface elevations, for matching. This is done to reduce the elevation estimation errors of the lidar odometer in nearshore water scenarios. Then, a water-edge-point extraction method based on point cloud forward projection was developed. Subsequently, a feature association and matching approach between frames and local maps was employed to facilitate the matching of water edge points. Additionally, a residual distance calculation for water edge point to water edge point distances was constructed to estimate the relative pose changes between radar frames. Finally, experiments were conducted using the USVinland public dataset and real-world data from the Qingdao Guzhenkou dataset. Results demonstrate that EF-SLAM effectively mitigates pose drift in odometer readings. Moreover, it exhibits higher positioning accuracy and improved robustness than mainstream laser-based SLAM algorithms.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428003 (2024)
  • Shanxue Chen, and Shaohua Xu

    Non-negative matrix decomposition (NMF) based on Euclidean distance standard is easy to cause unmixing failure in hyperspectral images with noise and abnormal pixel pollution. In order to suppress the influence of noise or abnormal pixels, the NMF model based on Cauchy loss function is adopted to improve the robustness of unmixing. Because the suppression of outliers may destroy the intrinsic abundance structure of hyperspectral images. Therefore, in order to ensure that the original hyperspectral internal data is not destroyed, the graph Laplace constraint is introduced into the model. At the same time, in order to improve the sparsity of abundance matrix and improve the performance of unmixing, a reweighted sparse constraint term is introduced, and a decomposition algorithm of Cauchy non-negative matrix based on graph Laplacian regularization (CNMF-GLR) is proposed. Considering the requirement of Laplacian constraint on neighborhood selection, this paper uses local neighborhood weighting method to determine local neighborhood by rectangular window structure. By comparing with other classical algorithms with the same initialization conditions on simulated and real data sets, the proposed algorithm is proved to have better robustness and unmixing performance.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428004 (2024)
  • Guanyuan Feng, Jian Zhang, Yu Miao, Zhengang Jiang, Weili Shi, and Xin Jin

    To address the inaccuracies of indoor map construction due to insufficient information derived from single data sources and errors in multi-sensor calibration, an indoor floor plan construction algorithm is proposed based on multi-source data fusion. The RGB-D sensor was used to obtain 3D structural features in indoor scenes, and the features were then integrated into LiDAR point clouds in the algorithm. Accordingly, an indoor floor plan with 3D structural features was constructed. In the algorithm, images collected by a depth camera in the RGB-D sensor were first converted into pseudo-LiDAR point clouds. Next, a filter based on polynomial function fitting was used to stratify and calibrate the pseudo-LiDAR point clouds, and then the LiDAR and pseudo-LiDAR point clouds were fused. Finally, the fused point cloud data were used to create the indoor 2D floor plan. Experimental results show that the proposed point-cloud hierarchical filtering and calibration method effectively fuses the LiDAR and pseudo-LiDAR point clouds, and the accuracy of the indoor 2D floor plan constructed by the fusion point clouds is significantly improved.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428005 (2024)
  • Xu Chen, and Mingchang Shi

    In the process of extracting buildings from high-resolution remote sensing imagery, convolutional neural networks (CNNs) struggle to balance global information, edge details, and underlying texture information, leading to blurred edges and incomplete results. To address this, we propose a dual-branch remote sensing building extraction network based on texture enhancement, named texture enhancement and Outlook attention U-shaped network (TEOA-UNet), which combines shallow CNNs with Transformer networks. First, to learn global contextual information while maintaining focus on local buildings, we introduce the Outlook attention mechanism. Then, to improve the model's perception of building edges, we utilize an edge aware module (EAM) to encourage the learning of edge information. Finally, to enhance the model's sensitivity to low-level textures, we put forward a texture enhancement branch based on a shallow convolutional network to strengthen the model's learning capability of low-level features. Experimental results on the Massachusetts, WHU, and Inria building datasets demonstrate that TEOA-UNet performs well in extracting buildings from remote sensing imagery with different resolutions and scenes, effectively improving the precision and completeness of building edge segmentation. The F1 score reached 88.54%, 95.22%, and 90.94% on the aforementioned datasets, respectively, which is a respective increase of 1.72 percentage points, 0.49 percentage points , and 0.23 percentage points over the Baseline model SDSC-UNet. These results indicate that TEOA-UNet possesses high extraction accuracy.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428006 (2024)
  • Zhaofei Xu, Jian Liao, Hongcheng Wang, Chong Kang, Wei He, and Wuyue Wang

    In advanced autonomous driving, the fusion of infrared camera and LiDAR can realize the vehicle's all-day, all-weather accurate perception of the surrounding complex environment. Accurate alignment is an indispensable step to realize multisensor fusion, and it is necessary to realize accurate alignment before fusion. Because infrared cameras have no color information, the traditional calibration method can be used for infrared camera and the LiDAR, and the calibration can only be performed based on a special heating target. However, such methods are costly, cumbersome to operate, and have limited calibration distances and low accuracy. Therefore, this study proposes an infrared and LiDAR calibration algorithm for road scenes. The proposed algorithm does not need a special target and can achieve accurate calibration at different distances according to the characteristics of the vehicles, pedestrians, lane lines, and road poles in different directions. Experimental results show that the proposed scheme can improve the robustness and security of automatic driving systems in harsh environments such as night, rain, fog, sand, and backlight. The proposed scheme can achieve more than 90% calibration accuracy for pedestrians and cars within longitudinal distances of 0 to 100 m and lateral distances of -10 m to 10 m.

    Aug. 25, 2024
  • Vol. 61 Issue 14 1428007 (2024)
  • Qi Liu, Lin Cao, Shu Tian, Kangning Du, Peiran Song, and Yanan Guo

    In recent years, convolutional neural network (CNN), with its powerful feature representation capabilities, has made remarkable achievements in the change detection of remote sensing images. However, CNN has shortcomings in modeling the long-range dependencies of dual-temporal images, resulting in the poor recognition of structural information. In contrast, the Transformer technology can effectively capture the long-distance dependencies between input pixels, thereby helping in perceiving and reasoning structural information in images. To solve the problem that existing change detection methods cannot consider global and local feature information in the model, a multiscale cascaded CNN-Transformer hybrid network was proposed in this study. This algorithm can completely use the global and local semantic information on a hybrid network and improve the ability of the model to perceive changes in object structures and semantic information. The cascade network enhances the interaction ability between various scales, making it easier for the model to understand the differences and connections between features with different granularities. In addition, in this study, feature weights were adjusted at various scales to improve the ability of the model to use multiscale information. The F1-score of the proposed method reaches 97.8% and 87.1% on the CDD and GZ-CD datasets, respectively. Experimental results on the two standard datasets show that this method can effectively use feature information with various scales to improve the change detection accuracy of the model.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1428008 (2024)
  • Yao Zhou, and Peng Fei

    The restricted optical aperture and limited measurement bandwidth of microscopy impose constraints on information acquisition, particularly during the observation of dynamic processes within fine subcellular structures and ultrafast and transient biological events in vivo, and efficient three-dimensional imaging of mesoscopic ex vivo tissues within biological systems. This limitation represents a formidable hurdle in the landscape of multidisciplinary biomedical research. Traditional constraints associated with fluorescence microscopy have prompted studies on innovative principles and methodologies. By integrating artificial intelligence, efforts have been directed toward enhancing the speed and precision of fluorescence microscopy imaging, thereby augmenting information throughput. In this study, a meticulous analysis of problems posed by throughput limitations encountered in the fields of cell biology, developmental biology, and tumor medicine. Through the integration of artificial intelligence, traditional constraints associated with fluorescence microscopy throughput were surmounted. This pioneering approach paves the way for the advancement of physical optics and image processing and greatly contributes to the evolution of biomedical research. This study offers comprehensive insights into intricate phenomena within the realms of life and health, not only holding paramount importance for biomedical exploration but also unveiling promising avenues for future studies and applications.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1400001 (2024)
  • Mingyuan Li, and Fengzhou Fang

    Abnormal intraocular pressure within the human eye is one of the main manifestations of glaucoma. In the early stages of the disease, patients often do not experience significant discomfort, making it difficult to be aware in a timely manner. If the condition is not treated promptly, it may lead to complete blindness. Early diagnosis of glaucoma can effectively prevent permanent vision loss. Clinical manual examinations are a viable solution, but they are not only time-consuming and labor-intensive but also require doctors to possess specialized knowledge and experience. Existing research results indicate that integrating artificial intelligence technology into imaging for the prevention and detection of glaucoma is efficient and accurate. This article systematically introduces the latest developments in the field of glaucoma auxiliary diagnosis based on artificial intelligence, discusses various published algorithm models, summarizes the challenges in such research, and outlines possible future research directions. It provides a comprehensive and in-depth review of the current research status and future development trends in intelligent glaucoma detection technology based on multi-modal assessment.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1400002 (2024)
  • Dan Wang, Qiong Ding, Runyuan Zhang, and Yuwei An

    LiDAR can obtain the three-dimensional space coordinates of the target, while receiving the echo radiation intensity reflected from the target, which can achieve the integrated acquisition of geometric characteristics and spectral information for targets. However, due to the influence of various factors, such as distance, incident angle, and atmospheric attenuation, the quality of the original intensity signal cannot meet the requirement of subsequent applications, then affecting the promotion of space and spectrum integration application. In this paper, starting from the generation of LiDAR intensity, its principles, characteristics, and applications are discussed. Then, the research status of intensity correction at home and abroad is reviewed, concluded, and summarized. Finally, the possible future research directions are prospected. With the increasing demand for integrated acquisition of spatial-spectral data by remote sensing technology in the future, the advantages of spatial-spectral integration of LiDAR technology will be greatly improved and widely used.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1400003 (2024)
  • Yifei Wu, Rui Yang, Lü Qishen, Yuting Tang, Chengmin Zhang, and Shuaihui Liu

    Image fusion aims to integrate complementary information from diverse source images, generating a composite image with higher quality, increased information content, and enhanced clarity. Infrared and visible light image fusion (IVIF) stands out as a focal point in the field of image fusion. This paper employs the Systematic Review method to conduct a comprehensive analysis and review of the publication trends in the last two decades within three major engineering online literature databases related to IVIF. The focus is on an in-depth examination and presentation of IVIF algorithms based on deep learning till August 2023. Additionally, a systematic analysis of performance evaluation methods in the IVIF domain is provided, including a categorized comparison of various evaluation method formulas and their specific components. Finally, the paper concludes with a summary and outlook on the future technological trends in IVIF, offering valuable insights for prospective research in this field.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1400004 (2024)
  • Shuchen Lin, Dejian Wei, Shuai Zhang, Hui Cao, and Yuzheng Du

    Knee osteoarthritis is a common traumatic and degenerative bone and joint disease that can induce various pathological changes due to injuries to various knee structures. Magnetic resonance imaging plays a crucial role in the clinical diagnosis of knee osteoarthritis. Currently, the use of deep learning models to extract depth features from knee joint images and achieve segmentation and lesion recognition of various knee joint structures has become a research hotspot in the field of auxiliary diagnosis of knee joint diseases. First, this study discussed the advantages and disadvantages of various imaging techniques for the knee joint, focusing on magnetic resonance multisequence imaging technology. Then, it highlighted current status of deep learning models used for diagnosing knee joint cartilage, meniscus, and other tissue structural lesions. Furthermore, it addressed the limitations of existing recognition models and introduced two model optimization technologies: knowledge distillation and federated learning. Finally, this study concluded by outlining future research directions.

    Jul. 25, 2024
  • Vol. 61 Issue 14 1400005 (2024)
  • Please enter the answer below before you can view the full text.
    Submit