To address the challenges of uneven inter-class distribution and difficulty in lesion area recognition in retinal fundus image datasets, this paper proposes a fusion dual-attention retinal disease grading algorithm with PVTv2 and DenseNet121. First, retinal images are preliminarily processed through a dual-branch network of PVTv2 and DenseNet121 to extract global and local information. Next, spatial-channel synergistic attention modules and multi-frequency multi-scale attention modules are applied to PVTv2 and DenseNet121, respectively. These modules refine local feature details, highlight subtle lesion features, and enhance the model's sensitivity to complex micro-lesions and its spatial perception of lesions areas. Subsequently, a neuron cross-fusion module is designed to establish long-range dependencies between the macroscopic layout and microscopic texture information of lesion areas, thereby improving the accuracy of retinal disease grading. Finally, a hybrid loss function is employed to mitigate the imbalance in model attention across grades caused by uneven sample distribution. Experimental validation on the IDRID and APTOS 2019 datasets yields quadratic weighted kappa scores of 90.68% and 90.35%, respectively. The accuracy on the IDRID dataset and the area under the ROC curve on the APTOS 2019 dataset reached 80.58% and 93.22%, respectively. The experimental results demonstrate that the proposed algorithm holds significant potential for application in retinal disease grading.
Aiming at the problem of leakage and misdetection caused by the high percentage of sample overlapping and occlusion, the difficulty of key feature extraction, and the large background noise in X-ray security images, an adaptive panoramic focusing X-ray image contraband detection algorithm is proposed. Firstly, the foreground feature awareness module is designed to accurately distinguish contraband and background noise by enhancing the edge structure and texture details of the foreground target to improve the accuracy and completeness of feature representation. Then, the multi-path two-dimensional information integration module is constructed by combining the multi-branch structure and dual cross attention mechanism to optimize the feature interaction and fusion in the channel and spatial dimensions, to strengthen the extraction capability of key features, and to effectively suppress the background interference. Finally, a panoramic dynamic focus detection head is constructed, which dynamically adjusts the receptive field through frequency adaptive dilated convolutions to accommodate the feature frequency distribution of small-sized contraband targets, thereby enhancing the model's ability to recognize small targets. Trained and tested on the public datasets SIXray and OPIXray, the mAP@0.5 reaches 93.3% and 92.5%, respectively, outperforming the other compared algorithms. The experimental results show that the proposed model significantly improves the leakage and false detection of contraband in X-ray images with high accuracy and robustness.
Aiming at the problems of temperature sensitivity, setting at room temperature, defocusing at low temperature, and low accuracy of camera focal surface prefabrication at room temperature, a defocusing test technology of low temperature infrared optical system based on opto-machine co-simulation, interferometry, and linear displacement measurement was proposed. The main factors causing low temperature defocusing are analyzed, and the defocusing data is calculated by optical machine simulation. The low temperature interference test optical path of the optical system is established by using the sensitive characteristic of power to focus position, and the low temperature defocusing of the infrared optical system is tested by combining high precision displacement measurement. This technology is used to analyze and test the defocus of a light and small infrared camera with an 181 mm aperture from normal temperature to low temperature. The deviation between the test results and the simulation calculation is less than half of the system focal depth. On this basis, the camera is preset at a normal temperature focal plane, and the experiment shows that the preset focal plane is accurate, which proves the feasibility and accuracy of the low temperature defocus measurement method. The test method can be used for presetting and focusing light and a small optical camera at normal temperature which is sensitive to temperature.
Disturbance suppression, especially high-frequency disturbance suppression beyond the closed-loop bandwidth, is the core of realizing high-precision stability control for tip-tilt correction systems. Repetitive control has good performance of periodic trajectory tracking and disturbance suppression, which is applied to the stability control of high-precision systems. The high-frequency disturbance suppression problem of the tip-tilt correction system is analyzed in this paper, and the performance of high-frequency disturbance suppression based on repetitive control is summarized. To solve the problems of natural frequency drift and waterbed amplification in traditional repetitive controllers, a comb-like repetitive controller based on Youla parameterization is designed to suppress high-frequency disturbances beyond the closed-loop bandwidth. In order to solve the problem that the integer-order repetitive control is only effective for specific frequency points, especially in most high frequency regions, the controller will fail due to disturbance fluctuations and uncertainty, an all-pass frit-order delay filter is optimized to suppress the high frequency disturbance at any frequency point up to Nyquist frequency in the tip-tilt correction system. Finally, a parallel repetitive control scheme is designed to suppress the vibration of aperiodic structures which is difficult to suppress, and its robust stability and effectiveness are discussed.
To address the issue of complex backgrounds in dim scenes, which cause object edge blurring and obscure small objects, leading to misdetection and omission, an improved YOLOv8-GAIS algorithm is proposed. The FAMFF (four-head adaptive multi-dimensional feature fusion) strategy is designed to achieve spatial filtering of conflicting information. A small object detection head is incorporated to address the issue of large object scale variation in aerial views. The SEAM (spatially enhanced attention mechanism) is introduced to enhance the network's attention and capture ability for occluded parts in low illumination situations. The InnerSIoU loss function is adopted to emphasize the core regions, thereby improving the detection performance of occluded objects. Field scenes are collected to expand the VisDrone2021 dataset, and the Gamma and SAHI (slicing aided hyper inference) algorithms are applied for preprocessing. This helps balance the distribution of different object types in low-illumination scenarios, optimizing the model's generalization ability and detection accuracy. Comparative experiments show that the improved model reduces the number of parameters by 1.53 MB, and increases mAP50 by 6.9%, mAP50-95 by 5.6%, and model computation by 7.2 GFLOPs compared to the baseline model. In addition, field experiments were conducted in Dagu South Road, Jinnan District, Tianjin City, China, to determine the optimal altitude for image acquisition by UAVs. The results show that, at a flight altitude of 60 m, the model achieves the detection accuracy of 77.8% mAP50.
With the rapid development of convolutional neural networks (CNNs) and Transformer models, significant progress has been made in remote sensing image super-resolution (RSSR) reconstruction tasks. However, existing methods have limitations in effectively handling multi-scale object features and fail to fully explore the implicit correlations between channel and spatial dimensions, thus restricting further improvements in reconstruction performance. To address these issues, this paper proposes an adaptive dual-domain attention network (ADAN). The network integrates self-attention information from both channel and spatial domains to enhance feature extraction capabilities. A multi-scale feed-forward network (MSFFN) is designed to capture rich multi-scale features. At the same time, an innovative gated convolutional module is introduced to further enhance the representation of local features. The network adopts a U-shaped backbone structure, enabling efficient multi-level feature fusion. Experimental results on multiple publicly available remote sensing datasets show that the proposed ADAN method significantly outperforms state-of-the-art approaches in terms of quantitative metrics (e.g., PSNR and SSIM) and visual quality. These results validate the effectiveness and superiority of ADAN, providing novel insights and technical approaches for remote sensing image super-resolution reconstruction.
This paper proposed a multi-task attention mechanism-based no-reference quality assessment algorithm for screen content images (MTA-SCI). The MTA-SCI first used a self-attention mechanism to extract global features from screen content images, enhancing the representation of overall image information. It then applied an integrated local attention mechanism to extract local features, allowing the focus to be on attention-grabbing details within the image. Finally, a dual-channel feature mapping module predicted the quality score of the screen content image. On the SCID and SIQAD datasets, MTA-SCI achieves Spearman's rank-order correlation coefficients (SROCC) of 0.9602 and 0.9233, and Pearson linear correlation coefficients (PLCC) of 0.9609 and 0.9294, respectively. The experimental results show that the MTA-SCI achieves high accuracy in predicting screen content image quality.
With the wide application of point clouds in virtual reality, computer vision, robotics and other fields, the assessment of distortions resulted from point cloud acquisition and processing is becoming an important research topic. Considering that the three-dimensional information of point clouds is sensitive to geometric distortion and the two-dimensional projection of point clouds contains rich texture and semantic information, a no-reference point cloud quality assessment method based on the fusion of three-dimensional and two-dimensional features is proposed to effectively combine the three-dimensional and two-dimensional feature information of point cloud and improve the accuracy of point cloud quality assessment. For 3D feature extraction, the farthest point sampling is firstly implemented on the point cloud, and then the non-overlapping point cloud sub-models centered on the selected points are generated, to cover the whole point cloud model as much as possible and use a multi-scale 3D feature extraction network to extract the features of voxels and points. For 2D feature extraction, the point cloud is first projected with orthogonal hexahedron projection, and then the texture and semantic information are extracted by a multi-scale 2D feature extraction network. Finally, considering the process of segmentation and interweaving fusion that occurs when the human visual system processes different types of information, a symmetric cross-modal attention module is designed to integrate 3D and 2D features. The experimental results on five public point cloud quality assessment datasets show that the Pearson’s linear correlation coefficient (PLCC) of the proposed method reaches 0.9203, 0.9463, 0.9125, 0.9164 and 0.9209 respectively, indicating that the proposed method has advanced performance compared with the existing representative point cloud quality assessment methods.
In response to the current limitations of neck pulse monitoring devices, such as being inconvenient to carry and having complex signal processing, a fiber Bragg grating (FBG) based neck pulse monitoring device was designed. The device monitored two volunteers in three states (resting, exercise, and vigorous exercise) while sitting and lying down for 10 s each. Fourier transform was applied to process the data, and the frequency error between the neck pulse device, the wristband, and the pulse oximeter was found to be less than 10%. Pearson correlation analysis was conducted on the periods of different states, with the correlation coefficient exceeding 0.9. Random forest was used for predictive analysis, and the results showed good prediction performance. The analysis indicates that the neck pulse monitoring device is capable of effectively monitoring the pulse in the neck region of the human body.
To address the challenge of balancing the computational accuracy and efficiency in adaptive finite element meshing, this study proposes a GTF-Net model based on the attention fusion mechanism. The model combines the graph attention network with the Transformer architecture, dynamically couples local geometric features with the global physical field through a multi-head cross-attention module, and enhances the representation of singular fields and complex boundaries. The verification of two case studies of waveguide transmission and Bessel equation shows that compared with the traditional Scikit-FEM (skFem) method, GTF-Net improves computational efficiency while reducing the standard deviation of gradient error by 85.9% and 23.8%, respectively. The results show that the model significantly improves the fit between mesh distribution and physical field changes through nonlinear feature mapping, providing a novel deep learning solution for adaptive mesh optimization in engineering calculations.