Computer Engineering
Co-Editors-in-Chief
2025
Volume: 51 Issue 8
36 Article(s)

Aug. 26, 2025
  • Vol. 51 Issue 8 1 (2025)
  • ZHAO Kai, HU Yuhuan, YAN Junqiao, BI Xuehua, and ZHANG Linlin

    Blockchain, as a distributed and trusted database, has gained significant attention in academic and industrial circles for its effective application in the domain of digital copyright protection. Traditional digital copyright protection technologies suffer from issues such as difficulties in tracking infringements, complexities in copyright transactions, and inadequate protection of legitimate rights, which severely hampering the development of digital copyright protection endeavors. The immutability, traceability, and decentralization inherent in blockchain technology provide a highly reliable, transparent, and secure solution to mitigate the risks associated with digital copyright infringement. This overview starts with an introduction to the fundamental principles of blockchain technology. Then, it discusses the latest research findings on the integration of blockchain with traditional copyright protection technologies to address the problems inherent in traditional copyright protection schemes. Further, an evaluation of the practical applications and potential of blockchain is conducted, emphasizing its positive impact on the copyright protection ecosystem. Finally, this overview delves into the challenges and future trends related to blockchain based copyright protection, ultimately aiming to establish a more robust and sustainable blockchain copyright protection system.

    Aug. 26, 2025
  • Vol. 51 Issue 8 1 (2025)
  • Mayilamu Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, Abudukelimu Abulizi, and Halidanmu Abudukelimu

    With the rapid advancement of general artificial intelligence technology, the application of foundational models across various fields has gained increasing attention. In image segmentation, the Segment Anything Model (SAM), as a foundational model, demonstrates notable advantages in enhancing image comprehension and processing efficiency. While SAM achieves state-of-the-art performance in image segmentation, further optimization in power consumption, computational efficiency, and cross-domain adaptability is required. This review provides an in-depth exploration of the potential improvements to SAM across several crucial dimensions, such as enhancing speed and computational efficiency, improving model accuracy and robustness, increasing adaptability and generalization, optimizing prompt engineering, and boosting data utilization and transfer learning capabilities. With these enhancements, SAM is expected to sustain high efficiency in highly complex tasks and better meet requirements of various fields and application contexts. In addition, this review summarizes the practical applications of SAM in various fields, including medical imaging, remote sensing, and the mechanical industry, and demonstrates the suitability and challenges of the model in different scenarios. Moreover, this review provides a detailed overview of commonly used datasets and evaluation metrics in the field of image segmentation. Through experimental comparative analyses, the impact of Vision Transformer (ViT) variants on the performance of SAM is assessed, along with performance evaluations of enhanced models, such as EfficientSAM, EfficientViT-SAM, MobileSAM, and RobustSAM. The challenges faced by SAM and its improved models in real-world applications are also discussed, and future research directions are proposed. This review aims to provide researchers with a comprehensive understanding of the advancements and applications of SAM and its variants, offering insights that may inform the development of new models.

    Aug. 26, 2025
  • Vol. 51 Issue 8 16 (2025)
  • WANG Qun, LI Fujuan, and MA Zhuo

    Autonomous Systems (ASes) that constitute the Border Gateway Protocol (BGP) have different interests and route policies. When actual route announcements exceed expected boundaries, route leakages can occur, leading to network security incidents caused by route redirection. In the propagation of BGP route information, ASes unconditionally trust and accept the routes declared by neighboring ASes. Additionally, each AS independently configures its own local policies and keeps this information secret, which complicates the verification of this route policy. This has been a persistent and unresolved challenge in the field of BGP security. Blockchain technology, with its inherent characteristics of decentralization, traceability, immutability, and transparency, offers a promising infrastructure for digital resource authentication and trust among ASes, potentially serving as a key technology for addressing the threat of route leakages. This study first clearly defines the relationships between neighboring ASes, as well as between the GR (Gao-Rexford) model and BGP route policies, elucidating the root causes of route leakages and the challenges in their prevention. Additionally, it reviews the research on traditional solutions to route leakages, focusing on their strengths, weaknesses, and unresolved issues. Subsequently, it proposes the advantages and technical approaches of using blockchain technology to defend against BGP route leakages and explores the principles and application characteristics of typical solutions. Finally, it discusses the existing challenges and outlines future research directions.

    Aug. 26, 2025
  • Vol. 51 Issue 8 39 (2025)
  • GAO Jia, and XU Yun

    With advancements in sequencing technology, human genome analysis has shifted from individual analysis to population analysis. To better demonstrate the genetic variation information between different samples within a population, the pan-genome graph model has replaced the traditional linear multi-sequence reference genome model, and sequence-to-graph alignment has become a key issue in biological sequence analyses. Existing alignment algorithms employ seed-and-extend strategies. However, owing to the numerous paths formed by graph combinations, localization and verification phases become time-consuming, necessitating further optimization and improvement of single-seed selection methods. To address this issue, this paper proposes a sequence alignment algorithm based on a combined minimizer seed. In the localization phase, the algorithm enhances the coverage range of a single seed through the combined hashing of minimizer seeds. Simultaneously, seeds are located through both sequence and relative position information, which significantly reducing the number of false-positive matching positions, thus lowering the workload of the subsequent filtering and verification processes. Experimental results demonstrate that the proposed algorithm can reduce candidate positions by approximately 80%, optimize time performance by one to three times, and have index memory and precise comparison capabilities comparable to mainstream alignment algorithms.

    Aug. 26, 2025
  • Vol. 51 Issue 8 53 (2025)
  • LIU Genhao, ZHANG Neng, and ZHENG Zibin

    Application Programming Interface (API) usage constraints are the conditions or restrictions that developers must follow when invoking APIs to ensure correct usage and prevent misuse. API documentation is an important tool for extracting these constraints. Existing Natural Language Processing (NLP)-based methods for extracting API usage constraints often rely on syntactic patterns, but their ability to handle complex coordinated sentences and impose strict requirements on syntactic structures is limited. To address these issues, this paper proposes an API usage constraint knowledge extraction method based on Large Language Model (LLM), referred to as AUCK. AUCK first preprocesses Java API documentation and extracts sentences containing API usage constraints. It then summarizes the syntactic patterns of coordinated sentences and designs corresponding cases to guide a LLM to decompose coordinated sentences into simple sentences. Finally, it summarizes the syntactic patterns of triplets and design cases to guide the LLM in extracting API usage constraint triplets. Experimental results on Java API documentation show that AUCK achieves an accuracy of 92.23% and recall of 93.14%, significantly outperforming existing methods, including DRONE (accuracy: 80.61%, recall: 86.81%), the mainstream triplet extraction tool OpenIE (accuracy: 76.92%, recall: 52.63%), and the large language model ChatGPT-3.5 (accuracy: 82.23%, recall: 67.71%). In addition, the application of AUCK to Android and Python API documentation verifies its good transferability.

    Aug. 26, 2025
  • Vol. 51 Issue 8 74 (2025)
  • LIU Ye, LIU Xixiang, and XU Hao

    In response to the limited robustness caused by the narrow field-of-view of traditional cameras and the constraints imposed by image distortion in existing panoramic Simultaneous Localization and Mapping (SLAM) algorithms, this study proposes a novel panoramic visual SLAM technology based on spherical mapping. By constructing a spherical grid using Goldberg polyhedra, the mapping relationship between pixels in panoramic two-dimensional images and spherical pixels is established, enabling feature extraction and matching on the spherical grid. Pose estimation is constrained by the polar curve equation, which facilitates the determination of three-dimensional point coordinates. Moreover, the Jacobian matrix of the optimization variables is derived to realize panoramic SLAM. This method fully exploits the geometric characteristics of panoramic cameras, effectively extracts information from panoramic images, and mitigates the effects of distortion. Experimental results demonstrate that the proposed algorithm enhances the capability of extracting information from panoramic image features, increases the quantity and accuracy of feature matches, and ensures the SLAM algorithm′s trajectory accuracy. Compared to the existing OpenvSLAM algorithm, the proposed method achieves higher localization precision and stability.

    Aug. 26, 2025
  • Vol. 51 Issue 8 86 (2025)
  • XIAO Yilong, DENG Yiqin, and CHEN Zhigang

    This study proposes a novel acceleration method for the Neural Radiance Field (NeRF) in dynamic 3D human reconstruction to address the challenges of low training efficiency and high computational complexity in volume rendering. To improve the ability of the NeRF to represent detailed local features, multiresolution hash encoding is used as positional encoding, which increases the NeRF's convergence speed. In addition, a shallow network is designed to estimate the volume density of the NeRF. An opacity loss function is proposed to optimize the network using the human alpha map output obtained by PP-Matting. The proposed density estimation network is used to compute the transmittance distribution along the camera rays during volume rendering. The importance sampling strategy for volume rendering is then implemented by inversely sampling the transmittance distribution, which reduces the number of unnecessary sampling points and improves the volume rendering's computational efficiency. Furthermore, precise human foreground masks are generated by binarizing human alpha maps, which enhances the quality of the reconstructed datasets. Extensive experiments demonstrate that the combination of multiresolution hash encoding and importance sampling strategy improves the reconstruction speed on the ZJU-MoCap and SHTU-MoCap datasets by 17.7%, 9.5%, and 37.5%, respectively, compared to the Neural Body, HumanNeRF, and MonoHuman, while also achieving higher reconstruction accuracy. The use of binarized PP-Matting increases the accuracy of human masks to over 96%.

    Aug. 26, 2025
  • Vol. 51 Issue 8 95 (2025)
  • WU Donghui, WANG Jinfeng, QIU Sen, and LIU Guozhi

    Sign language recognition has received widespread attention in recent years. However, existing sign language recognition models face challenges, such as long training times and high computational costs. To address this issue, this study proposes a hybrid deep learning method that integrates an attention mechanism with an Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network based on data obtained from a wearable data glove, EWBiLSTM-ATT model. First, by widening the first convolutional layer, the model parameter count is reduced, which enhances computational speed. Subsequently, by deepening the EWDCNN convolutional layers, the model's ability to automatically extract features from sign language is improved. Second, BiLSTM is introduced as a temporal model to capture the dynamic temporal information of sign language sequential data, effectively handling temporal relationships in the sensor data. Finally, the attention mechanism is employed to map the weighted sum and learn a parameter matrix that assigns different weights to the hidden states of BiLSTM, allowing the model to automatically select key time segments related to gesture actions by calculating the attention weights for each time step. This study uses the STM32F103 as the main control module and builds a data glove sign language acquisition platform with MPU6050 and Flex Sensor 4.5 sensors as the core components. Sixteen dynamic sign language actions are selected to construct the GR-Dataset data training model. Under the same experimental conditions, compared to the CLT-net, CNN-GRU, CLA-net, and CNN-GRU-ATT models, the recognition rate of the EWBiLSTM-ATT model is 99.40%, which is increased by 10.36, 8.41, 3.87, and 3.05 percentage points, respectively. Further, the total training time is reduced to 57%, 61%, 55%, and 56% of the comparison models, respectively.

    Aug. 26, 2025
  • Vol. 51 Issue 8 107 (2025)
  • ZHANG Zhaoli, LI Jiahao, LIU Hai, SHI Fobo, and HE Jiawen

    It is very difficult for traditional Knowledge Tracing (KT) models to model learners' knowledge state changes in long interaction sequences. This study introduces an attention mechanism model represented by a Transformer to capture potential information in learners' long interaction sequences that exhibits good performance. However, when modeling the learning process, existing models often ignore the differences in learners' abilities and focus mainly on the accumulation of knowledge mastery states, failing to fully model the forgetting benefit of learners. In this study, a Knowledge Tracing Method based on Personalized Forgetting Modeling (PFKT) is proposed that models learners' answering ability by introducing additional characteristic information and further explores learners' differentiated memory-forgetting ability. Specifically, this method starts with the historical interaction sequence of learners and comprehensively considers the acquisition and forgetting of knowledge points to capture the state of the learners' real knowledge mastery. Simultaneously, combined with additional characteristic information, personalized forgetting phenomenon modeling is realized more accurately. Experimental results demonstrate that the proposed PFKT model achieves better performance than existing models on the ASSISTments2017 and Algebra 2005-2006 datasets.

    Aug. 26, 2025
  • Vol. 51 Issue 8 120 (2025)
  • XIA Niming, and ZHANG Jie

    Deep Neural Network (DNN) are extremely vulnerable to adversarial examples, where subtle perturbations to legitimate inputs may cause the model to yield erroneous outputs. Exploring adversarial attacks can promote the robustness of deep learning models and advance the interpretability of DNN. Existing methods for generating Chinese adversarial examples typically employ simple transformation strategies, with emphasis on isolated Chinese linguistic features without considering the contextual effect of attacks. Hence, a heuristic-based algorithm known as the BSCA is proposed in this study. By comprehensively analyzing the linguistic variations and incorporating prior knowledge of Chinese character formation, phonics, and formality, a strategy for accurately assessing Chinese character deviations is designed. The adversarial search space is constructed based on this deviation strategy, and an improved beam search algorithm is utilized to optimize the generation process of Chinese adversarial examples in black-box attacks. Under strict constraints on perturbance and semantic deviation, BSCA can automatically adapt to different scenario requirements. Experimental evaluations conducted on TextCNN, TextRNN, and Bidirectional Encoder Representations from Transformers (BERT) for two Natural Language Processing (NLP) tasks indicate that BSCA can reduce the classification accuracy by at least 63.84 percentage points while incurring lower attack costs compared with baseline methods.

    Aug. 26, 2025
  • Vol. 51 Issue 8 131 (2025)
  • Lin Hechuan, Xu Huiying, Zhu Xinzhong, Huang Xiao, and Liu Ziyang

    With the continuous progress of information technology, people can use more and more diversified and complex ways to describe things more accurately, which leads to the emergence of multi-view data. Clustering multi-view data is a fundamental and important topic in data mining, machine learning, pattern recognition and other fields. In this era of information explosion, the dimension of data is higher and higher. How to efficiently cluster this kind of data remains a huge challenge. In order to solve the current k-means when dealing with high-dimensional data are multiple views of the shortage problem of ability, this paper proposes a new multiview clustering framework, called the weighted k-means algorithm are multiple views (Self - weighted Multi - view K - means algorithm, SwMKM). First of all, through the adoption of least absolute principles to guide the robustness, this method successfully reduce the effects of outliers on the results. Then, the iterative reweighted least square method is used to solve the minimum absolute residual, and the distribution of multiple weights is adjusted adaptively to achieve the reweighting control. Finally, by introducing a projection matrix with a 2, 1-norm penalty term, the high-dimensional feature space of the original dataset is transformed into a statistically uncorrelated, low-dimensional subspace for feature selection and noise suppression. Experimental results show that the performance on Handwritten numerals, MSRCv1, Outdoor Scene and other datasets is significantly better than other multi-view K-means methods, which proves the superiority of the algorithm.

    May. 10, 2024
  • Vol. 51 Issue 8 141 (2025)
  • FENG Yali, WEN Wen, HAO Zhifeng, and CAI Ruichu

    Sequential recommendations are the personalized, dynamic recommendations are achieved by modeling the sequential behavior of users. However, in the real world, user behavior data often exhibits high sparsity, while the transition relation between items in the behavior sequence changes with item characteristics. Therefore, how to capture the collaborative relation between users and items, while also capturing the transition patterns between items, becomes a crucial problem in sequential recommendation. To address this problem, this paper proposes a kind of collective matrix factorization method that fuses transition relation regularization. The method jointly decomposes the user-item interaction matrix and the Markov transition matrix between items. It sets shared item representation factors during the decomposition process to capture both collaborative relationships and transfer relationships. This alleviates the sparsity problem in user behavior data, thereby achieving effective sequence recommendation. Experimental comparison and analysis on five real-world datasets containing POIs, e-commerce user behavior sequences, movie and music ratings demonstrate that the proposed method outperforms existing state-of-the-art algorithms for sequence recommendation.

    Jul. 29, 2024
  • Vol. 51 Issue 8 151 (2025)
  • He Zhilei, Gao Shengxiang, Zhu Enchang, and Yu Zhengtao

    Cross-language summarization (CLS) aims to summarize and summarize the core content of the text in the source language (e.g., Burmese) with the text in the target language (e.g., Chinese). CLS is essentially a joint task of machine translation and monolingual summarization, which requires the model to have the capabilities of both aspects. In low-resource language scenarios such as Vietnamese and Burmese, cross-language summary training data is scarce, and Chinese, Burmese, and Vietnamese belong to different language families, and the language differences are large, resulting in the current cross-language summary method being less generalizable. Difference. Based on this, taking Burmese-Chinese and Vietnamese-Chinese as the research objects, a cross-language summary method with language relationship enhancement is proposed. This method first converts the input sequence into continuous word pairs, and then calculates the relationship between the source language and the target language. The relationship between these consecutive word pairs; finally, a joint training method of machine translation and monolingual summarization is introduced to effectively capture the relationship between the target language and the source language, improving the model's generalization and processing capabilities for continuous text. Extensive experiments were conducted on self-built data sets. Compared with other baseline models, the method proposed in the study improved the ROUGE-1, ROUGE-2 and ROUGE-L evaluation indicators by 5%, 1% and 4% respectively.

    Jun. 18, 2024
  • Vol. 51 Issue 8 160 (2025)
  • SUN Rongneng, LIU Lin, and KANG Yuanzhao

    Long non-coding RNAs (lncRNA) play important roles in many cellular life processes, and the subcellular localization of lncRNAs can bring key information for their functional identification. In response to the shortcomings of complex procedures, difficulty in replication, and high cost in identifying the subcellular localization of lncRNAs through traditional biochemical experimental methods, An attentional bi-directional long short-term memory (BiLSTM) and prototype network approach towards the prediction of lncRNA subcellular localization is proposed——BP-lncLoc. Firstly, the K-mer initial features are obtained from the original sequence data and balanced; Secondly, it combines the attention BiLSTM to effectively extract the deep implicit features of lncRNA sequences and optimize the neural network to deal with the gradient vanishing problem that may occur when dealing with high-dimensional data; thirdly, the prototype network prediction framework that does not rely on large-scale training samples is constructed for the small-sample nature of lncRNA subcellular localization data; finally, currently existing computational models lack interpretability, i.e., it is not known how the model makes decisions based on the input data, which is becoming more and more important with the rapid development of artificial intelligence and machine learning. In this paper, we achieve the interpretability of predictive models from the perspective of quantifying the importance of input features on output decisions. Compared with the state-of-the-art methods, achieves the best result of 98.89% accuracy on the public dataset, which provides a new idea for lncRNA subcellular localization prediction applications.

    Jul. 24, 2024
  • Vol. 51 Issue 8 168 (2025)
  • MA Manfu, CHEN Jiahao, LI Yong, and ZHANG Cong

    Conventional graph neural network models have limited processing power when handling large-scale graphs and are unable to represent intricate interactions between nodes. They have trouble effectively removing representative subgraphs from such massive graphs, which lowers their precision in both inference and training. This paper proposes a rumor detection model, the Multi-Feature Fusion Rumor Detection Model (MFLAN), which is built on an upgraded graph attention network. First, MFLAN uses a feature fusion approach with an attention mechanism, giving various weights to each feature before performing a weighted sum operation on the original features to produce a fused feature vector. Second, positive positional encoding is added so that the model can obtain a representation of the positional information. Then, a learnable parameter matrix is introduced, which allows the model to automatically learn and optimize parameter values during training. Finally, attention scores are sparsified, with certain irrelevant nodes in the large-scale graph receiving zero attention, resulting in the MFLAN model's attention sparsity. The experimental results show that the MFLAN model obtained accuracy rates of 97.71% on Ma-Weibo and 97.10% on Weibo23, reflecting improvements of 1.07% and 1.12%, respectively, over the Dir-GNN model. Furthermore, the MFLAN model outperformed other rumor detection algorithms across a variety of measures in this investigation.

    Jul. 22, 2024
  • Vol. 51 Issue 8 181 (2025)
  • WANG Shuai, and SHI Yancui

    The sequence recommendation algorithm dynamically models the user's historical behavior to predict the content they may be interested in. This study focuses on the application of contrastive Self Supervised Learning (SSL) in sequence recommendation, enhancing the model's representation ability in sparse data scenarios by designing effective self supervised signals. First, a personalized data augmentation method incorporating user preferences is proposed to address the issue of noise introduced by random data augmentation. This method guides the augmentation process based on user ratings and combines different augmentation methods for short and long sequences to generate augmented sequences that align with user preferences. Second, a mixed-augmentation training approach is designed to address the issue of imbalanced feature learning during training. In the early stages of training, augmentation sequences are generated using randomly selected methods to enhance the model performance and generalization. In the later stages, augmentation sequences with high similarity to the original sequences are selected to enable the model to comprehensively learn the actual preferences and behavior patterns of users. Finally, traditional sequence prediction objectives are combined with SSL objectives to infer user representations. Experimental verification is performed using the Beauty, Toys, and Sports datasets. Compared with the best result in the baseline model, the HR@5 indicator of the proposed method increases by 6.61%, 3.11%, and 3.76%, and the NDCG@5 indicator increases by 11.40%, 3.50%, and 2.16%, respectively, for the aforementioned datasets. These experimental results confirm the rationality and validity of the proposed method.

    Aug. 26, 2025
  • Vol. 51 Issue 8 190 (2025)
  • WEN Minchu, LIANG Wei, and ZHANG Jialin

    The open nature of wireless media poses a challenge for information security. The Time Division Multiple Access (TDMA) protocol is a predominant protocol tailored for time-sensitive industrial applications. Considering the time-slot scheduling characteristics of TDMA-based wireless sensor networks, this study proposes two types of masquerade attack models: an idle time-slot attack model and a retransmission time-slot attack model. In response to these two attack models and starting from the inherent transmission features of TDMA wireless sensor networks while considering their periodic transmission pattern and the fundamental transmission unit being a single time slot, a high-precision intrusion detection method based on fine-grained temporal feature extraction is proposed. First, fine-grained temporal features are extracted in the time dimension by leveraging information such as packet reception time and superframe start time to calculate the positional information of the transmission time slot. Subsequently, the positional information is fed into the Isolation Forest (IF)-an unsupervised learning model-for training and learning. Finally, a legitimacy assessment is conducted on two data packets received from the same node within one superframe cycle that have identical sequence numbers. The experimental results demonstrate that the two proposed masquerade attacks can evade existing intrusion detection methods and the proposed intrusion detection approach can effectively detect these two masquerade attacks. Compared to traditional methods, this approach achieves a 14.5% increase in the detection success rate when the packet loss rate is 30%.

    Aug. 26, 2025
  • Vol. 51 Issue 8 203 (2025)
  • LI Jiasong, CUI Yunhe, SHEN Guowei, GUO Chun, CHEN Yi, and JIANG Chaohui

    The separation of the control and data planes in Software Defined Network (SDN) enables its widespread application in large-scale network scenarios such as data centers, the Internet of Things (IoT), and cloud networks. However, this decoupled network architecture exposes the network to saturation attacks. Detecting saturation attacks based on Graph Neural Network (GNN) is a popular research topic in SDN. Nevertheless, the commonly used k-Nearest Neighbors (k-NN) graph in GNN overlooks short-term flow features, failing to effectively aggregate node information and preventing the model from fully leveraging the temporal characteristics of flows. To enhance the accuracy of saturation attack detection by utilizing both long- and short-term flow features, this study proposes a saturation attack detection method called HGNM, based on long-short-term flow graphs and a hybrid GNN. This method collects long- and short-term flow features by setting two sampling times. Additionally, this study designs a long-short-term flow graph generation method, named LSGH, based on the gray relational coefficient to construct long-short-term flow graphs, ensuring that the flow graphs encompass all features of the flows. The study also devises a hybrid GNN model, GU-GCN, by paralleling the GRU and GCN to capture both the temporal and spatial features of the flows, thereby improving the model's accuracy in detecting saturation attacks. Experimental results demonstrate that, on the generated graphs, the LSGH method outperforms the k-NN and CRAM algorithms in effectively enhancing the detection accuracy of the model. Moreover, compared to the other models, the GU-GCN model exhibits performance improvements in terms of accuracy, precision, recall, F1-score, ROC curve, PR curve, and confusion matrix.

    Aug. 26, 2025
  • Vol. 51 Issue 8 215 (2025)
  • WANG Guangming, LI Dongqing, and JIANG Congfeng

    As an important infrastructure in the information age, data centers provide all types of key information services. Currently, data centers face high levels of network attacks and are the main targets of network attacks. To improve network security, this study focuses on an anomaly detection method for data center network traffic. This study includes feature selection, dataset distribution balance, and abnormal traffic detection. First, a classification method for imbalanced datasets is proposed, and the classification performance is improved using feature engineering and a mixed sampling algorithm. Second, traffic anomaly detection methods based on Random Forest (RF) and Light Gradient Boosting Machine (LightGBM) are introduced to fully utilize their advantages in processing imbalanced data and noise resistance. The experiment uses the CSE-CIC-IDS2018 public dataset for verification. The results show that the proposed algorithm has a high precision and recall; among the 15 traffic types, the classification precision of 9 types is higher than 90%, and the classification precision of 13 types is higher than 74%. This study is significant for improving data center security, service quality, and network traffic anomaly detection. It not only provides an effective means to address escalating network threats but also makes a positive contribution to the stable operation of data centers and the reliability of information services.

    Aug. 26, 2025
  • Vol. 51 Issue 8 227 (2025)
  • XIAO Ke, LIU Ying, HE Yunhua, XU Gang, and WANG Chao

    With the energy industry's digital transformation, energy blockchains play an important role in the storage and retrieval of energy data. However, energy information data are diverse, informative, and involve trade secrets and sensitive information of market participants. Thus, it is challenging for large-scale blockchain systems to increase their unit storage loads while protecting data privacy. In future applications, the storage model and secure retrieval of energy data will be the main issues limiting the development of energy blockchains. Therefore, an on-chain and off-chain secure retrieval scheme based on an energy blockchain is proposed in this paper, which utilizes on-chain and off-chain collaborative storage technology to reduce the storage overhead of on-chain data and a multi-chain collaborative privacy-preserving architecture to achieve interoperability and sharing of different energy data. This scheme designs encrypted lookup tables as the internal data storage structure of energy blockchain and sets flag bits to achieve the retrieval of energy data from different sources; a one-to-n lookup table is designed as the storage structure of encrypted data in the cloud, which breaks the one-to-one relationship between the query index and query object in the traditional retrieval process, thus further protecting the privacy and security of data, and achieving the energy data information. The secure retrieval of energy data information is achieved. Experimental results show that the proposed scheme is feasible, reliable, and efficient.

    Aug. 26, 2025
  • Vol. 51 Issue 8 238 (2025)
  • FANG Yonghao, YAO Zhongyuan, LI Min, and SI Xueming

    Distributed power trading is emerging as a future trend in power-energy transactions. Blockchain, by leveraging its technological characteristics, provides a solution to the issues of lack of regulatory mechanisms, high transaction costs, and unclear information rules in distributed power trading. However, as the scale of distributed power trading gradually increases, the throughput of blockchain systems decreases, indirectly limiting the transaction speed of distributed power trading. To address this issue, this paper proposes an efficient and secure blockchain consensus algorithm tailored to distributed power trading. The algorithm is based on the historical transaction characteristics of nodes in the distributed power trading network, using clustering algorithms to organize the consensus network into a dual-layer network structure with multiple consensus sets and employing a dual-layer consensus process to enhance consensus parallelism. Simultaneously, an efficient leader-node election strategy within a single consensus set is designed, allowing for the rapid selection of high-performance leaders. Finally, an authentication method combining zero-knowledge proofs and key sharing is introduced to further reduce the likelihood of malicious nodes participating in the consensus. The experimental results show that the anti-Byzantine node count of the proposed consensus algorithm can resist various blockchain attacks such as double flower attacks, significantly reduce consensus communication overhead and latency, and effectively improve system throughput.

    Aug. 26, 2025
  • Vol. 51 Issue 8 250 (2025)
  • JI Lixia, ZHOU Hongxin, XIAO Shijie, CHEN Yunfeng, and ZHANG Han

    Generative diffusion models can learn to generate data. They progressively denoise and generate new data samples based on input Gaussian noise; therefore, they are widely applied in the field of image generation. Recently, the inductive bias provided by the U-Net backbone used in diffusion models has been revealed to be non-critical, and the Transformer can be adopted as the backbone network to inherit the latest advancements from other domains. However, introducing the Transformer increases the model size and slows the training. To address the issues of slow training and inadequate image detail associated with diffusion models utilizing the Transformer backbone, this paper introduces a diffusion model based on a neighborhood attention architecture. This model incorporates a Transformer backbone network with neighborhood attention, utilizes the sparse global attention pattern of the neighborhood attention mechanism, which exponentially expands the model′s perception range of images, and focuses on global information at a lower cost. By employing progressive expansion in the attention expansion layer, more visual information is captured during model training, resulting in images with better global aspects. Experimental results demonstrate that this design provides better global consistency, yields superior global details in the generated images, and outperforms current State-Of-The-Art (SOTA) models.

    Aug. 26, 2025
  • Vol. 51 Issue 8 262 (2025)
  • HAO Hongda, and LUO Jianxu

    Deep learning has been widely applied to medical imaging. A medical image segmentation model based on an attention mechanism is one of the main methods used in current research. For the multi-organ segmentation task, most existing 2D segmentation models mainly focus on the overall segmentation effect of slices, while ignoring the loss or under-segmentation of small object feature information in slices, which limits the model′s segmentation performance. To solve this problem, this study proposes a multi-organ semantic segmentation model, DASC-Net, based on multi-scale feature fusion and an improved attention mechanism. The overall framework of the DASC-Net is based on an encoder-decoder architecture. The encoder uses the ResNet 50 network and sets a skip connection with the decoder. The attention mechanism is realized using the parallel structure of a Dual Attention Module (DAM) and a Small Object Capture (SOC) module to perform multi-scale regional feature fusion. DASC-Net not only perceives the feature information of larger objects but also retains the feature information of small objects through attention weight reconstruction, which effectively addresses the limitations of the attention module and further improves the segmentation performance of the model. The experimental results on the CHAOS dataset show that DASC-Net can obtain 83.72%, 75.79%, 87.75%, 85.63% and 77.60% on the Sensitivity, Jaccard similarity coefficient, Positivity Predictive Value (PPV), Dice similarity coefficient, and mean Intersection over Union (mIoU) indicators, respectively; the Dice similarity coefficient and 95% Hausdorff Distance (HD95) values on the Synapse dataset are 82.44% and 21.25 mm, respectively. DASC-Net performs better than the other segmentation networks on both datasets, which demonstrates its reliable and accurate segmentation performance.

    Aug. 26, 2025
  • Vol. 51 Issue 8 270 (2025)
  • NI Yuansong, HAN Jun, ZOU Xiaoyan, HU Guangyi, and WANG Wenshuai

    In power systems, the stability and reliability of transmission lines are crucial. Bolts, as key components for connecting and fixing the main body of the lines, play a decisive role in maintaining the stability of a power system. However, during the inspection of transmission lines, detecting defects in these bolts using vision-based methods becomes particularly difficult because of the small proportion, uneven distribution, and indistinct features of bolts in the inspection images. To address these issues, an adaptive block detection method consisting of two stages is designed. In the first stage, an improved target density distribution map generation network is employed to predict a target density distribution map containing the approximate size and distribution information of the targets. This network is composed of RepODconv convolution blocks based on parameter reconstruction and multidimensional dynamic convolution technology, which effectively controls the model's parameter quantity while enhancing the network's attention to small-sized targets. Subsequently, a clustering block algorithm is designed to obtain fixed-size and unscaled block-area images based on this target density distribution map. In the second stage, the YOLOX model combined with self-attention modules is employed to detect these images, enhancing the network's discrimination ability for defects of different categories. Experimental results on a dataset of transmission line bolt inspections by unmanned aerial vehicles show that the recall rate and precision of majority-class defects reach 70%. Compared to the experimental results of current advanced detection networks, the Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) is improved by approximately 30%, mean Average Precision (mAP) is improved by approximately 70%, and the mAP of small targets is improved by approximately 2 times.

    Aug. 26, 2025
  • Vol. 51 Issue 8 281 (2025)
  • MIAO Ru, LI Yi, ZHOU Ke, ZHANG Yanna, CHANG Ranran, and MENG Geng

    The complex backgrounds, diverse target types, and significant scale variations in remote sensing images lead to target omission and false detection. To address these issues, this study proposes an improved Faster R-CNN multi-object detection model. First, the ResNet 50 backbone network is replaced with the Swin Transformer to enhance the model's feature extraction capability. Second, a Balanced Feature Pyramid (BFP) module is introduced to fuse shallow and deep semantic information, further strengthening the feature fusion effect. Finally, in the classification and regression branches, a dynamic weighting mechanism is incorporated to encourage the network to focus more on high-quality candidate boxes during training, thereby improving the precision of target localization and classification. The experimental results on the RSOD dataset show that the proposed model significantly reduces the number of Floating-Point Operations per second (FLOPs) compared to the Faster R-CNN model. The proposed model achieves 10.7 percentage points improvement in mAP@0.5∶0.95 and 10.6 percentage points increase in Average Recall (AR). Compared to other mainstream detection models, the proposed model achieves higher accuracy while reducing the false detection rate. These results indicate that the proposed model significantly enhances detection accuracy in remote sensing images with complex backgrounds.

    Aug. 26, 2025
  • Vol. 51 Issue 8 292 (2025)
  • WANG Hao, AI Kecheng, and ZHANG Quanyi

    In weak-texture environments, the current monocular visual-inertial Simultaneous Localization and Mapping (SLAM) suffers from visual degradation and error drift, leading to decreased accuracy in pose estimation. To address this issue, a monocular visual-inertial SLAM method is proposed based on feature collaboration. Initially, the Inertial Measurement Unit (IMU) data is pre-integrated, and a loosely coupled initialization with visual information is performed to obtain prior information and scale information of the system. Subsequently, a line feature extraction algorithm is introduced to optimize extracted line features, therefore reducing computational overhead. Based on positional relationships and geometric characteristics of point and line features, a feature collaborative association algorithm is employed to establish stable association constraints between point and line features, thereby enhancing the reliability of point feature tracking. Finally, a joint cost function optimization method based on multi-source information fusion is introduced to optimize point feature reprojection errors, line feature reprojection errors, and IMU residuals, resulting in improved pose estimation accuracy. Experimental results on the EuRoc and TUM Ⅵ public datasets, as well as in real environments, demonstrate that compared to mainstream visual-inertial SLAM methods, the proposed method reduces the average time consumption of line feature detection and tracking by 26.5%. Additionally, the root mean square error of pose estimation is reduced by an average of 38.6% and 43%. These findings validate that the proposed method achieves superior pose estimation accuracy in weak-texture environments.

    Aug. 26, 2025
  • Vol. 51 Issue 8 305 (2025)
  • TIAN Shuangyan, CHEN Yanli, and ZHOU Yonghui

    With the widespread use of short video sharing platforms and editing software, video copyright have become an urgent issue that needs to be addressed. To overcome this problem, robust video-watermarking methods have been proposed. However, under geometric attack conditions, their robustness significantly decreases because some watermarks are destroyed; therefore, finding a stable feature against a geometric attack is a challenge. This study proposes a robust video watermarking algorithm based on Non-Subsampled Contourlet Transform (NSCT)-Polar Harmonic Fourier Moments (PHFMs), utilizing the insensitivity of the NSCT to noise and the rotational invariance of PHFMs. Using the proposed Harris-Laplace feature point closed disk Region Of Interest (ROI) selection method, the ROI of the video keyframes is extracted as the watermark embedding domain. Further, NSCT is applied to the closed-disk ROI, and the PHFMs of the low-frequency sub-bands for the NSCT domain are calculated to embed the watermarks with quantization index modulation. The experimental results show that the algorithm can effectively resist different types of attacks, such as Joint Photographic Experts Group (JPEG) compression, MPEG-2 compression, salt and pepper noise, Gaussian noise, cropping, rotation, and flipping, and that the carrier has a small distortion while ensuring the watermark's invisibility.

    Aug. 26, 2025
  • Vol. 51 Issue 8 317 (2025)
  • CHEN Xiaolei, and WANG Rong

    This paper proposes a point cloud completion network based on multi-branch multi-scale feature fusion because the existing networks cannot extract high-quality global and local features of point clouds simultaneously and lose point cloud detail and coordinate information. The novelty of this network is its hierarchical progressive feature extraction and fusion mechanism. In the encoding stage, the proposed network first uses the Joint Feature Extraction Module (JFEM) to perform multi-scale feature learning using the input point cloud data of three different resolutions and successively extracts global features containing rich semantic information and fine local features to maximize the retention of key information. Subsequently, the Detail-Preserving Pooling (DP-Pool) module is used for reducing the dimensions of the features to avoid the loss of detail caused by traditional pooling operations. The multi-branch encoding structure is combined to achieve efficient fusion of global and local features, ensuring that features of different scales can complement each other. In the decoding stage, the network gradually restores the geometric structure of the point cloud via the Point Cloud Reconstruction (PCR) module, uses the multi-branch decoding structure to finely upsample the features at different levels, and generates a high-fidelity, high-density completed point cloud. Experimental results show that the performance of the proposed network is better than those of the top 10 advanced point cloud completion networks and can further improve the quality of point cloud completion.

    Aug. 26, 2025
  • Vol. 51 Issue 8 330 (2025)
  • LIU Juan, HU Xuelian, ZHU Meirong, FENG Menglan, and WEN Suyue

    The rapid development of Artificial Intelligence-Generated Content (AIGC) has expanded new ways for the development of virtual teachers. Exploring the method of virtual teachers in human-computer co-creation of teaching animation supported by AIGC technology and its application effect can provide theoretical explanations and behavioral evidence for teachers and researchers to intelligently generate virtual teachers development and teaching applications that promote the learning effect of animation with the aim of providing references for AIGC-enabling teaching. This study reviews the theoretical basis and related research on virtual teachers and teaching animation to promote learning, constructs human-computer co-creation model of virtual teachers in teaching animation supported by AIGC, and uses the AIGC platform to create teaching animations with realistic, hyper-realistic, and cartoon-based virtual teachers. Then, with college students as subjects and the teaching animations of real human teachers as the control group, a single-factor, four-level intersubject design is adopted to conduct a comparative experiment on the learning effects of the three types of teaching animations with virtual and real teachers through behavior, subjective reports, and eye tracker measurements. The results show that AIGC technology can improve the efficiency of human-computer co-creation of virtual teachers and provide efficient technical support for perfecting the personality development of virtual teachers. The cartoon virtual teacher has the best learning effect through consistent simplification and teaching of animation style. Although the study did not observe any difference in the learning performance of each group, different virtual teachers can influence the learning effect by affecting the learners′ motivation, cognitive load, and attention allocation of animation learning.

    Aug. 26, 2025
  • Vol. 51 Issue 8 341 (2025)
  • GAO Ang, WANG Yinshan, YAN Wen, SONG Changcheng, WANG Long, and YAO Erlin

    The HPL-MxP benchmark program is widely used for measuring the computational power of supercomputers in mixed-precision computing. Subject to the parallel implementation algorithm of this program, the selection of the matrix Numerical Block (NB) value of the matrix block size is a tradeoff problem that must consider matrix multiplication efficiency and load balancing. To solve this problem, this paper presents an optimization study on the Kunpeng 920 system and proposes a multi-level lookahead optimization strategy: small NB values are used for matrix chunking to achieve better load balancing, and equivalent NB values are improved by merging multiple rounds of matrix multiplication updates to achieve load balancing and high matrix multiplication efficiency. To realize a multi-level lookahead optimization scheme, this study reconstructs the Panel storage mode, designs a fine-grained computing and communication pipeline, and expands the HPL-MxP source program interface. A single-double precision hybrid test on the Kunpeng 920 multi-node platform shows that HPL-MxP can effectively solve the trade-off problem of NB values under multi-level lookahead optimization and does not incur significant additional overhead compared with the single-level lookahead strategy.

    Aug. 26, 2025
  • Vol. 51 Issue 8 354 (2025)
  • LIN Fan, and LI Jianhua

    In the field of Optical Chemical Structure Recognition (OCSR), current deep-learning-based models predominantly utilize Convolutional Neural Networks (CNNs) or Vision Transformers for visual feature extraction and Transformers for sequence decoding. Although these models are effective, they are still limited by their ability to extract image features and the accuracy of position encoding during decoding, which affect the recognition efficiency. In response to these limitations, this study uses an encoder-decoder architecture composed of a Multiorder gated aggregation Network (MogaNet) and a Transformer, which introduces relative positional encoding in the OCSR field, and proposes an optical chemical structure recognition model based on MogaNet. First, the model captures multiscale features, reduces feature redundancy using the MogaNet spatial aggregation module during image feature extraction, and improves channel dimension diversity using the MogaNet channel aggregation module. Second, during sequence decoding, a Transformer with relative positional encoding is used as the decoder to accurately capture the relative positional relationships between words. To train and validate this model, a chemical structure dataset containing 400 000 molecular structures is constructed, which includes both Markush and non-Markush structures. Experimental results demonstrate that the model achieves an accuracy of 92.36%, outperforming other models.

    Aug. 26, 2025
  • Vol. 51 Issue 8 364 (2025)
  • LÜ, HU Lang, LIANG Weinan, LI Guangli, and ZHANG Hongbin

    COVID-19 is an illness caused by a strain of the novel coronavirus. Existing COVID-19 imaging diagnostic models face challenges such as the lack of high-quality samples and insufficient exploration of inter-sample relationships. This paper proposes a novel model called Attention Distillation Contrastive Mutual Learning (ADCML) for COVID-19 diagnosis, to address these two issues. First, a progressive data augmentation strategy is constructed, which includes AutoAugment and sample filtering, and the lack of quality samples is proactively addressed by expanding the number of images and ensuring their quality. Second, the ADCML framework is built, which employs attention distillation to motivate two heterogeneous networks to learn from each other the pathological knowledge concerned with their attention. The implicit contrastive relationships among the diverse samples are then fully mined to improve the discriminative ability of the extracted features. Finally, a new adaptive model-fusion module is designed to fully mine the complementarity between the heterogeneous networks and complete the COVID-19 image diagnosis. The proposed model is validated on three publicly available datasets-including Computed Tomography (CT) and X-ray images-with accuracies of 89.69%, 98.16%, and 98.91%; F1 values of 88.62%, 97.58%, and 98.47%; and Area Under the Curve (AUC) values of 88.95%, 97.77%, and 98.90%, respectively. These results show that the ADCML model outperforms the mainstream baselines and has strong robustness, and that progressive data augmentation, attention distillation, and contrastive mutual learning form a type of joint force that promotes the final classification performance.

    Aug. 26, 2025
  • Vol. 51 Issue 8 373 (2025)
  • WANG Yue, XIE Mangang, and WANG Yaping

    Age of Information (AoI) and Energy Cost (EC) are two significant performance metrics for real-time status update of Internet of Things systems to characterize information freshness and energy efficiency. This study is oriented toward a vehicular network system for short packet transmission, utilizing free-ride codes to achieve synchronized transmission of payload and extra data. The extra data encoded with random codes are superimposed on the Low-Density Parity-Check (LDPC) coded payload data to achieve one-step transmission of heterogeneous data, consuming neither extra transmission power nor extra bandwidth resources. In this scheme, the average AoI and EC expressions are derived for preemptive and blocking schemes with a truncated automatic repeat-request protocol. Numerical simulations show that free-ride codes can not only reduce the average AoI of extra data but also have a negligible impact on the AoI of payload data, all while requiring no additional EC. Moreover, a comparison with the blocking scheme suggests that the pre-emptive scheme obtains a smaller average AoI but a larger average EC.

    Aug. 26, 2025
  • Vol. 51 Issue 8 383 (2025)
  • GAO Qingxin, LIU Cong, ZHANG Zaigui, GUO Na, SU Xuan, and ZENG Qingtian

    As a pivotal technology driving organization digital transformation, Robotic Process Automation (RPA) has garnered significant attention from both the academic and industrial sectors in recent years. However, current deployment strategies suffer from a lack of process analysis, leading to misguided deployment of RPA robots and resource wastage. Furthermore, existing RPA robots deployment methods based on process mining depend overly on domain-specific expertise, limiting their generality. To address these challenges, this study proposes the integration of process mining with RPA robots and presents a deployment method for RPA robots based on process mining. The method is initiated by introducing an approach to mine the global process model from event logs and extracting a Time Petri net containing temporal information. Subsequently, critical process paths are identified using a method designed to recognize key process paths. Finally, an optimization deployment strategy for RPA robots is introduced, which determines the optimal deployment node set considering the time and cost constraints. The proposed method is implemented using ProM, an open-source process mining tool platform. It is compared with four deployment methods in experiments that focus on improving time efficiency. The experimental results indicate that, compared to other deployment methods, this approach results in a time efficiency improvement ranging from 22% to 41%, and the deployment accuracy reaches 1, without relying on domain-specific expert knowledge, validating its generality and accuracy.

    Aug. 26, 2025
  • Vol. 51 Issue 8 396 (2025)
  • YAN Jianhong, LIU Zhiyan, and WANG Zhen

    Vehicle trajectory prediction is a crucial component of autonomous driving systems, and improving its reliability and accuracy greatly enhances the safety of autonomous driving. Considering the influence of traffic conditions on vehicle movement, this study focuses on traffic environmental factors such as neighboring vehicle motion and relative spatial positions. Building on the Long Short-Term Memory (LSTM) network encoder-decoder model, a spatiotemporal attention mechanism is introduced. Temporal-level attention focuses on the historical trajectories of the target and neighboring vehicles, whereas spatial attention focuses on the relative spatial positions of the vehicles. Additionally, to enhance feature extraction and achieve a more comprehensive feature fusion, multi-scale convolutional social pooling is utilized to increase the receptive field and integrate multi-scale features. By combining these two aspects, this study proposes a vehicle trajectory prediction model called MCS-STA-LSTM, which incorporates the LSTM encoder-decoder architecture, multi-scale convolutional social pooling, and a spatiotemporal attention mechanism. This model learns the interdependencies of vehicle movements to obtain multi-modal prediction distributions of future trajectories for a target vehicle based on maneuver categories. The model is trained, validated, and tested on the publicly available NGSIM dataset. Several comparative experiments demonstrate that the MCS-STA-LSTM model achieves an average Root Mean Square Error (RMSE) reduction of 9.35% within 3 s and 5.53% within 5 s when compared to other trajectory prediction models. These results indicate an improved trajectory prediction accuracy, highlighting the model's advantage in medium- and short-term predictions.

    Aug. 26, 2025
  • Vol. 51 Issue 8 406 (2025)
  • Please enter the answer below before you can view the full text.
    Submit