Computer Engineering, Volume. 51, Issue 8, 262(2025)
A Research on Training Method for Diffusion Model Based on Neighborhood Attention
Generative diffusion models can learn to generate data. They progressively denoise and generate new data samples based on input Gaussian noise; therefore, they are widely applied in the field of image generation. Recently, the inductive bias provided by the U-Net backbone used in diffusion models has been revealed to be non-critical, and the Transformer can be adopted as the backbone network to inherit the latest advancements from other domains. However, introducing the Transformer increases the model size and slows the training. To address the issues of slow training and inadequate image detail associated with diffusion models utilizing the Transformer backbone, this paper introduces a diffusion model based on a neighborhood attention architecture. This model incorporates a Transformer backbone network with neighborhood attention, utilizes the sparse global attention pattern of the neighborhood attention mechanism, which exponentially expands the model′s perception range of images, and focuses on global information at a lower cost. By employing progressive expansion in the attention expansion layer, more visual information is captured during model training, resulting in images with better global aspects. Experimental results demonstrate that this design provides better global consistency, yields superior global details in the generated images, and outperforms current State-Of-The-Art (SOTA) models.
Get Citation
Copy Citation Text
JI Lixia, ZHOU Hongxin, XIAO Shijie, CHEN Yunfeng, ZHANG Han. A Research on Training Method for Diffusion Model Based on Neighborhood Attention[J]. Computer Engineering, 2025, 51(8): 262
Category:
Received: Nov. 8, 2023
Accepted: Aug. 26, 2025
Published Online: Aug. 26, 2025
The Author Email: ZHANG Han (zhang_han@gs.zzu.edu.cn)