A Research on Training Method for Diffusion Model Based on Neighborhood Attention

JI Lixia; ZHOU Hongxin; XIAO Shijie; CHEN Yunfeng; ZHANG Han

doi:10.19678/j.issn.1000-3428.0068793

Computer Engineering, Volume. 51, Issue 8, 262(2025)

A Research on Training Method for Diffusion Model Based on Neighborhood Attention

JI Lixia^1,2, ZHOU Hongxin¹, XIAO Shijie¹, CHEN Yunfeng³, and ZHANG Han^1、*

Author Affiliations

¹School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450000, Henan, China

²College of Software Engineering, Sichuan University, Chengdu 610065, Sichuan, China

³Henan Cocyber Information and Technology Co., Ltd., Zhengzhou 450000, Henan, China

show less

Abstract Get PDF(in Chinese)

Generative diffusion models can learn to generate data. They progressively denoise and generate new data samples based on input Gaussian noise; therefore, they are widely applied in the field of image generation. Recently, the inductive bias provided by the U-Net backbone used in diffusion models has been revealed to be non-critical, and the Transformer can be adopted as the backbone network to inherit the latest advancements from other domains. However, introducing the Transformer increases the model size and slows the training. To address the issues of slow training and inadequate image detail associated with diffusion models utilizing the Transformer backbone, this paper introduces a diffusion model based on a neighborhood attention architecture. This model incorporates a Transformer backbone network with neighborhood attention, utilizes the sparse global attention pattern of the neighborhood attention mechanism, which exponentially expands the model′s perception range of images, and focuses on global information at a lower cost. By employing progressive expansion in the attention expansion layer, more visual information is captured during model training, resulting in images with better global aspects. Experimental results demonstrate that this design provides better global consistency, yields superior global details in the generated images, and outperforms current State-Of-The-Art (SOTA) models.

Keywords

diffusion model generative model image generation neighborhood attention Transformer backbone network

Tools

Get Citation

Copy Citation Text

JI Lixia, ZHOU Hongxin, XIAO Shijie, CHEN Yunfeng, ZHANG Han. A Research on Training Method for Diffusion Model Based on Neighborhood Attention[J]. Computer Engineering, 2025, 51(8): 262

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Category:

Received: Nov. 8, 2023

Accepted: Aug. 26, 2025

Published Online: Aug. 26, 2025

The Author Email: ZHANG Han (zhang_han@gs.zzu.edu.cn)

DOI:10.19678/j.issn.1000-3428.0068793

Topics