Laser & Optoelectronics Progress, Volume. 62, Issue 16, 1615001(2025)
Super-Resolution Reconstruction and Denoising Tasks for Public Safety Scene Images Using the EnSwinIR Model
Super-resolution reconstruction and denoising models facilitate the processing of low-resolution and noisy images in public safety scenarios. While the SwinIR model has achieved notable progress in preserving image details and modeling features in complex scenes, it is still limited by gradient optimization stability, local feature extraction, and computational cost. To address above challenges, our study proposes a hybrid residual attention module-based image super-resolution reconstruction model (EnSwinIR). During training, a perceptual loss function NormRMSE is designed to address the sensitivity of the original mean square error being affected by the absolute size of data pixels. By applying normalization and square root processing, the function enhances stability and learning efficiency. In the local feature extraction module, a four-directional shift convolution is introduced, including up, down, left, and right shifts. This approach reconstructs feature channels through displacement operations, thereby capturing multidirectional contextual information. In addition, a skip connection design combined with a residual module effectively mitigates the gradient vanishing problem often encountered in deep feature extraction. A grouped multi-scale self-attention method is incorporated in the later stages. Input features are evenly divided by channel count, and multi-scale sliding windows are implemented to string an optimal balance between performance, parameter count, and computational complexity. The experimental results indicate that the EnSwinIR model significantly outperforms existing approaches in terms of performance metrics and visual perception for super-resolution reconstruction tasks. For 2× and 4× super-resolution reconstruction, based on multisource testing scenarios, the model achieves an average increase in peak signal-to-noise ratio of 1.9 dB and 2.9 dB, respectively. Furthermore, the average structural similarity index improves by 0.029 and 0.050, respectively. The model also exhibits a reduction in complexity, with the number of parameters decreasing by 27.52% and 27.26% for 2× and 4× tasks, respectively, while the number of floating-point operations per second dropped by 30.72% and 35.65%, respectively. The model demonstrates notable improvements in multisource testing scenarios for the denoising tasks targeting images with noise levels of 15, 25, and 50. The average peak signal-to-noise ratio of the model increases by 3.77, 3.24, and 3.81 dB, respectively. Thus, the images processed by the EnSwinIR model exhibit a more realistic visual appearance and better preservation of local details, thereby demonstrating its potential for application in public safety scenarios.
Get Citation
Copy Citation Text
Qixiang Meng, Fanliang Bu, Qiqi Kou. Super-Resolution Reconstruction and Denoising Tasks for Public Safety Scene Images Using the EnSwinIR Model[J]. Laser & Optoelectronics Progress, 2025, 62(16): 1615001
Category: Machine Vision
Received: Dec. 5, 2024
Accepted: Feb. 7, 2025
Published Online: Aug. 18, 2025
The Author Email: Fanliang Bu (20051257@ppsuc.edu.cn)
CSTR:32186.14.LOP242377