Journal of Beijing Normal University, Volume. 61, Issue 3, 285(2025)
Uncertainty-awared generative network for Chinese scene text editing
[1] [1] CHEN J Y, HUANG Y P, LYU T C, et al. Textdiffuser: Diffusion models as text painters[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper_files/paper/2023/file/1df4afb0b4ebf492a41218ce16b6d8df-Paper-Conference.pdf
[2] [2] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139
[3] [3] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[EB/OL]. (2021-06-01) [2025-03-29]. https://arxiv.org/abs/2105.05233v4
[6] [6] LEE J, KIM Y, KIM S, et al. RewriteNet: realistic scene text image generation via editing text in real-world image[EB/OL]. [2025-03-29]. https://www.academia.edu/download/91842572/2107.11041v1.pdf
[7] [7] GU J X, MENG X J, LU G S, et al. Wukong: a 100 million large-scale Chinese cross-modal pre-training benchmark[EB/OL]. (2021-06-01) [2025-03-29]. https://arxiv.org/abs/2202.06767v4
[8] [8] GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL]. (2022-08-02) [2025-03-29]. https://arxiv.org/abs/2208.01618v1
[9] [9] JI J B, ZHANG G H, WANG Z W, et al. Improving diffusion models for scene text editing with dual encoders[EB/OL]. (2023-04-12) [2025-03-29]. https://arxiv.org/abs/2304.05568v1
[10] [10] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf
[11] [11] ZHANG L M, RAO A Y, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). October 1-6, 2023. Paris: IEEE, 2023: 3813
[12] [12] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022. New Orleans: IEEE, 2022: 10674
[13] [13] YANG Y K, GUI D N, YUAN Y H, et al. Glyphcontrol: glyph conditional control for visual text generation[EB/OL]. (2023-11-11) [2025-03-29]. 2023: 2305.18259. https://arxiv.org/abs/2305.18259v2
[14] [14] BALAJI Y, NAH S J, HUANG X, et al. Ediff-I: text-to-image diffusion models with an ensemble of expert denoisers[EB/OL]. (2023-05-14) [2025-03-29]. https://arxiv.org/pdf/2211.01324
[15] [15] CHEN H X, XU Z E, GU Z X, et al. Diffute: universal text editing diffusion model[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper_files/paper/2023/file/c7138635035501eb71b0adf6ddc319d6-Paper-Conference.pdf
[16] [16] ZENG W C, SHU Y, LI Z H, et al. Textctrl: diffusion-based scene text editing with prior guidance control[EB/OL]. (2024-10-14) [2025-03-29]. https://arxiv.org/abs/2410.10133v1
[17] [17] WANG T, QU X C, LIU T. Textmastero: mastering high-quality scene text editing in diverse languages and styles[EB/OL]. (2024-08-20) [2025-03-29]. https://arxiv.org/abs/2408.10623v1
[18] [18] CHANG H W, ZHANG H, BARBER J, et al. Muse: text-to-image generation via masked generative transformers[EB/OL]. (2023-01-02) [2025-03-29]. https://arxiv.org/abs/2301.00704v1
[19] [19] MA J, ZHAO M J, CHEN C, et al. Glyphdraw: seamlessly rendering text with intricate spatial structures in text-to-image generation[EB/OL]. (2023-05-23) [2025-03-29]. https://arxiv.org/abs/2303.17870v2
[20] [20] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[M]//Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Cham: Springer International Publishing, 2015: 234
[21] [21] TUO Y X, XIANG W M, HE J Y, et al. Anytext: multilingual visual text generation and editing[EB/OL]. (2024-02-21) [2025-03-29]. https://arxiv.org/abs/2311.03054v5
[22] [22] SCHUHMANN C, VENCU R, BEAUMONT R, et al. LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs[EB/OL]. (2021-11-03) [2025-03-29]. https://arxiv.org/abs/2111.02114v1
Get Citation
Copy Citation Text
GAO Yutong, ZHANG Ying, LIU Xianggan, LIU Yidian, JIANG Shan, GUO Ziyi, SONG Feifan. Uncertainty-awared generative network for Chinese scene text editing[J]. Journal of Beijing Normal University, 2025, 61(3): 285
Received: Apr. 9, 2025
Accepted: Aug. 21, 2025
Published Online: Aug. 21, 2025
The Author Email: LIU Xianggan (liuxg@hust.edu.cn)