Uncertainty-awared generative network for Chinese scene text editing

GAO Yutong; ZHANG Ying; LIU Xianggan; LIU Yidian; JIANG Shan; GUO Ziyi; SONG Feifan

doi:10.12202/j.0476-0301.2025056

Journal of Beijing Normal University, Volume. 61, Issue 3, 285(2025)

Uncertainty-awared generative network for Chinese scene text editing

GAO Yutong^1,2, ZHANG Ying¹, LIU Xianggan^3,4、*, LIU Yidian⁵, JIANG Shan⁵, GUO Ziyi⁵, and SONG Feifan⁶

¹Key Laboraory of Ethnic Language Intelligent Analysis and Security Governance, Ministry of Education, Minzu University of China, Beijing, China

²Key Laboratory of Big Data and Artificial Intelligence in Transportation, Ministry of Education, Beijing Jiaotong University, Beijing, China

³Natural Language Processing and Knowledge Graph Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei, China

⁴Hainan Lingshui Li'an International Education Innovation Pilot Zone, Lingshui, Hainan, China

⁵State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China

⁶Research Center for Rural Financial Reform, Changchun, Jilin, China

show less

Abstract Get PDF(in Chinese)

References(20)

[1] [1] CHEN J Y, HUANG Y P, LYU T C, et al. Textdiffuser: Diffusion models as text painters[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper_files/paper/2023/file/1df4afb0b4ebf492a41218ce16b6d8df-Paper-Conference.pdf

[2] [2] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139

[3] [3] DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[EB/OL]. (2021-06-01) [2025-03-29]. https://arxiv.org/abs/2105.05233v4

[6] [6] LEE J, KIM Y, KIM S, et al. RewriteNet: realistic scene text image generation via editing text in real-world image[EB/OL]. [2025-03-29]. https://www.academia.edu/download/91842572/2107.11041v1.pdf

[7] [7] GU J X, MENG X J, LU G S, et al. Wukong: a 100 million large-scale Chinese cross-modal pre-training benchmark[EB/OL]. (2021-06-01) [2025-03-29]. https://arxiv.org/abs/2202.06767v4

[8] [8] GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL]. (2022-08-02) [2025-03-29]. https://arxiv.org/abs/2208.01618v1

[9] [9] JI J B, ZHANG G H, WANG Z W, et al. Improving diffusion models for scene text editing with dual encoders[EB/OL]. (2023-04-12) [2025-03-29]. https://arxiv.org/abs/2304.05568v1

[10] [10] HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

[11] [11] ZHANG L M, RAO A Y, AGRAWALA M. Adding conditional control to text-to-image diffusion models[C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV). October 1-6, 2023. Paris: IEEE, 2023: 3813

[12] [12] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022. New Orleans: IEEE, 2022: 10674

[13] [13] YANG Y K, GUI D N, YUAN Y H, et al. Glyphcontrol: glyph conditional control for visual text generation[EB/OL]. (2023-11-11) [2025-03-29]. 2023: 2305.18259. https://arxiv.org/abs/2305.18259v2

[14] [14] BALAJI Y, NAH S J, HUANG X, et al. Ediff-I: text-to-image diffusion models with an ensemble of expert denoisers[EB/OL]. (2023-05-14) [2025-03-29]. https://arxiv.org/pdf/2211.01324

[15] [15] CHEN H X, XU Z E, GU Z X, et al. Diffute: universal text editing diffusion model[EB/OL]. [2025-03-29]. https://proceedings.neurips.cc/paper_files/paper/2023/file/c7138635035501eb71b0adf6ddc319d6-Paper-Conference.pdf

[16] [16] ZENG W C, SHU Y, LI Z H, et al. Textctrl: diffusion-based scene text editing with prior guidance control[EB/OL]. (2024-10-14) [2025-03-29]. https://arxiv.org/abs/2410.10133v1

[17] [17] WANG T, QU X C, LIU T. Textmastero: mastering high-quality scene text editing in diverse languages and styles[EB/OL]. (2024-08-20) [2025-03-29]. https://arxiv.org/abs/2408.10623v1

[18] [18] CHANG H W, ZHANG H, BARBER J, et al. Muse: text-to-image generation via masked generative transformers[EB/OL]. (2023-01-02) [2025-03-29]. https://arxiv.org/abs/2301.00704v1

[19] [19] MA J, ZHAO M J, CHEN C, et al. Glyphdraw: seamlessly rendering text with intricate spatial structures in text-to-image generation[EB/OL]. (2023-05-23) [2025-03-29]. https://arxiv.org/abs/2303.17870v2

[20] [20] RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[M]//Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Cham: Springer International Publishing, 2015: 234

[21] [21] TUO Y X, XIANG W M, HE J Y, et al. Anytext: multilingual visual text generation and editing[EB/OL]. (2024-02-21) [2025-03-29]. https://arxiv.org/abs/2311.03054v5

[22] [22] SCHUHMANN C, VENCU R, BEAUMONT R, et al. LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs[EB/OL]. (2021-11-03) [2025-03-29]. https://arxiv.org/abs/2111.02114v1

Tools

Get Citation

Copy Citation Text

GAO Yutong, ZHANG Ying, LIU Xianggan, LIU Yidian, JIANG Shan, GUO Ziyi, SONG Feifan. Uncertainty-awared generative network for Chinese scene text editing[J]. Journal of Beijing Normal University, 2025, 61(3): 285

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Apr. 9, 2025

Accepted: Aug. 21, 2025

Published Online: Aug. 21, 2025

The Author Email: LIU Xianggan (liuxg@hust.edu.cn)

DOI:10.12202/j.0476-0301.2025056

Topics