A code-switching-based approach for low-resource language visual question answering

LIU Zheng; DONG Jun; JIALE Dongzhu; CHAOMU Rilige; LIU Xuan; WENG Yu

doi:10.12202/j.0476-0301.2025054

Journal of Beijing Normal University, Volume. 61, Issue 3, 277(2025)

A code-switching-based approach for low-resource language visual question answering

LIU Zheng^1,2, DONG Jun², JIALE Dongzhu², CHAOMU Rilige^1,2, LIU Xuan^1,2, and WENG Yu^1,2、*

¹Key Laboraory of Ethnic Language Intelligent Analysis and Security Governance, Ministry of Education, Minzu University of China, Beijing, China

²Information Engineering School, Minzu University of China, Beijing, China

show less

Abstract Get PDF(in Chinese)

References(15)

[1] [1] MIN B N, ROSS H, SULEM E, et al. Recent advances in natural language processingvialarge pre-trained language models: a survey[J]. ACM Computing Surveys, 2024, 56(2): 1

[2] [2] BAYOUDH K. A survey of multimodal hybrid deep learning for computer vision: architectures, applications, trends, and challenges[J]. Information Fusion, 2024, 105: 102217

[3] [3] ZITNICK C L, AGRAWAL A, ANTOL S, et al. Measuring machine intelligence through visual question answering[J]. AI Magazine, 2016, 37(1): 63

[4] [4] CAO Q X, LI B L, LIANG X D, et al. Knowledge-routed visual question reasoning: challenges for deep representation embedding[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(7): 2758

[5] [5] SHEN Y, YANG M, LI Y L, et al. Knowledge-based reasoning network for relation detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 5051

[6] [6] YU Z, YU J, XIANG C C, et al. Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 5947

[7] [7] ZHANG H, TIAN L, WANG Z J, et al. Multiscale visual-attribute co-attention for zero-shot image recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6003

[8] [8] YU J, LI J, YU Z, et al. Multimodal transformer with multi-view visual representation for image captioning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(12): 4467

[9] [9] LI Q, XIAO F, BHANU B, et al. Inner knowledge-based Img2Doc scheme for visual question answering[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(3): 1

[11] [11] BAZI Y, AL RAHHAL M M, MEKHALFI M L, et al. Bi-modal transformer-based approach for visual question answering in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4708011

[12] [12] LIU Y B, GUO Y Y, YIN J H, et al. Answer questions with right image regions: a visual attention regularization approach[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(4): 1

[14] [14] JAUNET T, KERVADEC C, VUILLEMOT R, et al. VisQA: X-raying vision and language reasoning in transformers[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 976

[15] [15] GAO C Y, ZHU Q, WANG P, et al. Structured multimodal attentions for TextVQA[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9603

[18] [18] YUAN Z H, MOU L C, WANG Q, et al. From easy to hard: learning language-guided curriculum for visual question answering on remote sensing data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5623111

[19] [19] ADEL H, VU N T, KIRCHHOFF K, et al. Syntactic and semantic features for code-switching factored language models[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(3): 431

Tools

Get Citation

Copy Citation Text

LIU Zheng, DONG Jun, JIALE Dongzhu, CHAOMU Rilige, LIU Xuan, WENG Yu. A code-switching-based approach for low-resource language visual question answering[J]. Journal of Beijing Normal University, 2025, 61(3): 277

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Apr. 9, 2025

Accepted: Aug. 21, 2025

Published Online: Aug. 21, 2025

The Author Email: WENG Yu (wengyu@muc.edu.cn)

DOI:10.12202/j.0476-0301.2025054

Topics