Journal of Beijing Normal University, Volume. 61, Issue 3, 277(2025)
A code-switching-based approach for low-resource language visual question answering
[1] [1] MIN B N, ROSS H, SULEM E, et al. Recent advances in natural language processingvialarge pre-trained language models: a survey[J]. ACM Computing Surveys, 2024, 56(2): 1
[2] [2] BAYOUDH K. A survey of multimodal hybrid deep learning for computer vision: architectures, applications, trends, and challenges[J]. Information Fusion, 2024, 105: 102217
[3] [3] ZITNICK C L, AGRAWAL A, ANTOL S, et al. Measuring machine intelligence through visual question answering[J]. AI Magazine, 2016, 37(1): 63
[4] [4] CAO Q X, LI B L, LIANG X D, et al. Knowledge-routed visual question reasoning: challenges for deep representation embedding[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(7): 2758
[5] [5] SHEN Y, YANG M, LI Y L, et al. Knowledge-based reasoning network for relation detection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 5051
[6] [6] YU Z, YU J, XIANG C C, et al. Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 5947
[7] [7] ZHANG H, TIAN L, WANG Z J, et al. Multiscale visual-attribute co-attention for zero-shot image recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6003
[8] [8] YU J, LI J, YU Z, et al. Multimodal transformer with multi-view visual representation for image captioning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(12): 4467
[9] [9] LI Q, XIAO F, BHANU B, et al. Inner knowledge-based Img2Doc scheme for visual question answering[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(3): 1
[11] [11] BAZI Y, AL RAHHAL M M, MEKHALFI M L, et al. Bi-modal transformer-based approach for visual question answering in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4708011
[12] [12] LIU Y B, GUO Y Y, YIN J H, et al. Answer questions with right image regions: a visual attention regularization approach[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(4): 1
[14] [14] JAUNET T, KERVADEC C, VUILLEMOT R, et al. VisQA: X-raying vision and language reasoning in transformers[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(1): 976
[15] [15] GAO C Y, ZHU Q, WANG P, et al. Structured multimodal attentions for TextVQA[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9603
[18] [18] YUAN Z H, MOU L C, WANG Q, et al. From easy to hard: learning language-guided curriculum for visual question answering on remote sensing data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5623111
[19] [19] ADEL H, VU N T, KIRCHHOFF K, et al. Syntactic and semantic features for code-switching factored language models[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(3): 431
Get Citation
Copy Citation Text
LIU Zheng, DONG Jun, JIALE Dongzhu, CHAOMU Rilige, LIU Xuan, WENG Yu. A code-switching-based approach for low-resource language visual question answering[J]. Journal of Beijing Normal University, 2025, 61(3): 277
Received: Apr. 9, 2025
Accepted: Aug. 21, 2025
Published Online: Aug. 21, 2025
The Author Email: WENG Yu (wengyu@muc.edu.cn)