Multimodal recommendation with semantic graph enhancement and adaptive feature completion

CHAOMU Rilige; HE Mingxin; MA Liyan

doi:10.12202/j.0476-0301.2025052

Journal of Beijing Normal University, Volume. 61, Issue 3, 307(2025)

Multimodal recommendation with semantic graph enhancement and adaptive feature completion

CHAOMU Rilige^1,2, HE Mingxin^1,2, and MA Liyan^3、*

¹Key Laboraory of Ethnic Language Intelligent Analysis and Security Governance, Ministry of Education, Minzu University of China, Beijing, China

²School of Information Engineering, Minzu University of China, Beijing, China

³School of Computer Engineering and Science, Shanghai University, Shanghai, China

show less

Abstract Get PDF(in Chinese)

References(29)

[1] [1] QIU R H, WANG S, CHEN Z, et al. CausalRec: causal inference for visual debiasing in visually-aware recommendation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3844

[2] [2] WU L, HE X N, WANG X, et al. A survey on accuracy-oriented neural recommendation: from collaborative filtering to information-rich recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(5): 4425

[3] [3] ZHOU H Y, ZHOU X, ZENG Z W, et al. A comprehensive survey on multimodal recommender systems: taxonomy, evaluation, and future directions[EB/OL]. (2023-02-09) [2025-03-15]. https://arxiv.org/abs/2302.04473v1

[4] [4] BU J J, TAN S L, CHEN C, et al. Music recommendation by unified hypergraph: combining social media information and music content[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 391

[5] [5] LIU K, XUE F, LI S Y, et al. Multimodal hierarchical graph collaborative filtering for multimedia-based recommendation[J]. IEEE Transactions on Computational Social Systems, 2024, 11(1): 216

[6] [6] SHI C, HU B B, ZHAO W X, et al. Heterogeneous information network embedding for recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(2): 357

[7] [7] CHEN J Y, ZHANG H W, HE X N, et al. Attentive collaborative filtering: multimedia recommendation with item- and component-level attention[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Tokyo: ACM, 2017: 335

[8] [8] JU W, FANG Z, GU Y Y, et al. A comprehensive survey on deep graph representation learning[J]. Neural Networks, 2024, 173: 106207

[9] [9] ZHANG X K, XU B, MA F L, et al. Beyond co-occurrence: multi-modal session-based recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(4): 1450

[10] [10] ZHU X F, YIN Y B, WANG L. MIMNet: multi-interest meta network with multi-granularity target-guided attention for cross-domain recommendation[J]. Neurocomputing, 2025, 620: 129208

[11] [11] LIN Z H, TIAN C X, HOU Y P, et al. Improving graph collaborative filtering with neighborhood-enriched contrastive learning[C]//Proceedings of the ACM Web Conference 2022. Lyon: ACM, 2022: 2320

[12] [12] CHEN X, CHEN H X, XU H T, et al. Personalized fashion recommendation with visual explanations based on multimodal attention network: towards visually explainable recommendation[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Paris: ACM, 2019: 765

[13] [13] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2025-03-15]. https://arxiv.org/abs/1409.1556v6

[14] [14] LIU F, CHENG Z Y, SUN C C, et al. User diverse preference modeling by multimodal attentive metric learning[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice: ACM, 2019: 1526

[15] [15] LIU F, CHEN H L, CHENG Z Y, et al. Disentangled multimodal representation learning for recommendation[J]. IEEE Transactions on Multimedia, 2022, 25: 7149

[16] [16] ONG R K, KHONG A W H. Spectrum-based modality representation fusion graph convolutional network for multimodal recommendation[C]//Proceedings of the 18th ACM International Conference on Web Search and Data Mining. Hannover: ACM, 2025: 9

[19] [19] TAO Z L, LIU X H, XIA Y W, et al. Self-supervised learning for multimedia recommendation[J]. IEEE Transactions on Multimedia, 2022, 25: 5107

[20] [20] WEI Z H, WANG K, LI F X, et al. M3KGR: a momentum contrastive multi-modal knowledge graph learning framework for recommendation[J]. Information Sciences, 2024, 676: 120812

[21] [21] ZHOU X, ZHOU H Y, LIU Y, et al. Bootstrap latent representations for multi-modal recommendation[EB/OL]. (2023-05-01) [2025-03-15]. https://arxiv.org/abs/2207.05969v3

[22] [22] ZHANG J H, ZHU Y Q, LIU Q, et al. Latent structure mining with contrastive modality fusion for multimedia recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(9): 9154

[23] [23] GUO F P, WANG Z F, WANG X P, et al. Dual-view multi-modal contrastive learning for graph-based recommender systems[J]. Computers and Electrical Engineering, 2024, 116: 109213

[24] [24] WANG M H, LIN Y J, LIN G L, et al. M2GRL: a multi-task multi-view graph representation learning framework for web-scale recommender systems[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. San Diego: ACM, 2020: 2349

[25] [25] HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[EB/OL]. (2019-02-22) [2025-03-15]. https://arxiv.org/abs/1808.06670v5

[26] [26] KEMERTAS M, PISHDAD L, DERPANIS K G, et al. RankMI: a mutual information maximizing ranking loss[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020. Seattle: IEEE, 2020: 14362

[27] [27] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2023-08-02) [2025-03-15]. https://arxiv.org/abs/1706.03762

[28] [28] WANG Y N, WU J M, HOASHI K. Multi-attention fusion network for video-based emotion recognition[C]//2019 International Conference on Multimodal Interaction. Suzhou: ACM, 2019: 595

[29] [29] SUN J N, ZHANG Y X, GUO W, et al. Neighbor interaction aware graph convolution networks for recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Xi'an: ACM, 2020: 639

[30] [30] RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[EB/OL]. (2012-05-09) [2025-03-15]. https://arxiv.org/abs/1205.2618

[31] [31] KOREN Y, BELL R, VOLINSKY C. Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42(8): 30

Tools

Get Citation

Copy Citation Text

CHAOMU Rilige, HE Mingxin, MA Liyan. Multimodal recommendation with semantic graph enhancement and adaptive feature completion[J]. Journal of Beijing Normal University, 2025, 61(3): 307

Download Citation

EndNote(RIS)BibTex Plain Text

Set citation alerts for article

Save article for my favorites

Paper Information

Received: Apr. 9, 2025

Accepted: Aug. 21, 2025

Published Online: Aug. 21, 2025

The Author Email: MA Liyan (liyanma@shu.edu.cn)

DOI:10.12202/j.0476-0301.2025052

Topics