Optics and Precision Engineering, Volume. 32, Issue 21, 3244(2024)
3DRes-ViT knee osteoarthritis classification model based on multimodal fusion
Aiming at the problems of low accuracy of multiple classification in Knee osteoarthritis (KOA) and insufficient feature extraction of knee joint images, the 3DRes-ViT network model based on multi-modal fusion was proposed in this paper. Firstly, the 3D Convolutional Neural Networks (3D CNN) is designed to extract the 3D shallow features of the two magnetic resonance imaging (MRI) sequences respectively, including dual echo steady state (DESS) and fast spin echo (TSE). The study found that the two kinds of information are complementary, and then these features are fused. Secondly, the dependencies among the fused feature channels are captured by the Efficient Channel Attention (ECA) module and fed into the Vision Transformer (ViT) encoders, which combines the advantages of 3DCNN and ViT to efficiently aggregate the local and global features of the two modalities. Finally, the output of ViT is then fused with the X-ray image features extracted by the 2D convolutional neural network (2D CNN) to further enhance the classification performance. Experimental results show that our method performs excellently in the KOA four-classification task, with an average classification accuracy of 91.2%, an average precision of 91.6%, an F1 score of 0.914, and a reduction of the average absolute error to 8.8%. The proposed model surpasses the mainstream methods in the current field and significantly improves the multiple classification accuracy of knee osteoarthritis.
Get Citation
Copy Citation Text
Yu SONG, Rui XU, Xiaodong CAI, Xin WANG. 3DRes-ViT knee osteoarthritis classification model based on multimodal fusion[J]. Optics and Precision Engineering, 2024, 32(21): 3244
Category:
Received: Jul. 12, 2024
Accepted: --
Published Online: Jan. 24, 2025
The Author Email: Xin WANG (wangxin315@ccut.edu.cn)