Chinese Journal of Liquid Crystals and Displays, Volume. 40, Issue 4, 598(2025)
Pose estimation network based on attention feature fusion of multimodal data
6D pose estimation that balances accuracy and applicability has been a hot and difficult research topic. To this end, a 6D pose estimation network based on attentional feature fusion of multimodal data is proposed. Firstly, a deeper structure of squeeze and excitation module is introduced to enhance the dependency to expand the receptive field by adjusting the weights of each channel to improve the effect of processing RGB image features. Further, for multimodal data, an iterative attention feature fusion module is deployed in the feature fusion stage, which solves the scale inconsistency problem in global feature fusion through multiple iterative fusion operations, and is able to capture and integrate multimodal data more accurately, which significantly improves the effect of attitude regression. Finally, in order to quantitatively assess the robustness and applicability of the model in complex environments, an invisibility percentage metric is introduced, which is capable of assessing the performance of the model when dealing with partially occluded or complex backgrounds. Through the pose prediction experiments on the public dataset, it is verified that the improved model is not only able to achieve accurate predicted poses on the validation dataset, but also the algorithmic model proposed in this paper is more applicable in complex environments compared to the densefusion model.
Get Citation
Copy Citation Text
Yuntao ZHAO, Xinhui DENG. Pose estimation network based on attention feature fusion of multimodal data[J]. Chinese Journal of Liquid Crystals and Displays, 2025, 40(4): 598
Category:
Received: Jul. 28, 2024
Accepted: --
Published Online: May. 21, 2025
The Author Email: Xinhui DENG (2211241803@qq.com)