Optoelectronics Letters, Volume. 18, Issue 5, 313(2022)
Video-based body geometric aware network for 3D human pose estimation
[1] [1] MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]//2017 International Conference on 3D Vision (3DV), October 10-12, 2017, Qingdao, China. New York:IEEE, 2017:506-516.
[2] [2] HOSSAIN M R I, LITTLE J J. Exploiting temporal information for 3D human pose estimation[C]//Proceedings of the European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin:Springer, 2018:68-84.
[3] [3] LIN J, LEE G H. Trajectory space factorization for deep video-based 3D human pose estimation[C]//2019 British Machine Vision Conference (BMVC), September 9-12, 2019, Cardiff, UK. BMVA, 2019.
[4] [4] LUVIZON D C, PICARD D, TABIA H. 2D/3D pose estimation and action recognition using multitask deep learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake, UT, USA. New York:IEEE, 2018: 5137-5146.
[5] [5] MARTINEZ J, HOSSAIN R, ROMERO J, et al. A simple yet effective baseline for 3D human pose estimation[C]//Proceedings of the IEEE International Conference on Computer Vision, October 22-29, 2017, Venice, Italy. New York:IEEE, 2017:2640-2649.
[6] [6] PARK S, HWANG J, KWAK N. 3D human pose estimation using convolutional neural networks with 2D pose information[C]//Proceedings of the European Conference on Computer Vision, October 11-14, 2016, Amsterdam, The Netherlands. Berlin:Springer, 2016: 156-169.
[7] [7] PAVLLO D, FEICHTENHOFER C, GRANGIER D, et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:7753-7762.
[8] [8] CHEN X, LIN K Y, LIU W, et al. Weakly-supervised discovery of geometry-aware representation for 3D human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:7753-7762.
[9] [9] FANG H S, XU Y, WANG W, et al. Learning pose grammar to encode human body configuration for 3D pose estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, February 2-7, 2018, New Orleans, Louisiana, USA. Cambridge:AAAI Press, 2018:6821-6828.
[10] [10] PAVLAKOS G, ZHOU X, DERPANIS K G, et al. Coarse-to-fine volumetric prediction for single-image 3D human pose[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York:IEEE, 2017:7025-7034.
[11] [11] XU J, YU Z, NI B, et al. Deep kinematics analysis for monocular 3D human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York:IEEE, 2020:899-908.
[12] [12] CAI Y, GE L, LIU J, et al. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York:IEEE, 2019:2272-2281.
[13] [13] ZHAO L, PENG X, TIAN Y, et al. Semantic graph convolutional networks for 3D human pose regression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16-20, 2019, Long Beach, CA, USA. New York:IEEE, 2019:3425-3435.
[14] [14] LIU K, DING R, ZOU Z, et al. A comprehensive study of weight sharing in graph networks for 3D human pose estimation[C]//Proceedings of the European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin:Springer, 2020:318-334.
[15] [15] CI H, WANG C, MA X, et al. Optimizing network structure for 3D human pose estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York:IEEE, 2019:2262-2271.
[16] [16] WANG J, YAN S, XIONG Y, et al. Motion guided 3D pose estimation from videos[C]//Proceedings of the European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Berlin:Springer, 2020: 764-780.
[17] [17] LIU R, SHEN J, WANG H, et al. Attention mechanism exploits temporal contexts:real-time 3D human pose reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13-19, 2020, Seattle, WA, USA. New York:IEEE, 2020:5064-5073.
[18] [18] TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer: an all-MLP architecture for vision[C]//Thirty-Fifth Conference on Neural Information Processing Systems (NeurlPS), December 6-12, 2021, Virtual Event. New York:Curran Associates, 2021: 24261-24272.
[19] [19] CHEN C H, RAMANAN D. 3D human pose estimation c2D pose estimation + matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York:IEEE, 2017:7035-7043.
[20] [20] ZHENG C, ZHU S, MENDIETA M, et al. 3D human pose estimation with spatial and temporal transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10-17, 2021, Montreal, QC, Canada. New York:IEEE, 2021: 11656-11665.
[21] [21] DABRAL R, MUNDHADA A, KUSUPATI U, et al. Learning 3D human pose from structure and motion[C]//Proceedings of the European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin :Springer, 2018:668-683.
[22] [22] CHENG Y, YANG B, WANG B, et al. Occlusion-aware networks for 3D human pose estimation in video[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27-November 2, 2019, Seoul, Korea (South). New York:IEEE, 2019:723-732.
[23] [23] LIU J, ROJAS J, LI Y, et al. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video[C]//2021 IEEE International Conference on Robotics and Automation (ICRA), May 30-June 5, 2021, Xi'an, China. New York:IEEE, 2021: 3374-3380.
[24] [24] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
[25] [25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words:transformers for image recognition at scale[C]//9th International Conference on Learning Representations (ICLR), May 3-7, 2021, Virtual Event, Austria. 2021.
[26] [26] HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[EB/OL]. (2016-06-27) [2021-12-26].https://arxiv.org/abs/1606.08415v1.
[27] [27] IONESCU C, PAPAVA D, OLARU V, et al. Human3. 6m:large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(7):1325-1339.
[28] [28] CHEN Y, WANG Z, PENG Y, et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18-22, 2018, Salt Lake, UT, USA. New York:IEEE, 2018:7103-7112.
[29] [29] SIGAL L, BALAN A O, BLACK M J. Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International journal of computer vision, 2010, 87(1-2):4.
[30] [30] KINGMA D P, BA J. Adam:a method for stochastic optimization[EB/OL]. (2014-12-22) [2021-12-26]. https://arxiv.org/abs/1412.6980v1.
[31] [31] LOSHCHILOV I, HUTTER F. SGDR:stochastic gradient descent with warm restarts[EB/OL]. (2016-08-13)[2021-12-26]. https://arxiv.org/abs/1608.03983v1.
[32] [32] LEE K, LEE I, LEE S. Propagating LSTM:3D pose estimation based on joint interdependency[C]//Proceedings of the European Conference on Computer Vision, September 8-14, 2018, Munich, Germany. Berlin: Springer, 2018:119-135.
Get Citation
Copy Citation Text
LI Chaonan, LIU Sheng, YAO Lu, ZOU Siyu. Video-based body geometric aware network for 3D human pose estimation[J]. Optoelectronics Letters, 2022, 18(5): 313
Received: Feb. 3, 2022
Accepted: Mar. 10, 2022
Published Online: Jan. 20, 2023
The Author Email: Sheng LIU (edliu@zjut.edu.cn)