Optoelectronics Letters, Volume. 21, Issue 9, 547(2025)

Point-voxel dual transformer for LiDAR 3D object detection

Jigang TONG, Fanhang YANG, Sen YANG, and Shengzhi DU

In this paper, a two-stage light detection and ranging (LiDAR) three-dimensional (3D) object detection framework is presented, namely point-voxel dual transformer (PV-DT3D), which is a transformer-based method. In the proposed PV-DT3D, point-voxel fusion features are used for proposal refinement. Specifically, keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module. Subsequently, following the generation of proposals by the region proposal networks (RPN), the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture. In 3D object detection, the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions. Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.

Tools

Get Citation

Copy Citation Text

TONG Jigang, YANG Fanhang, YANG Sen, DU Shengzhi. Point-voxel dual transformer for LiDAR 3D object detection[J]. Optoelectronics Letters, 2025, 21(9): 547

Download Citation

EndNote(RIS)BibTexPlain Text
Save article for my favorites
Paper Information

Category: Image and Information processing

Received: Jul. 17, 2023

Accepted: Sep. 15, 2025

Published Online: Sep. 15, 2025

The Author Email:

DOI:10.1007/s11801-025-3134-9

Topics