Acta Optica Sinica, Volume. 45, Issue 8, 0815003(2025)
Contour Feature-Enhanced 3D Perception Method for Atypical Semantic Object
In autonomous driving, accurate perception of the environment is critical for ensuring the safety and efficiency of self-driving vehicles. One of the most significant challenges in this domain is the accurate depth estimation of atypical semantic objects, such as damaged vehicles, trees, or debris that may unexpectedly appear on the road. These objects often have unclear semantics and irregular shapes, which makes it difficult for conventional perception systems to detect and model their 3D occupancy accurately. Traditional 3D perception methods rely on multimodal inputs like images and point clouds to estimate depth and spatial distribution. However, these methods struggle with the complexities introduced by atypical objects, such as sparse depth maps or the need for precise temporal alignment across frames. To address these issues, we propose a novel occupancy network, semantics contour enhanced-forward and backward projection occupancy (SCE-FBOcc) networks, which enhances depth estimation by incorporating contour-aware depth features. By using contour features to clarify the boundaries of atypical semantic objects, we aim to improve occupancy prediction and depth estimation in scenarios where semantics are unclear, thus offering a more robust solution for autonomous driving systems.
To address the challenges associated with atypical semantic object perception and depth estimation, we propose a novel occupancy model, SCE-FBOcc. The model integrates multiple components designed to enhance the accuracy of depth prediction by leveraging contour-based features. First, the contour-aware depth network (CADN) is introduced to extract image contour features, which provide additional information to assist the model in distinguishing object boundaries that are often unclear or irregular. These contour features are critical in cases where traditional depth estimation methods fail to provide precise boundary definitions, particularly in the context of atypical objects such as damaged vehicles or road debris. By integrating these contour features into the depth estimation process, CADN enables the model to refine and improve its understanding of object contours and enhance depth perception. Subsequently, the contour feature attention module (CFAM) is applied to establish the relationship between the semantic contour features and high-resolution geometric features. This attention-based module selectively focuses on the most relevant contour information, allowing the model to prioritize the most important features for occupancy prediction. By dynamically adjusting attention to contour features, CFAM ensures that the model focuses more on the boundaries and shapes of objects that are crucial for accurate 3D occupancy estimation. This is particularly beneficial in situations where irregular shapes or occlusions obscure the full geometry of objects. Furthermore, the semantic contour assisted learning module (SCALM) is employed to model the associations between semantic categories and geometric contour features. SCALM helps the model suppress irrelevant contour features, which ensures that only the contours related to the semantic categories are extracted. By doing so, SCALM prevents the model from focusing on irrelevant contours, thus enhancing its ability to recognize important geometric features. This module ensures that the model not only aligns with the semantic understanding of objects but also performs well in more complex scenarios involving atypical or occluded objects. By learning the relationships between semantic information and geometric contours, SCALM improves the model’s robustness, enabling it to handle diverse environmental conditions and object types effectively. Together, these components contribute to improved occupancy prediction performance in challenging scenarios, where conventional models might struggle due to unclear semantics or ambiguous boundaries. By incorporating contour features and designing specialized modules that emphasize the importance of both semantic and geometric information, SCE-FBOcc achieves enhanced depth estimation and better 3D occupancy mapping, particularly in environments characterized by complexity and uncertainty.
SCE-FBOcc outperforms the baseline by 1.27 percentage points in average accuracy, with notable improvements in obstacles, construction vehicles, and traffic cones, achieving increases of 1.98 percentage points, 2.23 percentage points, and 1.91 percentage points, respectively. This highlights its superior performance in handling atypical semantic objects (Table 1). To assess the effect of contour features, the baseline model struggles with trees, as they are similar to the background. In contrast, SCE-FBOcc effectively distinguishes the foreground from the background, improving depth estimation (Fig. 8). In the case of construction vehicle misclassification, FBOcc misclassifies the vehicle’s tail, while SCE-FBOcc predicts correctly, which showcases the effectiveness of CFAM and SCALM in enhancing semantic understanding (Fig. 9). Ablation results show that performance drops significantly when only CEAM is used, particularly for artificial structures. This suggests that without SCALM’s geometric supervision, contour information extraction is insufficient. Incorporating SCALM improves performance across all categories (Table 2). Regarding inference latency, FPN+CM DepthNet achieves an inference delay of 5.66 ms, while CADN has a higher latency of 12.22 ms. However, by utilizing compute unified device architecture (CUDA) operators to reduce the computational overhead of deformable attention, inference latency is reduced to 9.12 ms. Overall, the inference time of SCE-FBOcc is only 3.4% higher than the baseline model, which demonstrates the efficiency of the proposed model despite its enhanced complexity (Table 3).
We present a 3D perception method that enhances the detection of atypical semantic objects through semantic contour features. The proposed approach improves accuracy in recognizing these objects by addressing issues such as foreground occlusion and incomplete semantic information. Experimental results demonstrate the effectiveness of the (SCALM and the CADN, with the model achieving significant improvements of 0.36% and 1.07% in detection accuracy for atypical categories, respectively. Overall, the model achieves a 1.43% increase in accuracy for atypical semantic categories, thereby validating the importance of contour features in enhancing depth estimation and occupancy tasks. Additionally, it is found that the extraction capability of contour information plays a crucial role in model performance. Future work can focus on improving the semantic understanding of contour extraction, particularly in eliminating irrelevant contours, to further enhance the model’s accuracy and robustness.
Get Citation
Copy Citation Text
Farong Kou, Kan Wang, Yajun Zhao, Tianxiang Yang, Gengyi Lü. Contour Feature-Enhanced 3D Perception Method for Atypical Semantic Object[J]. Acta Optica Sinica, 2025, 45(8): 0815003
Category: Machine Vision
Received: Dec. 20, 2024
Accepted: Feb. 28, 2025
Published Online: Apr. 27, 2025
The Author Email: Farong Kou (koufarong@xust.edu.cn)
CSTR:32393.14.AOS241916