Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
Blog Article
LiDAR-Camera fusion has demonstrated remarkable potential in 3D object detection for autonomous vehicles, leveraging complementary information from both modalities.Recent state-of-the-art approaches primarily make use of projection matrices to achieve cross-modal data alignment.However, these methods often struggle with poor performance when faced with sensor misalignment or calibration errors, resulting in suboptimal fusion quality and limited robustness.In this paper, we propose a novel framework for 3D object detection, called Height-Adaptive Deformable Multi-Modal Fusion, which leverages Deformable Attention to enhance the fusion process.Specifically, we introduce a Deformable-based aptamil allerpro Cross-Modal Spatial Attention that dynamically fuse image features through learnable offsets, allowing for more flexible and precise alignment between Shirt the LiDAR and camera modalities.
To further improve the fusion quality, we design a Height-Adaptive Aggregation strategy that mitigates the risk of incorrect fusion from background points while emphasizing the aggregation of foreground object features.In addition, we introduce projection noise to simulate misalign scenarios.To tackle these issues, an extra supervision loss is added.Extensive experiments on the nuScenes benchmark demonstrate the effectiveness and robustness of our proposed framework.Specifically, our methods significantly outperforms the LiDAR-only method and exhibits reduced precision degradation under sensor misalignment, outperforming other fusion-based approaches.
Our results validate the potential of proposed framework for improving 3D object detection accuracy, particularly in real-world, imperfect sensor environments.