A Comprehensive Overview of Rotation-Translation Decoupling Optimization for Online Dense Reconstruction

A Comprehensive Overview of Rotation-Translation Decoupling Optimization for Online Dense Reconstruction

RGB-D dense reconstruction is a critical research topic in computer vision, computer graphics, and robotics, with applications spanning augmented reality, virtual reality, robotic navigation, and 3D mapping. Traditional methods for camera pose estimation, which involves determining the rotation and translation of a camera relative to a scene, often treat these two components as a coupled system during optimization. However, this approach introduces challenges such as mutual interference between rotation and translation optimization and discrepancies in their physical units. To address these issues, this paper introduces a novel dense reconstruction algorithm based on rotation-translation decoupling optimization, referred to as RTDOFusion.

Introduction

Camera pose estimation is a fundamental step in dense reconstruction, directly influencing the quality of the reconstructed 3D model. Traditional iterative optimization methods, including those based on the Iterative Closest Point (ICP) algorithm and stochastic optimization techniques, typically optimize rotation and translation jointly. While these methods have been widely adopted, they suffer from two key limitations:

  1. Mutual Interference – When camera motion involves only rotation or translation, errors propagate between the two components, leading to suboptimal optimization.
  2. Dimensional Mismatch – Rotation and translation operate in different physical units, making it difficult to balance their contributions during optimization.

RTDOFusion addresses these challenges by decoupling the optimization of rotation and translation into separate subspaces. This approach allows each component to be optimized independently, reducing interference and improving accuracy.

Core Methodology

The proposed algorithm operates in an iterative framework, where each iteration involves:

  1. Search Neighborhood Definition – For the current estimates of rotation and translation, the algorithm defines a search neighborhood within their respective subspaces.
  2. Candidate Solution Sampling – Within these neighborhoods, multiple candidate solutions for rotation and translation are sampled.
  3. Evaluation and Selection – Each candidate solution is evaluated based on surface alignment quality, and the best-performing candidates are selected to update the current estimates.

This process repeats until convergence criteria are met.

Rotation-Translation Decoupling

Unlike traditional methods that treat pose estimation as a six-dimensional problem, RTDOFusion splits it into two independent optimizations:

• Rotation Optimization – Performed in a unit quaternion space, where candidate rotations are sampled from a predefined set and evaluated.

• Translation Optimization – Conducted in Euclidean space, with candidates sampled along three axes at fixed intervals.

By independently optimizing each component, the algorithm minimizes error propagation between rotation and translation.

Implicit Surface Alignment for Pose Evaluation

To assess the quality of candidate poses, the algorithm employs an implicit surface alignment strategy. The global scene is represented using a Truncated Signed Distance Function (TSDF), which encodes distances from voxel centers to the nearest surface. For a given candidate pose, the algorithm measures the alignment between the current frame’s surface and the global model by comparing their TSDF values. The pose that minimizes the discrepancy is selected as the optimal solution.

Coarse-to-Fine Hierarchical Search Strategy

To balance efficiency and precision, RTDOFusion employs a multi-level search strategy:

  1. Coarse Search – Early iterations use large search neighborhoods to quickly cover a broad solution space.
  2. Fine Search – Later iterations narrow the search range to refine the solution.

For rotation, this involves adjusting the radius of the sampling sphere in quaternion space. For translation, step sizes along each axis are progressively reduced. This hierarchical approach ensures both rapid convergence and high accuracy.

Experimental Validation

The algorithm was evaluated on multiple datasets, including FastCaMo-Synth and ICL-NUIM, demonstrating superior performance in both fast and slow camera motion scenarios.

Ablation Study on Optimization Interference

A key experiment compared RTDOFusion with ICP and ROSEFusion (a stochastic optimization method) by initializing the translation component to its ground truth value and only optimizing rotation. The results showed that traditional methods exhibited significant translation drift due to interference, while RTDOFusion maintained stable translation estimates, confirming the effectiveness of decoupling.

Tracking Accuracy and Reconstruction Quality

On FastCaMo-Synth, RTDOFusion achieved lower absolute trajectory errors (ATE) than competing methods, particularly in scenes with rapid motion. In ICL-NUIM, it performed comparably to state-of-the-art techniques despite relying solely on depth data (without color or loop closure mechanisms).

Visual comparisons further demonstrated that RTDOFusion produces more consistent reconstructions, with smoother surfaces and fewer alignment errors than ROSEFusion.

Computational Efficiency

Implemented in C++ with CUDA acceleration, RTDOFusion runs at near real-time speeds (approximately 30 frames per second) on consumer-grade hardware, making it practical for interactive applications.

Limitations and Future Work

While effective, RTDOFusion has some limitations:

  1. Dependence on Geometric Features – Performance may degrade in textureless environments where depth data is insufficient.
  2. Resolution Constraints – The TSDF representation’s voxel resolution affects pose estimation accuracy, particularly for fine details.

Future improvements could incorporate color information or adaptive voxel sizing to enhance robustness.

Conclusion

RTDOFusion presents a significant advancement in dense reconstruction by decoupling rotation and translation optimization. Through independent subspace searches, hierarchical refinement, and implicit surface alignment, the algorithm reduces mutual interference and improves pose estimation accuracy. Experimental results confirm its advantages in both fast and slow camera motion scenarios, making it a promising solution for real-time 3D reconstruction tasks.

doi.org/10.19734/j.issn.1001-3695.2024.01.0111

Was this helpful?

0 / 0