Two-Stage Depth Correction Algorithm Based on Error Component Modeling for Consumer-Grade RGB-D Cameras

Two-Stage Depth Correction Algorithm Based on Error Component Modeling for Consumer-Grade RGB-D Cameras

Introduction

Consumer-grade RGB-D cameras have become increasingly popular in various applications, including robotics, augmented reality, and human-computer interaction. However, the depth data obtained from these cameras often suffer from significant noise, low accuracy, and the presence of outliers. While factory calibration parameters may suffice for gaming and virtual reality applications, they fall short of meeting the high-precision localization requirements essential for robotic applications. This paper addresses the challenge of improving depth accuracy in consumer-grade RGB-D cameras, with a particular focus on the Intel® RealSense™ D455 camera used in assistive robotics, such as bathing robots.

Existing depth correction methods primarily target discontinued devices like the Kinect or Kinect v2, leaving a gap in research for modern sensors like the Intel RealSense series. Previous approaches include error modeling based on parameter estimation, calibration using specialized boards, and distortion compensation techniques. However, these methods often have limitations, such as being tailored to specific depth-sensing technologies (e.g., time-of-flight or structured light) or requiring complex setups. This paper proposes a novel two-stage depth correction algorithm that leverages an error component model to address both local and global depth errors efficiently.

Error Component Model

Depth Quality Evaluation Metrics

To assess and improve depth accuracy, three key metrics are used: Z-accuracy, fill rate, and root mean square error (RMSE). Z-accuracy measures the deviation between the measured depth values and ground truth (GT), providing an indication of the average depth error. The fill rate evaluates the percentage of valid depth pixels in an image, while RMSE quantifies spatial noise by calculating the standard deviation of depth values from a best-fit plane. In this study, Z-accuracy and RMSE serve as the primary metrics for evaluating depth correction performance.

Modeling Depth Errors

Depth errors in RGB-D cameras can be categorized into two components: global errors and local errors. Global errors represent systematic biases in the average depth measurements, while local errors account for pixel-specific deviations. To model these errors, high-precision laser rangefinders (with 1–2 mm accuracy) were used to establish GT values. The Intel RealSense D455 was positioned parallel to a flat wall at varying distances, and point clouds were captured to analyze depth quality.

Global errors were quantified using Z-accuracy, calculated as the average absolute difference between measured depth values and GT. Local errors were assessed using RMSE, which measures the dispersion of individual depth points around a fitted plane. Experimental results showed that both global and local errors increase nonlinearly with distance, with global errors being consistently larger than local errors at the same distance. This observation led to the development of an error component model that separately addresses these two types of errors.

Two-Stage Depth Correction Algorithm

The proposed algorithm corrects depth errors in two stages: first by addressing local errors and then by refining global errors. The approach leverages the fact that errors grow with distance, employing an iterative calculation strategy that starts from short distances and progressively extends to longer ranges.

Stage 1: Local Correction Function

The first stage focuses on correcting pixel-specific depth deviations. A local correction function is estimated for each pixel (or a subset of pixels, to reduce computational complexity) to adjust depth values such that all points in a flat surface converge to a single plane. The process involves the following steps:

  1. Data Collection: A checkerboard pattern is placed on a flat wall, and RGB images, depth images, and point clouds are captured at multiple distances.
  2. Initial Correction: An initial local correction function is applied to the raw point cloud to reduce local errors.
  3. Plane Fitting: A plane is fitted to the corrected point cloud using RANSAC (Random Sample Consensus), which robustly estimates the plane parameters even in the presence of outliers.
  4. Function Update: The local correction function is iteratively refined by comparing corrected depth values with the fitted plane.

To optimize computational efficiency, pixel discretization is introduced. Instead of computing a correction function for every pixel, a sparse grid of pixels is selected, and the correction functions for intermediate pixels are interpolated from neighboring grid points. This approach significantly reduces the number of functions that need to be estimated while maintaining correction accuracy.

Stage 2: Global Correction Function

The second stage corrects systematic biases in the average depth measurements. The global correction function is estimated by aligning the corrected point cloud (from Stage 1) with a reference plane derived from the checkerboard pattern in the RGB image. The process involves:

  1. Coordinate System Alignment: The transformation between the RGB camera and depth sensor coordinate systems is estimated using checkerboard corners detected in both modalities.
  2. Joint Optimization: The global correction function and the coordinate transformation are optimized simultaneously to minimize alignment errors between the corrected point cloud and the reference plane.
  3. Pixel Discretization: Similar to Stage 1, pixel discretization is applied to reduce computational load. Only a few key pixels (e.g., image corners) are used to estimate the global correction function, with other pixels interpolated from these reference points.

The global correction function is modeled as a quadratic polynomial, capturing the nonlinear relationship between measured and true depth values.

Experimental Results

Data Collection and Setup

Experiments were conducted using the Intel RealSense D455 camera mounted on a setup that ensured parallel alignment with a flat wall. High-precision laser rangefinders provided GT distance measurements. Data were collected at distances ranging from 0.586 m to 1.421 m, covering typical working ranges for robotic applications.

Depth Correction Performance

The algorithm was evaluated using different pixel discretization configurations, ranging from coarse (10×10 grid) to fine (full 640×480 resolution). Key findings include:

  1. Global Error Reduction: After applying the local correction function, global errors (Z-accuracy) were significantly reduced. Further refinement with the global correction function yielded additional improvements.
  2. Local Error Reduction: RMSE values decreased after both correction stages, indicating better consistency in depth measurements.
  3. Effect of Pixel Discretization: Finer discretization (e.g., 8×8 pixel blocks) achieved near-optimal correction performance while balancing computational efficiency. Coarser grids led to noticeable degradation in accuracy.

Application in Assistive Robotics

The algorithm was tested in a bathing robot scenario, where the D455 was used to localize a human model’s back region. Depth correction reduced positioning errors from an average of 18.329 mm to 1.111 mm, demonstrating practical utility in real-world applications.

Comparison with Existing Methods

The proposed method was compared with a state-of-the-art depth correction technique based on checkerboard calibration. While both methods performed similarly at close distances, the proposed algorithm exhibited superior performance at longer ranges, likely due to its iterative error estimation approach.

Conclusion

This paper presented a two-stage depth correction algorithm designed to enhance the accuracy of consumer-grade RGB-D cameras. By modeling depth errors as separate global and local components, the algorithm effectively reduces both systematic biases and pixel-specific deviations. Key contributions include:

  1. Error Component Model: A novel framework for categorizing and addressing depth errors.
  2. Iterative Correction Strategy: A computationally efficient approach that progressively refines depth estimates from short to long distances.
  3. Pixel Discretization: A practical method for reducing computational load without sacrificing correction accuracy.

Experimental results validated the algorithm’s effectiveness, showing significant improvements in both synthetic and real-world scenarios. The method is particularly advantageous for robotic applications requiring high precision, such as assistive bathing robots. Future work may explore adaptive correction functions for different camera models and dynamic environments.

doi.org/10.19734/j.issn.1001-3695.2024.06.0189

Was this helpful?

0 / 0