An Improved Accelerated Deep Image Prior-Based Denoising Model
Introduction
Image denoising is a fundamental task in computer vision, aiming to restore high-quality images from their noisy counterparts. Traditional denoising methods rely on filtering techniques and model optimization, while recent advances leverage deep learning to achieve superior performance. Among these, supervised deep learning models require large datasets of clean-noisy image pairs for training, which introduces data bias and limits generalization. In contrast, unsupervised models, such as the Deep Image Prior (DIP), eliminate the need for paired training data by using the noisy image itself as the learning target. However, DIP suffers from slow convergence due to its iterative training process and lacks an efficient early stopping mechanism.
To address these limitations, this paper introduces an Improved Accelerated Deep Image Prior-based Denoising Model (IADIP). The proposed model enhances DIP by incorporating multiple preprocessed images as network inputs and targets, simplifying the network architecture, and introducing an automatic early stopping strategy. These improvements significantly boost computational efficiency while maintaining or even surpassing the denoising performance of existing methods.
Background and Related Work
Supervised vs. Unsupervised Denoising
Supervised denoising models, such as DnCNN, DRUNet, SwinIR, and Restormer, rely on large datasets of clean-noisy image pairs. While effective, these models suffer from data bias—performance degrades when applied to images with noise characteristics different from the training set.
Unsupervised models, such as Noise2Noise, Noise2Void, and Neighbor2Neighbor, reduce dependency on paired data but still require multiple noisy images for training. The Deep Image Prior (DIP) stands out by using only the noisy image itself, leveraging the network structure as an implicit regularizer. However, DIP’s slow convergence and lack of an adaptive stopping mechanism hinder its practical application.
Challenges in DIP-Based Denoising
- Slow Training Convergence – DIP requires thousands of iterations to converge, making it computationally expensive.
- Fixed Early Stopping – Using a predefined iteration count often leads to suboptimal denoising due to overfitting or underfitting.
- Complex Network Architecture – The default 4-layer U-Net backbone increases computational overhead without always improving performance.
Methodology
Overview of IADIP
IADIP improves DIP in three key aspects:
- Multi-Preprocessed Image Input – Multiple complementary denoised images are used as network inputs and auxiliary targets.
- Simplified Network Architecture – The 4-layer U-Net is reduced to a 1-layer structure, accelerating training.
- Automatic Early Stopping – A pseudo-reference quality metric dynamically determines the optimal stopping point.
Multi-Preprocessed Image Input
Instead of using random noise as input, IADIP employs multiple preprocessed images obtained from existing denoising algorithms (e.g., BM3D, DnCNN, Restormer). These images are concatenated and fed into the network, reducing the mapping difficulty between input and output. Additionally, both the preprocessed images and the original noisy image serve as targets, enhancing the loss function’s guidance.
Simplified Network Architecture
Since the preprocessed images are already of high quality, the mapping task becomes simpler. Thus, the original 4-layer U-Net is replaced with a 1-layer U-Net, significantly reducing computational cost while maintaining denoising performance.
Automatic Early Stopping via Pseudo-Reference Metric
A major drawback of DIP is the lack of an adaptive stopping criterion. IADIP introduces an Early Stopping Metric (ESM) based on downsampling:
- The network output is downsampled using two different kernels, producing two sub-images.
- The Mean Squared Error (MSE) between these sub-images measures noise levels—lower MSE indicates better denoising.
- Training stops when the ESM stabilizes for 50 consecutive iterations.
This approach avoids the computational overhead of traditional no-reference quality metrics while ensuring optimal denoising performance.
Experimental Results
Efficiency Improvements
• Reduced Training Time – The 1-layer U-Net cuts training time by over 50% compared to the original 4-layer version.
• Faster Early Stopping – ESM executes in milliseconds, making it practical for real-time applications.
Denoising Performance
Experiments on benchmark datasets (Set12, BSD68, Urban100) demonstrate that IADIP outperforms state-of-the-art methods: • Higher PSNR – IADIP achieves up to 35.64 dB on Set12 (σ=15), surpassing Restormer (33.36 dB) and DCDIP (33.76 dB).
• Better Visual Quality – Fine details (e.g., facial features, textures) are preserved more accurately than in competing methods.
Comparison with Other Metrics
ESM matches the effectiveness of traditional no-reference metrics (e.g., BRISQUE, NIQE) but is 1000x faster, making it ideal for iterative denoising.
Conclusion
IADIP addresses the key limitations of DIP by:
- Accelerating convergence through multi-preprocessed inputs and a simplified network.
- Enhancing denoising quality via a hybrid loss function and adaptive early stopping.
- Maintaining generalization without requiring paired training data.
Future work will explore applications in super-resolution and deblurring.
For further details, refer to the original paper: https://doi.org/10.19734/j.issn.1001-3695.2024.05.0225
Was this helpful?
0 / 0