Tile Image Inpainting Based on High-Order Texture and Structural Feature Interaction
Introduction
The preservation and restoration of cultural heritage artifacts have gained increasing attention in recent years, particularly for historical architectural components that embody significant artistic and cultural value. Among these components, Chinese eave tiles, known as wadang, serve both functional and decorative purposes in traditional architecture. These tiles protect wooden eaves from weathering while displaying intricate patterns, images, or inscriptions that reflect historical and cultural significance. However, prolonged exposure to natural elements often leads to surface degradation, including cracks, peeling, and missing fragments. Traditional restoration methods rely heavily on manual expertise and case-based knowledge transfer, which are time-consuming and costly.
Digital image inpainting techniques offer a promising alternative by leveraging computational methods to reconstruct missing or damaged regions in images. While conventional inpainting approaches, such as diffusion-based and patch-based methods, have shown effectiveness in simple texture reconstruction, they struggle with complex structural details and large missing areas. Deep learning-based methods have demonstrated superior performance in handling intricate textures and semantic consistency, yet they often overlook the interaction between high-order and low-order features, which is crucial for capturing fine details in wadang images.
This paper introduces a novel generative adversarial network (GAN)-based approach for wadang image inpainting, emphasizing high-order texture and structural feature interaction. The proposed method, termed Recursive Partial Convolutional GAN (RPConv-GAN), enhances the model’s ability to reconstruct both coarse and fine details by integrating recursive partial convolution layers and feature fusion mechanisms. Additionally, a dedicated dataset of wadang images is constructed to validate the method’s effectiveness.
Challenges in Wadang Image Inpainting
Wadang images present unique challenges that distinguish them from natural images or other cultural heritage artifacts. Two primary issues complicate the inpainting process:
-
Blurred and Lost Edge Structures
Due to the uniform color distribution in wadang tiles, the boundaries between textures and structural elements are often subtle. Gradient variations at edges are less pronounced, making it difficult for conventional methods to accurately restore missing structural details. For example, in Figure 1(a), the inpainted region maintains overall texture consistency but fails to recover critical edge structures. -
Texture and Structural Disorder
Complex patterns, such as intertwined floral motifs or calligraphic strokes, are prone to incorrect reconstructions where edges intersect unnaturally or appear blurred. Figure 1(b) illustrates a case where the inpainted result exhibits noticeable structural misalignment and texture irregularities.
These challenges necessitate a method that not only captures high-level semantic features but also preserves fine-grained structural and textural details.
Proposed Method: RPConv-GAN
The RPConv-GAN framework consists of four key components: a generator, recursive partial convolution (RPConv) modules, a feature fusion layer, and a discriminator. The generator follows an encoder-decoder architecture, processing both the damaged wadang image and its edge structure map to separately encode and decode texture and structural features.
Generator Architecture
The generator employs partial convolutional layers (PConv) to minimize the influence of invalid pixels in the missing regions. The encoder uses stacked PConv layers with varying kernel sizes (7×7, 5×5, and 3×3) to progressively aggregate high-level features. The decoder mirrors this structure to reconstruct texture and edge details.
Partial Convolutional Layers (PConv)
PConv selectively processes valid pixels by masking out missing regions during convolution. This prevents color discrepancies and artifacts caused by invalid pixel interference. The mask is dynamically updated as the network fills in missing regions, ensuring that only relevant features contribute to the reconstruction.
Recursive Partial Convolution (RPConv)
To enhance the model’s ability to capture subtle details, RPConv modules are embedded in both the encoder and decoder. These modules perform two critical functions:
- High-Order Feature Extraction
RPConv extracts higher-order statistical features from the same-scale feature maps, capturing fine textures and edge variations that standard convolutions might miss. - Feature Interaction
By recursively combining low-order (global structure) and high-order (local details) features, RPConv strengthens the model’s representation of complex patterns. This interaction ensures that reconstructed edges remain sharp and textures appear natural.
Feature Fusion Layer
After the generator produces preliminary texture and structure reconstructions, a feature fusion layer integrates these outputs. The fusion process involves:
- Channel-wise Concatenation
Texture and structure feature maps are concatenated to preserve their distinct characteristics. - Cross-Channel Interaction
A 1×1 convolutional layer adaptively combines features across channels, reducing dimensionality while maintaining detail fidelity. - Non-Linear Enhancement
A sigmoid activation refines the fused features, ensuring smooth transitions between inpainted and original regions.
Loss Functions
The model optimizes three loss functions to ensure realistic and consistent reconstructions:
- Reconstruction Loss (L1)
Measures pixel-wise differences between the inpainted and original images. - Adversarial Loss
Encourages the generator to produce visually plausible results by evaluating high-level feature consistency. - Style Loss
Computes Gram matrix differences to preserve stylistic coherence between inpainted and original regions.
Experimental Validation
Dataset Construction
A dedicated wadang image dataset was curated, comprising 6,000 high-resolution images (256×256 pixels) categorized into three groups:
- Image-Type Wadang
Depicts intricate scenes, such as animals or mythological figures. - Pattern-Type Wadang
Features geometric or floral designs. - Text-Type Wadang
Contains inscriptions or calligraphic elements.
Additionally, 12,000 irregular masks (with 1%–60% hole ratios) were used to simulate varying damage conditions.
Comparative Experiments
RPConv-GAN was evaluated against state-of-the-art inpainting methods, including PConv, DeepFillv2, RFR, and AOT-GAN. Both qualitative and quantitative assessments were conducted.
Qualitative Results
• Image-Type Wadang
RPConv-GAN restored fine details (e.g., animal fur, facial features) more accurately than competitors, which often produced blurred or semantically inconsistent textures.
• Pattern/Text-Type Wadang
The proposed method preserved structural continuity in geometric patterns and stroke integrity in calligraphy, whereas other approaches introduced artifacts or broken edges.
Quantitative Metrics
• PSNR and SSIM
RPConv-GAN achieved superior scores, indicating better image quality and structural fidelity.
• FID and L1 Distance
Lower FID values confirmed that inpainted images were stylistically closer to originals, while reduced L1 distances reflected fewer pixel-level errors.
Ablation Study
Removing RPConv modules led to noticeable declines in edge clarity and texture coherence, underscoring their importance in feature interaction. The full model outperformed the ablated version by significant margins in all metrics.
Conclusion
This paper presented RPConv-GAN, a deep learning-based framework for high-fidelity wadang image inpainting. By leveraging recursive partial convolutions and feature fusion, the method effectively addresses challenges related to texture disorder and edge degradation. Experimental results demonstrated its superiority over existing techniques in both perceptual quality and objective metrics. Future work will expand the dataset to include more diverse wadang types and explore few-shot learning strategies to enhance generalization across different cultural heritage artifacts.
doi.org/10.19734/j.issn.1001-3695.2024.03.0086
Was this helpful?
0 / 0