Detecting Co-occurring Android-Specific Code Smells Using Static Program Analysis and Ensemble Learning

Introduction

Code smells are indicators of structural issues in software systems that can lead to increased complexity, reduced maintainability, and higher technical debt. While traditional object-oriented (OO) code smells have been extensively studied, Android-specific code smells present unique challenges due to the platform’s hardware constraints, event-driven architecture, and specialized APIs. Among these, the co-occurrence of multiple code smells in Android applications exacerbates maintenance difficulties and raises the likelihood of bugs.

This paper addresses the detection of co-occurring Android-specific code smells, focusing on two prevalent and harmful smells: Member-Ignoring Method (MIM) and No Low Memory Resolver (NLMR). MIM occurs when a method within a class does not utilize any of the class’s member variables, leading to increased energy consumption and reduced maintainability. NLMR arises when an Activity class fails to implement the onLowMemory() method, risking abrupt termination under low-memory conditions. The coexistence of these smells can compound their negative effects, making their detection crucial for improving software quality.

Existing tools like DAAP and aDoctor detect individual Android code smells but struggle with identifying co-occurrences. Traditional static analysis methods rely on heuristic rules and thresholds, which introduce subjectivity and inconsistency. Machine learning (ML) offers a promising alternative, but single ML models often lack generalization capabilities. To overcome these limitations, this work proposes a hybrid approach combining static program analysis with ensemble learning to detect MIM and NLMR co-occurrences effectively.

Background and Related Work

Android-Specific Code Smells

Android applications exhibit unique code smells due to platform-specific constraints. Reimann et al. first identified these smells, noting their impact on performance, stability, and energy efficiency. Unlike OO smells, Android smells often involve interactions with system APIs, lifecycle management, and resource handling. For instance, MIM and NLMR are particularly problematic because they affect memory management and energy consumption—critical concerns in mobile environments.

Code Smell Co-occurrence

Research shows that co-occurring smells are more detrimental than isolated ones. Studies by Palomba et al. reveal that classes affected by multiple smells are significantly more prone to faults and require more frequent modifications. In Android applications, three types of co-occurrences are prevalent:

  1. Android-specific smell co-occurrences (e.g., MIM and NLMR).
  2. Hybrid co-occurrences (Android and OO smells).
  3. OO smell co-occurrences.

Among these, MIM and NLMR exhibit strong statistical associations, making them ideal candidates for studying co-occurrence detection.

Detection Methods

Existing detection approaches fall into two categories:

  1. Static Program Analysis: Tools like DAAP and aDoctor use rule-based techniques to identify smells. While DAAP excels in detecting individual smells, it cannot handle co-occurrences. aDoctor supports co-occurrence detection but suffers from low accuracy and lacks open-source availability.
  2. Machine Learning: Prior work has applied ML to OO smells, but Android-specific smells remain underexplored. Traditional ML models (e.g., Random Forests, Decision Trees) and deep learning models (e.g., CNNs, RNNs) have shown promise, but no single model performs optimally across all smell types.

Methodology

The proposed method integrates static analysis with ensemble learning to detect MIM and NLMR co-occurrences. The workflow consists of four stages:

  1. Static Analysis for Co-occurrence Detection

A tool named ASSD (Android-Specific Smell Detector) extends DAAP’s capabilities to detect co-occurrences. ASSD parses Java source code into abstract syntax trees (ASTs) and applies detection rules to identify MIM and NLMR instances. Key innovations include:

  • Nested Hash Tables: A two-level hash structure maps classes to their detected smells, enabling efficient co-occurrence tracking.
  • Automated Sample Generation: ASSD separates code segments into positive (co-occurring smells) and negative (non-smelly) samples, facilitating supervised learning.
  1. Dataset Preparation

The dataset comprises 70 open-source Android apps from GitHub, selected based on size, popularity (star ratings), and recent activity. Tokenization converts source code into integer vectors, which are then trimmed to 88 features per sample to standardize input dimensions. To address class imbalance, undersampling balances positive and negative samples for traditional ML models, while oversampling augments data for deep learning.

  1. Ensemble Learning Models

Two ensemble strategies are explored:

  1. Traditional ML Ensemble: A soft-voting classifier combines predictions from Random Forest (RF), Extremely Randomized Trees (ERT), and Histogram-Based Gradient Boosting (HBGB). Each model’s vote is weighted by its validation performance.
  2. Deep Learning Ensemble: A soft-voting classifier integrates improved CNN and RNN architectures. The CNN includes embedding, convolutional, and pooling layers, while the RNN uses LSTM units to capture sequential dependencies.
  3. Evaluation Metrics

Performance is assessed using precision, recall, and F1-score. The models are tested on 13 apps not included in training, with human validation ensuring ground-truth labels for comparison.

Experimental Results

RQ1: Effectiveness of Ensemble Learning

The soft-voting ensemble outperformed individual ML models, achieving an F1-score of 97.7%. RF, the best single model, scored 93.3%, demonstrating the ensemble’s superior generalization.

RQ2: Comparison with Static Analysis

The ensemble surpassed aDoctor’s F1-score (71.6%) by 26.1 percentage points, highlighting ML’s advantage in handling complex co-occurrences.

RQ3: Traditional ML vs. Deep Learning

Traditional ML models (F1: 97.7%) outperformed deep learning (CNN: 82.5%, RNN: 84.9%), likely due to the latter’s higher data requirements and overfitting tendencies.

RQ4: Ensemble Strategies

The traditional ML ensemble (F1: 97.7%) outperformed the deep learning ensemble (F1: 91.6%), reinforcing the suitability of simpler models for this task.

RQ5: Computational Efficiency

Static analysis consumed ~308 minutes for dataset generation, while model training and testing were efficient (traditional ML: 13.7s training, 0.17s testing; deep learning: 407.7s training, 2.01s testing).

Discussion

Practical Implications

Detecting MIM and NLMR co-occurrences helps prioritize refactoring efforts, as these smells significantly impact memory and energy efficiency. The proposed method’s high accuracy makes it viable for integration into development pipelines.

Generalizability

While the model excels for MIM and NLMR, its applicability to other smells requires further study. Future work will explore additional smell pairs and hybrid co-occurrences.

Limitations

  • The dataset is limited to Java-based Android apps.
  • Deep learning underperformed due to data scarcity, suggesting a need for larger datasets or transfer learning.

Conclusion

This paper presented a hybrid approach for detecting co-occurring Android-specific code smells, combining static analysis with ensemble learning. The method demonstrated superior accuracy over existing tools and single ML models, with traditional ML ensembles proving particularly effective. Future directions include expanding the range of detectable smells and optimizing deep learning performance. By automating co-occurrence detection, this work contributes to improving Android software quality and reducing technical debt.

DOI: 10.19734/j.issn.1001-3695.2024.09.0331

Was this helpful?

0 / 0