A Comprehensive Review of Neural Architecture Search Technology

A Comprehensive Review of Neural Architecture Search Technology

Neural Architecture Search (NAS) has emerged as a groundbreaking approach in automated machine learning, aiming to reduce the heavy reliance on expert knowledge and manual effort in designing neural network architectures. This technology automatically discovers optimal architectures for specific tasks through systematic optimization processes, potentially outperforming human-designed models while significantly reducing design time and resource consumption.

Introduction to Neural Architecture Search

At its core, NAS addresses the fundamental challenge of designing neural network architectures, which traditionally requires extensive expertise and trial-and-error experimentation. The architecture of a neural network profoundly impacts its performance, encompassing critical decisions about network depth (number of layers), width (number of channels), connection patterns between layers (such as skip connections or dense connections), and the types of operations used in each layer (convolution, pooling, etc.).

Classical models like GoogleNet, ResNet, MobileNet, and GhostNet were all painstakingly designed by human experts through this manual process. However, this approach has several limitations: it’s time-consuming, potentially constrained by designers’ cognitive biases, and often requires redesign when applied to new tasks. NAS offers a solution by framing architecture design as an optimization problem that can be systematically solved using various search algorithms.

The typical NAS framework consists of three main components: search space, search strategy, and evaluation strategy. The search space defines the range of possible architectures that can be generated; the search strategy determines how architectures are explored within this space; and the evaluation strategy assesses the performance of discovered architectures to guide the search process. These components work in an iterative loop until satisfactory architectures are found.

Early NAS breakthroughs, such as Google’s NAS-RL in 2016, demonstrated that automatically discovered architectures could surpass human-designed ones on benchmarks like CIFAR-10. However, these pioneering methods came with exorbitant computational costs – NAS-RL required 22,400 GPU days to complete its search. This highlighted the central challenge in NAS development: reducing the enormous time and resource requirements while maintaining or improving search quality.

Core Challenges in Neural Architecture Search

The computational intensity of early NAS approaches stems from three fundamental issues that correspond to the three main components of NAS systems:

  1. Excessive Search Space: Early methods searched for complete end-to-end architectures where each layer’s parameters (kernel size, stride, channel count, etc.) were independently selectable. This created a combinatorial explosion of possibilities as network depth increased. For instance, a simple 10-layer network with just 5 options per layer would have 5^10 possible configurations.

  2. Lengthy Architecture Generation Process: Reinforcement learning-based approaches relied on trial-and-error exploration, while evolutionary algorithms required maintaining and evolving large populations of candidate architectures. Both strategies involved substantial overhead in generating new architectures to evaluate.

  3. Prolonged Evaluation Time: Each candidate architecture typically required full training from scratch to convergence before its performance could be assessed, making the evaluation phase extremely time-consuming.

These challenges have guided subsequent NAS research, with most advancements focusing on addressing one or more of these bottlenecks while maintaining search quality.

Reducing the Search Space

A significant breakthrough in NAS came with the realization that effective architectures often consist of repeated modular building blocks. This insight led to methods that search for optimal cell or block structures rather than complete architectures, dramatically shrinking the search space.

The NASNet algorithm pioneered this approach by searching for two types of cells: normal cells that preserve feature map dimensions and reduction cells that perform downsampling. These discovered cells could then be stacked to form complete networks. This strategy not only reduced search complexity but also enabled architecture transferability – cells discovered on smaller datasets like CIFAR-10 could be effectively applied to larger datasets like ImageNet.

Following this, BlockQNN adapted earlier reinforcement learning approaches to search for block structures, while ENAS introduced parameter sharing between architectures through a directed acyclic graph representation. These innovations achieved remarkable efficiency gains – ENAS completed searches in just 0.45 GPU days compared to thousands for earlier methods.

The cell-based approach has several advantages: • It confines the search to smaller, more manageable components

• Discovered cells often generalize well across different tasks and datasets

• Searching on smaller proxy datasets reduces overall computational costs

However, this strategy also imposes limitations by fixing the overall network structure and repeating identical blocks, potentially constraining architectural diversity. Some recent approaches like NSGA-Net and AE-CNN have attempted to overcome this by incorporating more flexible block compositions or combining different block types during search.

Accelerating Architecture Generation

Beyond shrinking the search space, significant progress has been made in developing more efficient search strategies. These approaches generally fall into two categories: improved evolutionary methods and gradient-based techniques.

Enhanced Evolutionary Approaches

Evolutionary algorithms offer several advantages for NAS, including insensitivity to local minima and not requiring gradient information. Recent advancements have focused on incorporating computational constraints directly into the search objectives and refining evolutionary operations.

LEMONADE introduced a multi-objective approach that considers both model performance and resource consumption, using cheap-to-compute metrics like parameter count and FLOPs to pre-screen candidates. NSGA-Net similarly optimized for both accuracy and computational complexity while imposing practical constraints on channel counts and operation positions.

Improvements to core evolutionary operations have also boosted performance. AmoebaNet’s modified tournament selection preserves top-performing architectures while favoring younger candidates to maintain diversity. AG-ENAS adapts genetic algorithm parameters based on population diversity and uses historical information to guide mutations, combining this with age-based selection to prevent premature convergence.

Gradient-Based Search Methods

A transformative development in NAS was the introduction of differentiable architecture search, which reformulated the discrete search space as continuous and optimizable via gradient descent.

DARTS (Differentiable Architecture Search) was the first major gradient-based approach, relaxing the discrete choice of operations between nodes to continuous mixtures that could be optimized jointly with network weights. This allowed efficient search using standard backpropagation, completing searches in just 1.5 GPU days.

Subsequent innovations built on this foundation: • SNAS used stochastic relaxation to enable end-to-end training

• ProxylessNAS employed binarization to enable direct search on target tasks

• PC-DARTS reduced memory usage through partial channel sampling

• SGAS introduced more stable architecture ranking criteria

These methods collectively addressed several limitations: • Memory efficiency through techniques like partial connections

• Architecture stability via operation importance metrics

• Depth consistency between search and evaluation phases

• Hardware-aware optimization for deployment constraints

While gradient-based methods generally outperform evolutionary approaches in both speed and final architecture quality, they come with their own challenges: • High memory requirements from simultaneous architecture and weight optimization

• Potential instability in operation selection due to initialization effects

• Possible bias in architecture rankings

Evolutionary methods retain advantages in certain scenarios: • Simpler implementation and easier extension

• Applicability to non-differentiable or gradient-free contexts

• Natural support for multi-objective optimization

Streamlining Architecture Evaluation

The third major direction in NAS improvement focuses on reducing the substantial time required to evaluate candidate architectures. Approaches here fall into two categories: incomplete training methods and model-based estimation.

Incomplete Training Methods

These techniques aim to approximate final architecture performance without full training: • Low-fidelity methods use proxies like reduced datasets, lower-resolution images, or shallower networks to accelerate individual evaluations

• Early stopping terminates training based on intermediate performance or learning curve predictions

While effective at reducing computation, these approximations can introduce ranking errors if the proxy metrics don’t correlate well with final performance. Some approaches like EF-ENAS attempt to correct for this by incorporating multiple performance indicators.

Model-Based Estimation

More sophisticated approaches attempt to predict architecture performance without direct training:

  1. One-shot methods train a single supernetwork encompassing all possible operations and architectures, with sub-networks inheriting shared weights for rapid evaluation. While efficient, these methods can suffer from operation coupling and ranking inaccuracies.

  2. Performance predictors learn to estimate architecture quality from structural features, treating this as a regression problem. Challenges include designing effective architecture encodings and obtaining sufficient training data. Recent innovations include: • Peephole’s LSTM-based encoder for sequential architectures

    • E2EPP’s random forest approach requiring less training data

    • HAAP’s data augmentation through operation reordering

    • Graph-based architecture generation for broader applicability

Performance prediction remains an active area with open questions about how to maximize accuracy with limited training examples and develop more comprehensive architecture representations.

Future Directions in NAS Research

While NAS has made remarkable progress, several important challenges and opportunities remain:

  1. Computational Efficiency: Despite improvements, search costs remain prohibitive for many applications. Further optimizations in memory usage, parallelization, and algorithmic efficiency could broaden accessibility.

  2. Application Expansion: Most NAS research focuses on image classification. Extending these techniques to other domains – including generative models, recurrent networks, multi-task learning, and transformers – represents a significant opportunity. Particularly promising is applying NAS to time-series data like physiological signals for healthcare applications.

  3. Hyperparameter Integration: Current NAS typically focuses on architectural parameters while ignoring other critical design choices like learning rates and regularization. Developing unified optimization frameworks could yield further improvements.

  4. Search Space Design: While cell-based approaches improved efficiency, they may limit architectural creativity. Developing more flexible yet tractable search spaces remains an important challenge, potentially through dynamic block composition or hierarchical search strategies.

  5. Multi-Objective Optimization: Practical deployment often requires balancing accuracy with factors like model size, latency, and energy consumption. Enhancing NAS to natively handle these trade-offs could better serve real-world needs.

  6. Robustness Considerations: As neural networks deploy in safety-critical applications, ensuring their reliability against adversarial attacks and input variations becomes crucial. Incorporating robustness metrics into NAS could produce more dependable models.

  7. Integration with Large Models: The rise of foundation models presents both challenges and opportunities for NAS. Techniques that can efficiently search architecture components at scale or optimize pre-trained models for specific tasks could significantly impact the field.

Conclusion

Neural Architecture Search has evolved from a computationally prohibitive curiosity to a practical approach for automated model design. The field’s progress has been driven by systematic efforts to address three fundamental challenges: reducing the search space through modular design, accelerating architecture generation via gradient-based and evolutionary methods, and streamlining evaluation through approximation and prediction.

The most effective current approaches combine insights from these directions – using cell-based search spaces optimized with differentiable techniques and evaluated through shared weights or performance predictors. However, no single method dominates all scenarios, with different approaches offering distinct trade-offs in terms of flexibility, efficiency, and final architecture quality.

Looking ahead, NAS stands poised to transform how neural networks are designed, potentially making state-of-the-art models accessible to non-experts and discovering novel architectures beyond human imagination. Realizing this potential will require continued innovation to make searches more efficient, broaden their applicability, and integrate practical deployment constraints. As these challenges are addressed, NAS may well become a standard tool in the machine learning workflow, much like automated hyperparameter tuning is today.

doi.org/10.19734/j.issn.1001-3695.2024.05.0172

Was this helpful?

0 / 0