Efficient Verifiable Aggregation Scheme Based on Linear Homomorphic Hash and Secret Sharing
Introduction
Federated learning has emerged as a prominent distributed machine learning framework, enabling multiple participants to collaboratively train a shared model without exposing their raw data. This approach significantly reduces privacy risks compared to centralized data collection methods. However, despite its advantages, federated learning faces critical challenges related to privacy protection and the integrity of aggregation results.
Existing research has demonstrated that model parameters, though not directly revealing raw training data, can still expose sensitive information through gradient inversion attacks or membership inference attacks. To mitigate these risks, various privacy-preserving aggregation techniques have been proposed, including differential privacy, secure multi-party computation, and homomorphic encryption. However, most of these methods assume that the aggregation server will honestly compute and return correct aggregation results. In practice, cloud-based aggregation servers may not always be trustworthy. They might intentionally return incorrect results to save computational resources or, worse, manipulate aggregation outcomes to extract private information from participants.
To address these concerns, verifiable federated learning schemes have been introduced, allowing participants to verify the correctness of aggregation results. However, existing solutions suffer from high communication overhead, inability to handle participant dropouts, or reduced verification efficiency when users exit during training. These limitations make them unsuitable for real-world scenarios where participants are resource-constrained devices like smartphones or IoT devices, which may frequently disconnect due to network or power issues.
This paper presents an Efficient Verifiable Aggregation scheme based on Linear Homomorphic Hash and Secret Sharing (EVA-LHHSS). The proposed solution ensures privacy-preserving verification with constant communication overhead, independent of model dimensions, while robustly handling participant dropouts without compromising verification efficiency.
Background and Related Work
Privacy Risks in Federated Learning
While federated learning prevents direct data sharing, studies have shown that model parameters can leak sensitive information. Gradient inversion attacks can reconstruct training data from gradients, and membership inference attacks can determine whether specific data points were used in training. These vulnerabilities necessitate robust privacy-preserving aggregation methods.
Privacy-Preserving Aggregation Techniques
Several approaches have been proposed to protect model privacy during aggregation:
• Secure aggregation protocols using cryptographic techniques
• Homomorphic encryption for encrypted model aggregation
• Differential privacy to prevent inference attacks
However, these methods typically assume honest aggregation servers, which may not hold in practice. Malicious or economically motivated servers could return incorrect results, undermining the entire federated learning process.
Verifiable Federated Learning
Existing verifiable schemes face several challenges:
- High Communication Overhead: Some schemes require verification information proportional to model dimensions, becoming impractical for large models.
- Dropout Intolerance: Many solutions cannot efficiently handle participant dropouts, requiring complete recomputation when users disconnect.
- Security Vulnerabilities: Certain schemes expose hash values that could enable brute-force attacks on sparse model parameters.
Recent improvements include schemes using linear homomorphic hashing for constant-size verification information and dual-aggregation approaches. However, these still struggle with dropout scenarios or require additional infrastructure like trusted execution environments or auxiliary servers.
System Design
Overview
EVA-LHHSS consists of three main entities:
- Trusted Authority (TA): Initializes cryptographic parameters and distributes keys before going offline.
- Participants: Resource-constrained devices that train local models and participate in verification.
- Aggregation Server: Coordinates model aggregation and assists in verification without accessing raw data.
The scheme operates in two phases: proof generation and aggregation verification, designed to maintain privacy while enabling efficient verification even with participant dropouts.
Core Cryptographic Components
-
Linear Homomorphic Hash: Enables efficient verification with constant-size outputs regardless of model dimensions. The hash function preserves additive properties, allowing verification of aggregated models through hash comparisons.
-
Secret Sharing: Based on Shamir’s threshold scheme, this allows participants to securely distribute verification materials. The scheme’s homomorphic properties enable efficient reconstruction even when some shares are missing due to dropouts.
-
Pedersen Commitments: Provide verifiable hiding of hash values, preventing servers from forging verification proofs. The commitments’ additive homomorphism supports efficient batch verification.
Proof Generation Phase
Each participant performs the following steps:
- Computes a linear homomorphic hash of their local model.
- Blinds the hash value using a random secret to prevent privacy leaks.
- Creates a Pedersen commitment to the hash value.
- Splits the blinding factors using secret sharing and distributes shares to other participants through the server.
This phase ensures that verification materials are prepared without exposing sensitive information, while the secret sharing mechanism provides dropout resilience.
Aggregation Verification Phase
After receiving the aggregated global model, participants:
- Combine received secret shares to help reconstruct aggregated blinding factors.
- Verify the server’s proof using the reconstructed values and commitments.
- Check consistency between the provided proof and their computed hash of the global model.
The scheme’s design ensures that verification remains efficient regardless of how many participants drop out, as long as a threshold number remain active.
Security Analysis
Correctness
The scheme guarantees that correctly aggregated models will pass verification. The homomorphic properties of the cryptographic components ensure that valid computations produce consistent verification proofs. Participants can detect any server-side manipulation of aggregation results through the verification process.
Reliability
A malicious server cannot forge acceptable verification proofs for incorrect aggregation results. The combination of Pedersen commitments and secret sharing ensures that:
- Servers cannot create valid proofs without proper participation from honest users.
- Any attempt to manipulate proofs will be detected during verification.
This property holds even if the server colludes with some participants, provided the number of colluding parties remains below the security threshold.
Privacy Protection
The scheme protects participant privacy against both curious servers and other participants:
- Blinded hash values prevent direct inference of model parameters.
- Secret sharing ensures that individual participants’ secrets cannot be reconstructed without sufficient shares.
- Commitments hide the actual hash values while allowing verification.
Even if some participants collude with the server, they cannot recover sensitive information about non-colluding participants’ models.
Performance Evaluation
Experimental Setup
The prototype implementation uses standard cryptographic primitives:
• NIST P-256 elliptic curve for commitments and hashing
• AES-128 for secure communication
• Shamir’s secret sharing with threshold set at half the participant count
Tests measured computation time and communication overhead under varying conditions of participant count, dropout rates, and model sizes.
Communication Efficiency
Key findings:
- Communication overhead remains constant with respect to model dimensions, making the scheme scalable to large models.
- Participant dropouts do not increase communication costs, unlike prior schemes where dropouts triggered additional message rounds.
- Per-participant communication grows linearly with the number of participants but remains significantly lower than comparable schemes.
Computation Efficiency
Notable results:
- Verification time scales linearly with model size due to hash computations but remains efficient for practical model dimensions.
- Server-side computation remains stable regardless of dropout rates, requiring only two secret reconstructions in all cases.
- The scheme outperforms alternatives like VeriFL, especially in high-dropout scenarios where traditional schemes require extensive recomputation.
Dropout Tolerance
The secret sharing mechanism provides robust dropout handling:
- Verification succeeds as long as a threshold number of participants remain active.
- Dropouts do not increase computation or communication overhead for remaining participants.
- The server’s workload remains constant regardless of dropout rates, preventing verification bottlenecks.
This makes the scheme particularly suitable for mobile and IoT environments where intermittent connectivity is common.
Conclusion
The EVA-LHHSS scheme addresses critical limitations in verifiable federated learning by combining linear homomorphic hashing with secret sharing techniques. Key advantages include:
- Constant communication overhead independent of model dimensions
- Robust privacy protection through cryptographic commitments and secret sharing
- Efficient verification that remains stable despite participant dropouts
- Strong security guarantees against malicious servers and curious participants
Experimental results demonstrate significant improvements over existing approaches, particularly in scenarios with frequent participant disconnections. The scheme’s efficiency and dropout tolerance make it particularly suitable for real-world deployments involving resource-constrained devices.
Future work could explore extensions to detect and mitigate poisoning attacks from malicious participants while maintaining the scheme’s verification properties. Additional optimizations may further reduce computation overhead for extremely large models or massive participant counts.
doi.org/10.19734/j.issn.1001-3695.2024.03.0213
Was this helpful?
0 / 0