Multiparty Probabilistic Threshold Private Set Intersection Against Malicious Adversaries

Multiparty Probabilistic Threshold Private Set Intersection Against Malicious Adversaries

Introduction

Privacy-preserving computation has become increasingly important in modern applications where multiple parties need to collaborate without revealing sensitive data. Private Set Intersection (PSI) is a fundamental cryptographic primitive that allows parties to compute the intersection of their private sets without exposing any additional information. While traditional PSI protocols reveal the exact intersection, Threshold Private Set Intersection (TPSI) introduces a conditional output—only revealing the intersection if its size exceeds a predefined threshold. This is particularly useful in scenarios like federated learning, ride-sharing, and medical research, where collaboration is only meaningful when sufficient common data exists.

However, deterministic TPSI protocols may be overly restrictive in real-world applications where approximate thresholds are acceptable. Probabilistic Threshold Private Set Intersection (pTPSI) addresses this limitation by introducing a probabilistic variant of TPSI, where the intersection is revealed with a probability proportional to the intersection size. This approach offers significant efficiency advantages, especially for large datasets, while maintaining strong privacy guarantees.

Despite recent advances in pTPSI, existing solutions primarily focus on semi-honest or two-party settings. Real-world applications often involve multiple parties and require security against malicious adversaries who may deviate arbitrarily from the protocol. This paper presents two novel multiparty pTPSI protocols that achieve security against malicious adversaries, addressing key challenges in efficiency, collusion resistance, and practical deployment.

Background and Related Work

Private Set Intersection and Threshold Variants

Standard PSI protocols enable two or more parties to compute the intersection of their private sets without revealing non-intersecting elements. Early work by Freedman et al. introduced the concept of TPSI, where the intersection is only revealed if its size exceeds a threshold. Subsequent research explored various cryptographic techniques for TPSI, including homomorphic encryption, polynomial interpolation, and oblivious transfer.

Ghosh and Simkin established communication complexity bounds for TPSI, while Badrinarayanan et al. extended these results to multiparty settings. Branco et al. proposed a multiparty TPSI protocol using threshold homomorphic encryption, but their solution remains theoretical without practical implementations.

Probabilistic TPSI

Liu et al. introduced pTPSI as a probabilistic relaxation of TPSI, where the intersection is revealed with a probability dependent on its size. This approach is particularly advantageous in large-scale applications like federated learning, where exact thresholds may be unnecessarily strict. Their work presented efficient two-party pTPSI protocols secure against malicious adversaries but left multiparty settings unexplored.

Security Models

Security against malicious adversaries is crucial for real-world deployments, where parties may behave arbitrarily. The Universal Composability (UC) framework provides a rigorous way to model security in such settings. Prior work on malicious-secure PSI includes solutions based on oblivious pseudorandom functions (OPRF) and zero-knowledge proofs, but extending these to multiparty pTPSI introduces new challenges in efficiency and collusion resistance.

Technical Preliminaries

Oblivious Key-Value Stores (OKVS)

OKVS is a cryptographic primitive that allows encoding key-value pairs into a data structure where keys can be queried without revealing the associated values. The correctness property ensures that decoding a key returns its correct value if present, while the obliviousness property guarantees that the structure reveals no information about absent keys. OKVS supports homomorphic operations, enabling efficient computations on encoded data.

Oblivious Programmable Pseudorandom Functions (OPPRF)

OPPRF extends oblivious pseudorandom functions (OPRF) by allowing the sender to program specific outputs for a set of points. This functionality is essential for constructing malicious-secure protocols where parties can enforce correctness conditions without revealing their inputs. OPPRF enables secure computation of set intersections while preventing malicious behavior.

Pairwise Independent Hashing

Pairwise independent hash functions ensure that the hash values of any two distinct inputs are statistically independent. This property is crucial for probabilistic set size testing, where the goal is to estimate the intersection size without revealing individual elements. The Goldwasser-Sipser (GS) protocol leverages pairwise independence to achieve efficient set size estimation with bounded error probabilities.

Zero-Sharing and ZeroXOR

Zero-sharing schemes distribute secret shares of zero among parties, enabling secure aggregation of private inputs. The ZeroXOR functionality allows parties to compute the XOR of their values conditioned on shared keys, which is useful for multiparty set operations. These primitives form the basis for collusion-resistant protocols where subsets of parties may attempt to learn unauthorized information.

Multiparty pTPSI Protocol Constructions

Protocol 1: Non-Colluding Multiparty pTPSI

The first protocol addresses scenarios where no collusion exists among participants. It leverages symmetric-key primitives for efficiency and consists of the following steps:

  1. Setup and Key Distribution: A designated party generates PRF keys and distributes them to other participants.
  2. OKVS Encoding: Each party encodes their set elements into an OKVS structure using the shared keys.
  3. Set Reduction: Parties reduce their sets by applying pairwise independent hashing, following the GS protocol.
  4. Threshold Testing: A two-party pTPSI subprotocol checks whether the estimated intersection size meets the threshold.
  5. Intersection Computation: If the threshold is satisfied, parties compute and reveal the intersection.

This protocol achieves linear communication complexity in the number of parties and set sizes. Security against malicious adversaries is ensured by the use of OKVS and OPRF, which prevent parties from deviating undetectably.

Protocol 2: Collusion-Resistant Multiparty pTPSI

The second protocol enhances security by resisting collusion among up to N-1 parties, provided two specific parties are not simultaneously corrupted. It builds on the first protocol with additional safeguards:

  1. Role Assignment: Parties are divided into clients, a hub, and servers to limit information exposure.
  2. ZeroXOR Integration: The ZeroXOR functionality ensures that only authorized parties can reconstruct intermediate values.
  3. OPPRF-Based Validation: Servers validate their inputs using OPPRF to detect and prevent malicious behavior.
  4. Threshold Testing with Zero-Sharing: The GS protocol is augmented with zero-sharing to ensure correctness even if some parties collude.

This protocol maintains practical efficiency while providing stronger security guarantees. The use of ZeroXOR and OPPRF ensures that colluding parties cannot learn unauthorized information unless they corrupt both the hub and a specific server.

Security Analysis

Both protocols are proven secure in the UC framework against static, malicious adversaries. The proofs involve constructing simulators that extract adversarial inputs while maintaining indistinguishability between real and ideal executions. Key security properties include:

• Input Privacy: No party learns anything beyond the intended output (the intersection or a threshold-based result).

• Correctness: The protocol correctly computes the intersection if and only if the threshold condition is met.

• Collusion Resistance: The second protocol withstands collusion among subsets of parties, provided specific conditions hold.

The security of Protocol 1 relies on the unforgeability of PRF outputs and the obliviousness of OKVS. Protocol 2 additionally leverages the secrecy of zero-sharing and the programmability of OPPRF to resist collusion.

Performance Evaluation

Experimental evaluations demonstrate the practicality of both protocols. Implementations in C++ using the MP-SPDZ library show:

• Protocol 1: With 8 parties and set sizes of 2^20, the protocol completes in approximately 24.59 seconds when the threshold is set to 0.5n.

• Protocol 2: Under the same conditions but with N/2 colluding parties, the runtime increases to around 40.00 seconds due to additional cryptographic checks.

Comparisons with prior work highlight significant efficiency gains. For example, the non-colluding protocol runs 220x faster than existing two-party TPSI solutions for set sizes of 2^11. The collusion-resistant variant remains competitive, offering 15x speedups over multiparty TPSI protocols for large datasets.

Applications

The proposed protocols enable privacy-preserving collaborations in numerous domains:

  1. Federated Learning: Multiple institutions can identify common users for joint model training without revealing non-overlapping data.
  2. Ride-Sharing: Users can find matching routes with strangers only if sufficient overlaps exist, preserving location privacy otherwise.
  3. Healthcare: Hospitals can collaboratively study patient cohorts while ensuring anonymity for non-shared records.
  4. Advertising: Ad networks can target users present in multiple datasets without exposing individual identities.

The probabilistic nature of pTPSI is particularly beneficial in large-scale settings where exact thresholds may be impractical or unnecessary.

Conclusion

This paper presents the first malicious-secure multiparty pTPSI protocols, addressing a critical gap in privacy-preserving set operations. The non-colluding variant offers exceptional efficiency for large datasets, while the collusion-resistant version provides stronger security guarantees. Both protocols leverage advanced cryptographic primitives like OKVS, OPPRF, and ZeroXOR to achieve practical performance without compromising security.

Future work could explore adaptive security, where adversaries dynamically corrupt parties during execution, and further optimizations for extremely large-scale deployments. Extending these techniques to other privacy-preserving computations, such as threshold private joins or secure aggregation, is another promising direction.

doi.org/10.19734/j.issn.1001-3695.2024.01.0109

Was this helpful?

0 / 0