Adaptive Secure Fuzzy Multi-Keyword Searchable Encryption Scheme Based on Blockchain
Introduction
With the rapid development of global informatization and data sharing, cloud computing has become increasingly popular due to its cost-effectiveness and flexibility in data outsourcing. However, cloud service providers, often semi-honest third parties, pose significant risks to the privacy of outsourced data. While encrypting data enhances privacy, it simultaneously hinders data searchability and sharing. Symmetric Searchable Encryption (SSE) provides an effective solution for keyword-based ciphertext searches. Traditional SSE schemes, however, face challenges in handling practical scenarios where users may input misspelled or similar keywords. Existing solutions for fuzzy single or multi-keyword searches often return irrelevant results or suffer from linear search complexity growth with increasing file volumes. Moreover, many current schemes are vulnerable to known-plaintext attacks (KPA) and cannot resist adaptively chosen-keyword attacks (IND-CKA2).
This paper presents an adaptive secure blockchain-based fuzzy multi-keyword searchable encryption scheme that addresses these limitations. The proposed solution combines local sensitive hashing with twin Bloom filters to achieve fuzzy keyword processing and constructs an index tree for sub-linear search efficiency. By integrating Merkle hash trees with adaptive multi-set accumulators, the scheme ensures the correctness and completeness of search results. Furthermore, the use of blockchain technology enables trusted third-party searches while reducing storage overhead through innovative optimization mechanisms.
Background and Related Work
Traditional SSE schemes have evolved to support fuzzy keyword searches to accommodate user input errors. Zhang et al. implemented fuzzy single-keyword search on encrypted data but often returned numerous irrelevant results. Wu et al. utilized Bloom filters and local sensitive hashing for fuzzy matching, while Wei et al. incorporated TF-IDF for result ranking. Liu et al. developed a wildcard-based fuzzy multi-keyword search supporting AND/OR semantics, though with linear search complexity growth. Chen and Zhong proposed two-stage index structures and balanced binary trees respectively to improve efficiency, yet security concerns remained.
Security vulnerabilities persist in existing solutions, particularly those using secure K-nearest neighbor (KNN) technology, which cannot resist KPA or IND-CKA2 attacks. Attribute-based encryption combined with searchable encryption has shown promise in resisting KPA but lacks support for fuzzy searches and predicate privacy. Liu et al. proposed an IND-CKA2 secure scheme limited to single-keyword searches. The challenge lies in designing searchable encryption that simultaneously achieves IND-CKA2 resistance, fuzzy multi-keyword support, and improved search efficiency.
Beyond security, ensuring search result correctness and completeness remains crucial. Semi-honest cloud providers might return false or incomplete results due to economic incentives, hardware errors, or cyberattacks. While RSA accumulators and message authentication codes (MAC) have been used for verification, they often incur significant computational overhead or only support plaintext retrieval. Shao et al. implemented efficient verifiable fuzzy multi-keyword searches using keyed-hash MAC and radix trees, but these still fall short of IND-CKA2 security.
Blockchain technology offers potential solutions through its distributed ledger properties, consensus mechanisms, and smart contract capabilities. Existing blockchain-based searchable encryption schemes either focus on specific domains like healthcare or use KNN technology vulnerable to KPA. Common limitations include reliance on data owners for continuous online availability, insecure key distribution channels, and significant storage burdens from blockchain’s immutable nature.
System Design
System Model and Entities
The proposed system comprises five key entities: Authorization Nodes (AN), Data Owners (DO), Data Requesters (DR), Consortium Blockchain (CB), and Cloud Servers (CS). ANs are elected through consensus to initialize system parameters and manage key distribution. DOs register with ANs, encrypt files for CS storage, and construct verified index trees for CB. DRs generate search trapdoors through smart contracts. CB stores index trees and executes search operations, while CS maintains encrypted files and returns results based on CB instructions.
The workflow begins with AN election and system initialization. Users register with ANs to obtain keys. DOs encrypt files for CS storage and build verified index trees for CB. DRs generate trapdoors for search requests deployed via smart contracts. CB executes searches, with successful results triggering CS to return corresponding ciphertexts. DRs verify result correctness and completeness before decryption.
Key Components and Algorithms
The scheme employs several innovative cryptographic constructs:
Twin Bloom Filter (TBF): An extension of traditional Bloom filters, TBF uses N twin bit arrays where each twin stores opposite bits. This structure simultaneously stores and masks keyword information through four algorithms: Setup initializes cryptographic hash functions, Init creates the bit array structure, Insert maps keywords to filter positions, and Check verifies keyword presence while maintaining privacy.
Adaptive Multi-Set Accumulator (MSA): This cryptographic primitive provides concise proofs for disjoint sets using bilinear pairings. KeyGen generates public-private key pairs, Accumulator creates set commitments, PDisjoint generates non-membership proofs, and VDisjoint verifies these proofs.
Index Tree Construction: The scheme builds a balanced binary index tree with file TBFs as leaf nodes. Non-leaf nodes aggregate child node information through union operations. The tree supports efficient sub-linear searches while maintaining security properties.
Graph-based Keyword Partitioning (GKP): This optimization algorithm minimizes shared keywords between subtrees by modeling files as graph vertices weighted by shared keyword counts. Through strategic merging and redistribution, GKP reduces search width and improves efficiency.
Detailed Scheme Description
Key Generation: ANs initialize system parameters including MSA key pairs, digital signature keys, and cryptographic hash functions. Public parameters and system keys are distributed to authorized users.
File Encryption: DOs encrypt files using AES and generate hash-based labels for ciphertext identification. The encrypted files with labels form the outsourced ciphertext set stored on CS.
Index Generation: The process involves multiple steps:
- Keyword fuzzy processing using LSH to map similar spellings to the same buckets
- Adaptive encryption with TBFs storing keyword bucket strings while hiding actual values
- Balanced binary index tree construction with verification mechanisms
- MSA and Merkle hash tree integration for result verification
Trapdoor Generation: DRs transform query keywords into bucket strings, encrypt them into trapdoors, and generate verification tags. These are deployed via smart contracts for search requests.
Search Execution: CB nodes verify request validity before searching the index tree. The search algorithm recursively traverses the tree, checking TBF positions for keyword matches. Successful searches trigger CS to return ciphertexts, with verification information provided for DR validation.
Result Verification: DRs use MSA proofs to verify non-returned files and reconstruct authentication paths to validate result integrity and correctness.
Security Model
The scheme achieves IND-CKA2 security against adaptive chosen-keyword attacks through careful cryptographic design. The security proof demonstrates computational indistinguishability between real and simulated system views. Additional properties include:
- Unforgeability: The combination of MSA and Merkle hash trees prevents forgery of search results and proofs
- Fairness: Blockchain consensus and smart contract execution ensure honest behavior from all parties
- Privacy: TBFs and cryptographic constructs hide actual keyword information while supporting fuzzy searches
Performance Analysis
Functional Comparison
Compared to existing solutions, the proposed scheme offers several advantages: • Supports fuzzy multi-keyword searches through LSH and TBF
• Achieves sub-linear search complexity via balanced index trees
• Provides result correctness and completeness verification
• Resists IND-CKA2 attacks through adaptive security design
• Integrates blockchain for trusted execution and key management
Storage Optimization
The scheme separates indexes from ciphertext data, storing only verified index trees on-chain. Each add operation stores (2m-1)×n/4 bytes for m files with n-bit TBFs. Search operations store q×(2k×HMAC_size + H1_size) bytes for q keywords. The global time-based optimization mechanism further reduces storage by pruning expired transactions from active verification.
Experimental comparisons show the optimized storage costs comparable to existing schemes while providing additional security features. For large-scale data, the solution proves most suitable for lightweight ciphertext searches where security outweighs storage concerns.
Experimental Evaluation
Implementation on Hyperledger Fabric with NSF research abstracts demonstrates: • Index construction time grows with file counts but remains practical
• Trapdoor generation scales linearly with keyword counts
• Search times show sub-linear complexity with effective GKP optimization
• Verification times increase moderately with file volumes
• Smart contract response times remain acceptable for security-sensitive applications
The GKP algorithm significantly reduces false-positive proofs, particularly for larger file sets. While verification adds overhead, the millisecond-level costs prove reasonable for applications prioritizing security over real-time performance.
Conclusion
This paper presents an adaptive secure blockchain-based fuzzy multi-keyword searchable encryption scheme that addresses critical limitations in existing solutions. By combining LSH, TBFs, and optimized index structures, the solution achieves efficient fuzzy searches with verification capabilities. Blockchain integration enables trusted execution while innovative storage mechanisms reduce overhead.
The scheme proves particularly suitable for security-sensitive domains like healthcare and government systems where real-time performance is secondary to data protection. Future work will focus on further enhancing search efficiency to broaden applicability while maintaining the robust security properties demonstrated in this design.
doi.org/10.19734/j.issn.1001-3695.2024.03.0107
Was this helpful?
0 / 0