In big data analytics, Jaccard similarity is a widely used block for scalable similarity computation. It is broadly applied in the Internet of Things (IoT) applications, such as credit system,… Click to show full abstract
In big data analytics, Jaccard similarity is a widely used block for scalable similarity computation. It is broadly applied in the Internet of Things (IoT) applications, such as credit system, social networking, epidemic tracking, and so on. However, with the increasing privacy concerns of user’s sensitive data for IoT, it is intensively desirable and necessary to investigate privacy-preserving Jaccard similarity computing over two users’ datasets. To boost the efficiency and enhance the security, we propose two methods to measure Jaccard similarity over private sets of two users under the assistance of an untrusted cloud server in this paper. Concretely, by leveraging an effective Min-Hash algorithm on encrypted datasets, our protocols output an approximate similarity, which is very close to the exact value, without leaking any additional privacy to the cloud. Our first solution is under a semihonest cloud server, and our enhanced solution introduced the consistency-check mechanism to achieve verifiability in malicious model. For efficiency, the first solution only need about 6 minutes for billion-element sets. Furthermore, as far as we know, the consistency-check mechanism is proposed for the first time to achieve an effective verifiable approximate similarity computation.
               
Click one of the above tabs to view related content.