In big data systems, the infrastructure is such that large amounts of data are hosted away from the users. In such a system information security is considered as a major… Click to show full abstract
In big data systems, the infrastructure is such that large amounts of data are hosted away from the users. In such a system information security is considered as a major challenge. From a customer perspective, one of the big risks in adopting big data systems is in trusting the provider who designs and owns the infrastructure from accessing user data. Yet there does not exist much in the literature on detection of insider attacks. In this work, we propose a new system architecture in which insider attacks can be detected by utilizing the replication of data on various nodes in the system. The proposed system uses a two-step attack detection algorithm and a secure communication protocol to analyze processes executing in the system. The first step involves the construction of control instruction sequences for each process in the system. The second step involves the matching of these instruction sequences among the replica nodes. Initial experiments on real-world hadoop and spark tests show that the proposed system needs to consider only 20 percent of the code to analyze a program and incurs 3.28 percent time overhead. The proposed security system can be implemented and built for any big data system due to its extrinsic workflow.
               
Click one of the above tabs to view related content.