Abstract With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as… Click to show full abstract
Abstract With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. High-dimensional big data mixes the typical features of both high-dimensional data and big data, which has also brought new problems and great challenges for processing and optimizing the high-dimensional big data. In this case, the storage structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File. Then, we propose a fuzzy cluster algorithm to differentiate the key dimension and non-key dimension of high-dimensional big data based on US-ELM, which can also gain the clusters of key dimension. After that, we propose the execution and API of HB-File based on the open source implementation of MapReduce, Hadoop system. With the intensive experiments, we show the effectiveness of HB-File in satisfying the storage of high-dimensional big data.
               
Click one of the above tabs to view related content.