Abstract Structured data of all sensors and actuators are a requirement for decisions about control strategies and efficiency optimization in Building Automation. In practice, the analysis of data is a… Click to show full abstract
Abstract Structured data of all sensors and actuators are a requirement for decisions about control strategies and efficiency optimization in Building Automation. In practice, the analysis of data is a challenging and time-consuming task. In previous work, it has been demonstrated that classification algorithms may reach high classification accuracies when applied to building data. However, supervised algorithms require labelled training data sets and a predefined classes, and depend highly on the selection of input features. In this paper, we investigate how unsupervised machine learning techniques can be used to tackle both the problem of classification of time series as well as the problem of feature selection. We present a selection of the most promising algorithms and apply them on data extracted from the E.ON Energy Research Center. We then investigate the use of an unsupervised feature extraction compared to the statistical features used in previous literature by comparing the results of the classification on different data sets. Our investigations show that the unsupervised methods we apply to not find data clusters that represent the pre-defined class labels. They, however, are able to find groups of similar data points, showing that clustering is in general possible and that the time series have distinguishable properties. We also see a more robust performance of the classification algorithms when unsupervised feature extraction is used. The results of this paper show that unsupervised machine learning algorithms cannot generally mitigate the issue of missing training data. However, they can improve supervised classification by providing a more robust set of features compared to manual selection. From the clusters that where found we can derive insights about the properties of the time series, that allow us to make a better assessment which information that can be extracted using data-driven algorithms.
               
Click one of the above tabs to view related content.