Estimating the number of distinct elements is required in many computing applications. For example, we may want to estimate the number of unique users accessing a service, or the number… Click to show full abstract
Estimating the number of distinct elements is required in many computing applications. For example, we may want to estimate the number of unique users accessing a service, or the number of different words in a text. This problem is known as count distinct or cardinality estimate and has been widely studied. An accurate naïve approach is to keep a list of distinct elements and for each upcoming one check such list and add it when it is not found. However, this is not practical for large cardinalities because it would require storing a large number of elements. Over the years, many algorithms have been proposed to provide cardinality estimates without storing elements and using less memory. One of the state-of-the-art algorithms for cardinality estimate is the HyperLogLog (HLL); it provides a good estimate over a large range of cardinality values using a small array of counters. As HLL is implemented in computing systems, it is exposed to soft errors that can corrupt bits stored in memories or registers. To avoid data corruption, memories are commonly protected with Error Correction Codes (ECCs). ECCs however incur in significant overhead because protection needs additional memory cells per word to store the parity check bits as well as additional computation for checking them. For achieving data protection, an alternative approach that can be of low-overhead exploits features of the implemented algorithms to detect and correct errors. In other scenarios, the algorithm must be executed on a system that is not protected with parity or ECCs and thus protection must be implemented at the algorithmic level. In this article, we first study the impact of soft errors on the HLL algorithm by performing simulation by error injection. The results show that the algorithm is quite robust and can filter out most errors. However, for large cardinalities, there are some errors that can cause a large discrepancy in the HLL estimate. This occurs when the uppermost bit with a value of one in a counter is flipped to a zero. Based on the analysis of the experimental results and the HLL algorithm, a protection technique is proposed that effectively mitigates the impact of soft errors at a small overhead. The proposed Remove Minimum (RM) scheme has been validated by error injection experiments to show that it effectively protects against soft errors on the HLL counters.
               
Click one of the above tabs to view related content.