LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Technical perspective: Evaluating sampled metrics is challenging

Photo by andersjilden from unsplash

dure multiple times. Moreover, the paper shows that relatively large samples are needed to obtain consistent results from sampled metrics. In fact, the paper shows that when 1/3 of the… Click to show full abstract

dure multiple times. Moreover, the paper shows that relatively large samples are needed to obtain consistent results from sampled metrics. In fact, the paper shows that when 1/3 of the whole catalogue is used as a sample, sampled metrics are consistent with the exact metrics. Unfortunately, in this case the speed up from sampling is limited. What is the source of the inconsistency and bias in the sampled metrics? As the authors show, they stem from a simple fact: by using a sample of the irrelevant items, the rank of a relevant item is an underestimate of its exact rank, obtained when all the irrelevant items are considered. Since the error in the estimate can be quantified, it can then be corrected, and another main result of the paper is showing that even a simple correction is able to resolve most of the mistakes of the uncorrected sampled metrics. Therefore, while, as suggested by the authors, samplingbased approaches should be avoided in evaluations whenever possible, they can still be employed by using a properly designed correction. One of the most important takeaways from the paper is clear: when sampling is used to estimate a quantity, understanding, and analyzing the impact of the sampling procedure is crucial. This is a more general message than it may seem at first sight. In several applications one can rarely assume that the data at hand represent the whole system, or population, or process, under study, and most commonly the data is only a sample of the system/population/process. Understanding the impact of sampling procedures on the results of algorithms, and how to properly account for them in the computation, is of paramount importance to draw reliable and robust answers from data.

Keywords: perspective evaluating; metrics challenging; technical perspective; sampled metrics; paper; evaluating sampled

Journal Title: Communications of the ACM
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.