This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$ . In particular, the size… Click to show full abstract
This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$ . In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and it is close to optimal for the class of all i.i.d distributions on strings of a given length. The method also can be used to code and predict strings with a condition on the tail of the ordered counts, and it can be applied to distributions in an envelope class. Moreover, we show that our model permits exact computation of the minimax optimal code, for all alphabet sizes, when conditioning on the size of the sample.
               
Click one of the above tabs to view related content.