LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Sufficiently Informative and Relevant Features: An Information-Theoretic and Fourier-Based Characterization

Photo from wikipedia

A fundamental challenge in learning is the presence of nonlinear redundancies and dependencies in the data. To address this, we propose a Fourier-based approach to characterize feature redundancies, in unsupervised… Click to show full abstract

A fundamental challenge in learning is the presence of nonlinear redundancies and dependencies in the data. To address this, we propose a Fourier-based approach to characterize feature redundancies, in unsupervised learning, and feature-label dependencies, in the supervised variant of the problem. We first develop a novel Fourier expansion for functions (more generally stochastic mappings) of correlated binary random variables. This is a generalization of the standard Fourier expansion on the Boolean cube beyond product probability spaces. As an important application of this analysis, we investigate learning with feature subset selection. In the unsupervised variant of this problem, we characterize feature redundancies via the Shannon entropy and group the features into sufficiently informative and redundant. Then, we make a connection to the proposed Fourier expansion and derive an upper bound on the joint entropy. Based on that, we propose a measure to quantify feature redundancies and present an unsupervised learning algorithm. We test our method on various real-world and synthetic datasets and demonstrate improvements on conventional unsupervised feature selection techniques. Then, we investigate the supervised feature subset selection and reformulate it in the Fourier domain. Bridging the Bayesian error rate with the Fourier coefficients, we demonstrate that the Fourier expansion provides a powerful tool to characterize nonlinear feature-label dependencies. Further, we introduce a computationally efficient measure for selecting relevant features. Via a theoretical analysis, we show that our proposed measure finds provably asymptotically optimal feature subsets. Lastly, we present an algorithm based on this measure and via numerical experiments demonstrate its improvements on various supervised feature selection algorithms.

Keywords: fourier based; sufficiently informative; fourier; feature; fourier expansion; relevant features

Journal Title: IEEE Transactions on Information Theory
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.