Abstract The urban sound classification has a strong relation with feature extraction. In this paper, we present a compact and effective representation capable of characterizing different urban sounds based on… Click to show full abstract
Abstract The urban sound classification has a strong relation with feature extraction. In this paper, we present a compact and effective representation capable of characterizing different urban sounds based on deep and handcrafted features combination. To this end, we propose a small parameter space CNN model to extract deep features that are combined with handcrafted features extracted from audio signals. Then, we apply a feature selection step to reduce feature dimensionality and to investigate handcrafted features that enrich deep features to better discriminate between urban sounds. The feature selection experiment results indicate that associating perceptual, static, and physical features with deep features improves the classification performance and allows a dimension reduction up to 62.32% for the combined descriptors. The proposed descriptors achieve a classification accuracy of 86.2% for the ESC (urban noises) dataset and 96.16% for the UrbanSound8K dataset, outperforming most of the state-of-the-art CNN models for urban sound classification.
               
Click one of the above tabs to view related content.