For environmental sound classification, CNNs have become the most successful architecture. By regarding the CNN features as a collection of nodes arranged on a 2D time-frequency grid, typical CNN layers… Click to show full abstract
For environmental sound classification, CNNs have become the most successful architecture. By regarding the CNN features as a collection of nodes arranged on a 2D time-frequency grid, typical CNN layers process nodes within a limited local region. However, the rich relation information between nodes, especially the non-local relations, is mostly ignored. For environmental sound, these inter-node relations carry rich information about the existence of repetitive sound event patterns and the complex interactions between different sound events in acoustic scenes, which are valuable for categorizing environmental sound. In this letter, we propose a relation module, named the R-Block, to explore the relation information in an explicit and comprehensive way. The R-Block is designed to not only capture and utilize the inter-node relations, but also explore the structure of the learned relations, which leads to a more expressive representation. Experimental results reveal that, by augmenting a powerful ResNeXt backbone with the R-Block, our model is able to achieve competitive performance on ESC-50 and US8K sound event classification dataset and state-of-the-art result on DCASE2018 acoustics scene classification dataset.
               
Click one of the above tabs to view related content.