"Sound Texture Generative Model Guided by a Lossless Mel-Frequency Convolutional Neural Network"

Rainstorms, insect swarms, and galloping horses produce “sound textures,” which are the resulting natural sounds of many similar acoustic events. With new achievements emerging regularly for generative models, the deep convolutional neural network (CNN) has proven to be a tremendously successful approach for image and sound synthesis. Existing state-of-the-art sound texture generative models simply treat sound texture signals as 1-D images while discarding the difference between the human vision and auditory systems. This paper considers mel-frequency statistical features, which are designed according to the human auditory system and have been viewed as the dominant features for sound identification. We first construct a CNN structure for extracting mel-frequency features from sounds losslessly. This structure is called mel-frequency CNN (MF-CNN). Next, we investigate a novel sound texture generative model by incorporating the MF-CNN into a convolutional generative network composed of cascading upsampling groups. A jointly alternating back propagation algorithm is proposed to train the overall network. The feedback of the MF-CNN is used to advise the gradients in the inferential and learning back propagation to make the mel-frequency features of the synthesized sounds more similar to the natural ones. Moreover, the proposed generative model can also be extended to other sound synthesis tasks.

Keywords: mel frequency; network; frequency; sound texture; texture generative

Journal Title: IEEE Access
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended