In this paper we propose a dilated convolutional model for music melody extraction. Taking variable-q transforms (VQTs) as inputs, it first uses consecutive layers of convolution to capture local temporal-frequency… Click to show full abstract
In this paper we propose a dilated convolutional model for music melody extraction. Taking variable-q transforms (VQTs) as inputs, it first uses consecutive layers of convolution to capture local temporal-frequency patterns, and then a single layer of dilated convolution to capture global frequency patterns contributed by the pitches and harmonics of active notes. Compared with the contrast model without dilation, the proposed model can remarkably cut down the computational cost, and at the same time does not compromise the performance. Its advantages over existing models are two fold. First, it performs best on most datasets, for both general and vocal melody extraction. Second, it can achieve the best performance with least training data.
               
Click one of the above tabs to view related content.