This letter extends our prior work on context-dependent piano transcription to estimate the length of the notes in addition to their pitch and onset. This approach employs convolutional sparse coding… Click to show full abstract
This letter extends our prior work on context-dependent piano transcription to estimate the length of the notes in addition to their pitch and onset. This approach employs convolutional sparse coding along with lateral inhibition constraints to approximate a musical signal as the sum of piano note waveforms (dictionary elements) convolved with their temporal activations. The waveforms are pre-recorded for the specific piano to be transcribed in the specific environment. A dictionary containing multiple waveforms per pitch is generated by truncating a long waveform for each pitch to different lengths. During transcription, the dictionary elements are fixed and their temporal activations are estimated and postprocessed to obtain the pitch, onset, and note length estimation. A sparsity penalty promotes globally sparse activations of the dictionary elements, and a lateral inhibition term penalizes concurrent activations of different waveforms corresponding to the same pitch within a temporal neighborhood, to achieve note length estimation. Experiments on the MIDI aligned piano sounds dataset show that the proposed approach significantly outperforms a state-of-the-art music transcription method trained in the same context-dependent setting in transcription accuracy.
               
Click one of the above tabs to view related content.