LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Photo by historyhd from unsplash

In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation… Click to show full abstract

In this paper, we propose a variable-bit-rate speech codec-based on mixed excitation linear prediction enhanced (MELPe) with an average bit rate of 2 kbps and with a better representation of excitation signal. The order of the prediction filter in MELPe coding architecture is reduced from 10 to 7 without affecting the perceptual quality of the decoded speech by using psychoacoustic Mel scale. An efficient two-split vector quantization is developed with weighted Euclidean distance measure for Mel scale-based linear predictive coding (Mel-LPC), and it requires only 18 bits/frame. The instantaneous pitch or epoch that is vital for many speech processing applications is preserved in this codec by including it in the excitation signal used for reconstructing the voiced speech. The quantization scheme developed for glottal closure instants (GCIs) causes an increase in the bit requirement for voiced frames by 4–25 bits depending on the position of GCIs. To compensate for that, the Mel-LPC order for both silence and unvoiced frames has been brought down to 4 without compromising the perceptual quality of reconstructed speech. The lowered bit budget for unvoiced frame is 41 bits/frame, and for silence, it is 31 bits/frame. Further reduction of 10 bits for silence frame is obtained by reducing the number of transmitted parameters and by tuning the quantization bit requirement for each. For categorizing the speech frames at the entry of the encoder, a neural network-based voiced/unvoiced/silence classification algorithm using five-dimensional feature set is created. The experimental results show that the proposed coding scheme operates at an average bit rate of 2 kbps, which is less than the bit rate of MELPe (2.4 kbps), but with a better perceptual score. In addition to all these, the incorporation of Mel-LPC gives a better performance in the estimation of formants and GCIs.

Keywords: speech; excitation signal; bit rate; bit

Journal Title: Arabian Journal for Science and Engineering
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.