This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals.
Total least squares (TLS) algorithms are applied for the automatic extraction of the modeling parameters in the ESM, i.e. the amplitude, phase, frequency and damping factors of a user-defined number of damped sinusoids. In order
to turn the SNR optimization criterion of these TLS algorithms into a perceptual modeling strategy, we use the psychoacoustic model of MPEG-1 Layer 1 in a subband TLS-ESM scheme. This allows us to model each subband signal in accordance with its perceptual relevance, thereby lowering the number of required modeling components for a given modeling quality. Simulations and listening tests confirm that perceptual ESM achieves the same perceived quality as plain ESM while using substantially less components, and provide support for applying the new model in the fields of
parametric audio processing and coding.
Kris Hermus, Werner Verhelst, Philippe Lemmerling, Patrick Wambacq and Sabine Van Huffel, "Perceptual Audio Modeling with Exponentially Damped Sinusoids", Signal Processing, volume 85, No. 1, pages 163-176, January 2005.