European archives of oto-rhino-laryngology vol:261 issue:10 pages:541-547
This paper describes our first attempts to develop a method for the objective assessment of quality in substitution voices. The objective analysis deals with acoustic parameters characterising short voice and speech samples like a sequence of isolated vowels, a sequence of VCV and CVCVCV syllables, a short sentence, etc. A database of 113 registrations from 68 patients (53 total laryngectomy patients with tracheo-esophageal speech, 14 total laryngectomy patients with esophagcal speech and 5 patients with partial frontolateral laryngectomy) and 6 registrations from healthy control persons was collected. Each registration consisted of seven speech utterances and was subjected to an acoustic analysis as well as to a perceptual evaluation, the latter involving eight parameters like "overall impression", "tonicity", etc. Since the goal of our work is to find out the best acoustical measurement for supporting perception and making it precise, it seemed logical to strive for a perceptually based acoustic analysis. We therefore performed the analysis by means of a peripheral auditory model with a built-in fundamental frequency (pitch) extractor. From the frame-level outputs (a frame is 10 ms) of the analyser, global objective parameters, such as (1) the percentage of voiced frames, (2) the average voicing evidence, (3) the voicing length distribution and (4) the fundamental frequency jitter, were computed for the different speech utterances. So as to reduce the parameter variability arising from the nature of the speech utterances (e.g., the presence of pauses in the signal, errors caused by the pitch extractor, etc.), the objective parameters were computed using non-standard averaging schemes involving energy weighting and frame selection. A statistical analysis of the objective parameters confirms that the quality of tracheo-esophageal speech is superior to that of esophageal speech, but inferior to that of normal speech and speech with the preservation of one vocal fold. Correlations between the objective parameters and the perceptual parameters are moderate.