S522, Kewley-Port
Spring, 2008

FINAL Assignment (30 points)

Final Assignment, due Wed. 4/30 at noon in DKP mail box. SELECT ONE from the following.


Below, two final projects are offered. However, if you prefer, you may do a different final assignment using your DSP skills. You must obtain the instructor's consent by Wednesday, 4/16, in order to do a different assignment.


Accuracy of Measuring Formants, 2005

Measuring formant frequency is known to have errors of ±60 Hz in the F2 region for spectrograms [Monsen RB, Engebretson AM, J Speech Hear Res 1983 Mar;26(1):89-97]. The accuracy of formant frequency measurements: a comparison of spectrographic analysis and linear prediction. The purpose of this lab is to determine the measurement accuracy of F2 for known steady-state and linear formant changes. The stimuli were generated by the Klatt synthesizer, Fs = 10000 Hz, and are available on Oncourse->Resouces-> ASSFIN. mpg200.wav has a flat F2 at 2272 Hz, and mpg203.wav linearly changes rising 165 ms from 2272 Hz to 2287 Hz. The exact F2 Klatt parameters are as follows:

Time Freq  AV
0        2272  0
15      2272  42
30                 60
180    2287  48
200               0
240    2287  0

a) For this project, use Colea so that you can determine the formant frequencies of the two vowels close to the onset and the offsets of the vowels. Tweak your LPC parameters until you have one set that are the best measurements of both flat and rising F2.

b) Use Straight to calculate a smoothed spectrum of the two vowels. Store the n3sgram [save n3sgram.txt n3sgram -ASCII]. Now use mesh, your colon operator, to find the F2 peaks at the onsets and offsets of the vowel. [Try pkpick.m in DSPFIRST]. Calculate the frequency of the peaks given that it uses 1024 point FFT and Fs is 10000Hz.

c) Compare LPC and smoothed spectra measurements of F2 to the orginal Klatt F2 values used in synthesis. Comment on the procedures, and which you think would be most accurate measuring diphthongs in real speech.


Morphing Speech Sounds, 2008 version

Classic experiments in speech perception have used continua between pairs of speech sounds to index categorical perception or auditory discrimination abilities. The goal of this project is for you to create your own speech continuum using natural speech samples and compare how well the resynthesized endpoints match the original recordings. You will be using the STRAIGHT morphing algorithms as well as the tutorial documents on morphing that are found on the STRAIGHT website. Also download and read Hisami Matsui, Hideki Kawahara. "Investigation of Emotionally Morphed Speech Perception and its Structure Using a High Quality Speech Manipulation System", Prod. Eurospeech'03, pp.2113-2116 2003, .

For this project you will need to download the following:
1) morphing_continuum.m and two endpoint stimuli (from Oncourse->Resources->ASSFIN->morphing.zip)
2) STRAIGHT V.40_006 (from STRAIGHT website)
3) STRAIGHT morphing package V.20_007b (from STRAIGHT website)
4) Colea (from Colea website)

There appears to be a bug in Kawahara's morphing code in the display of the frequency axes in the smoothed spectrum. The directions below incorporate this problem so you can still do the project.

1) Use morphing_continuum.m, referring to the STRAIGHT documentation on Morphing. When the smoothed spectrum for ex_baed.wav appears, you will see a frequency axis from 0 to 7000 Hz that is NOT correct. Roughly, it is 0 to 5000 Hz, possibly on a log scale. Therefore, the lowest "formant" is actually F0. So to mark the anchor points for the formants, skip the lowest one, and start marking on the one near 1000Hz. You can mark 3 or 4 formants at 5 time points for both the ex_baed.wav and ex_daed.wav. The sample rate is odd for these files, Fs = 24414Hz, which may be part of the problem.

2) Generate a morphed continuum with 11 steps using morphing_continuum.m. Save the spectra marked with anchors for the original and resynthesized endpoints, and 'morph_baed-daed_6.wav' as .jpg files using the save button on the smoothed spectra windows.

3) To verify weather the morphing worked, first use your ears and comment on how close the original and resynthszed tokens sound to the original. Then use Colea to measure the formant frequencies for the first 3 formants at just your first time anchor (you can read that on your saved .jpg files). When you measure the formants, I recommend using the LPC settings for24 coefficients (because of the high sample rate) and a window of 20 ms. You can also try formant tracks, but that failed for me for some files. Measure formants for just this first time point for the morphed continuum endpoints ('morph_baed-daed_1.wav' and 'morph_baed-daed_11.wav'). Compare these measurements and your perceptual experience to the original recordings. Use these measurments to describe how well the endpoints were resynthesized. Also measure the formant frequencies at the first time point for the midpoint file, 'morph_baed-daed_6.wav.' Are the measurements for this midpoint what you would predict from the recordings?

4) Explain the 4 basic steps that the morphing code morphing_continuum.m performs.

5) For your project, prepare a brief written report describing your overall opinion and experience with morphing and the success at resynthesizing good endpoints. Include printouts of your .jpg files from step 2. Upload your morph_baed-daed_1.wav, morph_baed-daed_6.wav, and morph_baed-daed_11.wav files to your dropbox in Oncourse.

D. Fogerty Last update 4/26/08


Cochlear Implant Project, 2008 version

The assignment is to implement a speech processing demo for the Med-El cochlear implant processor based on the description in Loizou article, section 5.3 (from weekly schedule and handout). The Med-ELDemo.m itself should be patterned closely after the MELCOCH.M matlab demo from Herrick & Ueda. The output of Mel-EdDemo.m should also produce six similar figures (or more) and play the two sound files. You may alter the melcoch.m demo, carefully editng comments to reflect your processing, or start from scratch. You may use the 'dilemma' sound in DILMBIN.MAT, or make Med-ElDemo.m more general as you choose. The files MELCOCH.M and DILMBIN.MAT are in Oncourse->Resources->ASSFIN.

A few comments should be included in a brief written description of your project. First, explain the criteria you used to select the eight logrithmically spaced bandpass filters in relation to the Baskent & Shannon article.. For the compression function, implement the function described under section 4.2.1 CIS parameters, Compression, that refers to Fig. 19. Set MCL to 2^15, making your compression range and set THR 5 or 6 dB lower than that. THR & MCL should be constant across the eight channels .

By the Wednesday due date, e-mail me your Mel-EdDemo.m file (and other .m files you develop) using your initials in the .m file name so I can listen to your output, and see the six figures, and your 'brief' comments.

{Updated on 4/26/08}