Logo ČVUT
ČESKÉ VYSOKÉ UČENÍ TECHNICKÉ V PRAZE
STUDIJNÍ PLÁNY
2023/2024

Speech Processing

Přihlášení do KOSu pro zápis předmětu Zobrazit rozvrh
Kód Zakončení Kredity Rozsah Jazyk výuky
BE2M31ZRE Z,ZK 6 2P+2C anglicky

Podmínkou zápisu na předmět BE2M31ZRE je, že student si nejpozději ve stejném semestru zapsal příslušný počet předmětů ze skupiny BEZBM

Garant předmětu:
Petr Pollák
Přednášející:
Petr Pollák
Cvičící:
Petr Pollák
Předmět zajišťuje:
katedra teorie obvodů
Anotace:

The subject is devoted to basis of speech processing addressed to students of master program. Discussed speech technology is currently applied in many systems in different fields (e.g. information dialogue systems, voice controlled devices, dictation systems or transcription of audio-video recordings, support for language teaching, etc.). Students will learn basic algorithms for speech analysis (spectral analysis, LPC, cepstral analysis, pitch, formants, etc.), principles of speech recognition (GMM-HMM, ANN-HMM systems, small and large vocabulary recognizers), speaker recognition (based on VQ and GMM), speech synthesis or speech enhancement. Further information can be found at <a href=http://noel.feld.cvut.cz/vyu/be2m31zre>http://noel.feld.cvut.cz/vyu/be2m31zre</a>. Pro zapsané studenty jsou detailní informace na výukovém portálu <a href=https://moodle.fel.cvut.cz>Moodle FEL</a>.

Požadavky:

Bases of digital signal processing are supposed as preliminary knowledge.

Osnova přednášek:

1. Introduction - speech production and perception model, basic characteristics (phonetic and articulatory)

2. Spectral characteristics of speech signal (DFT and LPC spectrum)

3. Cepstral reprezentation of speech. Recognition features. Voice Activity Detection.

4. Speech enahncement (additive and convolution noise, one-channel and multi-channel systems)

5. Basic classification approaches and techniques (GMM, HMM, VQ, ANN, DNN)

6. Speaker verification and identification. Language recognition.

7. Small and large vocabulary speech recognition (DTW, GMM-HMM, LVCSR, HTK and KALDI tools).

8. Modern LVCSR systems (DNN-HMM). Adaptation techniques. Advanced speech features.

9. Speech synthesis - basic principles (concatenative and formant synthesis, PSOLA)

10. Audio-visual speech recognition

11. Hearing aids and cochlear implants (anatomy and hearing model, speech processing)

12. Speech coding.

13. Multimedia systems with voice input (dialog systems, logopaedy, language teaching)

14. Databases for speech technology systems. Reserve.

Osnova cvičení:

1. Introduction: speech signal, tools for analysis, sources of speech signals

2. Basic time-domain and spectral characteristics

3. Fundamental frequency (pitch) estimaton

4. LPC spectrum and formant estimation

5. Cepstrum and cepstral distance: voice activity detection.

6. Basic classification techniques (GMM, VQ, HMM): vowel classification

7. Speaker verification based on VQ

8. Speaker identification based on GMM

9. DTW based recognition: simple recognizer of particular words

10. HMM based recognition: basic tasks and demonstration of HMM modelling

11. Suppression of additive noise in speech signal

12. Convolutory noise suppression

13. Speech synthesis: implementation of formant synthesis, demonstration of available tools

14. Reserve. Credits

Cíle studia:

The goals of the subject is to introduce used speech technology in the most important multimedia applications. Students should manage the knowledge as basic characteristics of speech signal, speech enhancement, speech recognition, speech synthesis, audio-visual speech processing, etc. Students will practice basic tasks of speech processing in MATLAB environment and also other publicly available tools for speech analysis will be used.

Studijní materiály:

[1] Rabiner, L., Schafer, R. W.: Introduction to Digital Speech Processing Foundations and Trends in Signal Processing). Now Publishers Inc, 2007.

[2] Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing. Prentice Hall, 2001.

[3] Deller Jr., J. R., Hansen, J. H. L., Proakis, J. G.: Discrete-time Processing of Speech Signals. Wiley, 2000.

[4] McLoughlin, I.: Applied Speech and audio Processing: With Matlab Examples. Cambridge University Press, 2009.

[5] Jelinek, F.: Statistical Methods for Speech Recognition (Language, Speech, and Communication). The MIT Press, 1998.

[6] ITU-T Recommendations - http://www.itu.int/ITU-T

Poznámka:
Další informace:
https://moodle.fel.cvut.cz/courses/BE2M31ZRE
Rozvrh na zimní semestr 2023/2024:
Rozvrh není připraven
Rozvrh na letní semestr 2023/2024:
06:00–08:0008:00–10:0010:00–12:0012:00–14:0014:00–16:0016:00–18:0018:00–20:0020:00–22:0022:00–24:00
Po
Út
místnost T2:C3-337
Pollák P.
09:15–10:45
(přednášková par. 1)
Dejvice
T2:C3-337
místnost T2:C4-362
Pollák P.
11:00–12:30
(přednášková par. 1
paralelka 101)

Dejvice
Laborator K362
St
Čt

Předmět je součástí následujících studijních plánů:
Platnost dat k 17. 4. 2024
Aktualizace výše uvedených informací naleznete na adrese https://bilakniha.cvut.cz/cs/predmet4846106.html