Title Acoustic modelling of Lithuanian speech recognition
Translation of Title Lietuvių šnekos atpažinimo akustinis modeliavimas.
Authors Laurinčiukaitė, Sigita
Full Text Download
Pages 25
Keywords [eng] Speech recognition ; acoustic modelling ; hidden Markov model ; syllable-based speech recognition ; phoneme-based speech recognition
Abstract [eng] This paper is devoted to an acoustic modelling of Lithuanian speech recognition. Word-, syllable-, contextual syllable-, phoneme- and contextual phoneme-based speech recognition was investigated. Investigations were performed for isolated words and continuous speech. The most popular sub-word units in Lithuanian speech recognition are phonemes and contextual phonemes, and research on other sub-word units is omitted. This paper aims to compare capacity of linguistic sub-word units to model speech and to demonstrate that investigation of sub-word units suggest using alternative sub-word units to phoneme and contextual phoneme. The dissertation proposes a new methodology for acoustic modelling of syllables and phonemes, new sub-word unit – pseudo-syllable; technologies for acoustic modelling of separate sub-word units, including developed schemes, tools and recommendations. Speech corpus of isolated words was prepared and two versions of corpus of continuous speech LRN were developed for experimental research. Investigation of recognition of isolated words and construction of acoustic models for words showed that a size of training set of acoustic models, a content of training set in regard to number of speakers have an influence on speech recognition accuracy. The recommendations for word-based acoustic modelling are given. Investigation of recognition of isolated words and construction of acoustic models for words, syllables and phonemes showed that the best recognition results 98 ±1,8 % are achieved with sub-word unit of syllable. The complexity of syllable-based acoustic modelling prescribes sub-word unit type of word to use for acoustical modelling. After investigation of phoneme-based and contextual phoneme-based recognition of continuous speech two sets of phonemes with the best speech recognition accuracy (62 ±1,5 % and 62 ±1,5 %)) were selected. Set of phonemes without (or with) softness of consonants, accent and splitting of diphthongs are recommended for acoustic modelling of phoneme- and contextual phoneme-based recognition of continuous speech. Contextual phoneme with regard to speech recognition accuracy or phoneme with regard to simplicity of acoustic modelling is recommended. Investigation of recognition of continuous speech according to proposed methodology showed that new sub-word unit (pseudo-syllable) increase speech recognition accuracy (57 ±0,3 %) in comparison to phoneme models (52 ±0,3 %). Investigation of separate blocks in methodology allowed to increase speech recognition accuracy to 67 ±1,4 %. Contextual syllables-phonemes increase speech recognition accuracy to 72 ±1,4 %, but are inferior to contextual phonemes (76 ±1,3 %).
Type Summaries of doctoral thesis
Language English
Publication date 2008