Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages

TitleCollecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages
Publication TypeConference Paper
Year of Publication2009
AuthorsBadenhorst, J., Van Heerden C., Davel M. H., and Barnard Etienne
BooktitleProceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009)
DateMarch
PublisherAssociation for Computational Linguistics
LocationAthens, Greece
EditorDe Pauw, Guy, de Schryver Gilles-Maurice, and Levin Lori
Abstract

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data.

URLhttp://www.aclweb.org/anthology/W09-0701
AttachmentSize
W09-0701.pdf605.84 KB