Containing overgeneration in Zulu computational morphology

TitleContaining overgeneration in Zulu computational morphology
Publication TypeConference Paper
Year of Publication2007
AuthorsPretorius, Laurette, and Bosch Sonja E.
BooktitleHuman Language Technologies as a Challenge for Computer Science and Linguistics, Proceedings of 3rd Language and Technology Conference
DateOctober 2007
PublisherWydawnictwo Poznańskie Sp. z o.o.
EditorVetulani, Z.
ISBN Number978-83-7177-407-2

The development of a large coverage computational morphological analyser for Zulu requires not only the modelling of the regular phenomena often associated with word formation, but also the idiosyncratic behaviour that may occur in Zulu morphology. This paper discusses the application of an existing rule-based finite-state morphological analyser prototype ZulMorph in semi-automating the mining of available Zulu language corpora for idiosyncratic behaviour. The semi-automated procedure makes provision for bootstrapping the morphological analyser to include newly extracted information from corpora. Of particular interest is also the central role that the machine-readable lexicon plays. The procedure is applied to a Zulu development corpus of 30 000 types and the results are given and discussed.