Unsupervised Induction of Dholuo Word Classes using Maximum Entropy Learning
Submitted by Guy on Wed, 2007-01-17 18:04
| Title | Unsupervised Induction of Dholuo Word Classes using Maximum Entropy Learning |
| Publication Type | Conference Paper |
| Year of Publication | 2007 |
| Authors | De Pauw, Guy, Wagacha Peter W., and Abade Dorothy A. |
| Booktitle | Proceedings of the First International Computer Science and ICT Conference (COSCIT 2007) |
| Publisher | University of Nairobi |
| Location | Nairobi, Kenya |
| Abstract | This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of word classes for the Western Nilotic language of Dholuo. The proposed approach extracts shallow morphological and contextual features for each word of a 300k text corpus of Dholuo. These features provide a layer of linguistic abstraction that enables the extraction of general word classes. We provide a preliminary evaluation of the proposed method in terms of language model perplexity and through a simple case study of the paradigm of the verb stem "somo". |
| Attachment | Size |
|---|---|
| coscit.depauw.pdf | 234.56 KB |