AfLaT.org - machine learning https://aflat.org/taxonomy/term/21/0 en Statistical unicodification of African languages https://aflat.org/content/statistical-unicodification-african-languages <span class="biblio-title"><a href="/content/statistical-unicodification-african-languages">Statistical unicodification of African languages</a></span>, <span class="biblio-authors"><a href="/biblio/author/204">Scannell, Kevin P.</a></span> , Language Resources and Evaluation, 09/2011, Volume 45, Issue 3, p.375-386, (2011) <span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.atitle=Statistical+unicodification+of+African+languages&amp;rft.title=Language+Resources+and+Evaluation&amp;rft.issn=1574-020X&amp;rft.date=2011&amp;rft.volume=45&amp;rft.issue=3&amp;rft.spage=375&amp;rft.epage=386&amp;rft.aulast=Scannell&amp;rft.aufirst=Kevin&amp;rft_id=info%3Adoi%2F10.1007%2Fs10579-011-9150-3"></span> https://aflat.org/content/statistical-unicodification-african-languages#comments Diacritic Restoration machine learning Unicodification Tue, 20 Sep 2011 09:36:36 +0000 Guy 521 at https://aflat.org Automatic Diacritic Restoration for African Languages https://aflat.org/diacriticrestoration <!--paging_filter-->The orthography of many African languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform diacritic restoration. <P> This is a demonstration system for a diacritic restoration method that is able to automatically restore diacritics on the basis of local graphemic context. It is based on the machine learning method of Memory-Based learning. We have applied the method to the African languages of Cilubà, Gĩkũyũ, Kĩkamba, Maa, Sesotho sa Leboa, Tshivenḓa and Yoruba. <P> You can find more information on this system in <A HREF="?q=node/182">this paper</A> <P> <form action="?q=node/185" method="post" color=red> <TABLE> <TR> <TH COLSPAN="2">Select a language and enter the word or sentence you want to restore diacritics for. <TR> <TH><INPUT type="radio" name="lingo" value="Cilubà" >Cilubà (e.g. mutekete)<BR> <TH><INPUT type="radio" name="lingo" value="Gĩkũyũ">Gĩkũyũ (e.g. nituronire)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Kĩkamba">Kĩkamba (e.g. ningulilikana)<BR> <TH><INPUT type="radio" name="lingo" value="Maasai" >Maasai (e.g. oltunani)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Sesotho sa Leboa">Sesotho sa Leboa (Northern Sotho) (e.g. swanetse)<BR> <TH><INPUT type="radio" name="lingo" value="Tshivenḓa">Tshivenḓa (e.g. tshiswitulo)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Yoruba">Yoruba (e.g. isinku) <BR> <TH>&nbsp; </TABLE> <textarea name="text" rows=6 cols=70></TEXTAREA> <p><input type="submit" /> [Processing the text might take a while] </form> <H5>Authors:</H5> <B>Guy De Pauw</B>: CNTS - Language Technology Group, University of Antwerp, Antwerp, Belgium, <span class="spamspan"><span class="u">guy [dot] depauw</span> [at] <span class="d">ua [dot] ac [dot] be</span></span><BR> <B>Gilles-Maurice de Schryver</B>: African Languages and Cultures, Ghent University, Ghent, Belgium, <span class="spamspan"><span class="u">gillesmaurice [dot] deschryver</span> [at] <span class="d">ugent [dot] be</span></span><BR> <B>Peter Waiganjo Wagacha</B>: School of Computing and Informatics, University of Nairobi, Nairobi, Kenya, <span class="spamspan"><span class="u">waiganjo</span> [at] <span class="d">uonbi [dot] ac [dot] ke</span></span><BR> </UL> https://aflat.org/diacriticrestoration#comments Central Africa Eastern Africa Southern Africa Western Africa Tool / Application Cilubà diacritics encoding Gĩkũyũ Kikamba machine learning Northern Sotho Sesotho sa Leboa Tshivenda Venda Yoruba Tue, 23 Oct 2007 11:16:13 +0000 Guy 184 at https://aflat.org Northern Sotho Part-of-Speech Tagger (V2) - Demo https://aflat.org/sothotag <!--paging_filter--><p>This demo showcases a part-of-speech tagger for Northern Sotho. It retrieves the morpho-syntactic categories for words in a sentence. It uses <A HREF="https://ilk.uvt.nl/mbt/">MBT</A>, the memory-based tagger trained on a relatively small annotated corpus. </p> <p><B>Version1:</B> Ocotober 10 2007 (20k tokens training set)<br /> <B>Version2:</B> December 8 2007 (35k tokens training set)</p> <p><HR></p> <p><B>Type in the text you want to tag (2,500 character limit)</B><SMALL><BR>Example: <i>Motho ge a sa tseba o swanetše go dumela seo gore bao ba tsebago ba mmotše.</i></SMALL></p> <form action="?q=node/178" method="post" color=red> <textarea name="word" rows=13 cols=60></TEXTAREA></p> <p><input type="submit" /> [Tagging the text might take a while]<br /> </form> <p><H5>Authors:</H5><br /> <B>Guy De Pauw</B>: CNTS - Language Technology Group, University of Antwerp, Antwerp, Belgium, <span class="spamspan"><span class="u">guy [dot] depauw</span> [at] <span class="d">ua [dot] ac [dot] be</span></span><br /> <B>Gilles-Maurice de Schryver</B>: African Languages and Cultures, Ghent University, Ghent, Belgium, <span class="spamspan"><span class="u">gillesmaurice [dot] deschryver</span> [at] <span class="d">ugent [dot] be</span></span><br /> </UL></p> <p><H5> <A HREF="?q=node/179">Paper</A></H5></p> https://aflat.org/sothotag#comments Southern Africa Tool / Application machine learning Northern Sotho Sesotho sa Leboa tagger Thu, 11 Oct 2007 12:05:37 +0000 Guy 177 at https://aflat.org CNTS - Language Technology Group https://aflat.org/node/15 <!--paging_filter--><div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>CNTS is a research center of the Department of Linguistics of the <A href="https://www.ua.ac.be" target="_blank">University of Antwerp (UA)</A> in Antwerp, Belgium, engaged in research in computational linguistics and psycholinguistics. The CNTS - Language Technology Group has a strong tradition in the application of machine learning techniques for natural language processing. Recently, CNTS has also started investigating the applicability of unsupervised learning methods and knowledge transfer techniques for the annotation and linguistic description of African languages, particularly Kiswahili and the local languages of Kenya.</p> </div> </div> </div> <div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://www.cnts.ua.ac.be" target="_blank">https://www.cnts.ua.ac.be</a> </div> </div> </div> <div class="field field-type-userreference field-field-aflat-users"> <div class="field-label">AfLaT users:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="/users/guy" title="View user profile.">Guy</a> </div> </div> </div> https://aflat.org/node/15#comments Corpus Tool / Application Kiswahili machine learning Swahili Tue, 12 Dec 2006 14:41:22 +0000 Guy 15 at https://aflat.org Kiswahili Part-of-Speech Tagger - Demo https://aflat.org/swatag <!--paging_filter--><p>This demo showcases a broad coverage part-of-speech tagger for Kiswahili. It retrieves the morpho-syntactic categories for words in a sentence. This system uses the <A HREF=https://ilk.uvt.nl/mbt/ target=blank>Memory-Based Tagger</A> trained on the Helsinki Corpus of Swahili. </p> <p><B>Type in the text you want to tag</B><SMALL><BR>Example: <i>Hapo ni kwa nini Sahara halina maji na kwa nini simba na shungi.</i></SMALL></p> <form action="?q=node/11" method="post" color=red> <textarea name="word" rows=13 cols=60></TEXTAREA></p> <p><input type="submit" /> [Tagging the text might take a while]<br /> </form> <p><H5>Authors:</H5><br /> <B>Guy De Pauw</B>: CNTS - Language Technology Group, University of Antwerp, Antwerp, Belgium, <span class="spamspan"><span class="u">guy [dot] depauw</span> [at] <span class="d">ua [dot] ac [dot] be</span></span><br /> <B>Gilles-Maurice de Schryver</B>: African Languages and Cultures, Ghent University, Ghent, Belgium, <span class="spamspan"><span class="u">gillesmaurice [dot] deschryver</span> [at] <span class="d">ugent [dot] be</span></span><br /> <B>Peter Waiganjo Wagacha</B>: School of Computing and Informatics, University of Nairobi, Nairobi, Kenya, <span class="spamspan"><span class="u">waiganjo</span> [at] <span class="d">uonbi [dot] ac [dot] ke</span></span><br /> </UL></p> <p><H5><A HREF="?q=node/6">Paper</A></H5></p> https://aflat.org/swatag#comments Eastern Africa Tool / Application Kiswahili machine learning Swahili Tue, 12 Dec 2006 13:58:27 +0000 Guy 10 at https://aflat.org Gĩkũyũ Diacritic Placement - Demo https://aflat.org/node/8 <!--paging_filter--><p>The orthography of Gĩkũyũ includes a number of accented characters to represent the entire vowel system (namely ĩ and ũ). Not available on standard computer keyboards, these characters are usually typed as the nearest available characters (i and u).</p> <p><a href="https://aflat.org/node/8" target="_blank">read more</a></p> https://aflat.org/node/8#comments Eastern Africa Tool / Application Gĩkũyũ Kĩkũyũ machine learning Tue, 12 Dec 2006 13:52:59 +0000 Guy 8 at https://aflat.org A grapheme-based approach to accent restoration in Gĩkũyũ https://aflat.org/node/5 <span class="biblio-title"><a href="/biblio/view/5">A grapheme-based approach to accent restoration in Gĩkũyũ</a></span>, <span class="biblio-authors"><a href="/biblio/author/133">Wagacha, Peter W.</a>, <a href="/biblio/author/333">De Pauw Guy</a>, and <a href="/biblio/author/3">Githinji P. W.</a></span> , Proceedings of the Fifth International Conference on Language Resources and Evaluation, May, 2006, Genoa, Italy, p.1937-1940, (2006) <span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Fwww.aflat.org&amp;rft.title=A+grapheme-based+approach+to+accent+restoration+in+G%C4%A9k%C5%A9y%C5%A9&amp;rft.date=2006&amp;rft.spage=1937&amp;rft.epage=1940&amp;rft.aulast=Wagacha&amp;rft.aufirst=P+W&amp;rft.au=Pauw%2C+De&amp;rft.au=Githinji%2C+P+W&amp;rft.pub=ELRA&amp;rft.place=Genoa%2C+Italy"></span> Eastern Africa Tool / Application Gĩkũyũ Kĩkũyũ machine learning Tue, 12 Dec 2006 13:41:29 +0000 Guy 5 at https://aflat.org Data-driven part-of-speech tagging of Kiswahili https://aflat.org/node/6 <span class="biblio-title"><a href="/biblio/view/6">Data-driven part-of-speech tagging of Kiswahili</a></span>, <span class="biblio-authors"><a href="/biblio/author/333">De Pauw, Guy</a>, <a href="/biblio/author/194" class="biblio-local-author">de Schryver Gilles-Maurice</a>, and <a href="/biblio/author/133">Wagacha Peter W.</a></span> , Proceedings of Text, Speech and Dialogue, 9th International Conference, Volume 4188/2006, Berlin, Germany, p.197-204, (2006) <span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rft.title=Data-driven+part-of-speech+tagging+of+Kiswahili&amp;rft.series=Lecture+Notes+in+Computer+Science&amp;rft.isbn=978-3-540-39090-9&amp;rft.date=2006&amp;rft.volume=4188%2F2006&amp;rft.spage=197&amp;rft.epage=204&amp;rft.aulast=Pauw&amp;rft.aufirst=Guy+De&amp;rft.pub=Springer+Verlag&amp;rft.place=Berlin%2C+Germany"></span> Eastern Africa Tool / Application Kiswahili machine learning Swahili Tue, 12 Dec 2006 13:41:29 +0000 Guy 6 at https://aflat.org Development of a corpus for Gĩkũyũ using machine learning techniques https://aflat.org/node/7 <span class="biblio-title"><a href="/biblio/view/7">Development of a corpus for Gĩkũyũ using machine learning techniques</a></span>, <span class="biblio-authors"><a href="/biblio/author/133">Wagacha, Peter W.</a>, <a href="/biblio/author/333">De Pauw Guy</a>, and <a href="/biblio/author/9">Getao K.</a></span> , Proceedings of LREC workshop - Networking the development of language resources for African languages, Genoa, Italy, (2006) <span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Fwww.aflat.org&amp;rft.title=Development+of+a+corpus+for+G%C4%A9k%C5%A9y%C5%A9+using+machine+learning+techniques&amp;rft.date=2006&amp;rft.aulast=Wagacha&amp;rft.aufirst=P+W&amp;rft.au=Pauw%2C+De&amp;rft.au=Getao%2C+K&amp;rft.place=Genoa%2C+Italy"></span> Eastern Africa Corpus Gĩkũyũ Kĩkũyũ machine learning Tue, 12 Dec 2006 13:41:29 +0000 Guy 7 at https://aflat.org