AfLaT.org - encoding https://aflat.org/taxonomy/term/41/0 en Automatic Diacritic Restoration for African Languages https://aflat.org/diacriticrestoration <!--paging_filter-->The orthography of many African languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform diacritic restoration. <P> This is a demonstration system for a diacritic restoration method that is able to automatically restore diacritics on the basis of local graphemic context. It is based on the machine learning method of Memory-Based learning. We have applied the method to the African languages of Cilubà, Gĩkũyũ, Kĩkamba, Maa, Sesotho sa Leboa, Tshivenḓa and Yoruba. <P> You can find more information on this system in <A HREF="?q=node/182">this paper</A> <P> <form action="?q=node/185" method="post" color=red> <TABLE> <TR> <TH COLSPAN="2">Select a language and enter the word or sentence you want to restore diacritics for. <TR> <TH><INPUT type="radio" name="lingo" value="Cilubà" >Cilubà (e.g. mutekete)<BR> <TH><INPUT type="radio" name="lingo" value="Gĩkũyũ">Gĩkũyũ (e.g. nituronire)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Kĩkamba">Kĩkamba (e.g. ningulilikana)<BR> <TH><INPUT type="radio" name="lingo" value="Maasai" >Maasai (e.g. oltunani)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Sesotho sa Leboa">Sesotho sa Leboa (Northern Sotho) (e.g. swanetse)<BR> <TH><INPUT type="radio" name="lingo" value="Tshivenḓa">Tshivenḓa (e.g. tshiswitulo)<BR> <TR> <TH><INPUT type="radio" name="lingo" value="Yoruba">Yoruba (e.g. isinku) <BR> <TH>&nbsp; </TABLE> <textarea name="text" rows=6 cols=70></TEXTAREA> <p><input type="submit" /> [Processing the text might take a while] </form> <H5>Authors:</H5> <B>Guy De Pauw</B>: CNTS - Language Technology Group, University of Antwerp, Antwerp, Belgium, <span class="spamspan"><span class="u">guy [dot] depauw</span> [at] <span class="d">ua [dot] ac [dot] be</span></span><BR> <B>Gilles-Maurice de Schryver</B>: African Languages and Cultures, Ghent University, Ghent, Belgium, <span class="spamspan"><span class="u">gillesmaurice [dot] deschryver</span> [at] <span class="d">ugent [dot] be</span></span><BR> <B>Peter Waiganjo Wagacha</B>: School of Computing and Informatics, University of Nairobi, Nairobi, Kenya, <span class="spamspan"><span class="u">waiganjo</span> [at] <span class="d">uonbi [dot] ac [dot] ke</span></span><BR> </UL> https://aflat.org/diacriticrestoration#comments Central Africa Eastern Africa Southern Africa Western Africa Tool / Application Cilubà diacritics encoding Gĩkũyũ Kikamba machine learning Northern Sotho Sesotho sa Leboa Tshivenda Venda Yoruba Tue, 23 Oct 2007 11:16:13 +0000 Guy 184 at https://aflat.org Good UTF-8 editor for Windows https://aflat.org/node/89 <!--paging_filter--><p>I usually edit my UTF-8 encoded files in linux, where you have the wonderful "gedit" program that is fully compliant with UTF-8. Is there a text-only editor like that for Windows? Crimson editor claims to be UTF-8 compatible, but I beg to differ.</p> https://aflat.org/node/89#comments African Language Technology encoding Thu, 18 Jan 2007 12:51:50 +0000 Guy 89 at https://aflat.org Typesetting African languages https://aflat.org/node/78 <!--paging_filter--><div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://www.ideography.co.uk/library/afrolingua.html" target="_blank">https://www.ideography.co.uk/library/afrolingua.html</a> </div> </div> </div> <div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>This Web page provides a description of the 54-page document &quot;Typesetting African languages&quot;</p> <p></p> <p><em>Most African languages with a writing system use a modification of the Roman alphabet; the systems were often the invention of Christian missionaries, though some have been devised by government commissions since decolonisation.</p> <p>The &quot;authors&quot; of these new writing systems usually aimed to make spellings logical and consistent by providing a written sign for each consonant or vowel sound in the language, and this often led to the adoption of newly- created letterforms that are easy to write by hand, but are not available in standard fonts for typesetting. </em></p> <p><em>In writing up this report, I have aimed it at readers who do not know much about how typesetting is handled today in PC-based &quot;desktop publishing&quot; and word processing systems. I have aimed to explain the issues, and some solutions, in simple language and with a wealth of illustration. </em></p> </div> </div> </div> https://aflat.org/node/78#comments encoding spelling Wed, 20 Dec 2006 14:18:29 +0000 Guy 78 at https://aflat.org Tshivenḓa (Venda) characters https://aflat.org/node/57 <!--paging_filter--><div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://africanlanguages.com/venda/#dia" target="_blank">https://africanlanguages.com/venda/#dia</a> </div> </div> </div> <div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>Information on displaying Tshivenḓa special characters in word processors, on web pages, ... Includes links to specialized font sets.</p> </div> </div> </div> https://aflat.org/node/57#comments Southern Africa Tool / Application encoding Tshivenda Venda Fri, 15 Dec 2006 14:45:21 +0000 Guy 57 at https://aflat.org A12N gateway https://aflat.org/node/53 <!--paging_filter--><div class="field field-type-link field-field-url"> <div class="field-label">URL:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <a href="https://www.bisharat.net/A12N/" target="_blank">https://www.bisharat.net/A12N/</a> </div> </div> </div> <div class="field field-type-text field-field-description"> <div class="field-label">Description:&nbsp;</div> <div class="field-items"> <div class="field-item odd"> <!--paging_filter--><p>Links to fonts, resources, encoding tools, ... for African languages. Provided by the <A href="?q=node/21">Bisharat</A> project.</p> </div> </div> </div> https://aflat.org/node/53#comments Project / Organisation encoding Thu, 14 Dec 2006 01:31:10 +0000 Guy 53 at https://aflat.org