Hi
I am wanting to develop a Pitjantjatjara spell checker for
LibreOffice. Pitjantjatjara is one of a number of indigenous
languages from Central Australia. I'm hoping that what we learn from
doing this will enable us to develop spell check dictionaries for
other Australian indigenous languages.
I have a list of words in a unicode text file where each word is on a
new line. There are about 2300 lines/words.
I have no idea what to do next and I would appreciate clues for the next steps.
Thanks
Peter
Hi
I am wanting to develop a Pitjantjatjara spell checker for
LibreOffice. Pitjantjatjara is one of a number of indigenous
languages from Central Australia. I'm hoping that what we learn from
doing this will enable us to develop spell check dictionaries for
other Australian indigenous languages.
I have a list of words in a unicode text file where each word is on a
new line. There are about 2300 lines/words.
I have no idea what to do next and I would appreciate clues for the next steps.
Thanks
Peter
Hi,
Have you read the Hunspell manual?
http://sourceforge.net/projects/hunspell/files/Hunspell/Documentation/
You need a dictionary in Hunspell format.
There is more to be done in order to use this dictionary in LibreOffice. See
http://wiki.services.openoffice.org/wiki/Adding_a_new_language_or_locale for
"how to" for the bare minimum of what's needed to add support to spell-check a
language, and the full-blown locale data to make it a proper fully-supported
locale, and bug https://bugs.freedesktop.org/show_bug.cgi?id=30773 for
the example of Kabyle.
Best regards,
Andras
Peter Ruwoldt schreef:
Hi
I am wanting to develop a Pitjantjatjara spell checker for
LibreOffice. Pitjantjatjara is one of a number of indigenous
languages from Central Australia. I'm hoping that what we learn from
doing this will enable us to develop spell check dictionaries for
other Australian indigenous languages.I have a list of words in a unicode text file where each word is on a
new line. There are about 2300 lines/words.I have no idea what to do next and I would appreciate clues for the next steps.
Thanks
Peter
This should give you a first start:
http://www.suares.an/index.php?page_id=25&news_id=233#news-top
Ruud
Hi Peter,
I am wanting to develop a Pitjantjatjara spell checker for
LibreOffice.
Well, I know nothing about that language, and I'm not sure whether you
really intended to write what you did.
I don't think it is necessary to develop a new spell-checker, but
instead it is enough to create a corresponding dictionary for the
existing spellchecker, namely hunspell.
[...] I have a list of words in a unicode text file where each word is on a
new line. There are about 2300 lines/words.
This is a rather short list, very likely not enough for automatic
affix creation. I guess that this list doesn't included flexed forms
of the words anyway (i.e. past and future forms, genitives, etc.)
I have no idea what to do next and I would appreciate clues for the next steps.
You need to have a good understanding on how words are formed in the language.
For example if plural of a word is (almost) always formed by appending
an "s" to a word, then you should create an affix rule for that, etc.
But without knowing the languge specifics, it is hard to give a
concrete path. But then again 2300 is really short. With that list,
you can just save that list as a dictionary without any
affix/transformation rules.
But to develop a dictionary, not only a list with correct words, but
also a list with (possibly automatically generated) list of misspelled
words is needed, to do quality checks on your modifications.
ciao
Christian
Dear Peter,
I've helped create spell checking dictionaries for a number of
minority and endangered languages and would be happy to help you with
this.
As others have already indicated, you will probably want to
combine your word list with an affix file that encodes any common
prefixes and suffixes in the language. I can help write that but
I'll need lots of input from you to do so. I also have scripts that
can package your word list up so that it is usable as an extension in
either Firefox or OpenOffice/LibreOffice.
2300 words is a great start - in Irish we say "Tús maith leath na
hoibre" - a good start is half the work.
Kevin
Thanks for that Andras. Getting the locale bit happening seems to be a
hurdle I will have to make happen.
Peter
Thanks Ruud
Thanks Christian
I've checked out the Hunspell project and I do not want to create a
new checker, just the dictionary. I figured that a plain word list
with no aff will allow a start to be made. I have got a linguist
working with me and I figured that from here we could evolve the
project over time.
Peter
We build spellchecker for 11 languages in South Africa. Our framework might be useful for you to build a number of checker for Firefox and LO.