Question on Thesaurus

Dear all:

I wonder if any of you have experience creating a thesaurus for your
language ... particularly in UTF-8.

This is my problem. I'm working to create a Tamil thesaurus -- to go along
with the Tamil Spell-checker (already created). I've followed the guidelines
found in http://lingucomponent.openoffice.org/thesaurus.html and have
created test .dat and .idx files required for the purpose. (See attached.)
However, despite trying many times, I'm not able to the output to work in
LibreOffice. I don't get the anticipated synonyms. It says (None).

Could anyone here help me?

Many thanks.
-e.

Hi,

Dear all:

I wonder if any of you have experience creating a thesaurus for your
language ... particularly in UTF-8.

This is my problem. I'm working to create a Tamil thesaurus -- to go along
with the Tamil Spell-checker (already created). I've followed the guidelines
found in http://lingucomponent.openoffice.org/thesaurus.html and have
created test .dat and .idx files required for the purpose. (See attached.)

This mailing list does not allow attachments. Have you looked at other
thesauri that work? Is yours look the same?

Best regards,
Andras

Hi Andas,

Most other thesauri are created in ISO8859-1 encoding, except Hungarian,
which uses UTF-8. The dat and idx files I created look pretty much look like
the Hun files to me. So, I'm not sure why it's not working... -e.

Did you register it with LibreOffice? I mean via dictionaries.xcu. Do
you see your thesaurus in Tools - Options - Language Settings -
Writing Aids?

Yes. I see OpenOffice.org New Thesaurus, below Hunspell SpellChecker. Both
ticked. -e.

Andas,

I found one oddity, though. In Hungrarian, which uses UTF-8, the byte offset
into the first data is 6. Which makes sense. For English, which uses
ISO8859-1, the byte offset is 10. Again fine. However, in the file I
generated, the byte offset is 9. I wonder if there is something there...

-e.