Understanding the hunspell

Hi,

I'm not sure if I'm doing it right by sending this mail do l10n, but hopping someone will point me to right direction. Even tried to subscribe to hunspell mailing list, but with no success.

I've done reading Hunspell help/documentation pdf from the internet but still have few questions. From what stands in that document I can't be certain in some specific things which are actually very important when we talk about morphology of highly flective language like Croatian.

Here we go:

Let's say that .aff contain:

FLAG long
SFX Y A1 1
SFX Y 0 a .

FLAG long
SFX Y A2 1
SFX Y A2 0 e .

Now, A1 and A2 needs to be applied to word "jezik" (eng. language). So if .dic file contain "jezik/A1" I should expect forms like "jezik" and "jezika". If .dic file contain entery like "jezik/A2" expected forms are "jezik", "jezike", but if the entery is

     jezik/A1A2

I can expect forms like "jezik", "jezika", "jezike", but will hunspell try to combine A1 and A2 together like "jezikae" or "jezikea"? Form "jezikae" and "jezikea" are not valid in Croatian. I know it can be prevented by doing

FLAG long
SFX Y A2 1
SFX Y A2 0 e [^a]

but question is - will hunspell combine (aglutinate) suffixes if suffix statements doesn't prevent it explicitly (and allows it implicitly)? Wasn't able to solve this by reading the manual from http://hunspell.sourceforge.net nor by looking into other .aff files from git repository.

Also if .dic file contains only form "PDF" will the word "pdf" in LO be underlined as misspelled? Can .dic file contain two words in one line?

Further more - have trouble with understanding the REP section in .aff file. Can I do:

REP 1
Plitvička_Jezera Plitvička_jezera

will that automatically correct capital letter in word "Jezera"?

I hope someone will give me a hand here...

Kruno

Hi Kruno,

Hi,

I'm not sure if I'm doing it right by sending this mail do l10n, but hopping
someone will point me to right direction. Even tried to subscribe to
hunspell mailing list, but with no success.

This is the right place for your questions.

I've done reading Hunspell help/documentation pdf from the internet but
still have few questions. From what stands in that document I can't be
certain in some specific things which are actually very important when we
talk about morphology of highly flective language like Croatian.

Here we go:

Let's say that .aff contain:

FLAG long
SFX Y A1 1
SFX Y 0 a .

FLAG long
SFX Y A2 1
SFX Y A2 0 e .

Now, A1 and A2 needs to be applied to word "jezik" (eng. language). So if
.dic file contain "jezik/A1" I should expect forms like "jezik" and
"jezika". If .dic file contain entery like "jezik/A2" expected forms are
"jezik", "jezike", but if the entery is

    jezik/A1A2

I can expect forms like "jezik", "jezika", "jezike", but will hunspell try
to combine A1 and A2 together like "jezikae" or "jezikea"? Form "jezikae"
and "jezikea" are not valid in Croatian. I know it can be prevented by doing

FLAG long
SFX Y A2 1
SFX Y A2 0 e [^a]

but question is - will hunspell combine (aglutinate) suffixes if suffix
statements doesn't prevent it explicitly (and allows it implicitly)? Wasn't
able to solve this by reading the manual from
http://hunspell.sourceforge.net nor by looking into other .aff files from
git repository.

The second field in the header of a suffix class means only allowed
combination with prefix flags (from a class with similar allowed
prefix-suffix combination), so your first example will work (with
syntax corrections) well:

FLAG long
SFX A1 Y 1
SFX A1 0 a .

SFX A2 Y 1
SFX A2 0 e .

Also if .dic file contains only form "PDF" will the word "pdf" in LO be
underlined as misspelled?

Yes, "pdf" won't be accepted, only "PDF" (by the way, it's possible to
set this feature for lower case words, eg. names of the measurements
or currencies have to be lower case words in capitalized text, too).

Can .dic file contain two words in one line?

Only in space separated form, and in LibreOffice this is only for
better suggestions.

Further more - have trouble with understanding the REP section in .aff file.
Can I do:

REP 1
Plitvička_Jezera Plitvička_jezera

will that automatically correct capital letter in word "Jezera"?

No, there is that autocorrect list in the file DocumentList.xml of
acor_hr-HR.dat ZIP archive.

Best regards,
László