Help with Lightproof localisation for LibreOffice

Hi there,

I am the owner of the Tamil localisation effort, and am creating the
grammar checker for LibreOffice using Lightproof. I am having trouble
matching the diacritic marks that are so common in Tamil. For example --

\b(த[ா-ௌ]*\S*)\b

will match தாலம but not தாலம்

I would like to match the whole word, including the diacritic mark; but I'm
not sure how to trap it.

Would appreciate if you had faced similar problem for your language and
have solved it.

Cheers,
Elanjelian

Hi there,

I am the owner of the Tamil localisation effort, and am creating the
grammar checker for LibreOffice using Lightproof. I am having trouble
matching the diacritic marks that are so common in Tamil. For example --

\b(த[ா-ௌ]*\S*)\b

will match தாலம but not தாலம்

I would like to match the whole word, including the diacritic mark; but I'm
not sure how to trap it.

Would appreciate if you had faced similar problem for your language and
have solved it.

Cheers,
Elanjelian

Hi,

AFAIK it's a known bug of python2. It doesn't support unicode completely. So you need to switch to python3 to process your language without this kind of problems.

I'm not sure when Lightproof will deliver python3 support, probably László tell you more.