gettext and translations

Hi all,

I invite all of you to have a read to that message from Caolan, it's a
discussion, so please share your concerns, your solutions, your views :slight_smile:

https://lists.freedesktop.org/archives/libreoffice/2017-May/077818.html

For those who translates out of Pootle, msgctxt is to be removed, if you
need them, just tell me how you use it.

Thanks a lot in advance,
Cheers
Sophie

Hi, Sophie,

regarding 16 and 18, I think these are important or even most important
ones.

The 18 is the crucial one and should be tried in a separate world, with an
extra instance of Pootle to test what really happens. Without testing this
with real world examples it should not be introduced.

As with 16, I do think it might be important if I understand correctly how
gettext functions are working (and script using these commands, like
pomigrate2 etc.). In Slovenian and probably other languages same English
strings are translated sometimes differently. Some languages use cases, so
same word that has one form in English has several forms in other
languages. Also, some words in English have different meanings/translations
in other languages depending on context. By the change in 16 I fear that
the translation systems will offer or even automatically use the first
available translation of the English string and not offer a fuzzy string
for translators to check if translation is right (i.e. it will not make
difference between different translations of same English string that is
now defined by this part of po file).

As I am using my own translation systems, using basic Translate Toolkit
commands and scripts so I am ready to try your testing po sets on my
translation memory to see, what happens.

Also, the LO version, if this goes through, should be decided in advance,
5.4 for sure is not the one for this.

Lp, m.

Hi Sophie, *,

I read Caolán's mail, and while removing clutter is good, I'm really
worried by this proposed (impending?) removal of msgctxt. Namely, does
removing it mean that e.g. "Left", "Edit", "Number", "Print", "None" etc.
could no longer be translated differently (depending on the string's
precise context) within same module? How "big" would a module be in the
proposed new system anyway? Would all strings in same
toolbar/panel/dialog/menu/etc. be one module, or all strings in same
component, like the whole of Writer?

If answer to my first question is yes, then it would affect the more
inflected languages quite badly - and very, very badly if the answer to my
second question is anything that encompasses more than one single toolbar,
or one single panel, or dialog, or menu. The exact maximum size of one
module will vary by language.

Also, would removing msgctxt really only affect those of us who translate
offline? Doesn't Pootle distinguish strings based on that as well?

In any case, as has been said time and time again: especially for short
strings (one or two words), each and every appearance of the string HAS TO
BE independently translatable, no matter if the string is same in English -
consider these for example:
"Number" - is it noun or verb?
"None" - which gender, number, case, etc. does it have?
"Open/Save/Print" - is it dialog title or the button?

These are just a few most basic examples off the top of my head. There are
many, many more such strings out there in the Estonian translation alone -
and Estonian doesn't even have the grammatical gender distinction that e.g.
the Slavic languages have. Ask translators of e.g. Slovenian, Bulgarian,
Polish or Russian how many times they have had to fight in various
projects for splitting strings that are identical in English, but need e.g.
different gender forms in translation.

Anyway, since you asked especially for offline usecases, here's mine: when
updating the translation after several thousands of words go fuzzy after
each major release, I use msgctxt to quickly identify the (basic) context
of the string - is it dialog title, label, menuitem with/without context,
etc. - and match it to suitable TM suggestion. If the location of the
string remains unclear from msgctxt, or if it's a new feature, I go find
the actual location based on Key-ID. If I'd have to find every last one of
the fuzzies via Key-ID, translating LibreOffice would take a lot more time
and effort, and cause way lot more frustration.

Best regards,
Mihkel
Estonian team

Big +1 for what Martin and Mikhel have said about matching existing translations and not losing the ability to translate to different targets depending on the context.

If there is going to be wholesale re-organisation as to how the strings are presented for translation, two other considerations also spring to mind which I didn't see on the list:
1) Plurals? I think LO still doesn't do plurals properly (but I may be confusing projects here, apologies if it does). This would also tie in with the 1 source string » multiple target strings issue.
2) Turning en-US into a to-be-translated locale and making the source strings just en or some fake locale? That way, en-US can change case, correct en-US typos and other such stuff which is English specific to its heart's content without hitting all the other locales at the same time.

Michael

I've used the .PO based workflow from the beginning of my OOO/LO L10N stint, and yes, you'd get those problems in such environment.

You'd just have to keep the IDs for strings translations' variants/exceptions/etc. separately.

That was how I was dealing with the problem, anyway -- last time I looked, there was no easy way to save this in .PO files created from the POT sets published by OOO/LO teams. Can't rightly remember, seems the extra info was lost in migration from POT set to POT set.

-Yury

Another big +1 for what Martin and Mikhel have said about matching
existing translations and not losing the ability to translate to
different targets depending on the context.

And another bad thing if we lost qtz (KeyIDs) pseudo locale to differ
one entry from another.

Hello,

Yes, Bulgarian is also one of those inflected languages that require different translations to the same English phrase depending on the context. If we lose msgctxt, we must have another way to separate different instances of the same English phrase. Years ago at my workplace, we had this problem with an in-house l10n system and AFAIR we circumvented it by including the context in the original translatable string where necessary and then removing it in a small English ‘translation’, something like this:

Original: Number|noun
English translation: Number
Bulgarian translation: Номер

Original: Number|menu_item
English translation: Number
Bulgarian translation: Номериране

Original: Settings
English translation: <same as the original, so not included in the l10n file>
Bulgarian translation: Настройки

So this could be a fallback if the new system does not support a context/instance id separate aside from the translatable text itself. UI code can also automatically strip the part after the escaping character, thus eliminating the need for an English ‘translation’.

I rarely download POs to translate offline, but even in Pootle, msgctxt is often useful to me to decide how to translate short ambiguous phrases instead of searching for them visually in the UI.

Cheers,
Mihail

Hi,

While I appreciate the ingenuity of this approach, it would be utter
stupidity to replace a well-working implementation with msgctxt with such a
system in a l10n project the size of LibO :slight_smile:

Best, Mihkel

If I read Caoláns post well, it would not be a problem to preserve
msgtxt in the new situation.

ciao,
Cor

Hi all,

Yes, Bulgarian is also one of those inflected languages that require different translations to the same English phrase depending on the context. If we lose msgctxt, we must have another way to separate different instances of the same English

If I read Caoláns post well, it would not be a problem to preserve
msgtxt in the new situation.

Thanks to all for your feedback. We are well aware of the need of
context, as Cor said, msgtxt could be preserved. Also before going live
with it, there will be several tests run to make sure the situation is
ok for us and doesn't imply more work on our side.

Cheers
Sophie

Hi,
one thing which I miss in the current system is the possibility to
translate plurals correctly. gettext supports that - maybe that it would
be necessary to change something in code, but I think its is worth that.
Pls, think about this
best
Milos

Hi all,

I invite all of you to have a read to that message from Caolan, it's a
discussion, so please share your concerns, your solutions, your views :slight_smile:

https://lists.freedesktop.org/archives/libreoffice/2017-May/077818.html

For those who translates out of Pootle, msgctxt is to be removed, if you
need them, just tell me how you use it.

Hi Sophie, *,

I read Caolán's mail, and while removing clutter is good, I'm really
worried by this proposed (impending?) removal of msgctxt. Namely, does
removing it mean that e.g. "Left", "Edit", "Number", "Print", "None" etc.
could no longer be translated differently (depending on the string's
precise context) within same module? How "big" would a module be in the
proposed new system anyway? Would all strings in same
toolbar/panel/dialog/menu/etc. be one module, or all strings in same
component, like the whole of Writer?

Hi All

I agree with this point, 'msgtxt' often gives to us useful hints to translators, especially to those people that translate offline.
Of course, if LibreOffice developers doesn't use it, it should be unuseful for us.

What I hope for it is the following, look at the example taken by KDE:

#. +> trunk5
#: plugins/generic/skg_advice/skgadviceboardwidget.cpp:40
#, kde-format
msgctxt "Dashboard widget title"
msgid "Advices"
msgstr "Suggerimenti"

#. +> trunk5
#: plugins/generic/skg_advice/skgadviceboardwidget.cpp:57
#, kde-format
msgctxt "Noun, a user action"
msgid "Activate all advice"
msgstr "Attiva tutti i suggerimenti"

This is very useful for us :slight_smile:

If answer to my first question is yes, then it would affect the more
inflected languages quite badly - and very, very badly if the answer to my
second question is anything that encompasses more than one single toolbar,
or one single panel, or dialog, or menu. The exact maximum size of one
module will vary by language.

Also, would removing msgctxt really only affect those of us who translate
offline? Doesn't Pootle distinguish strings based on that as well?

In any case, as has been said time and time again: especially for short
strings (one or two words), each and every appearance of the string HAS TO
BE independently translatable, no matter if the string is same in English -
consider these for example:
"Number" - is it noun or verb?
"None" - which gender, number, case, etc. does it have?
"Open/Save/Print" - is it dialog title or the button?

These are just a few most basic examples off the top of my head. There are
many, many more such strings out there in the Estonian translation alone -
and Estonian doesn't even have the grammatical gender distinction that e.g.
the Slavic languages have. Ask translators of e.g. Slovenian, Bulgarian,
Polish or Russian how many times they have had to fight in various
projects for splitting strings that are identical in English, but need e.g.
different gender forms in translation.

Anyway, since you asked especially for offline usecases, here's mine: when
updating the translation after several thousands of words go fuzzy after
each major release, I use msgctxt to quickly identify the (basic) context
of the string - is it dialog title, label, menuitem with/without context,
etc. - and match it to suitable TM suggestion. If the location of the
string remains unclear from msgctxt, or if it's a new feature, I go find
the actual location based on Key-ID. If I'd have to find every last one of
the fuzzies via Key-ID, translating LibreOffice would take a lot more time
and effort, and cause way lot more frustration.

Best regards,
Mihkel
Estonian team

Ciao