Quality in l10n

Hello

As translator for LibreOffice since 2003, I often see less than optimal
english strings to translate and I believe other translators also has
the perception that the original english can be improved (other than typos).

Although the origin of the english strings are from the developers, and
they are focused in producing code and fixing bugs more than writing
beautiful english, it may be necessary that we - the community and in a
continuous process - start reviewing the linguistics in english.

Se for example the use of buttons that has ellipsis (...) as string.
What is the meaning of the ellipsis? It may vary a lot depending on the
context: it can open a file, open a new dialog, expand an current active
dialog, etc... For the layman, the ellipsis can go unoticed quite often.

So, for example, my plea is to replace ellipsis (...) by some more
meaningful string as "More...", "Browse...", "Open...", etc... Many more
strings should be reviewed because often the english language can be
extremely concise, word-saving, often dubious, and put translators into
trouble to find the exact meaning of the feature.

That may be an easy hack, or even a task for a skilled non-programmer
linguist, sort of transtating en-US to en-Intelligible...

Kind regards

Note: I am often challenged by transation of features done in Office,
and for my personal evil satisfaction, I am happy to see they also miss
the target completely. :slight_smile:

- --
Olivier Hallot
Founder, Board of Directors Member - The Document Foundation
The Document Foundation, Kurfürstendamm 188, 10707 - Berlin, Germany
Gemeinnützige rechtsfähige Stiftung des bürgerlichen Rechts
Legal details: http://www.documentfoundation.org/imprint
LibreOffice translation leader for Brazilian Portuguese
+55-21-8822-8812

Hi :slight_smile:
+1
It certainly seems that way. However it also seems that the number of
troublesome strings is remarkably small. People do sometimes post
such strings to this list and various people are here ready to discuss
what they think it means. Although some it it might look quite rough
it seems to be sufficiently obvious what is intended in most strings
so the number of posts to this list is remarkably small.

The documentation team would probably find it too tough to help with
anything like this. It's too technical and they can't handle any type
of coding such as tags or anything. Recently they began to be able to
edit wikis but that is mostly thanks to the efforts of a couple of
people from the translators team joining their team and showing that
it's really not as tough as they imagined. On the other hand a couple
of native English speakers have joined this L10n mailing list and have
been learning bits&bobs. Still, we are not very code at handling
strings with coding in them but are becoming more familiar with it as
we see it more often. So, please be gentle with us and until we
become more familiar with all this please post strings or, even
better, entire paragraphs and we can try to let you know what we
think.

There was such a thread a few days ago but i got a bit muddled and
other people were giving better answers anyway. Feel free to post
more!

Btw i am never sure about what people mean by "Office" these days.
When i say it, i mean LibreOffice. If someone uses it to mean MS
Office then i usually have to ask them which version because there are
so many inconsistencies between the different versions. With
LibreOfice it seldom matters which version.

Regards from
Tom :slight_smile:

Hello

As translator for LibreOffice since 2003, I often see less than
optimal english strings to translate and I believe other
translators also has the perception that the original english can
be improved (other than typos).

Although the origin of the english strings are from the developers,
and they are focused in producing code and fixing bugs more than
writing beautiful english, it may be necessary that we - the
community and in a continuous process - start reviewing the
linguistics in english.

Yes, and not only for localization sake, but also for the quality of
the en_US version. There is several menus/dialogs where camel case is
not used, where the '&' character is used, where ellipsis are not
useful, where Tab/Pane/Deck are used for the same, etc.
I thought that Glade would bring more quality but what is missing the
most is maybe a referential glossary for developers (or they have one
that should be updated maybe).

Se for example the use of buttons that has ellipsis (...) as
string. What is the meaning of the ellipsis? It may vary a lot
depending on the context: it can open a file, open a new dialog,
expand an current active dialog, etc... For the layman, the
ellipsis can go unoticed quite often.

So, for example, my plea is to replace ellipsis (...) by some more
meaningful string as "More...", "Browse...", "Open...", etc... Many
more strings should be reviewed because often the english language
can be extremely concise, word-saving, often dubious, and put
translators into trouble to find the exact meaning of the feature.

That may be an easy hack, or even a task for a skilled
non-programmer linguist, sort of transtating en-US to
en-Intelligible...

Yes, you're right and Glade should ease the task here because all the
dialogs are in the same place and easy to review one by one. But for
me the first task would be to set a referential glossary or update it
if it exists already.

Cheers
Sophie

Hi Sophie,

I thought that Glade would bring more quality but what is missing the
most is maybe a referential glossary for developers (or they have one
that should be updated maybe).

Although a good idea in theory, I don't think developers will spend time
checking the glossary. But yes, the glossary should be reviewed for
linguistic harmonization.

I think this is a job for a skilled linguistic community individual
(with ideally a very good knowledge of technology, engineering,
statistics, printing industry, DBA, mathematics, book editing, digital
design, image processing...).

Kind regards
- --
Olivier Hallot
Founder, Board of Directors Member - The Document Foundation
The Document Foundation, Kurfürstendamm 188, 10707 - Berlin, Germany
Gemeinnützige rechtsfähige Stiftung des bürgerlichen Rechts
Legal details: http://www.documentfoundation.org/imprint
LibreOffice translation leader for Brazilian Portuguese
+55-21-8822-8812

Three months ago I have asked similar question here. Precisely: what to
do with English string that could be improved[0]? I got few other
suggestions on how to improve that string, but nothing was done.

Also, Polish translation is periodically changed without our knowledge
and consent. Last time I remember someone has superseded all "Liczba"
with "Ilość" (first one is for countable, second for uncountable; it's
like changing "many" to "much" in English). There were some releases
that embarrass Polish translation team before we have learned about this
change.

Why do I say that? Because these are the reasons that makes
me not believe in success of this initiative. Translating LibreOffice
was ungrateful in the past and I think that this time it will not only
be enormous task, but also no one will say "thank you" and in few
months someone will carelessly waste all that has been accomplished.

I don't mean to discourage anyone from helping, but I do know that
there are people here that feel the same way.

Nevertheless, I wish you best luck and I would love to be proved wrong.

[0]
http://nabble.documentfoundation.org/libreoffice-l10n-quot-Selection-from-quot-in-Pivot-table-propose-of-changing-English-string-td4072596.html

Hi Mirosław,

Although the origin of the english strings are from the developers,
and they are focused in producing code and fixing bugs more than
writing beautiful english, it may be necessary that we - the
community and in a continuous process - start reviewing the
linguistics in english.

Three months ago I have asked similar question here. Precisely: what to
do with English string that could be improved[0]? I got few other
suggestions on how to improve that string, but nothing was done.

That's actually the issue. Andras is helping a lot here, but he is not
the one responsible of the en_US version. And I don't think somebody is
in fact.

Also, Polish translation is periodically changed without our knowledge
and consent. Last time I remember someone has superseded all "Liczba"
with "Ilość" (first one is for countable, second for uncountable; it's
like changing "many" to "much" in English). There were some releases
that embarrass Polish translation team before we have learned about this
change.

I believe you but I don't know how that can happen. I understand it's
really frustrating and that should never happen again. Please tell here
if you see some strings changed again, we should seriously investigate
this.

Why do I say that? Because these are the reasons that makes
me not believe in success of this initiative. Translating LibreOffice
was ungrateful in the past and I think that this time it will not only
be enormous task, but also no one will say "thank you" and in few
months someone will carelessly waste all that has been accomplished.

Something I don't agree with you: nobody will do that carelessly. Don't
think that if a mistake has been done it was on purpose or by being
careless. Also I think that if you don't hear a thank you, be sure that
I'm personally thankful to the l10n team to be so patient and for
working so hard, that make me specifically proud to be part of this
team. And if this can be of help, I'm alone to translate the FR version
and have very very few 'thank you' from the FR community, but never
mind, I do this work for the project itself, not for one community :wink:

I don't mean to discourage anyone from helping, but I do know that
there are people here that feel the same way.

What is important is to try to solve what can be frustrating for us and
don't stay whit this frustration. A situation can always be improved but
for that we have to know and take action on it. This is Olivier proposal
and what we should work on. Same for your issue about changes in the
string in Pootle, if it happens again, we have to investigate and
correct what went wrong.

Nevertheless, I wish you best luck and I would love to be proved wrong.

Thanks a lot, same for you and thank you for your participation here :slight_smile:

Cheers
Sophie

Hmm. Sophie - should we consider spinning-up a community for en_US?
I'm not sure I have much time to spend on it right now, but I could
perhaps help out a little here and there...

Best,
--R

Robinson Tryon wrote:

the one responsible of the en_US version. And I don't think somebody

is in fact.

Hmm. Sophie - should we consider spinning-up a community for en_US?
I'm not sure I have much time to spend on it right now, but I could

a) Can somebody layout precisely what an en_US l10n team/group/individual would do?

b) Is there an existing en_## L10N group that could assume the responsibilities, duties, etc of an en_US L10N group?

jonathon

Hi Jonathon,

Robinson Tryon wrote:

the one responsible of the en_US version. And I don't think
somebody

is in fact.

Hmm. Sophie - should we consider spinning-up a community for
en_US? I'm not sure I have much time to spend on it right now, but
I could

a) Can somebody layout precisely what an en_US l10n
team/group/individual would do?

as, I said, I think the first thing is to have an up to date glossary,
then check for the consistency in the menus/dialogs/tabs, check for the
Camel case use, check that the good terms are used for the good
functions, actions, etc and are consistent with the terms already used.
Check that the help buttons leads to help files, check that the help
files is up to date, etc. I've more but don't want to frighten you with
the tasks :wink:

b) Is there an existing en_## L10N group that could assume the
responsibilities, duties, etc of an en_US L10N group?

there is an en_GB group, but we are speaking about en_US which is the
source for all languages. There is no en_US l10n group because this is
the developer team in fact, but few are native en_US speaking. Note that
at the OOo time, the linguist (Liz Matthis) was german, but she was a
good linguist.

So to answer Robinson and you, yes, we need somebody able to check the
en_US version. I'm ready to help in each step.

Cheers
Sophie

a) Can somebody layout precisely what an en_US l10n
team/group/individual would do?

as, I said, I think the first thing is to have an up to date glossary,
then check for the consistency in the menus/dialogs/tabs, check for the
Camel case use, check that the good terms are used for the good
functions, actions, etc and are consistent with the terms already used.
Check that the help buttons leads to help files, check that the help
files is up to date, etc. I've more but don't want to frighten you with
the tasks :wink:

oh, is that all?

:stuck_out_tongue:

So to answer Robinson and you, yes, we need somebody able to check the
en_US version. I'm ready to help in each step.

Awesome.

Sophie: please point us at whatever docs/intro info you have and we
can get started stubbing-in wiki pages, creating a list of TODO items,
etc. Feel free to poke us every week or so if you need something more
from us.

Thanks,
--R

Sophie wrote:

a) Can somebody layout precisely what an en_US l10n team/group/individual would do?

as, I said, I think the first thing is to have an up to date glossary, then check for the consistency in the menus/dialogs/tabs, check for the

Camel case use, check that the good terms are used for the good functions, actions, etc and are consistent with the terms already used.

Seems to me that a lot of that could be checked by a script. Some of it should be flagged when testing using the various screen readers. (I am making some very broad assumptions about the extent of a11y testing. Starting with using a box that literally has no monitor, keyboard, or mouse hooked up to it, when checking to see what functionality broke this time around.)

Check that the help buttons leads to help files, check that the help files is up to date, etc.

IOW, a lot of grunt work that probably could have been automated years ago, but wasn't.

b) Is there an existing en_## L10N group that could assume the responsibilities, duties, etc of an en_US L10N group?

there is an en_GB group, but we are speaking about en_US which is the source for all languages. There is no en_US l10n group because this is the developer team in fact,

My thinking was that the en_## L10N group could add slidestream this into their localization work

good linguist.

You don't want a linguist here. You want somebody that suffers from anb acute case of monolingualism.

jonathon

​That's probably Pootle. It can grandiosely ​screw up on merges/updates,
especially when strings with alternative translations are involved. This
and the fact that it "forgets" translations because it has pretty much
nonexistent fuzzy matching is the main reason I keep a local SVN repository
of all Pootle project I work on. Diffing PO files is a pain, but still
better than having broken translations and having to repeatedly retranslate
something I already translated.
It's still possible this wasn't the case of course. Newer versions of
Pootle include a timeline function, that shows who and how changed the
string. If it doesn't contain a corresponding entry, it's Pootle to blame,
if it does, well, you can lynch the guilty individual to your heart's
content.

Really Sophie? So how do you feel with fact that we are translating Sidebar
second time? Michael Meeks committed Sidebar code to 4.1 branch at the last
moment and on mailing list he wrote: "I anticipate all manner of problems
with it." ( source:
http://nabble.documentfoundation.org/Libreoffice-qa-what-to-do-with-AOO-Sidebar-experimental-feature-in-libreoffice-4-1-master-and-the-4-h-td4057181.html).
This piece of functionality was hidden from users, so developers gave
us
extra job to do, though it was unneeded. Half year later they convert
Sidebar to Widget Layout and now we must do our work again. When we lost
common sense?

Perhaps no one did not it on purpose, but this style of developing software
is not correct, I think.

And en-US team need glossary, surely. I see many inconsistent phrases like
"*Please consider restart LibreOffice* to set new features".

Ps. LibreOffice is just another tool. The better way is work for some
people, not for things.

Soooo..what's the best way for me to identify and squash bugs in
strings like this? Should I just keep an eye on core.git? Perhaps
look at the compiled list of strings?

Thanks,
--R

During alpha and beta testing, and during translation process dozens
of people read en-US strings. Some people report typo bugs either in
Bugzilla, or in e-mail, and those bugs are fixed within hours. I don't
think we need to set up processes, e.g. formal review, UI committee,
approval, etc -- we had these in old OOo era -- , just we need to
exercise the "many eyeballs" principle better. It is better to report
a typo multiple times, than never.

Cheers,
Andras

Hi Andras,

And en-US team need glossary, surely. I see many inconsistent phrases like
"*Please consider restart LibreOffice* to set new features".

Soooo..what's the best way for me to identify and squash bugs in
strings like this? Should I just keep an eye on core.git? Perhaps
look at the compiled list of strings?

During alpha and beta testing, and during translation process dozens
of people read en-US strings. Some people report typo bugs either in
Bugzilla, or in e-mail, and those bugs are fixed within hours. I don't
think we need to set up processes, e.g. formal review, UI committee,
approval, etc -- we had these in old OOo era -- , just we need to
exercise the "many eyeballs" principle better. It is better to report
a typo multiple times, than never.

I agree about no need for formal review, processes etc.
However there is a need for a check of the quality of the en_US version,
per the original request from Olivier.
And it is really time consuming and error prone for localizers to not
rely on a clear and understandable explanation or when several strings
are used for the same dialog, button, action, etc.
Of course, for the typo etc, during the translation time, no problem to
rely on the l10n team to correct them (and thank you for fixing so
fast), but that does not solve the problem of the overall quality of the
en_US version.
And I know that I don't propose any satisfying solution for developers
or localizers :slight_smile: /me still thinking about it.

Cheers
Sophie

Hi,

I agree with Sophie on this, some kind of overall check.

Since there are tools like LanguageTool maybe there is a way to automatize
this:
1) set a local LT server
2) serve all LO strings with identifiers for later reference without the
XML and other tags to LT server and log all errors reported (including
spell-checking ones)
3) manually go through errors reported and extract only reported errors
that represent a true error
4) native speakers suggest changes
5) if necessary, the proposed changes are checked by a team of non-techie
native speakers
6) fix the errors as proposed in 4) and as confirmed in 5)
I see this a process that could be finished in a release cycle, i.e. for
the next release (be it 4.3 or whatever).

I am not sure but - maybe this is feasible - can we automize the LT-check
via its server of any strings changed in the code when checking it in (at
least that found errors are part of the log of check-in, so it can be later
parsed and checked en-masse by native speakers)? This way all string
check-ins/changes after the full cleanup (steps 1-6) would be monitored.

Probably this is science fiction.

Lp, m.