LibreOffice in Chinese (China)

Hi,

I would like to help on coordinating Chinese (China) translation of LO
if it is OK. I am now maintaining GNOME's translation in Chinese
(China) as a committer, but have never contributed to OOo ever before.

As far as I know Chinese translations of OOo is managed by Sun G11n
group, so the community involvement is somehow difficult to take
place. OOo in Chinese (China) uses a quite different translation
style/guideline from other mainstream projects like GNOME/KDE/TP. I
would like to change this situation by revising old translations and
get new translations settled in a same style like other projects
mentioned before.

I would need some help from you on where I can find the translation
files. There is only a wiki page[1] about additional strings we'd
translate for LO, but will we maintain our own version of all
translatable strings? or just these strings from additional LO
features?

[1]http://www.freedesktop.org/wiki/Software/LibreOffice/i18n/

Hi Aron,

2010.11.11 08:38, Aron Xu rašė:

I would like to help on coordinating Chinese (China) translation of LO
if it is OK. I am now maintaining GNOME's translation in Chinese
(China) as a committer, but have never contributed to OOo ever before.

Welcome onboard!

As far as I know Chinese translations of OOo is managed by Sun G11n
group, so the community involvement is somehow difficult to take
place. OOo in Chinese (China) uses a quite different translation
style/guideline from other mainstream projects like GNOME/KDE/TP. I
would like to change this situation by revising old translations and
get new translations settled in a same style like other projects
mentioned before.

I would need some help from you on where I can find the translation
files.

You can translate online using Pootle @ http://pootle.documentfoundation.org/ or download .po files from there, translate offline, and upload them back (or send them to this list, I think).
After you register, you'll most probably want me or Andre to give you the manager rights on zh_CN.

There is only a wiki page[1] about additional strings we'd
translate for LO, but will we maintain our own version of all
translatable strings? or just these strings from additional LO
features?

[1]http://www.freedesktop.org/wiki/Software/LibreOffice/i18n/

For 3.3, it's these additional features only. After that, the plans are to maintain our whole localization independently. So, for 3.3 you should probably not change the terminology, but for 3.4, I think you're welcome to.

Note: these are my personal views, and I may be wrong about some aspects, in which case I hope someone with authority will correct me. :wink:

Regards,
Rimas

Thanks for your kindly reply, I've registered an account, "happyaron",
please add me as the supervisor of Chinese China translations.

2010.11.11 10:37, Aron Xu rašė:

Thanks for your kindly reply, I've registered an account, "happyaron",
please add me as the supervisor of Chinese China translations.

This is now done.

Best regards,
Rimas

Thank you very much.

Another question, the file lo-build-zh_CN.po shown on Pootle are not
in sync with freedesktop.org git repository.

2010.11.11 11:24, Aron Xu rašė:

2010.11.11 10:37, Aron Xu rašė:

Thanks for your kindly reply, I've registered an account, "happyaron",
please add me as the supervisor of Chinese China translations.

This is now done.

Best regards,
Rimas

--
E-mail to l10n+help@libreoffice.org for instructions on how to unsubscribe
List archives are available at http://www.libreoffice.org/lists/l10n/
All messages you send to this list will be publicly archived and cannot be
deleted

Thank you very much.

Another question, the file lo-build-zh_CN.po shown on Pootle are not
in sync with freedesktop.org git repository.

Just follow the "LibreOffice 3.3 string freeze, Pootle update" thread. Even if it's not in sync now, I suppose it will be tomorrow.

By the way, please add yourself to the wiki page @ http://wiki.documentfoundation.org/Language_Teams .

Also note that you don't have to quote full message text (esp. the footer) when replying. Just delete everything that's irrelevant to your particular reply. :wink:

Rimas

2010.11.11 12:46, Rimas Kudelis rašė:

By the way, please add yourself to the wiki page @ http://wiki.documentfoundation.org/Language_Teams .

Oops, apparently, there's already a Chinese Simplified team (I just had to sort the list alphabetically). Aron, please coordinate your effort with Dean Lee <xslidian+lo@gmail.com>.

I wonder why Dean doesn't have a Pootle account, or at least proper manager rights there.

Regards,
Rimas

CC'ing Dean Lee.

2010.11.11 12:46, Rimas Kudelis rašė:

By the way, please add yourself to the wiki page @
http://wiki.documentfoundation.org/Language_Teams .

Oops, apparently, there's already a Chinese Simplified team (I just had to
sort the list alphabetically). Aron, please coordinate your effort with Dean
Lee <xslidian+lo@gmail.com>.

I wonder why Dean doesn't have a Pootle account, or at least proper manager
rights there.

Regards,
Rimas

Yes there is already a zh-hans item on that page. I'm not sure whether
we'd change it to zh_CN, because in glibc our language code is zh_CN,
and Mozilla use zh-CN. "zh-hans" is the non-official form to express
the combination of zh_CN and zh_SC, similarly "zh-trans" is for zh_TW
and zh_HK.

I've talked with Dean Lee today, and we might want him to say about
his opinion here, :slight_smile:

@Lee:
What's your opinion about Chinese (China) translation of LO (language
code and coordinator role)?

2010.11.11 13:21, Aron Xu rašė:

Yes there is already a zh-hans item on that page. I'm not sure whether
we'd change it to zh_CN, because in glibc our language code is zh_CN,
and Mozilla use zh-CN. "zh-hans" is the non-official form to express
the combination of zh_CN and zh_SC, similarly "zh-trans" is for zh_TW
and zh_HK.

Dunno where you took that info from. zh-Hans (means Chinese in Han script Simplified variant) is an official code from BCP47, which should be preferred to zh-CN (Chinese in China).

Similarly, zh-Hant (Chinese in Han script Traditional variant) are preferred to zh-TW and zh-HK.

Not all environments are already using these new codes, but in general, the direction of movement is towards them, not from them. For example, Apple and Microsoft have introduced them in their products recently.

Regardless of said above, I'm quite positive that we should take OS expectations into account, and if zh-Han* locale codes aren't recognised by Linux, our Linux packages should probably use older, recognised locale codes.

Rimas

Dear all,

I'm sorry that I didn't pay as much attention to the mailing list as to the
wiki.

In terms of experience, I recommend Aron Xu, who has contributed a lot to
the open-source community, as the maintainer of the Simplified Chinese team
and its mailing lists.
I myself would prefer to be a 'helper', who can pay more time on his
favorite parts. :smiley:
(I just registered on the pootle after Aron Xu told me about it during our
talk. Thanks, Aron!)

And on the language code issue, I can see 'zh-hans/t' (Simplified /
Traditional Han character) more and more often used now. (As it's included
in several w3c recommended files since 2003, to prevent confusion or
dispute.)
The only (and deadly) imperfection I can see is its compatibility issue with
environments that only accept 'zh-CN/TW'.
So I insist on using 'zh-hans/t' at least in documentation.
(I can remember that Drupal, TDF's future CMS, is using this tag.)

(References: RFC 4646; ISO 15924; W3C i18n QA-CSS-lang)

Best wishes,
Dean (via my Android)

在 2010-11-11 下午7:21,"Aron Xu" <happyaron.xu@gmail.com>编写:

CC'ing Dean Lee.

2010.11.11 12:46, Rimas Kudelis rašė:

By the way, please add yourself to the wiki page @
http://wiki.documentfoundation.org/Language_Teams .

Oops, apparently, there's already a Chinese Simplified team (I just had to
sort the list alphabetically). Aron, please coordinate your effort with

Dean

Lee <xslidian+lo@gmail.com <xslidian%2Blo@gmail.com>>.

I wonder why Dean doesn't have a Pootle account, or at least proper

manager

rights there.

Regards,
Rimas

Yes there is already a zh-hans item on that page. I'm not sure whether
we'd change it to zh_CN, because in glibc our language code is zh_CN,
and Mozilla use zh-CN. "zh-hans" is the non-official form to express
the combination of zh_CN and zh_SC, similarly "zh-trans" is for zh_TW
and zh_HK.

I've talked with Dean Lee today, and we might want him to say about
his opinion here, :slight_smile:

@Lee:
What's your opinion about Chinese (China) translation of LO (language
code and coordinator role)?

I'm not so familiar about BCP 47 but I heard it was designed for
Internet application usage (such as HTML). So I agree to use it in our
wiki and other web documentations.

But for the software, I think we are using the ISO 639-1/2 in most
cases. Lists of languages in these two ISO standards are [1] and [2].
I think LO is a general desktop application suit and should follow a
standard that is well accepted, there won't be a better choice than
using ISO 639-1/2, which is compatible (or almost compatible) to most
platforms (Windows[3], Mac[4], Linux[5] and other *nix variants).

BCP 47 is used in OOo (correct me if I'm wrong!), but I don't think
it's a wise choice because we have to first map ISO 639 codes to
BCP-47 for we use gettext have i18n support, then we have to map BCP
47 back to Unix locales (which is almost ISO 639 codes) on most *nix
platforms. Such a process is complicated.

Language code usage of software are in a mess. Chinese on Windows, for
example, we can find:
* zh_CN (ISO 639-1) for Windows itself (as in [3]);
* zh-Hans (BCP 47) in Vista and .NET 2.0;
* zh-CHS which should be replaced by zh-Hans but still being widely
used till even today because of Windows XP;
* zho (ISO 639-2) are being used in some Microsoft documents as well.

There is no doubt that we should follow ISO 639-1 because all
standards mentioned above are based on it. For languages that do not
have an ISO 639-1 code, I suggest we use ISO 639-2 so we can easily
use gettext and support *nix systems, and map the codes to respective
platforms (Windows, Mac, etc) if necessary.

[1]http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
[2]http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
[3]http://msdn.microsoft.com/en-us/library/ms533052
[4]http://support.apple.com/kb/TA26811?viewlocale=en_US
[5]http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales

2010.11.11 16:08, Aron Xu rašė:

2010.11.11 13:21, Aron Xu rašė:

Yes there is already a zh-hans item on that page. I'm not sure whether
we'd change it to zh_CN, because in glibc our language code is zh_CN,
and Mozilla use zh-CN. "zh-hans" is the non-official form to express
the combination of zh_CN and zh_SC, similarly "zh-trans" is for zh_TW
and zh_HK.

Dunno where you took that info from. zh-Hans (means Chinese in Han script
Simplified variant) is an official code from BCP47, which should be
preferred to zh-CN (Chinese in China).

Similarly, zh-Hant (Chinese in Han script Traditional variant) are preferred
to zh-TW and zh-HK.

Not all environments are already using these new codes, but in general, the
direction of movement is towards them, not from them. For example, Apple and
Microsoft have introduced them in their products recently.

Regardless of said above, I'm quite positive that we should take OS
expectations into account, and if zh-Han* locale codes aren't recognised by
Linux, our Linux packages should probably use older, recognised locale
codes.

Rimas

I'm not so familiar about BCP 47 but I heard it was designed for
Internet application usage (such as HTML). So I agree to use it in our
wiki and other web documentations.

Well, I haven't read the whole standard either, but I don't think Internet is its only application. Similarly, MIME also contains a letter for Internet, but it's being used way more widely. :wink:

But for the software, I think we are using the ISO 639-1/2 in most
cases. Lists of languages in these two ISO standards are [1] and [2].

ISO 639 defines languages, not locales or scripts. Basically, BCP 47 combines its codes from those defined in ISO 639, ISO 15924, and ISO 3166.

I think LO is a general desktop application suit and should follow a
standard that is well accepted, there won't be a better choice than
using ISO 639-1/2, which is compatible (or almost compatible) to most
platforms (Windows[3], Mac[4], Linux[5] and other *nix variants).

I don't see zh-CN or zh-TW defined in ISO 639. :wink: Code zh is defined, but the second part – CN and TW – comes from a totally different standard (ISO 3166).
Similarly, Hans and Hant are defined in ISO 15924.

Also, check out [6] for Mac.

BCP 47 is used in OOo (correct me if I'm wrong!), but I don't think
it's a wise choice because we have to first map ISO 639 codes to
BCP-47 for we use gettext have i18n support, then we have to map BCP
47 back to Unix locales (which is almost ISO 639 codes) on most *nix
platforms. Such a process is complicated.

I don't think we're using gettext (yet).

Language code usage of software are in a mess. Chinese on Windows, for
example, we can find:
  * zh_CN (ISO 639-1) for Windows itself (as in [3]);

Well, [3] lists codes for Chinese in different territories, not for chinese written in different scripts.

  * zh-Hans (BCP 47) in Vista and .NET 2.0;
  * zh-CHS which should be replaced by zh-Hans but still being widely
used till even today because of Windows XP;

From what I've read, it's going to be obsoleted in future versions of .Net.

  * zho (ISO 639-2) are being used in some Microsoft documents as well.

There is no doubt that we should follow ISO 639-1 because all
standards mentioned above are based on it. For languages that do not
have an ISO 639-1 code, I suggest we use ISO 639-2 so we can easily
use gettext and support *nix systems, and map the codes to respective
platforms (Windows, Mac, etc) if necessary.

That's exactly what BCP 47 does. The "zh" part of zh-Hans is exactly the code assigned for Chinese by ISO 639-1. :wink:

[1]http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
[2]http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
[3]http://msdn.microsoft.com/en-us/library/ms533052
[4]http://support.apple.com/kb/TA26811?viewlocale=en_US
[5]http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales

[6] http://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html

Rimas

2010.11.11 16:08, Aron Xu rašė:

2010.11.11 13:21, Aron Xu rašė:

Yes there is already a zh-hans item on that page. I'm not sure whether
we'd change it to zh_CN, because in glibc our language code is zh_CN,
and Mozilla use zh-CN. "zh-hans" is the non-official form to express
the combination of zh_CN and zh_SC, similarly "zh-trans" is for zh_TW
and zh_HK.

Dunno where you took that info from. zh-Hans (means Chinese in Han script
Simplified variant) is an official code from BCP47, which should be
preferred to zh-CN (Chinese in China).

Similarly, zh-Hant (Chinese in Han script Traditional variant) are
preferred
to zh-TW and zh-HK.

Not all environments are already using these new codes, but in general,
the
direction of movement is towards them, not from them. For example, Apple
and
Microsoft have introduced them in their products recently.

Regardless of said above, I'm quite positive that we should take OS
expectations into account, and if zh-Han* locale codes aren't recognised
by
Linux, our Linux packages should probably use older, recognised locale
codes.

Rimas

I'm not so familiar about BCP 47 but I heard it was designed for
Internet application usage (such as HTML). So I agree to use it in our
wiki and other web documentations.

Well, I haven't read the whole standard either, but I don't think Internet
is its only application. Similarly, MIME also contains a letter for
Internet, but it's being used way more widely. :wink:

But for the software, I think we are using the ISO 639-1/2 in most
cases. Lists of languages in these two ISO standards are [1] and [2].

ISO 639 defines languages, not locales or scripts. Basically, BCP 47
combines its codes from those defined in ISO 639, ISO 15924, and ISO 3166.

Ah, yes.

I think LO is a general desktop application suit and should follow a
standard that is well accepted, there won't be a better choice than
using ISO 639-1/2, which is compatible (or almost compatible) to most
platforms (Windows[3], Mac[4], Linux[5] and other *nix variants).

I don't see zh-CN or zh-TW defined in ISO 639. :wink: Code zh is defined, but
the second part – CN and TW – comes from a totally different standard (ISO
3166).
Similarly, Hans and Hant are defined in ISO 15924.

Also, check out [6] for Mac.

But keep in mind Hans and Hant cannot cover different kinds of
Chinese, for example there are differences between HK and TW, but they
are combined to one single Hant, which perhaps could not be accepted
by people who are using them, :slight_smile:

BCP 47 is used in OOo (correct me if I'm wrong!), but I don't think
it's a wise choice because we have to first map ISO 639 codes to
BCP-47 for we use gettext have i18n support, then we have to map BCP
47 back to Unix locales (which is almost ISO 639 codes) on most *nix
platforms. Such a process is complicated.

I don't think we're using gettext (yet).

I'm not sure about this point, things should be not so difficult if we
do not use it.

Language code usage of software are in a mess. Chinese on Windows, for
example, we can find:
 * zh_CN (ISO 639-1) for Windows itself (as in [3]);

Well, [3] lists codes for Chinese in different territories, not for chinese
written in different scripts.

* zh-Hans (BCP 47) in Vista and .NET 2.0;
 * zh-CHS which should be replaced by zh-Hans but still being widely
used till even today because of Windows XP;

From what I've read, it's going to be obsoleted in future versions of .Net.

Yes, I've wrote that it is to be replaced, but remember Windows XP is
still running on many people's desktops and laptops, I'm not sure this
situation can change very rapidly in near future. And some other
applications are still using zh-CHS as a tag on there release
documents to tell people it is in Simplified Chinese, no matter
whether they have to use zh-Hans in new .NET, they just show such
information to end users and people get confused.

* zho (ISO 639-2) are being used in some Microsoft documents as well.

There is no doubt that we should follow ISO 639-1 because all
standards mentioned above are based on it. For languages that do not
have an ISO 639-1 code, I suggest we use ISO 639-2 so we can easily
use gettext and support *nix systems, and map the codes to respective
platforms (Windows, Mac, etc) if necessary.

That's exactly what BCP 47 does. The "zh" part of zh-Hans is exactly the
code assigned for Chinese by ISO 639-1. :wink:

Well, I don't mind BCP 47 if it works well, but I've mentioned before
that it cannot cover all different variants, it's just a loosely
defined standard. To be more precise to end users, we might have to
place two sets of translations, zh_TW and zh_HK, into zh-Hant
packages. They are not the same, so we need to do it separately, even
if we make them into a single package to end users.

Listing language variants with different regions is a good way to
solve conflicts in our development. On the other hand, using a loosely
defined name for our release language pack (which contains everything
fit into the category) is probably good for users.

2010.11.11 17:12, Aron Xu rašė:

But keep in mind Hans and Hant cannot cover different kinds of
Chinese, for example there are differences between HK and TW, but they
are combined to one single Hant, which perhaps could not be accepted
by people who are using them, :slight_smile:

Of course. The question though is if there will ever be two different Simplified Chinese localizations of LO. Are those differences really noticable?

Quoting the Apple document I linked to before:

The new standard defines new tags for the traditional Chinese (|Hant|) and simplified Chinese (|Hans|) scripts. Thus, traditional Chinese spoken in any country uses the code |zh-Hant|. Traditional Chinese, as it is spoken in Taiwan, now uses the locale code |zh-Hant_TW|.

IMHO, you can think of zh-Hans as zh@hans in glibc terms. Script is simply a different kind of locale modifier than country. Since glibc has four Chinese locales, they probably don't need the modifier. But if they needed it, the final locale codes would probably look like zh_CN@hans.

From what I've read, it's going to be obsoleted in future versions of .Net.

Yes, I've wrote that it is to be replaced, but remember Windows XP is
still running on many people's desktops and laptops, I'm not sure this
situation can change very rapidly in near future.

XP will be EOL'd in 2014 anyway. It won't stay for ever, after all...
Regardless of that, I think our goal is for LO to work as expected in your locale. If it does, I don't see the exact language code as a problem.

And some other
applications are still using zh-CHS as a tag on there release
documents to tell people it is in Simplified Chinese, no matter
whether they have to use zh-Hans in new .NET, they just show such
information to end users and people get confused.

Hm, IMO, locale code is a technicality that the end users should not care about at all.

Well, I don't mind BCP 47 if it works well, but I've mentioned before
that it cannot cover all different variants, it's just a loosely
defined standard. To be more precise to end users, we might have to
place two sets of translations, zh_TW and zh_HK, into zh-Hant
packages. They are not the same, so we need to do it separately, even
if we make them into a single package to end users.

Not necessary, as stated above. zh_TW@hant and zh_HK@hant could be used, and then again specifying the country code would probably make the need for a script code obsolete... Ha!:slight_smile:

Listing language variants with different regions is a good way to
solve conflicts in our development. On the other hand, using a loosely
defined name for our release language pack (which contains everything
fit into the category) is probably good for users.

I think in the end it's about Chinese (Simplified) vs. Chinese (China). Until we don't have two country versions of Simplified or Traditional, we can just skip country codes, I think.

Rimas

2010.11.11 17:35, Rimas Kudelis rašė:

2010.11.11 17:12, Aron Xu rašė:

Well, I don't mind BCP 47 if it works well, but I've mentioned before
that it cannot cover all different variants, it's just a loosely
defined standard. To be more precise to end users, we might have to
place two sets of translations, zh_TW and zh_HK, into zh-Hant
packages. They are not the same, so we need to do it separately, even
if we make them into a single package to end users.

Not necessary, as stated above. zh_TW@hant and zh_HK@hant could be used, and then again specifying the country code would probably make the need for a script code obsolete... Ha!:slight_smile:

Listing language variants with different regions is a good way to
solve conflicts in our development. On the other hand, using a loosely
defined name for our release language pack (which contains everything
fit into the category) is probably good for users.

I think in the end it's about Chinese (Simplified) vs. Chinese (China). Until we don't have two country versions of Simplified or Traditional, we can just skip country codes, I think.

Interestingly enough, relevant locales I see in Pootle are: zh_CN, zh_HK, zh_TW.

Which I guess means that these are the codes that will be used at least for 3.3. :wink:

Rimas

Well, zh_CN and zh_SG shoul fall in Hans's category, but I've never
seen any people translating software to zh_SG, they just use zh_CN
directly.

zh_HK and zh_TW fall in Hant's category, which should have a solution.
If we define it like zh_HK@hant, the modifier @hant can be omitted,
because there is nothing else could make people puzzled if the
modifier does not present. (In fact @hant could be used to help put
correct files into correct language pack, :smiley: )

2010.11.11 17:47, Aron Xu rašė:

Interestingly enough, relevant locales I see in Pootle are: zh_CN, zh_HK,
zh_TW.

Which I guess means that these are the codes that will be used at least for
3.3. :wink:

Well, zh_CN and zh_SG shoul fall in Hans's category, but I've never
seen any people translating software to zh_SG, they just use zh_CN
directly.

Which is exactly why zh_Hans is more correct. :wink:

zh_HK and zh_TW fall in Hant's category, which should have a solution.
If we define it like zh_HK@hant, the modifier @hant can be omitted,
because there is nothing else could make people puzzled if the
modifier does not present. (In fact @hant could be used to help put
correct files into correct language pack, :smiley: )

I didn't say there *has* to be only one Chinese Traditional language pack. :wink: It would make sense to have a "general" Hant language pack if we only had one localization, but if we have two, we can have two language packs too.

I suggest we stop this discussion here, or go off-list in order not to bother people too much... :wink:

Rimas