Update of the localization files

2010.12.30 11:54, Martin Srebotnjak rašė:

So even if we don't want to use pootle we will have to?

No, that was just a piece of info that you seemed not to know. I was not implying that you will or will not have to use Pootle, that's not up to me to decide.

Rimas

2010.12.30. 10:36 keltezéssel, Martin Srebotnjak írta:

The Slovenian team will be working with sdf files (we have our own
localization system that works with sdf, splits to po-s and finalizes back
to sdf), so -1 from me.

Lp, m.

You can use whatever tool you want. English sdf file will be provided
separately. At the end you'll have a localized sdf. You can convert it
to a bunch of po files and upload them to Pootle. That's all.

This is one more step on your side but simplifies the process on the
other side. More than 100 languages need to be managed. It would be
desirable, if everybody used the same process. Even having two processes
in OOo project (one is Pootle the other is sdf via issuezilla) is
confusing sometimes.

Andras

OOo does not use issuezilla anymore, but langteams provide a ftp/http where
from the sdf is taken automatically prior a build.

So it is as automatic as taking files from Pootle, just that two scripts
must be run, not just one.

Lp, m.

2010.12.30. 10:54 keltezéssel, Martin Srebotnjak írta:

So even if we don't want to use pootle we will have to?

My proposal was to *store* translation in Pootle. It is up to you how to
make translation. One option is Pootle but you may choose other option.

Even OOo is going into a better direction - they have now automated the
build process of localized builds and they automatically grab Slovenian sdf
from our ftp address, when they need it.

Getting rid of the l10n repository is also the goal. It is huge,
currently 2.4 GB, and is getting bigger with every commit. It takes ages
to download, it takes minutes to commit & push something, cgit web
interface timeouts when I query this repository etc. git is not good at
handling large text files. AFAIK OOo has the same problem and I have not
seen their solution.

Andras

Andras,
if you can make an automated script for me, so that I do not have to log
into Pootle, select the right menu, select the file from my disk, then wait
for upload, then check if it is in etc.

If the LO team can make an automated process that we upload the zip via ftp
or something, maybe that is a solution.

Thanks, m.

You still have to log into something ( for example an ftp account),
and it will cost more efforts for people who maintain it.

If you really want to make it automatically, you still can write a
script (e.g. Perl WWW:Mechanize, or even using curl) to log into your
Pootle account and submit files.

Hi Martin, Andras,

2010.12.30. 10:54 keltezéssel, Martin Srebotnjak írta:

So even if we don't want to use pootle we will have to?

My proposal was to *store* translation in Pootle. It is up to you how to
make translation. One option is Pootle but you may choose other option.

I agree with Andras, here. I, for myself, use Pootle only as a repository, never for translation, sometime to make some checks.

So this is not an obligation, we are trying to see how to simplify the process for the teams taking care of the files at the different steps of the localization process.
As Kendy explained, having .sdf file in GIT (and any SCM) is a mess because of the file format. So if in our process it is possible to remove this .sdf step, that will save time to several of us. Of course, if it's a burden for your team, we will find a way to make you happy, no probem.

Even OOo is going into a better direction - they have now automated the
build process of localized builds and they automatically grab Slovenian sdf
from our ftp address, when they need it.

Getting rid of the l10n repository is also the goal. It is huge,
currently 2.4 GB, and is getting bigger with every commit. It takes ages
to download, it takes minutes to commit& push something, cgit web
interface timeouts when I query this repository etc. git is not good at
handling large text files. AFAIK OOo has the same problem and I have not
seen their solution.

Yes and taking the issue where it is, is imho more efficient than adding ways to workaround it. I will rely and Andras and Rimas here as I know nothing about Pootle admin and Git.

But again Martin, rest assure that we won't put something in place that won't let you contribute in the way you are used to. Let just see how we can make things a little bit better for every body, how it's possible.

Kind regards
Sophie

Hi Kendy,

Hi Sophie,

Thank you for all the explanations! :slight_smile:

This is now two/three months that we didn't touch the files because we
do not have them in the LO Pootle repository and we are not working any
more on the OOo Pootle repository. So some teams have now a fair amount
of issues to fix in their files, that will take time and resources, and
we need to have the complete set of files in the LO Pootle repository
for that.

I see, OK. What is at the moment blocking the import of the content of
the OOo Pootle into the LO Pootle, please? Just some missing tooling,
or the decision of what is the source for the translations& how to
organize them?

I think we have kept the file "LO only" for the 3.3, because we didn't have any process and we were not sure of who will admin, take care of every thing or even participate ;).
We have had several members of our team able to take care of the Pootle server, of updating the files and pushing them into the sources (and BTW thanks a lot for their work, I'm proud to belong to this family :slight_smile: Also we get now a large number of teams participating to LO l10n.
So now that 3.3 translation is almost over, I think we need to go further and have a complete process in place again. Hence my questions here :wink:

Ah, maybe I understand now :wink: So of course, it is up to you to define
if you want to have the translations merged from the OOo tree to the LO
tree for 3.4, or not. I understand it that you'd prefer not to, ie.
l10n repo (containing the localize.sdf's) untouched by the merges from
OOo, right?

That was what I was not sure about: all the new features and bug fixes
for OOo will be merged to the LO tree for 3.4.

Most probably we won't be merging everything, which might cause trouble
when merging the localizations as a whole :frowning:

yes, that might cause problems.

In that case yes, we want the l10n repo merged and containing all the
new features or fixes strings from OOo. And the sooner the better
whatever the amount of strings :slight_smile:
So that means that we can extract the strings from the last OOoDEV and
merge them with our LO file to have the complete (UI+HC2) set of strings
up to date until now?

Based on what you wrote, I think for LO master (towards-3.4), the best
would be to extract all the strings from the current git repositories
(ie. from the LO master branch, not from OOoDEV) to have the complete
set(so that it would look similar to what is in the OOo Pootle now, but
based on LO sources), and msgmerge the translations from OOo and from
lo-build.po. That way, it would be easy to merge updated translations
from OOo later (should there be any), while still having the LO strings
as the base. Or are there reasons not to do that?

I don't see any issue, proceeding from our branch may be the best way, you're right. Others, do you see any issue with your process if we proceed as Kendy proposed?

BTW - would it help you if we got rid of the sdf files, and instead we
had .po files in the l10n git repository? [For sure it would help us
who work with the git repos, because the sdf file format is just
something incredibly terrible for version control.] Would you be able
to merge directly from the OOo Pootle, or from .po files produced by
that, or do you still need .sdf for part of your workflow?

Provided we answer Andras points and Martin question, for me it's ok. Also, we need to make sure that the teams working with xliff files are happy too.
L10n teams, if you see something missing or wrong for you, please do not hesitate to raise your voice as Martin did. We are discussing and having all the issues in the hand at the beginning is always a better way to go :wink:

Kind regards
Sophie

I am an active translator for KDE as well. The only translatable files KDE
offers are .po files, coming from .pot files. The total amount of .pot files
in trunk is ~100MB. With almost 100% translated the total amount of .po files
for one language in trunk is ~200MB. This should be multiplied by the total
number of languages. So I assume it is far more than 2.4 GB. Besides trunk
there is the whole set of branches still available. For all these files KDE
still uses subversion, although the rest of KDE is now using git.
I do not experience any problem with committing a single .po files, it is
almost instantaneous, and committing a bunch is around 10 seconds.
Once per 24 hours there is a script running to synchronise between the source
from which the .pot files come and between these .pot files and the .po files.
I assume that in due time subversion will be replaced by git, but the work
flow will not change much.

I download/update my language .po files using svn (with a very simple script)
and use Lokalize with a Translation Memory database, which is now 125M, for
translation and a glossary for words of .5 MB. Uploading is using svn commit.
Compared to using Pootle this is far more easy. Using svn or git for download
and upload is using less bandwith, because only the differences are
transported.

So my suggestion is to look at that method also.

Hi Freek,
[...]

I download/update my language .po files using svn (with a very simple script)
and use Lokalize with a Translation Memory database, which is now 125M, for
translation and a glossary for words of .5 MB. Uploading is using svn commit.
Compared to using Pootle this is far more easy. Using svn or git for download
and upload is using less bandwith, because only the differences are
transported.

Thank you very much for your feedback on this. It's great if we can share others experience on this.

The only thing I'm afraid of is the technical skills needed for the steps to download or upload the files. Also what will be the process under Windows, is it easy to commit on git or svn or whatever using other OS than Linux (I remember at the very beginning of the OOo FR site, I was under Windows 98 and comits to the cvs repository was not exactly what I call fun :wink:

Kind regards
Sophie

Google showed me http://tortoisesvn.tigris.org/ as the site for a subversion
client on Windows. It talks about easy to use from Windows Explorer. So I hope
a Windows user is able to come up with a HOWTO or Cookbook for the whole
process. The latest version is from November 2010, so it is not a dead end.
Using subversion is very easy, just one command to set it up and two commands
for regular use, an update command and a commit command. If anything goes
wrong I just delete the whole directory and subdirectories and does the setup
command again.

You did not raise a concern for a translation utility on Windows with spell
checking, a glossary of word translation and a translation memory, so I assume
that is not a problem. Otherwise, someone familiar with such a utility should
cover that part.

Managing .po files with git is quite fast. As I am a GNOME committer
and manage translations in such way, I totally agree this would be an
ideal solution to get rid of the horrible situation now.

I've tried xliff2po to convert documentation file to po, and it
produces a po file with no header, but many duplicate msgids. We are
lacking tools like msgmerge, msgfmt for xliff/sdf translation files,
which significantly increases the hardness of managing them.

Hi Martin,

The Slovenian team will be working with sdf files (we have our own
localization system that works with sdf, splits to po-s and finalizes
back to sdf), so -1 from me.

What exactly your system that works with sdf does? Is it available
somewhere? [Public git/svn/hg/cvs?] From what you described above, you
work with po's in the end, so I am somehow missing the point of
insisting on sdf :wink:

Thank you,
Kendy

Ok, Kendy, I can upload po's in a zip, if that is fine for you. But I do not
want to check in stuff etc. I would like to either give you a download link
like for OOo where they download from predefined addresses for dev, 3.3,
3.2.1 etc. or I could ftp automagically to some site; it does not need to be
sdf delivered. But I need a sdf of English strings, please.

If I give you the address you might press some buttons and make a mess, so
it is better I do not put it publicly here. It is not much protected, this
site. :slight_smile:

Lp, m.

There is one good thing about sdf, with gsicheck one can check if there are
errors in the tags etc. But I can test that with my sdf and upload a zip of
po-s, no problem.

Lp, m.

Hi Andras,

> BTW - would it help you if we got rid of the sdf files, and instead we
> had .po files in the l10n git repository? [For sure it would help us
> who work with the git repos, because the sdf file format is just
> something incredibly terrible for version control.] Would you be able
> to merge directly from the OOo Pootle, or from .po files produced by
> that, or do you still need .sdf for part of your workflow?

Assumption: translate-toolkit can convert translatable content back and
forth without loss of information.

Yes, I assume the same thing :slight_smile:

I believe this assumption is true. Translate-toolkit has been used for a
long time by many teams. My suggestion is that all l10n teams should use
Pootle to submit their translations. This does not mean that they must
use Pootle to translate. They can use Pootle, offline PO editing tools,
xliff, or edit sdf file directly - it does not matter. However at the
end translations must be uploaded to Pootle in .po format. Pootle - with
a git back-end - will contain the "master" copy of translations.

Sounds great to me.

English sdf file should be produced regularly for Pootle update. l10n
repository will be obsolete. Build should take .po files from git
(Pootle back-end) and generate localized sdf files build-time.

Problems:

1. How to import existing LibreOffice translations to Pootle?

l10n repository contains monolingual (and sometimes outdated) sdf files.
We can export up-to-date bilingual (en-US + translated) sdf files from
the source, but we cannot make a difference between untranslated strings
and strings that are intentionally same as en-US (URLs, code, function
names, language names etc.). Sun stored translations in a database (not
public) and they kept track of this information - this cannot be
extracted from the source.

I think that with a simple heuristic, we might get quite good results:

- if there exists a language that has a translation => mark the string
as not translated
- if there no translation in any language, mark as fuzzy; it probably is
an URL or something

We can play a bit with the % of languages that have the translation for
the fuzzy / not translated at all split; I hope it might work reasonably
well.

2. How to merge translations from OpenOffice.org?

I think it should be decided individually for each language team.
Automatic merge should happen for only those languages that do not have
LibreOffice translators. Of course technical support should be provided
for all. Translators don't need to understand the technical details. I
think members of this list have the knowledge, we can put together a
good process.

Sounds good to me.

Thank you,
Kendy

Hi Martin,

> Ok, Kendy, I can upload po's in a zip, if that is fine for you. But I do
> not want to check in stuff etc. I would like to either give you a download
> link like for OOo where they download from predefined addresses for dev,
> 3.3, 3.2.1 etc. or I could ftp automagically to some site; it does not need
> to be sdf delivered. But I need a sdf of English strings, please.
>
>
There is one good thing about sdf, with gsicheck one can check if there are
errors in the tags etc. But I can test that with my sdf and upload a zip of
po-s, no problem.

OK, great, thanks!

Regards,
Kendy

+1 for .po and svn (or git, whatever version management tool is fine to me)

subversion is a bit easier to use than git but I think most translator
will leave the
review/commit cycles to the translation leads :slight_smile:

svn/git is a must, I need them as a tool for tracking the changes.

Hi Sophie, all,

> I download/update my language .po files using svn (with a very simple script)
> and use Lokalize with a Translation Memory database, which is now 125M, for
> translation and a glossary for words of .5 MB. Uploading is using svn commit.
> Compared to using Pootle this is far more easy. Using svn or git for download
> and upload is using less bandwith, because only the differences are
> transported.

Thank you very much for your feedback on this. It's great if we can
share others experience on this.

The only thing I'm afraid of is the technical skills needed for the
steps to download or upload the files. Also what will be the process
under Windows, is it easy to commit on git or svn or whatever using
other OS than Linux (I remember at the very beginning of the OOo FR
site, I was under Windows 98 and comits to the cvs repository was not
exactly what I call fun :wink:

I've just found

  http://translate.sourceforge.net/wiki/pootle/version_control

that describes how to connect Pootle with a version control system.
From what I understand, this is built into Pootle.

Basically, it suggests what Freek says - having git as the authoritative
source, while providing all the strengths of Pootle as Andras explained
(xliff, downloading/uploading of tarballs, etc.)

An admin with the appropriate rights gets an [Update] button that
transparently updates the files from git (should there be changes in git
by the translators that commit their changes directly). The page says
that there is also possible to trigger commit from the Pootle server,
though there are some troubles there; I'll look more.

If this works, to me it seems that this is be the best from all the
worlds:

- for Freek, or others who prefer to work with version control system
directly, the method known from Gnome and KDE would be available, ie.
work directly with the .po files in git

- for Martin, or others that prefer uploading tarballs somewhere, they
could decide whether to work directly with git (change of their
workflow), or to upload the .zips to Pootle, and let an admin do the git
updates

- for people that do not want, or are not used to work with a version
control system directly, or want to use xliff as the source, they could
work with Pootle only, and let the updates and commits on an admin

How does that sound?

Regards,
Kendy

Hi Sophie, all,

> > I download/update my language .po files using svn (with a very simple
script)
> > and use Lokalize with a Translation Memory database, which is now 125M,
for
> > translation and a glossary for words of .5 MB. Uploading is using svn
commit.
> > Compared to using Pootle this is far more easy. Using svn or git for
download
> > and upload is using less bandwith, because only the differences are
> > transported.
>
> Thank you very much for your feedback on this. It's great if we can
> share others experience on this.
>
> The only thing I'm afraid of is the technical skills needed for the
> steps to download or upload the files. Also what will be the process
> under Windows, is it easy to commit on git or svn or whatever using
> other OS than Linux (I remember at the very beginning of the OOo FR
> site, I was under Windows 98 and comits to the cvs repository was not
> exactly what I call fun :wink:

I've just found

http://translate.sourceforge.net/wiki/pootle/version_control

that describes how to connect Pootle with a version control system.
From what I understand, this is built into Pootle.

Mozilla has been using this feature for a while. It works (from the
translators point of view).

How does that sound?

Sounds great - what to do next? Who has the right to set up a module in git
(Kendy?), who can configure Pootle to use it (Rimas?). I think we should
test a selected set of languages first (e.g. hu), then I will collect all
translations from the different sources and upload them.

Thanks,
Andras