autofix for helpIDs in libo_help in progress (was: Automatic translations and no suggestions in Pootle)

Hi *,

[...]

The second issue was already mentioned by Andras: there is a huge amount of
strings in help where only meaningless (in terms of translation) identifiers
were changed. [...]

Please focus on UI for now, Andras did sent me the script he did use
previously to reduce this noise.

Andras script would have had to be applied before doing the initial
update, so I had to take a detour and write some bigger script.

I can try to merge back the strings from the 4.1 translations. But no promises.

So what I did was first to compare the templates in 4.1 and 4.2 to get
a list of old IDs and new IDs, and the affected files (to limit
processing time). This was done with a combination of script and
manual review (as it's hard to map changes when there are completely
removed and added strings as well)

The output of that script was used as basis for the other one that

* checks libo41_help for a translation of the string
* applies that to libo_help if the translation for that string is empty.

For future reference (as gmail's search is much better than my memory
:-))) - here's the hacky script that is now run. It should save you
about 12000 words.

#!/usr/bin/perl

use strict;
use warnings;
use utf8;

binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';

my $language = shift;

# update files on-disk
system("python src/manage.py sync_stores --project=libo41x_help
--language=$language");
system("python src/manage.py sync_stores --project=libo_help
--language=$language");

my @files = qw(sbasic/shared/01.pot scalc/01.pot schart/01.pot
shared/01.pot shared/02.pot shared/autopi.pot
shared/explorer/database.pot shared/optionen.pot simpress/01.pot
smath/01.pot swriter/01.pot);

my %mapping = ();
$mapping{'53251'} = 'modules/swriter/ui/optcompatpage/default';
$mapping{'878711313'} = 'modules/swriter/ui/flddbpage/browse';
$mapping{'878871556'} =
'modules/simpress/ui/customanimationcreatetab/auto_preview';
$mapping{'878874627'} =
'modules/simpress/ui/customanimationcreatetab/effect_speed_list';
$mapping{'879350288'} = 'modules/swriter/ui/optcaptionpage/category';
$mapping{'DBACCESS_LISTBOX_DLG_DIRECTSQL_LB_HISTORY'} =
'dbaccess/ui/directsqldialog/sqlhistory';
# lots of other mappings stripped from this post

my %translations = ();
my $orig = "";
my $translation = "";
foreach my $file (@files) {
# file.pot → file.po
chop($file);
print "Datei: $file\n";
# get rid of those nasty multiline wraps that break grepping
system("msgcat --no-wrap translations/libo41x_help/$language/$file

/tmp/cloph-fix-helpids.$language.tmp");

while( my ($old, $new) = each %mapping) {
#brute-force, just grep the same file ~900 times. Not nice, but works
open(GREP, "grep $old /tmp/cloph-fix-helpids.$language.tmp | ") or die
"Cannot read temporary input file ($!)\n";
binmode GREP, ":utf8";
while(<GREP>) {
if (/^msgid/) {
s/$old/$new/;
$orig = $_;
$translation = <GREP>;
next unless $translation;
if ($translation =~ m/^msgstr/) {
$translation =~ s/$old/$new/;
$translations{$orig} = $translation;
} else {
# either there is no translation or it matches a commented-out entry
print "translation doesn't start with msgstr! ($translation)";
}
}
}
close GREP;
}
unlink "/tmp/cloph-fix-helpids.$language.tmp";
}

foreach my $file (@files) {
print "Datei: $file\n";
open(ORIG, "msgcat --no-wrap translations/libo_help/$language/$file

") or die "Cannot read input file $file ($!)\n";

binmode ORIG, ":utf8";
open(MOD, ">", "translations/libo_help/$language/$file.mod") or die
"Cannot read output file $file.mod ($!)\n";
binmode MOD, ":utf8";
my $discard = "";
while(<ORIG>) {
if ($translations{$_}) {
print MOD;
$discard = <ORIG>;
# only update if translation is empty
if ($discard =~ m/^msgstr ""/) {
print MOD $translations{$_};
} else {
print MOD $discard;
}
} else {
print MOD;
}
}
close ORIG;
close MOD;
rename "translations/libo_help/$language/$file.mod",
"translations/libo_help/$language/$file";
}
# update database from file on-disk and rerun update_against_templates
(just to be safe)
system("python src/manage.py update_stores --project=libo_help
--language=$language");
system("python src/manage.py update_against_templates
--project=libo_help --language=$language");

Hello Christian, *,
<snip>

Andras script would have had to be applied before doing the
initial update, so I had to take a detour and write some bigger
script.

"G"

I can try to merge back the strings from the 4.1 translations.
But no promises.

So what I did was first to compare the templates in 4.1 and 4.2 to
get a list of old IDs and new IDs, and the affected files (to
limit processing time). This was done with a combination of script
and manual review (as it's hard to map changes when there are
completely removed and added strings as well)

The output of that script was used as basis for the other one that

* checks libo41_help for a translation of the string
* applies that to libo_help if the translation for that string is
empty.

Wow, I am impressed :slight_smile:

For future reference (as gmail's search is much better than my
memory

At your age ;?

:-))) - here's the hacky script that is now run. It should save
:you
about 12000 words.

Wow :slight_smile:

<big snip>

/tmp/cloph-fix-helpids.$language.tmp");

Would it not be better to use some mechanism to ask, which user is
using this script ;? Something like "/tmp/´whoami`..." or something
like that? And this maybe for all other occurrences of "cloph"
somewhere in the script?

<snip>

foreach my $file (@files) {
print "Datei: $file\n";

"G" ... The whole text in English, and than "Datei" ;?

Thanks you for your work (though I have to nag a bit ... :wink: ) :slight_smile:
I hope, I will have time to test your scripts on my system the next
days ... :wink:
Thomas.

Hi Thomas, *,

Hello Christian, *,
<snip>

:-))) - here's the hacky script that is now run. It should save
:you
about 12000 words.

Wow :slight_smile:

<big snip>

/tmp/cloph-fix-helpids.$language.tmp");

Would it not be better to use some mechanism to ask, which user is
using this script ;?

well, as it is only used to update the files on the pootle server/not
meant to be run by translators, I don't care about that - just want a
name that is not used by something else :-))

<snip>

foreach my $file (@files) {
print "Datei: $file\n";

"G" ... The whole text in English, and than "Datei" ;?

Yeah :slight_smile:

Thanks you for your work (though I have to nag a bit ... :wink: ) :slight_smile:
I hope, I will have time to test your scripts on my system the next
days ... :wink:

See above - there should be no need for you to run the scripts - they
are run on the pootle server. German has the luck of being further up
in the alphabet (well, at least "de" is :-), so you should already see
progress there.

ciao
Christian

PS: fixed part of the script, otherwise it would miss hits when the
pattern shifts because of matches in commented out parts

                while(<GREP>) {
                        if (/^msgid/) {
                                s/$old/$new/;
                                $orig = $_;
                                $_ = <GREP>;
                                last unless $_;
                                if (/^msgstr/) {
                                        s/$old/$new/;
                                        $translations{$orig} = $_;
                                } elsif (/^msgid/) {
                                        redo;
                                } else {
                                        # either there is no
translation or it matches a commented-out entry
                                        print "$language - translation
doesn't start with msgstr! ($_)";
                                }
                        }
                }
                close GREP;

Also forgot about ca-valencia (formerly ca_XV) that is not caught by
the script because the rename - but thankfully noticed myself :slight_smile:

ciao
Christian

So, for those translating off line should we wait that you've run the
script to download are files and avoid to work off line in the mean time?

Cheers
Sophie

Hi Sophie, *,

[...]
So, for those translating off line should we wait that you've run the
script to download are files and avoid to work off line in the mean time?

Didn't think about offline translation...
Depends on whether you're feeling comfortable enough to

a) run a slightly modified script on your system or
b) trust your offline tools to be able to merge the restored strings
to your offline copy (i.e. wait for the script to run, download the
po-zips, then merge translations)

b) probably is easier when you have the option to not touch existing
translations, but only touch strings with no translation at all.

The script is running currently, (at langs et,es eu currently - have
it process 3 at a time) (was at nb on the first run before I noticed
the bug in the script that would miss entries) - so no long and french
is finished updating.

So I'd wait a little, then pull the files from pootle and merge them
offline with the setting to only add translations, never modify.

ciao
Christian

Good morning Christian, *,
<snip>

<big snip>

/tmp/cloph-fix-helpids.$language.tmp");

Would it not be better to use some mechanism to ask, which user
is using this script ;?

well, as it is only used to update the files on the pootle
server/not meant to be run by translators, I don't care about that
- just want a name that is not used by something else :-))

I have not had translators in mind, but maybe other Pootle admins
... :wink: Or is there only one user, which all admins use ;?

<snip>

foreach my $file (@files) {
print "Datei: $file\n";

"G" ... The whole text in English, and than "Datei" ;?

Yeah :slight_smile:

:slight_smile:

Thanks you for your work (though I have to nag a bit ... :wink: ) :slight_smile:
I hope, I will have time to test your scripts on my system the
next days ... :wink:

See above - there should be no need for you to run the scripts -
they are run on the pootle server. German has the luck of being
further up in the alphabet (well, at least "de" is :-), so you
should already see progress there.

O.K.

Thank you for your answer
Thomas.
<Rest snipped>