many fuzzy strings due to xlm tags move in 5.2 help files

sophi · March 11, 2016, 2:03pm

Hi all,

As you may have read in the minutes of the ESC call, Olivier reported an
issue with the help files and translation for 5.2:
+ Many strings are the same content but with xml tags placed differently
            + same visual result
            + issues with translators ahead, must find support for a
script to fix this
               => need a script to undo this; or un-fuzzy-them.

and I reacted to that:
* l10n (Sophie)
    + will ask on the l10n list wrt. scripting
        + developer support appreciated.
        + how urgent is it ?
            + since 5.2 is due in August - so plenty.
        + is it only po files, or UI files too ? (JanI)
            + only help files; for pt_BR: 57k new words to review.
            + can compare 5.1 vs. master dbs (JanI)
AI: + unwind / script changes here (Christian)

So to make it more explicit, is there among our group somebody able to
write a script that will remove fuzzy strings when it's only and xml tag
that has been moved?
Thanks in advance
Cheers
Sophie

Christian_Lohmaier · March 14, 2016, 5:35pm

Hi *,

Hi all,

+ Many strings are the same content but with xml tags placed differently

Most fuzzy strings are not because xml tags being placed differently
(where xml tags are different, it was to fix validation errors), but
most fuzzy strings are due to changes in message context.

for example:
-<paragraph role="bascode" id="par_id3154685" xml-lang="en-US"
l10n="U" oldref="4">ChDrive Text As String</paragraph>
+<paragraph id="par_id3154685" role="bascode" xml-lang="en-US">ChDrive
Text As String</paragraph>

→ the effectivechange is removalof obsolete attributes l10n and
oldref. What matters here is oldref, as that was part of the po
files:

-#. Hew7C
+#. rkzEY
#: 03020402.xhp
msgctxt ""
"03020402.xhp\n"
"par_id3154685\n"
-"4\n"
"help.text"
msgid "ChDrive Text As String"
msgstr ""

→ the oldref gets removed from message context, making it a new
entity/new string. as there's an identical source-string, pootle
reuses the translation and marks it fuzzy.

+ only help files; for pt_BR: 57k new words to review.
+ can compare 5.1 vs. master dbs (JanI)
AI: + unwind / script changes here (Christian)

So to make it more explicit, is there among our group somebody able to
write a script that will remove fuzzy strings when it's only and xml tag
that has been moved?

See above, it is not about xml-tags, but changes to message context.
Writing a script to un-fuzzy automatically is possible, but not
entirely trivial.

for each fuzzy string look in translations marked obsolete whether
"same context+one additional line (consisting of only a number)"
exists with exact msgid and translation string, and if so, remove
fuzzy marker.

But as it is time til August, not urgent (doesn't need to be done this
or next week IMHO). Remember: if we wouldn't have master projects, you
wouldn't know about this right now :-))

Just avoid basic/shared (that's where nearly all of this kind of
change was done) and you won't be bothered by those fuzzy ones. Of
course the reason why it was done was because files in basic/shared
had lots of syntax/validation errors, so even after the fuzzy ones are
gone, there will be some strings to translate, but of course much
fewer.

ciao
Christian

stanislav.horacek · March 19, 2016, 5:13pm

Hi,

Dne 14.3.2016 v 18:35 Christian Lohmaier napsal(a):

Hi *,

Hi all,

+ Many strings are the same content but with xml tags placed differently

Most fuzzy strings are not because xml tags being placed differently
(where xml tags are different, it was to fix validation errors), but
most fuzzy strings are due to changes in message context.

there are also hundreds of the strings with differently placed tags and for them it is not enough to just accept fuzzy strings - typically <swichinline><caseinline><emph></emph></caseinline></swichinline> has been changed to <emph><swichinline><caseinline></caseinline><swichinline></emph>.

Are these strings planned to be translated automatically as well?

But as it is time til August, not urgent (doesn't need to be done this
or next week IMHO). Remember: if we wouldn't have master projects, you
wouldn't know about this right now :-))

This is not completely true - the issue with the "l10n" and "oldref" tags changing context affected also 5.1 strings, now it is just highlighted because of the massive automated changes.

Best regards,
Stanislav

filmsi · March 19, 2016, 6:02pm

It's an old story. Developers of OOo/LO think localizers are enjoying this
kind of manual work. As if it was a mandala or knitting.
So they rework tags every now and then, without caring about our feelings.

But we are not enjoying it and the latest thousands changes of the tag kind
must be made with a script in the translated strings. Not our work.

So we have enough time to do what we do - localize. Amen.

Lp, m.

valtermura · March 19, 2016, 6:48pm

Hi All

I definitely agree with Martin.

Ciao
Valter

Yury_Tarasievich · March 20, 2016, 6:01am

By my estimates -- I'm looking at the kbabel stats, which aren't perfect, -- last three years (half 2013--end 2015) brought about 100% overall "change" (untranslatedness) in UI strings corpus (up to 30K units). Of course, this includes strings going fuzzy without real change in the content, but confirming fuzzy units is real work, still.

JFYI.

-Yury

It's an old story. Developers of OOo/LO think localizers are enjoying this
kind of manual work. As if it was a mandala or knitting.
So they rework tags every now and then, without caring about our feelings.

...

Christian_Lohmaier · May 12, 2016, 1:09pm

Hi Martin, *,

Hi, Christian,

I have several questions to organize the work of the Slovenian l10n team for
the 5.2 release:
- When will the changing end so you can run magic script on the l10n po
files to change these affected fuzzy strings to fully localized?

Did run the script end of last week, so the cases where obsolete tag
was removed from the message context had the fuzzy flag removed (if it
had the same translation in 5.1, still fuzzy if the translation was
changed in the meantime)

For example, will that milestone be the alpha1 release or beta1? Any later
date does not make it possible for l10n teams to finish other help files
localization, as we do not know what files to localize now and which not
because they will be made localized by magic scripts;

Remember that you wouldn't have 5.2 translation process in the old
scheme. Only because we have master you can translate stuff already...
If you ask when string freeze is: that is with RC1 (for 5.2 there are
a total of 4 rcs scheduled).
Feature freeze is with beta1 (the week after the next).

- Could you please run this script(s) on the l10n projects that are not
using Pootle for translation?

Only if absolutely necessary.
helpfuzz.yml - a file that lists the files that were changed with the
template updates, you don't need to restrict to those, but touching
every file would have increased the time needed for the processing
(not so much for the change in the po files, but uptating the database
in pootle afterwards) Not all of those files have the described
obsolete-attribute removal, so only a subset is affected.
the hepfuzzy.yml content:

http://pastie.org/10834259

The script to process the po file.

http://pastie.org/10834263

Leave out the syncing to/from disk to pootle - expects the
translations in translations/libo_help/<language> and
translations/libo51_help/<langauge> respectively.