LO 4.2.1.1 .docx file: partial text and images.

From time to time I receive .docx email attachments from local groups, such

as colleges or community organizations, which may display with varying
amounts of success. Recently, I received such a .docx as a mail attachment
which only displays partial text.

My goal is to gather sufficient information to file a meaningful bug report
in the hopes of corrective action being taken to help make LibreOffice more
robust by properly displaying documents such as this. If there is an
unreleased fix or existing unresolved bug report, please inform me.

The .docx file in question has 2 visible pictures (on left side of page),
each in some sort of box of their own, and 3 other colored boxes (2 colored
boxes in right side of page and one full width box across bottom third of
page). One of the colored boxes (topmost rightmost) has text in it that I
can see. The other 2 boxes (right-middle, and bottom) have no visible text.
This is the problem area.

I used 7-Zip to extract the .docx to files, and skimmed through the
convoluted .xml soup -- using NotePad++ -- and found that the missing text
was locations, times and topics for some guest speaker series. There may
also be some embedded or inline images, as I recognized some base64 content
mixed in as well. Perhaps bullet points or backgrounds. I did not explore
further.

I would share the offending document, but it is not completely anonymized,
and I have no software with which to open it and anonymize and retain the
same bug. Perhaps I could manually edit the .xml text areas. Then there
are the images, which I must decade and save as external files, to
anonymyze, re-encode, and then adjust any byte counts.

Finally, not sure if 7-Zip can create the exact same zip container, or if
that matters? I'd have to get the 7-Zip compression settings right, usually
reduce dictionary and word size options to minimums (has worked for .xpi and
.jar files in the past, why not .docx). Assuming I could anonymize and
retain the errant behavior, would it then be helpful to determine the source
of the bug?

Can anyone think of any other troubleshooting I might try, either as
LO/Writer options, or raw .xml edits, or both?

Also note and please respect that within the context of this thread, I (and
likely many others) have no interest to debate the wheretos and whyfors of
assigning blame (to someone or something else), or denial that it is a
compatibility problem (belonging to LO), or other evasive tacts.

This is not a mere formatting problem to be ignored, simply because some
text is fonted wrongly, image positioned oddly, and so on. Text here is not
displaying at all. If I have to unarchive and muck about in .xml files,
that speaks to me of an issue of compatibility which must be resolved or
worked around in some way with a more graceful degradation of document
quality than to drop things on the floor and walk away with fingers in ears
singing, "la la la can't hear you".

In other similar threads over the years, I have noticed a peculiar tendency
of some people to insist on inserting such defensive and evasive commentary
any time an issue of compatibility is raised. Please don't waste time or
energy on such responses, nor responding to such responses. It is not
helpful to this specific issue at hand.

Should a prolonged discussion of such things be so strongly desired by
anyone, feel free to start a new thread on that specific topic, and leave
this one alone. Perhaps I will then say a few words on the topic in that
thread, sufficient to communicate most of these sentiments. But I ask
please that such dialog not be included here, for the sake of a streamlined
discussion.

Off-topic digression follows.

Imagine a schoolkid getting beaten up every day by a bully

How do you convince people of the truth? What happens if all the kids
gang up together?

In this analogy, who is the bully and who is the kid out of luck? I
understand, you intend the bully to be the corporation and possibly the
end-users, and the kid to be the free software project.

All too often with free software projects (more so in the past decade than
the one previous), it seems like there are multiple groups of antagonists.

One group is usually a corporation, and the other is a group of bullies who
make up the consensus of the developers and/or community support list. Both
sides give the impression that they want nothing to change, nor do they care
about common use cases which reveal problems.

The end user who simply encounters a problem and would merely like to do
whatever they are able to help troubleshoot something and improve the
situation gets hostility from all sides.

The end-user becomes conditioned to not want to provide any feedback
whatsoever. The community then arrives at the delusion that there is no
problem, and becomes further entrenched and hostile towards the next
end-user.

Meanwhile, end-users encounter this situation multiple times, year after
year, with many other projects, eventually develop an aversion to any
interaction with free software projects. At the very least with the
community portion, anyways.

On the other hand, if a project became too popular too fast, it could
collapse from having too many users without adequate momentum. So I suppose
actively discouraging people from using the software by bullying anyone who
mentions any problem keeps the project alive at a certain stage.

Quite a perverse dichotomy. Most unfortunate. Slightly amusing.

Regardless -- and sometimes against my better judgement -- I still ping
random projects from time to time when I encounter a problem, have an idea,
or even a question, in the hopes that I get a friendly response and am
welcomed to the opportunity to help make a difference. :slight_smile: Although
increasingly rare these days, it does happen, and those are the moments when
I feel empowered, inspired and elevated, if only for a brief time, until I
contribute a useful bug report or patch. That was the experience which
initially drew me to free software, back when few people knew that free
software existed.

Hi :slight_smile:
Actually the kid WAS all the end-users. This mailing-list is almost
entirely end-users. We are here to help each other and learn new
tricks. So are almost all of the devs (also end users) and everyone
else in the project. The whole reason for this entire project is
that we DO want change.

There most definitely IS a problem and it's one we experience almost
every day, whether or not new end-users write in to the mailing list
about it.

Almost all of us can remember being in exactly the same situation (or
very similar) to the one you are currently in. The same one that many
people face when using MS Office.

So our devs do put in an enormous amount of work into finding
problematic DocX files, then reverse engineering it to find out what
quirk got added into the spec without being documented and then
rewriting the filters to deal with the problem.

The problem then is that MS can easily just sprinkle some new little
quirks with almost no effort at all. So all non-MS programs have to
put tons more effort into trying to find those new ones and reverse
engineering and then fixing. Often these projects share the results
so that a lot of them become able to read the quirky

We have already seen the result of what happens when MS format is able
to be used by everyone and finally becomes fully interoperable as they
keep promising it will be. When that happened a few years ago (with
Doc, Xls, Ppt etc) they seemed to choose that moment to change to the
completely different format.

While you might feel that people are being hostile against you it is
more likely that you have just hit a raw nerve and that they are
ranting against the world in general and feel like you might be a
sympathetic ear because of what you have only just found. So it's not
really against you, it's more WITH you.

Things ARE changing though. People are starting to use uneditable
formats (Pdf) and moving to expensive editors to edit them. Less
insane is the increasing momentum towards using ODF (Odt, Ods, Odp
etc) and often to the free programs and suites that use them natively.

Governments, organisations and corporations all seem to be moving
towards ODF, or at least the ideals of it.

Sometimes they get tricked into thinking that OOXML or Rtf delivers on
it's promises but then they find that no-one else can use those
formats except MS Office. It's at that point that things become
interesting. Some blame ALL the other programs and suites, which is
what i did initially and also what you did in your first post. It
takes a while to realise that if ALL other programs are having a
problem with the MS format then maybe it's a problem with the format.
After all why would they ALL deliberately make it difficult for
themselves to gain market-share? Maybe it is a problem with the
format.

Then researching into the format itself, rather than just blindly
accepting MS's excuses, does start to raise interesting questions.
Why are there 3 different "transitional" formats? Why can't they make
a format that DOES do what they promise? Surely it's not out of
stupidity or incompetance on the part of MS's devs!? Why do they have
so much trouble implementing ODF when everyone else finds it so easy?

Regards from
Tom :slight_smile:

Hi :slight_smile:
Imagine a schoolkid getting beaten up every day by a bully (that the
parents all seem to think is a good kid) demanding his tuck-shop
money. The kid goes home hungry. At home his parents don't give him
supper because they have already paid for his lunch. He eventually
grumbles about the bully and gets smacked for telling a lie about the
wonderful bully.

How do you convince people of the truth? What happens if all the kids
gang up together? It's now the word of the popular bully against the
word of all the rest.
Regards from
Tom :slight_smile:

Hi :slight_smile:
Please file a bug-report with MS Office for failing to implement their
own format as per their ISO specifications.

The file(s) that you have trouble opening are likely to also be
difficult or even impossible to open in many versions of MS Office.
You really need to be using the same version of MS Office as the
creator of the document. Try it! You might be amazed.

Microsoft themselves state that the DocX version in 2007 is a
"transitional" version (=not the same as the ISO spec), as is the one
in 2010 and the default ones in 2013 and 365. In Ms Office 2013 it is
possible to use "Save As" to use their current "strict" version of
DocX which is supposedly a lot closer to their ISO version.

Such files often rely on someone using a non-MS program to open, make
a few changes (such as anonymising information) and then saving into a
more standards-compliant version of DocX or even better into a format
that everyone can use.

Microsoft's various different implementations appear to have a large
proprietary components which are apparently copyright protected,
preventing non-MS implementations from being able to read their files
properly. By attempting to create filters to properly read the
various implementations of DocX non-MS companies put themselves at
risk of court action!

This is NOT an excuse and NOT anyone being defensive about it and it
is NOT an evasive tactic. It's one of the reasons why various
companies, governments and organisations are moving (or already have
moved) away from MS formats (which appear to fail any of their
promises of interoperability, just as Rtf did (see relevant court
case)) and towards a truly Open Document Format (which has been
successfully implemented by many programs, on many platforms for years
and is not reliant on the whims of any single vendor or company or
commercial interest (unlike DocX)).

The best bet for exchanging files is to refuse to accept DocX and ask
for the file to be sent in a format which everyone can use, either Doc
(the older MS format which MS dropped just after everyone else was
able to implement it) or Odt. Note that in 10-20 years files in DocX
format are likely to be extremely difficult to open because the
various implementations are not properly written up anywhere. Each
ODF format IS properly drawn up and the specification is available for
free from OASIS or, for a charge through, the ISO governing body. The
DocX specification available, for a charge, through the ISO
organisation is not the same as any implementation except the
non-default one in MS Office 2013.

So, your options for opening the file properly is to
1. Find out which version of MS Office was used to create the file
and one which version of Windows. Then buy that version of Windows
and that version of MS Office.
2. Try out various programs until you stumble onto the one that can
open that version of DocX (and cross your fingers that MS doesn't take
them to court because of it)
3. Ask the person sending the unreadable files to upgrade to a more
recent version of MS Office (this is the advice you would get from the
rest of the MS world)
4. Ask the person to please send in a more compatible format, such as
Doc or Odt, this is the option the non-MS world generally advise.

Many people are now sending Pdfs insted of editable versions of
documents specifically because this whole formats issue has become
such a nightmare. If you ask the originator to send as a PDF then you
will probably find that many other people have made the same request
of them.

Just to be clear i am just a volunteer here and do NOT represent any
official viewpoint of TDF or anyone else associated with LibreOffice
or ODF. On many of the mailing lists here i am on moderation and/or
have often been threatened with being removed from the mailing lists
because my views are unpopular outside of the normal users peer-led
support mailing list. It's similar for quite a few of us here.
Regards from
Tom :slight_smile: