Rountrip Conversion Problems (was Re: Should LibreOffice ... secret formats?)

Spencer sent me reproducible test cases for the two problems he has raised here.

I have performed a conforming forensic analysis (without having looked inside the format at all).

Here is the situation for the case of dashed lines in presentations.

CONCLUSIONS

This is a situation that has been seen in analysis on this list before. Both products have difficult with round-tripping into the other format and back.

1. In the specific case that Spencer reported, it appears that LibreOffice produces an unbroken line when saving the dashed line into an Office 97-2000 PowerPoint .PPT format.

2. In the specific case that Spencer reported, it also appears that LibreOffice presents an unbroken line when receiving an actual dashed line from an Office 97-2000 PowerPoint .PPT format.

3. Although PowerPoint 2010 will recognize the correct dashed line when opening the ODP directly (not as a PPT), dashed lines produced in ODP format from PowerPoint 2010 are not read correctly (as ODP format) by either PowerPoint 2010 or LibreOffice Impress.

This is based on simple observation, without attempting any analysis to isolate the problems more specifically. It appears to be enough for 4 bug reports though. PPTX was not tested. That may lead to more bug reports all-around.

- Dennis

DETAILED PROCEDURE

A. Document A from Spencer: Original ODP

This is a single-slide .ODP where the only figure is a diagonal dashed line. The dashes are relatively long and the space between the three dashes is about the same width as a dash. This document opens just fine in LibreOffice 3.3.2, the one I use for production on my desktop system.
  This document also opens correctly (as an .ODP) in PowerPoint 2010. I needed to do a document-repair click-through because PowerPoint 2010 expects ODF 1.1 and the ODF 1.2 package from LO has unexpected XML content not defined in ODF 1.1. But the slide opens without problems. The dashed line is correct.

X. Document X from Dennis. (PPT from the original ODP using PowerPoint 2010)

I also saved this opened ODP from PowerPoint 2010 as an Office 97-2000 PowerPoint .PPT file. It re-opens just fine in PowerPoint 2010.

[Side Note: There is an interesting difference in the presentation of the dashed line in PowerPoint 2010 in comparison with LibreOffice. If I zoom the slide larger, the sizes of the dashes and spaces between them do not changes. Instead, the number of dashes and spaces increases or decreases as the zoom makes the line longer or shorter. In LibreOffice Impress, the line retains 3 dashes, but their length and that of the intervening space changes as the slide is viewed at different zoom magnifications. I am certain that the ODF Specification does not say anything about the visual presentation of the dashed line. I don't know if [MSO-PPT] does or not. I doubt that the OOXML specification does either, but I should check that before I perpetuate another myth. This is a finer-grained interoperability issue than the problem Spencer reports. It appears to be within the allowed discretion for implementations.]

B. Document B from Spencer. (PPT from the original ODF using LibreOffice Impress)

This is a .PPT that Spencer made by Save As from LibreOffice Impress (just as Document X was made by Save As from Word 2010).

Document B, when opened by PowerPoint 2010, shows a single solid line.

C. Document C from Spencer. (ODP made after opening Document B in LibreOffice)

This document is provided as confirmation that when Document B is re-opened in LibreOffice, it also shows a single solid line.

Y. Document Y from Dennis. (ODP made after opening Document X in LibreOffice)

In LibreOffice Impress, Document X opens the same as Document B, losing the line. The dashed line is known to be there from the PowerPoint 2010 side, but it turns into a solid line on input by LibreOffice Impress. Document Y captures the ODP of that result.

Z. Document Z from Dennis (ODP made from Document Y using PowerPoint 2010)

LibreOffice Impress opens this document and retains a dashed line, but the dashes are much smaller and there are many of them. At 100% these view as intermittent long and short dashes.

PowerPoint 2010 opens this document (which it produced) and the dashed line has turned into a solid line.

I repeated test similar to those NoOp also performed to see how the variations that I made with the dashed-line slide image show up here.

CONCLUSION

The round trip from Document A to B back to C is definitely broken in Libre Office in the manner described by Spencer.

The opening of either Document A or Document B in Word 2010 produces a terrible result where the 4 columns are longer and flow at their bottoms onto a second page.

This is so bad I despair of doing any further isolation.

- Dennis

ANALYSIS DETAILS

A. Document A - The ODT from Spencer. In LO 3.3.2 I see 4 columns, each 1.5" wide, with about 0.5" between. The Format | Columns dialog reports 1.42" with 0.70" spacing and AutoWidth is selected.

B. Document B - The Word 27-2000 format DOC from Document A via Libre Office, by Spencer

C. Document C - The ODT file that reflects what is seen when Document B is opened in Libre Office.

When I open Document A in Word 2010, I see the problem that NoOp reported, concerning a blank column showing up. There is also an error message about "Drawn Objects and Text Boxes 1." I also see that there is a second page having 4 more columns (the 4th column is empty). It appears that the columns stretch vertically down onto the second page. That is, there are only 4 columns but each column is two pages long, and the top of the second column is all blank, so its content only appears on the second page. There are also columns whose content image flows off the bottom of the page and is chopped off.

When I open Document B in Word 2010, What I see is almost the same as when opening Document A in Word, but the B view has a duplicate title over one of the figures of the Financial Industry Profits graph in column 2.

Document C in LibreOffice is now 2 pages because the column sizes are screwed up, leading to 4 columns on the first page and a fifth column on the second page.

There's no point in making a Document X because I have no means to obtain a correct version in Word to try saving back.

Hi, Dennis:

       Thanks very much. Should I do something to file bug reports on these items?

       Spencer

Spencer

Hi, Dennis:

       Thanks very much. Should I do something to file bug reports on
these items?

       Spencer

I would go ahead a file a bug report. The address is
https://bugs.freedesktop.org. You will need to set up a user account to
file the bug.

snip

It is tough to figure out what bug to report in the multi-column text-flow problem.

In the dashed line problem, it is easy to report two bugs, one for dashed lines to .doc and one for dashed lines from .doc.

In this multi-column flow case, LibreOffice can round trip, and the bug is in the change to column and spacing widths that have the material not fit and not flow properly.

So there is a bug around not being able to consume what it produces properly.

THE SERIOUS INTEROP QUESTION

The other problem, that I don't know how to deal with, is whether that is a proper .doc for what is in the .odt at all. I *think* the problem you are seeing is that the frame on one of the images in column 2 is actually too wide. Or maybe column 1, and it forced the kind of adjustment you are seeing. But the consequences in Word are particularly awful.

What is even more amazing is that what Word does with that specific .doc has not changed since Office 97!! NoOp gets near-identical results from the .doc in Word 97 that I get from it in Word 2010. I bet if the second page is examined, the column 2 content will be seen to have flown down to the second column there. (The only difference that I see in Word 2010 compared with the Word 97 screen shot is that 2010 has a double title over the graph in column 3 and consequently more text flows to the top of column 4. I hadn't noticed that additional title doubling in my earlier report.)

On the other hand, what Word 2010 does with the original ODT is strangely close to what it does with the .DOC, and that is *really* inexplicable.

So there's not enough here for an isolated bug.

MORE DETAIL: I forgot to check this before. When the .doc is opened in Word 2010, the columns are set as four across, with column 1 1.6", 2-3 at 1.25" apiece, and column 4 at 1.6". The spacing is 0.7". The equal column width box is not checked. The margins are 0.35" top, left, right, and bottom, with no gutter. The page is US Letter.

If I check "equal column width" I get 1.43" columns all the way across and 0.7" spacing. The duplicated titles I mention disappear, but there are other duplications in the columns.

(sigh)

FURTHER ANALYSIS POSSIBILITIES

Although it introduces more variables that can't be controlled, I think there are three avenues of further exploration:

1. Make a .doc that seems as correct as is possible. See what LibreOffice does with that. Then make an .odt from that .doc from Office 2010 to see how that round-tripping works. This might localize *something*.

2. Do the same thing with .docx in both directions. If experience is any guide, this will be worse, but because .docx is an XML format it might be possible to find more clues by inspecting the XML that travels in various directions.

3. Make a Microsoft Word XML file too. This is a rarely-used variation that *might* provide more clues. There are filters for reading those into LibreOffice also, although I have no clue concerning their quality. (This can be round-tripped out of LibreOffice too, I believe.)

There is a project, Apache Poi, that has Java tools for manipulating and converting Microsoft Office format documents. That might help to examine the .doc files to see where the discrepancies arise. That's a lot of work to invest for this particular file. I think starting with variations of simple cases may work better.

Once I stopped vibrating over my disappointment with the multi-column text-flow problem, I remembered that Microsoft had provided some side-by-side OOXML and ODF documents (and some .doc versions) for use in feature- and conversion-checking <http://lists.oasis-open.org/archives/oic/201102/msg00006.html>.

I did not find one that dealt with multi-column pages, but I did find interesting ones that illustrated text frames with text flow among them.

I also found out how easy it is to make a multi-column page, so I will see if I can duplicate Spencer's test case in Word 2011 directly and then see how it goes back and forth with LibreOffice. Perhaps that will reduce the degree of bafflement about what the disconnects are.

I will report back ...

- Dennis

3. Although PowerPoint 2010 will recognize the correct dashed line when
opening the ODP directly (not as a PPT), dashed lines produced in ODP format
from PowerPoint 2010 are not read correctly (as ODP format) by either
PowerPoint 2010 or LibreOffice Impress.

Personally, whether m$ is capable of opening the odp file is secondary
to a more interesting question: does this observation occur when the
odp is created _by LO_ to odf12 or the other options odf10/11 and
odf11? If my memory is correct you stated that m$o claims conformance
to odf10; in this case we would want to see LO create the odp file in
version odf10 and then m$o opens this file successfully and without
distortion (analogous to opening a w3 compliant html file in opera and
firefox).

If the m$ user receives the aforementioned file with distortion, two
expected causes would be either LO failing to create correctly the odp
file to the specification, or m$ fails to conform to the odf
specification. The former could be discounted if the recipient opens
the file in his/her LO and receives no distortion (assuming both LO
installations are odf compliant!).

If LO creates the odp file in an odf specification that m$ does not
understand, the m$ user should expect to receive a distorted file.

Z. Document Z from Dennis (ODP made from Document Y using PowerPoint 2010)

LibreOffice Impress opens this document and retains a dashed line, but the
dashes are much smaller and there are many of them. At 100% these view as
intermittent long and short dashes.

Do you have to change the odf specification setting of LO?

PowerPoint 2010 opens this document (which it produced) and the dashed line
has turned into a solid line.

To clarify, this occurs when m$ shows the odp file on screen as
expected (i.e. dashed line) but after closing m$ and reopening, a
solid line is visible?

PS: I must point out that the primary marketing thrust of OpenOffice.org was
and is that it offers (unqualified) support for key Microsoft Office
formats, it is free, and it runs on more than Windows. I don't know how
LibreOffice is positioned, but it would be interesting to see what would
happen if "support for Microsoft Office formats" were to be removed from all
promotional statements concerning LibreOffice.

It is interesting that so many are fearful of the "nightmare scenario"
of LO not supporting m$, but that is _not_ the objective of the
original opinion. As consistently stated, the issue is that the time
spent reporting behaviour of LO with m$ formats must be spent more
profitably long term on the odf instead. On the basis of those
comments posted, the unfortunate conclusion is that for m$ users, odf
is of lesser importance than m$ which makes LO a mere m$ clone.

I'm going to top post on purpose this time (shock & awe)...

Dennis, sorry but I've a hard time following your posts. You top post
without any word wrap & here is how your post appears in my standard
email client.

I well appreciate your participation on this list, however you've
already read, and commented[1], on "Top Posting... Can we have an LO
Mailing List Guidelines Page?" thread. How does top posting and lack of
word wrap make the following readable at all?

Re: word wrap:
<http://permalink.gmane.org/gmane.comp.documentfoundation.libreoffice.user/10816>
Is it that difficult?

Were you to initially receive the following in your email client
(X-Mailer: Microsoft Outlook 14.0) would you be able to follow and
understand just what the heck you are talking about?

I think your contributions to this list are sincere, well thought out,
and valuable. /Please/ reconsider your top posting and lack of word wrap
in future responses.

Gary

[1]
<http://permalink.gmane.org/gmane.comp.documentfoundation.libreoffice.user/10798>

Gary,

I see word-wrap just fine on the web page you linked to.
It appears that the list provided automatic word wrap. It
won't reflow if I make the window narrower than where it
did auto-breaking, so I couldn't read it comfortably with
my phone's browser, unless the forced wrap is something
like 30 characters (with landscape viewing). But the auto-
breaking is there, with a line width of around 110
characters, it seems.

I also see word-wrap just fine in the e-mail you sent to me.
So the answer is yes, I would see it all just fine.

The conflict is with clients that do know to word-wrap the text,
which is kept in paragraph-level streams so the client can do
correct word-wrap with whatever the displayed line size is.

I know some list archives *prevent* work-wrapping by using
<pre> instead of <p> elements when plaintext is presented
via HTML. The GMANE page you linked to uses <pre> but then
does automatic word wrap to keep line width at around 110
characters. Works fine on my monitor [;<).

When email is word-wrapped with hard line breaks, it is
then ugly in situations when the client also does automatic
word-wrapping. And when there are ">" reply-nesting markers,
it gets worse.

Catch-22.

So if I sent HTML-formatted mail, would that actually work
better for you?

- Dennis

PS: This message manually word-wrapped for your pleasure.

PPS: There is an SMTP IETF RFC that explains how to auto-
matically handle word-wrapping and tell when not to word-
wrap a line. It appears that knowledge of that is not
uniformly distributed. I think the architectural principle
is that the recipient would know what its wrapping needs
are and the sender has no way to know what works, hence
no pre-wrapping inside paragraph text.

Although this is OT for this thread. One more thing.

I don't read the list by opening all of the posts.
I read the lists in my preview pane as I quickly
scan the unread ones (chronologically). If there
is a forced word-wrap, the rewrapping in the
preview pane is rather, um, distracting, as it is
on these manually hard-broken lines when I seem them
on the list.

Of course, rapid processing of the preview pane also
favors top-posting.

I guess there is tolerance or there isn't.

- Dennis.

Much better - thanks. Any further responses will not be top posted.

Clients typically are set to view to screen width (yours does). However
as you can see, replying to such creates an issue; hence the issue of
what you see below. Unwrapped posts places the onus on person replying
to your posts to fix your 110 char format and rewrap to a common
72/80(max) character text format. I can do this in my client by Ctrl-R,
however I shouldn't have to. You can easily see the difference with your
current reply vs your other below.

So, let's go back to top posting; were you to receive /only/ this email,
would it make sense to you to read this first and then have to sort down
through the rest to understand what we are currently talking about? Try
it; print, set aside, and then start from the top.

I realise that you may have issue with your Outlook client, but even
with that it's not difficult to interleave/bottom post. Give it a try...
If it doesn't work out for you then fine, but then print a list post
that you last participated in, set aside for at least one day, read
later, and consider what you'd prefer afterwards.

And thanks for replying and trying, it's appreciated :slight_smile:

I knew from the top post in my preview pane that there was a question
for me to answer.

When I get a bottom posted, or worse, interspersed without stood-off
comments, I have to hunt for them, even when it is a thread I am
interested in.

Because subject lines aren't always the truth, I look at the top of
messages for clues but no further. I rarely open the full message
except to see if there is something I want to reply to.

So, I will rarely be bottom or mid-posting. I'm not changing that.
Since you are using a newsreader (you referred me to a GMANE archive),
why do you need it?

Also, I didn't do the 110-column line breaks, the GMANE code that
made the HTML did that. If all it takes is a Ctrl-R for you to see
the text flow at your window width, why is this conversation necessary?

- Dennis

Aren't the two of us committing a greater sin by having this exchange
under the wrong subject line? I will not be continuing on this subject.