Gary,
Thanks for the links and analysis.
I haven't attempted the obvious test, which is to save the
RTF that fails in LO back to RTF and then compare to see what
got left by the side of the road.
My superficial impression of the 96.rtf and the RTF specifications
revealed to me that there are extensive provisions in the RTF
specification (and, specifically, how the 96.rtf is coded), to
accommodate all manner of up-/down-level adjustments between
different versions and capabilities of software. I wonder if
this is not being handled well in the currently-implemented
import-export features, but I did not dig in deep enough to
determine that one way or the other.
- Dennis
DEEPER ANALYSIS
I did waste a fun evening becoming re-acquainted with RTF though.
My first romance with the format was around 1989 by tricking
Borland Paradox (MS-DOS version) to produce text files of reports
that were actually RTF documents that I could import into a Xerox
workstation desktop publisher and make nice, paginated documents
from. (I was compiling a glossary built in a database.)
In examining the actual RTF of the 96.rtf example, it was very
interesting to see how little of the RTF file is actually needed
to accomplish the result. (There is a ton of overhead material.)
I also started looking through the RTF specifications, using RTF
Specification 1.9.1 (Office 2007 level). It reminded me what a
fascinating format the underlying RTF structure is. And there's
sample code in the specification, although I am attempted to see
how fast I could make a processor on my own that serves as the basis
for an RTF forensic analysis and validation tool.
The number of control-word (sort of like an XML element tag) details
is immense, of course, but very little is needed to make a simple
document.
One important feature: Since the *1987* specification the "\*"
prefix feature has been used to identify control words whose data
should be completely ignored if the control word is not recognized
or supported. For a control word that is not recognized and that
lacks the prefix, the content is presumably to be kept in-line
assuming it is in a place where content is being expressed. Drawing
objects tend to be introduced by {\*\do ...}, for example, and those
are used in important ways in 96.rtf.
There are other interesting facets in the specification. For
example, an RTF can have non-Unicode and Unicode-based alternatives
of the same content, for selective use depending on the capabilities
of the processor that is consuming the RTF.
In addition, there are Word97-2007 shape objects in {\shp ...} and
those also figure significantly in 96.rtf. These also permit an
optional Word 6.0/95 alternative {*\shprslt ...} and I see that
those are present as well in 96.rtf.
Finally, although some of OOXML is mapped into RTF (for example,
MATHML), other parts of OOXML that are newer than the binary formats
are included as XML and the OOXML specification applies. (The way
XML is embedded in RTF is a bit gnarly - the XML is coded in hex
streams so the RTF parser is not confused.) This may be one way that
it has not been necessary to update the RTF specification for Office
2010.
There are many other provisions for up- and down-level compatibility
and soft adjustment to the capabilities of a given RTF consumer. It
is rather remarkable though it depends on the quality of the producer
that such material is included and of the consumer that such material
is exploited.