Odt size difference between MS Office 11 and LO 3.3.2

Dear LO users!
I've created new document with the only text "Test_MSO" inside it and saved as Test_MSO.odt in MS Office 2011, that opened it in LO 3.3.2 on the same PC and saved as Test_LO.odt. the sizes of the document is:
Test_MSO.odt 4.22 Kb
Test_LO.odt 8.96 Kb.
Why is the difference so big?
I can imagine, that this is because supplementary info, LO saves. But I've tried the same with bigger documents and the difference is also ~1,5 times or more.
The files are here: http://ifolder.ru/23727937 - Test_MSO
http://ifolder.ru/23727957 - Test_LO

With best regards, Victor

Inviato: Mar 24 maggio 2011, 09:30:34
3.3.2

Dear LO users!
I've created new document with the only text "Test_MSO" inside it and saved as
Test_MSO.odt in MS Office 2011, that opened it in LO 3.3.2 on the same PC and
saved as Test_LO.odt. the sizes of the document is:
Test_MSO.odt 4.22 Kb
Test_LO.odt 8.96 Kb.
Why is the difference so big?
I can imagine, that this is because supplementary info, LO saves. But I've tried
the same with bigger documents and the difference is also ~1,5 times or more.
The files are here: http://ifolder.ru/23727937 - Test_MSO
http://ifolder.ru/23727957 - Test_LO

With best regards, Victor

- - -

Hello Viktor, all,
I can't download the Test_MSO.odt (Apart I don't speak russian... there's no
captcha!)
Would you please upload it again?
Thank you!
Namastè! cico :slight_smile:

Here is the link to archive with both files: http://www.mediafire.com/?5bpa0ebdqsuc7ce
Old links doesn't work now, because of amount of non Russian traffic :slight_smile:
Do you think this should be posted as suggestion or bug in bugzilla?
With best regards,
Victor.

The odt files are basically zip files, so you can unzip them to see
the contents.

The LO has a lot more files in it the the MSO one.

.:
total 28
drwxr-xr-x 5 root root 4096 May 24 21:06 LO
drwxr-xr-x 3 root root 4096 May 24 21:06 MSO
-rwxrwxrwx 1 root root 9183 May 24 20:34 Test_LO.odt
-rwxrwxrwx 1 root root 4331 May 24 20:33 Test_MSO.odt

./LO:
total 52
drwxr-xr-x 11 root root 4096 May 24 21:06 Configurations2
-rw-r--r-- 1 root root 3426 May 24 07:17 content.xml
-rw-r--r-- 1 root root 532 May 24 07:17 manifest.rdf
drwxr-xr-x 2 root root 4096 May 24 21:06 META-INF
-rw-r--r-- 1 root root 1188 May 24 07:17 meta.xml
-rw-r--r-- 1 root root 39 May 24 07:17 mimetype
-rw-r--r-- 1 root root 8589 May 24 07:17 settings.xml
-rw-r--r-- 1 root root 11394 May 24 07:17 styles.xml
drwxr-xr-x 2 root root 4096 May 24 21:06 Thumbnails

./LO/Configurations2:
total 36
drwxr-xr-x 2 root root 4096 May 24 21:06 accelerator
drwxr-xr-x 2 root root 4096 May 24 07:17 floater
drwxr-xr-x 3 root root 4096 May 24 21:06 images
drwxr-xr-x 2 root root 4096 May 24 07:17 menubar
drwxr-xr-x 2 root root 4096 May 24 07:17 popupmenu
drwxr-xr-x 2 root root 4096 May 24 07:17 progressbar
drwxr-xr-x 2 root root 4096 May 24 07:17 statusbar
drwxr-xr-x 2 root root 4096 May 24 07:17 toolbar
drwxr-xr-x 2 root root 4096 May 24 07:17 toolpanel

./LO/Configurations2/accelerator:
total 0
-rw-r--r-- 1 root root 0 May 24 07:17 current.xml

./LO/Configurations2/floater:
total 0

./LO/Configurations2/images:
total 4
drwxr-xr-x 2 root root 4096 May 24 07:17 Bitmaps

./LO/Configurations2/images/Bitmaps:
total 0

./LO/Configurations2/menubar:
total 0

./LO/Configurations2/popupmenu:
total 0

./LO/Configurations2/progressbar:
total 0

./LO/Configurations2/statusbar:
total 0

./LO/Configurations2/toolbar:
total 0

./LO/Configurations2/toolpanel:
total 0

./LO/META-INF:
total 4
-rw-r--r-- 1 root root 2093 May 24 07:17 manifest.xml

./LO/Thumbnails:
total 4
-rw-r--r-- 1 root root 829 May 24 07:17 thumbnail.png

./MSO:
total 28
-rw-r--r-- 1 root root 2939 Jan 1 1980 content.xml
drwxr-xr-x 2 root root 4096 May 24 21:06 META-INF
-rw-r--r-- 1 root root 935 Jan 1 1980 meta.xml
-rw-r--r-- 1 root root 39 Jan 1 1980 mimetype
-rw-r--r-- 1 root root 1385 Jan 1 1980 settings.xml
-rw-r--r-- 1 root root 5765 Jan 1 1980 styles.xml

./MSO/META-INF:
total 4
-rw-r--r-- 1 root root 793 Jan 1 1980 manifest.xml

________________________________
Da: Vit <vitruss@gmail.com>
A: users@libreoffice.org
Inviato: Mar 24 maggio 2011, 12:37:45
Oggetto: Re: [libreoffice-users] Odt size difference between MS Office 11 and LO
3.3.2

Here is the link to archive with both files:
http://www.mediafire.com/?5bpa0ebdqsuc7ce
Old links doesn't work now, because of amount of non Russian traffic :slight_smile:
Do you think this should be posted as suggestion or bug in bugzilla?
With best regards,
Victor.

- - - -

1 - see mail by M. D. Setzer II
2 - the difference is above all in the settings.xml file (bigger in the LO
files) and in the thumbnails folder, that doesn't exist in the MSO files.
Namastè! :slight_smile:

In news:op.vvy5s8pmijpzuf@arch.che.intra.net,
Vit <vitruss@gmail.com> typed:

Dear LO users!
I've created new document with the only text "Test_MSO"
inside it and saved as Test_MSO.odt in MS Office 2011,
that opened it in LO 3.3.2 on the same PC and saved as
Test_LO.odt. the sizes of the document is: Test_MSO.odt
4.22 Kb Test_LO.odt 8.96 Kb.
Why is the difference so big?
I can imagine, that this is because supplementary info,
LO saves. But I've tried the same with bigger documents
and the difference is also ~1,5 times or more. The files are here:
http://ifolder.ru/23727937 - Test_MSO
http://ifolder.ru/23727957 - Test_LO

With best regards, Victor

With that small/simiple a file, I suppose overhead must be responsble for LO
being the larger file. IMO the proper test would have been to type in the
text n both apps rather than using LO to open a Word created document. It's
probably that process that caused the large difference in file sizes. Also,
the text wasn't identical between the two files.

With normal file sizes though, LO is producing much smaller files than
Word's equivalent. Here, I named the files My Test .doc and .odt:

The Word .doc file s 28k in size.
The LO .odt fle is 9k.
    That's a pretty substantial difference, opposite to which file was
larger in your test, and what I expected to find as opposed to your numbers.
    Opening the .doc file with LO results in a ,odt file of 11k, yet another
difference and one which is going to be carrying the overhead of the
conversion methods.

Each filename was the same; "My Test"
Each file contents was the same "My Test", placed at very beginning of each
document. (28k .doc and 9k .odt)
The .doc opened and saved with LO as a ,odt gave yet a different number
(less compressibility)(11k)
   I have to wonder if you didn't reverse the numbers you reported and/or
don't have Word and LO set up the same. You also didn't mention any
versions: I used Word 2k2 and LO 3.3.2 on XP Pro +.

LO, for one thing, saves files in compressed format and thus they would be
expected to be smaller than a .doc file, which s not compressed. They're
just plain .zip files in reality. You can open them wth winzip, in fact.

HTH,

Twayne`

The simple explanation is that the MSO file is missing many ODF features :wink:

Interestingly both your files fail the ODF validation on this site

http://tools.odftoolkit.org/odfvalidator/

Does it make any sense that only when I create an identical file under OOo
3.4 Beta the file is valid?

Should LibreOffice worry about this?

Is the validator an accurate tool? Shouldn't OASIS be providing such a
tool???

Hi :slight_smile:
I generally find that the original format of files like this is the smallest.
If you re-write the document from scratch using exactly the same formatting in
both then i would guess that LibreOffice generates the best and possibly
smallest .odt. There are obvious exceptions such as compressed formats and
stuff like saving a jpg as a gif, or saving it as another jpg with even higher
compression rates.

MSO's odf is inherently broken apparently (as mentioned fairly often by a
variety of people) so it's not really a fair contest. A broken jug in millions
of pieces can take up a lot less space than a full jug but it probably can't
hold water. It was a good idea to test it out tho :slight_smile:
Regards from
Tom :slight_smile:

Hi Tom :wink:

If you re-write the document from scratch using exactly the same formatting

in
both then i would guess that LibreOffice generates the best and possibly
smallest .odt.

I did. It is not :slight_smile: Since the MS ODF is so incomplete it manages to be the
smallest. And the OOo file is the second smallest (and in addition, valid!)

My only concern here is that TDF makes sure that the ODF files created with
LibreOffice are valid. It doesn't make sense otherwise.

And since Oracle seems to be dropping the ball on OpenOffice it would make
sense to have a validation tool on the TDF or LibreOffice site (I wouldn't
wait for OASIS...)

MSO's odf is inherently broken apparently (as mentioned fairly often by a

variety of people) so it's not really a fair contest. A broken jug in
millions
of pieces can take up a lot less space than a full jug but it probably
can't
hold water.

TBH I'm glad that MS Office 2007/2010 even has ODF support. Of course they
aren't going to make it that good... After all it is in their best interest
that you use their proprietary file formats :slight_smile:

Cheers!

...
Back in 2009 I was doing some comparisons between OOo Impress and MS
Powerpoint. The MS Powerpoint ,pps was a slideshow with a wav sound file
& was/is (I still have it) 4.4MB including the sound 622.5KB wav file,
buttons, backrounds etc. Then saving that file as an .odp resulted in a
15.7MB .odp. So I extracted the .pps files[1] and did a comparison on
the image files from the .pps and the .odt image files.

Comparing one slide in the .pps against the .odt:

.pps:
slide0026_background.jpg = 19.4KB
slide0026_image007.jpg = 41.7KB
Total = 61.1KB

.odt (same slide)
10000000000002E4000002276EE6BF31.png = 467KB
(I assume this to be the 'background' file, as the next file contains
the same image with the text written to the slide)
img3.png = 263.5KB
Total = 730.5KB

Note that OOo converted the 41.7KB jpg from the .pps to .png. An ~11.9:1
increase in the slide images are what caused the bulk of the .odp size.
All together 15.6MB for the .odp png's + 1 wav file of 363 bytes.

I've not bothered to build an later OOo or LO presentation from those
files so the information may no longer be valid. But, it is clear that
OOo/LO could use some 'fine tuning' in the file size/image
compression/conversion area - at least with respect to Impress. If I get
time & if it is interest to anyone I can just build a 1 slide Impress
presentation using the extracted .ppt jpg & compare. I've only MS Office
2003 on a virtual machine, so I wouldn't be able to test with anything
later in that department.

[1] http://www.pptfaq.com/FAQ00778.htm

But, it is clear that

OOo/LO could use some 'fine tuning' in the file size/image
compression/conversion area - at least with respect to Impress. If I get
time & if it is interest to anyone I can just build a 1 slide Impress
presentation using the extracted .ppt jpg & compare. I've only MS Office
2003 on a virtual machine, so I wouldn't be able to test with anything
later in that department.

I don't mind doing a comparison if you post a link to the PPS file.

But this only has any usefulness if LO developers consider that optimizing
the ODF files is an interesting task.

Otherwise it's an academical waste of time :wink:

...
I don't have a link for it - M. Henri Day sent it to me directly. But
I'll be happy to email it to you if you'd like. Note: I had to do
considerable timing adjustments to get it to sync with the external .wav
file in order to get it to work in OOo with gstreamer.

Gary

Hi Gary

I don't have a link for it - M. Henri Day sent it to me directly. But

I'll be happy to email it to you if you'd like. Note: I had to do
considerable timing adjustments to get it to sync with the external .wav
file in order to get it to work in OOo with gstreamer.

I meant upload it somewhere and send me the link :slight_smile:
But email is fine.

I'm a Windows user so I hope it will be considerably easier to handle the
pps :wink:

Cheers,
Pedro

I'd rather not put it up on a link so sending to you directly. That
said, it plays just fine in Powerpoint so no issues there at all. What I
meant by getting it to play in OOo w/gstreamer is that I had to sync the
extracted wav to OOo (and LO as I've also converted to that as well).
OOo/LO doesn't embed the wav file, so you need to have an external wav
file then spend time sync'ing the slide captions & transitions to the
wav as each slide progresses... a real PITA. You need to do the same on
Windows, the only difference is that linux uses gstreamer (what a god
send compared to the Sun "solution") and Win uses mediaplayer or such.

You can take the pps & then extract to view the jpg's in the
presentation per the instructions from the original post. Were LO to use
the original jpg files from the pps (or convert them to png's properly)
I suspect that the .odt would be close to the original pps.

G.

Correction. The .pps is only 2.8MB - the files extracted from the pps =
4.4MB.
...