[Calc, LibO 4.4] Bug on .ODS to .XLSX file conversion?

Hi:

I've experienced the following issue:

The file attached is an .ODS file (1,976 KB). When seved as .XLSX file it
expands to more than 15 MB.

If the new .XLSX files is oppened with MSOffice 2013 it will ask for
recovery of information within the file, after that if we save the
recovered file to the same file type (.XLSX) the file shrinks to 2,393KB.

Why?

OBRA_CIVIL_TERMINACION_ESTACION_Y_MODULO_ENTRADA_EST_21_LINEA_2B_abril_2015.ods
<http://nabble.documentfoundation.org/file/n4147038/OBRA_CIVIL_TERMINACION_ESTACION_Y_MODULO_ENTRADA_EST_21_LINEA_2B_abril_2015.ods>

Hi :slight_smile:
The XlsX, and other OOXML formats are notoriously unreliable. Each
different version of MS Office uses slightly different versions. There are
at least 3 different "transitional" versions of the format and they are not
always compatible with each other.

If you take the XlsX that was saved by MS Office 2013 and try opening it in
another version of MS Office, say 2010 or 2007, or even 365 then you may
well find it's been opened in "Compatibility Mode" or even that it doesn't
work at all in extreme cases. Chances are that some things will have moved
around or vanished, especially if you have anything in a frame (such as
images).

It makes a bit of sense that a file created with newer software might not
fare too well on older versions of the same program but with XlsX and the
other OOXML formats it even seems to happen the other way around. Save a
file in MS Office 2007 and you get the same problems trying to open that
file in more recent versions of MS Office.

MS keep the specs for their formats secret. There is an ISO definition of
OOXML but that doesn't seem to work in any version of MS Office. MSO 2013
was the first that allowed people to save in "strict" but the resultant
files don't seem to be able to be opened in any version of MS Office.

So pretty much all programs struggle with trying to read or write XlsX or
other OOXML files in any reliable way. Which version of MS Office should
they aim for?

Actually we had a few times where the LibreOffice user in an office has
been the only person able to open files from colleagues using different
versions of MS Office and has then become a kinda stepping stone for
sharing files between those colleagues. Similarly with OpenOffice.

Best advice seems to be to use either;
1. the older MS Office format, Xls (without the X on the end), for MS
Office Xp/2000 and prior or
2. stick with Ods.

Microsoft, in their 2010 version, refused to use the ODF 1.2 format that
everyone else was using at the time and stuck with a mangled version of the
older 1.0 (even ignoring the 1.1) and somehow that led to Ods appearing to
open just fine but all the formulas got replaced by fixed values. Their
apparent failure somehow led to people saying that everyone else made bad
spreadsheet programs and everyone should stick with MS.

So it is well worth avoiding XlsX and it is probably best to use their
older Xls format instead - at least if you want to open the spreadsheet on
any machine other than the one with MS Office 2013 on.

Regards from
Tom :slight_smile:

Hi:

Thanks for your attention.

The fact with this file is that can't be supported by .XLS format, and my
coworkers doesn't work with LibO, and that is the only way I can use to
"keep in touch" with them and their jobs. It may be a pain .. ... ... ,
but while it work I'll keep using this method to share my jobs with them!

In use LibO for every document I create, even I update every version of
LibO as the module for my Linux distro: Porteus.

On the other hand,I can understand the situation with MSOffice and how they
refuse the OpenDocument formats, but I still worry about the big file
size for those files not supported by the "classical" file type.

This can lead erroneously, as you said, to people to believe that everyone
else made bad spreadsheet programs and everyone should stick with MS.

Thanks.

*Ramón E. Tavárez B.*

Those files are probably write only. :wink:

Hi :slight_smile:
Yeh, i don't know a good way around that :frowning: other than the way you already
use! :))

People used to be quite happy with the idea of downloading Adobe Acrobat to
read Pdfs with. It used to be that almost any website that shared any kind
of document tended to only have them as Pdfs and also then 'had to' have a
button to help people download and install an Adobe Pdf Reader. We hardly
see any of that these days but that is quite a recent change.

There were objections from people working in the council, or other
government offices or in big companies that they couldn't install anything
and thus couldn't use the format that everyone else seemed to be using so
much. Now they seem to use Pdf more than everyone else, and even demand
that other people use it!

Regards from
Tom :slight_smile:

Discussion about open formats aside, I've looked at the file and it looks
like when saving as xlsx each sheet grow huge (some goes to ~20MB before
compression).

Could you also post the xlsx file obtained when saving with MSO, for
comparison? My guess is that either LO saves *a lot* of empty/dummy cells,
or that it saves in an older xlsx format that is less compact.

btw this is only a "by curiosity" request, but it could help the dev
pinpoint an issue if my first guess is correct.

Check the post, I've uploaded the requested file.

Thanks for your interest.

Regards

*Ramón E. Tavárez B.*

That's interesting. Examining the beginning of one of the "big" sheets in
both case, we see that LibreOffice is *very* verbose with what it save. If
I understand it correctly, LO save informations about *every* cells,
including empty ones. Which is very odd, since it does not do it with
*every* sheets.

This does not explain why MSO have to "repair" the files though. Maybe the
devs would be interested in this test case.

Hi :slight_smile:
Good work there. Thanks for taking it to the next level Cley! Nicely done
:slight_smile:

Posting a bug-report might be a good idea? Ramon can you do that?
Regards from
Tom :slight_smile:

That's interesting. Examining the beginning of one of the "big" sheets in
both case, we see that LibreOffice is *very* verbose with what it save. If
I understand it correctly, LO save informations about *every* cells,
including empty ones. Which is very odd, since it does not do it with
*every* sheets.

This does not explain why MSO have to "repair" the files though. Maybe the
devs would be interested in this test case.

--
Cley Faye
http://cleyfaye.net

<snip />

> > > >>
> > > >>
> > > >>
> > > >>> Hi:
> > > >>>
> > > >>> I've experienced the following issue:
> > > >>>
> > > >>> The file attached is an .ODS file (1,976 KB). When seved as
.XLSX
> > > >>> file it
> > > >>> expands to more than 15 MB.
> > > >>>
> > > >>> If the new .XLSX files is oppened with MSOffice 2013 it will ask
for
> > > >>> recovery of information within the file, after that if we save
the
> > > >>> recovered file to the same file type (.XLSX) the file shrinks to
> > > >>>2,393KB.
> > > >>> Why?
> > > >>>
> > > >>>
> > > >>>
> > >
> >
>
OBRA_CIVIL_TERMINACION_ESTACION_Y_MODULO_ENTRADA_EST_21_LINEA_2B_abril_2015.ods
> >
> > > >>> <
> > > >>>
> > >
> >
>
http://nabble.documentfoundation.org/file/n4147038/OBRA_CIVIL_TERMINACION_ESTACION_Y_MODULO_ENTRADA_EST_21_LINEA_2B_abril_2015.ods
> > > >>> >
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> View this message in context:
> > > >>>
> > >
> >
>
http://nabble.documentfoundation.org/Calc-LibO-4-4-Bug-on-ODS-to-XLSX-file-conversion-tp4147038.html

<snip />

> >
>
http://nabble.documentfoundation.org/Calc-LibO-4-4-Bug-on-ODS-to-XLSX-file-conversion-tp4147038p4147063.html
> >

<snip />

Cley Faye wrote:

> That's interesting. Examining the beginning of one of the "big" sheets in
> both case, we see that LibreOffice is *very* verbose with what it save. If
> I understand it correctly, LO save informations about *every* cells,
> including empty ones. Which is very odd, since it does not do it with
> *every* sheets.

Yes, I also looked at it, I inspected sheet 7 (and it brought my computer to a grinding halt).
It contains all the cells until column IW, which is 231 columns!
As this sheet has 10423 rows, that is a lot of data.

So the question is: why did it not optimize away the empty cells?

I think Piet have pointed to the source of the problem.

Seems in several sheet there are cells formatted out data range. Sheet
'Análisis Pisos' seems empty but has a lot of empty comments, better delete
the sheet.

For a first cleaning, to do in every sheet:

- Ctrl+End goes to bottom-right corner of range with data in a sheet.
- Use Ctrl-M to clear any direct format for the columns right and rows down
of it.

Miguel Ángel.

Hi Tom:

I've wanted to report it as a bug, but first I decided to ask the
community, in case I 'm the the origin of the "failure ".

Now I want to ask you how exactly can I report it to be treated as a bug by
the dev. community ?

Thanks.

(long live LibO!)

Should "Delete empty rows/columns" be a new command on a sheet basis. Sheet basis meaning you see the results of command before saving and can "undo".
I'm not sure it would be good a default unless the command would check formulas to make sure empty cells are not referenced.

m.a.riosv wrote:

> I think Piet have pointed to the source of the problem.
>
> Seems in several sheet there are cells formatted out data range. Sheet
> 'Análisis Pisos' seems empty but has a lot of empty comments, better delete
> the sheet.
>
> For a first cleaning, to do in every sheet:
>
> - Ctrl+End goes to bottom-right corner of range with data in a sheet.
> - Use Ctrl-M to clear any direct format for the columns right and rows down
> of it.
>
> Miguel Ángel.

I did some more experimenting with Sheet 7 (ANALISIS PISOS IMPORTADOS).

First I selected all columns from M to the end and deleted all cells (everything) in it. That helped.

But then I saw that several rows (3723, 3897, 3972-3984, 4040-4046, 4090. 8375, 8378, 10415-10417, 10238) still had all their cells listed individually. It appears that these complete rows have the font set to "Times New Roman 10pt", whereas the default font is Arial. So I also dir a Ctrl-M on these columns and that dramatically decreased the size of Sheet 7.

The file saved by MS Excel has special commands to set the format/style of a row, so that it does not have to specify each individual cell. So that is an optimisation that the exporter misses for some reason. I also tried this with a small file and there it did optimize it.

I guess if you do this for all the sheets the file may become much smaller.
That still leaves improvement possibilities in the exporter.

One possibility comes to mind as to why "empty" cells are saved. If formatting has been applied to the empty cells then they must be saved even if no values are currently stored in those cells. As a trivial example consider the four cell range A1:B2 which contain no values but the four cells are formatted to display numbers as currency. How could that formatting be saved without saving those "empty" cells?