Large spreadsheet does not open in LibreOffice.

Good afternoon!

I have a spreadsheet in '.xlsx', with a size around 150Mb and containing +/-
400000 lines.
I tried to open it in LibreOffice and the program is with a gray screen with
the progress bar, just arriving at a point that the bar does not advance and
the program simply crashes.

Is there any way to open this worksheet in LibreOffice.
Libre Version: 5.0.3.2

Thank you!

I have never seen a 150 mb functional spreadsheet before. I'd like to see
the file because I'm relatively sure that a database is what is
needed (especially if there are a ton of functions in it and nested ifs)

I would recommend trying to open the file and leave the computer going over
night. The issue is that LibreOffice has to use xlsx filters to open the
file, this is going to be slow when you're talking about 150 megs. Open the
file over night, save it as an open document format (ods) and I suspect it
will be quite a bit faster.

But, again, I bet that file needs to be a database and not a spreadsheet...

Best,
Joel

Hi :slight_smile:
It does seem like an usually huge file-size. 150 kb would be more normal!

There is a "memory" setting that might be useful to increase quite
dramatically. From the "Tools" menu;

Tools - Options - Memory

You can see the settings are mostly all very low so that people can
easily use LibreOffice on very low-spec machines. Modern desktops and
laptops have plenty of Ram these days and i'm guessing you are not
using a tablet or phone so you can radically increase most of those
values. I tried bumping the "used by LibreOffice" one up from around
10-20 Mb to about 200 Mb. I think the maximum value is 256 Mb because
when i tried higher values they got reduced to 256 Mb. I usually
reduce the number of steps it can undo because 100 usually takes me to
waaay before my last back-up.

If the file really is Mb instead of just kb then is it due to having
lots of images and art?
Regards from
Tom :slight_smile:

I have never seen a 150 mb functional spreadsheet before.

That size is not uncommon in environments where Excel is the only office
tool that the employees know how to use.

I'd like to see the file because I'm relatively sure that a database

is what is needed (especially if there are a ton of functions in it and
nested ifs)

People have been abusing spreadsheets as databases since at least 1992
when the first database program that masqueraded as a spreadsheet was
commercially released.

On second thoughts, there was a spreadsheet program for the Apple ][,
that was a nothing more than a specialized database program. AFAIK, it
was only released as source code (Apple Pascal).

jonathon

Trust me...I know. I worked somewhere where we had a 750 meg excel file
that crashed multiple times a day and was entirely useless. When I
suggested that someone (meaning I) take the time to do it right....IT
pushed back heavily and was afraid that I'd break something (as if
things weren't already horribly broken....).

I recommend to most people who see their spreadsheets creeping into the
10's to 100's of megs to start looking at how to structure databases and
do things right instead of horrible hacks that tend to have lots of
issues - despite many people trying, a spreadsheet is not a database.

Best,
Joel

Trust me...I know. I worked somewhere where we had a 750 meg excel file
that crashed multiple times a day and was entirely useless. When I
suggested that someone (meaning I) take the time to do it right....IT
pushed back heavily and was afraid that I'd break something (as if
things weren't already horribly broken....).

I recommend to most people who see their spreadsheets creeping into the
10's to 100's of megs to start looking at how to structure databases and
do things right instead of horrible hacks that tend to have lots of
issues - despite many people trying, a spreadsheet is not a database.

[Joel, although I agree that my 7 MB spreadsheet has gotten unwieldy, I have no clue how to design and implement a database to replace the mess. I suspect that most users who have these oversized, abused, spreadsheets have the same problem. Businesses should be able to hire a database designer but as a retired individual I don't have financial resources to even get training in database design. So I appear to be stuck. I won't go into any of my spreadsheet's details here. Suffice it to say I recognize that it has become a kludge.

At 7,145 KB after about 45 weeks accumulation of data I can see the possibility that the .ods file could expand past 8MB for the full year. Then it will start over and I'll need to figure out how to carry over some values computed from this year's data to the new spreadsheet. -- jl]

Best,
Joel

Can you change that for all users ?

Ta

Mal

Sometimes people select the whole sheet to apply a direct format, cleaning
direct formatting, at least out of data range could helps a lot, because
seems LibreOffice doesn't like it too much.
Miguel Ángel.

Hi :slight_smile:
I think so. I have no idea how to do that though.

Which OS are you using? Is it a version of Windows? Xp, Win7 or something?
Regards from
tom :slight_smile:

I have no clue how to design and implement a database to replace the mess.

One of the good things about Microsoft Access, was that it was easy for
people who knew nothing about databases, to create their own database.
I'll grant, that from a theoretical point of view, the overwhelming
majority of those databases are non-optimal. For the typical
SOHO/individual user database, that is completely irrelevant.

Where LibO falls down, is that database creation is neither intuitive,
nor well explained.

Your basic relational database record has four fields:
* Record number;
* A value;
* What that value represents;
* What that value is associated with;

The record number is to ensure that every record has at least one unique
field. This way, data can be edited, deleted, or added, as required. (If
the database software one uses does not mandate/require that at least
one field in each record be unique, then the database software is trash,
and its usage should be permanently barred, ASAP.)

The value that is represented is the piece of information in the record
that is important. The client name, or the client address, for example.

What the value represents, tells the user what that piece of information
is. It isn't much use have a "value represented" of 48.6,122.25, if one
has no idea what that value is supposed to be of.

What the value is associated with, indicates which other records share
common parameters with this specific record. for example, "48.6,
122.25" is part of my address. Other records would contain the street
number, street name, city name, Postal Code, and Country.

What makes database construction awkward, is in knowing which records
belong together. To make that task easier, instead of having a
four-field record, one has a record with umpteen fields.

IE:
Address Database record:
* Record number;
* Name;
* Street Number;
* Street Name;
* Street Extension;
* City;
* Postal Code;
* Country;
instead of 7 records, each with only one part of the address, which is
what relational database theory suggests is optimum;

I suspect that most users who have these oversized, abused,

spreadsheets have the same problem.

Yes, and no.

_Lotus 1,2,3_ literally revolutionized how data could be understood, and
analysed. Things that formerly took months, if not years to determine,
could now be done in weeks. The major hold up was being able to use the
corporate database. What the more daring users did, was simply added the
data that would have gone into the corporate database, onto a quick and
dirty spreadsheet. These spreadsheets were then spread around the
office, and quickly became mission-critical monstrosities, entirely
unsupported by IT.

Circa 2000, one organization that specialized in corporate software
development estimated that it would take five years for the typical
corporate mission-critical spreadsheet that had become a monstrosity, to
be properly developed by the corporate IT department. For most
organizations, that was an unacceptable length of time to wait for a
replacement, especially since the odds were that the spreadsheet had
originated less than five years earlier.

Slowly, spreadsheet developers incorporated quasi-database functionality
into their spreadsheets, to the point that spreadsheets that lacked
database functionality, were considered to be "crippled" by
"power-users".

The end result is that one can currently find books exonerating the
usage of spreadsheet software as databases, and, worse luck, teaching
users how to abuse the spreadsheet software, as a database program.
(I've forgotten the name of the book on using Excel, but it was a fairly
detailed explanation of how to use Excel as a database. Needless to
say, it would have been much easier, and safer (^1), for the user to
have done everything described in that book, by using R, or even Python,
and SQLite, MariaDB, or PostgreSQL.)

All that said, some database developers use spreadsheets, to determine
what fields are needed, and what the database flow should allow, and
what it should dis-allow.IOW, the spreadsheet becomes a spec-sheet for
the database, and attendant items, that they deliver.

So I appear to be stuck. I won't go into any of my spreadsheet's

details here.

This is where a dozen or more database templates, with comprehensive
instructions on how to use them, and modify them, would be extremely useful.

Then it will start over and I'll need to figure out how to carry over

some values computed from this year's data to the new spreadsheet.

In an ideal world, one of those database templates would work as a
replacement for your spreadsheet.

^1: More than one researcher has argued that The Human Genome Project
was a complete waste of money, time, and effort, because of the numerous
_known_ errors inflicted upon the project, by the use of Excel, and its
silent translation of data to something other that what the user wrote.
FWIW, Calc does the same type of silent translation. Had that project
used a real database program, those errors would not have happened.

jonathon

^1: More than one researcher has argued that The Human Genome Project

was a complete waste of money, time, and effort, because of the
numerous
_known_ errors inflicted upon the project, by the use of Excel, and
its
silent translation of data to something other that what the user
wrote.

It would be interesting to have some references for this claim

Cheers
Harvey

^1: More than one researcher has argued that The Human Genome Project

was a complete waste of money, time, and effort,

It would be interesting to have some references for this claim

Which claim:
* That researchers called the Human Genome Project a waste of money,
time, etc;
* That Excel destroys data by silent transformation;
* Something else;

Regardless, here are some pages that discuss known errors in using
spreadsheets:
* http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm;
* http://www.eusprig.org/horror-stories.htm;
* http://www.eusprig.org/stories.htm

Sorry for my imprecision :-(. I meant references for the negative
impact of Excel on the Human Genom project. Just because Excel had been
used, it might have beenn likely to affect the project negatively, but
there should have been data quality checks in place as well, which
might have compensated them.

Cheers
Harvey

Sorry for my imprecision :-(. I meant references for the negative
impact of Excel on the Human Genom project. Just because Excel had been

Neither DDG nor Google are providing appropriate URLs to me, of the
research articles that showed where that project was negatively
impacted.(The same search terms might show somebody else the appropriate
articles.)

There were discussions about this on /., K5, and Reddit, but those
discussions aren't coming up, when I search those sites. :frowning:

The copies of the papers I archived were destroyed, when a friend of my
housemate borrowed (without permission) my archival hard drive, and
allowed Windows to "fix the errors", which it (Windows) did, by
reformatting the hard drive.

there should have been data quality checks in place as well, which

It was the discrepancies between the various quality checks, that first
indicated that there might be problems. It took a very long time for the
primary researchers to admit that the amount of destroyed data was
unknown, and unknowable.

jonathon

Have tested various configurations in relation to memory, yet still does
not work.

Adriano Moreira

* Att,Adriano Martins MoreiraCel.: 82 8177-4885 (VIVO)Cel.: 82 9945-8803
(TIM)e-mail: ammorei@gmail.com <ammorei@gmail.com>*

Unfortunately there is no way to pass the worksheet to contain confidential
information.
Already tested, tembém, cirando Base in a database, and still not worked.

Adriano Moreira

* Att,Adriano Martins MoreiraCel.: 82 8177-4885 (VIVO)Cel.: 82 9945-8803
(TIM)e-mail: ammorei@gmail.com <ammorei@gmail.com>*

Without the spreadsheet there is nothing we can do for you. Sorry. I
suggest converting it to open standard (ods) and seeing if that helps.
Otherwise, there's really not much more we can do without seeing the sheet.

Best,
Joel

Hi.
A couple of suggestions to try.
Can you save it to .ods (as Joel suggested) from office.
Can you save it as .xls instead of .xlsx and see if it opens.
Can you save it to google docs and then download it again as .ods
steve