Intentionally crashing LibreOffice when frozen/LibreOffice will not start.

I am working with a data set that keeps causing my LibreOffice to freeze. I
am pretty sure that this is only because it is big. It is a pipe-separated
text from the US Economic Census imported into Calc, about 30 columns and
around a million rows. (The actual data set is bigger, but Calc quits at a
million-odd. The complete file is about 0.8 gig.) I suspect but can not
prove that this is related to file handling somehow, e.g. breaking down
during auto-saving. The first time I saved the data as a Calc file it took
nearly an hour with the "soffice.bin *32" process running at 25 percent of
CPU time and using about 825 meg of memory the entire time. (Not sure why
this is showing up as a 32-bit version). And when Calc freezes, all the
LibreOffice programs freeze. So I can't just switch to another file and
noodle away while waiting.

So of course, what I really want is a spreadsheet that reads more rows and
doesn't crash. But I assume others are already asking for that, & do not
know how to add anything helpful to that effort. So instead I am asking a
more modest question: What is the best way to crash a frozen LibreOffice if
I want to minimize the data and files lost?

Note that this is the question I started with. Now I have a new and more
urgent question: Why won't LibreOffice start at all? See below.

When Calc froze, I closed it in the usual way with the usual upper-right
close window button. This results in the message:

"The file '$(ARG1)' is corrupt and therefore cannot be opened. LibreOffice
can try to repair the file.

LO can try to repair the file. The corruption could be the result of
document manipulation or of structural document damage due to data
transmission. We recommend that you do not trust the content of the
repaired document. Execution of macros is disabled for this document. Should
LO repair the file?"

I choose <Yes>

"Message "The file '$(ARG1)' could not be repaired and therefore cannot be
opened."

The document recovery window said "recovery of your documents was finished"
while the big spreadsheet still said "recovery in process" and a Writer file
said "not recovered yet". Three Writer files said "recovery
successful"Because no LibreOffice process seemed to be using much CPU time,
I assumed that it was again frozen rather than still working (though
"soffice.bin *32" still had 1.14 gig in memory) and hit the "finish" button.

I got a message, "Fatal Error - Bad Allocation" and nothing was recovered.

I tried opening Writer, and the document recovery box popped up again, but
with one fewer Writer files listed. This time it started on the spreadsheet,
never finished (though the window again said "recovery of your documents
was finished") and never got to the remaining files, though I let it run all
night. In the morning I hit finis, and the recovery window disappeared with
nothing taking its place.

I tried to start Writer again. Nothing happened - no error, no window,
nothing. I tried opening a Writer document by clicking on it in Windows
Explorer. Nothing. I tried opening a (different) Calc file.

Windows Task Manager shows five instances of soffice.bin *32 running, five
instances of soffice.exe * 32, and four instances of swriter.exe *32

I am running LibreOffice 4.2 on Windows 7 64-bit with SP1, on a Dell i5
machine with 24 gig of RAM. I do not recall there being a separate 64-bit
version, but the installer put it in my 64-bit programs directory.

Advice and suggestions most welcome.

andrewH wrote

I am working with a data set that keeps causing my LibreOffice to freeze.
I am pretty sure that this is only because it is big. It is a
pipe-separated text from the US Economic Census imported into Calc, about
30 columns and around a million rows. (The actual data set is bigger, but
Calc quits at a million-odd. The complete file is about 0.8 gig.) I
suspect but can not prove that this is related to file handling somehow,
e.g. breaking down during auto-saving.

[...]

Because no LibreOffice process seemed to be using much CPU time, I assumed
that it was again frozen rather than still working (though "soffice.bin
*32" still had 1.14 gig in memory) and hit the "finish" button.

[...]

I am running LibreOffice 4.2 on Windows 7 64-bit with SP1, on a Dell i5
machine with 24 gig of RAM. I do not recall there being a separate 64-bit
version, but the installer put it in my 64-bit programs directory.

There is only a 32 bit version of LO for Windows, so that is not an issue.
It sounds to me as though the input filter is choking on the amount of data
to be read. The RAM usage figures are smaller than I would expect for that
amount of file data, although if it is uncompressed then it is less
surprising cf. with my testing in this
<http://ask.libreoffice.org/en/question/29766/what-is-the-filesize-limit-on-a-calc-file/?answer=29841#post-id-29841>
AskLO answer. I have pushed Calc v4.2 even higher on occasion. If it is a
public dataset can you post a link for others to test?
Best wishes, Owen.

It looks to me as though Calc cannot handle more than 1048576 rows of data,

Do you have more rows than that? If yes, then I think that you cannot open the file.

If you have less than that, and, if you think that you are simply running out of memory.... if you can figure out how to get the data to me, I can run a test on a 64 bit version (running on Linux). My machine has 32 GB of RAM...

Hi :slight_smile:
Try Gnumeric;
http://www.gnumeric.org/download.html

It's a dedicated spreadsheet program with a tiny footprint that uses
minimal resources, so it's faster, lighter and more robust than Excel
or Calc. Many people find Gnumeric to be better than Calc or Excel
for serious or hefty spreadsheets and/or for handling many more
spreadsheets in a shorter time-frame.

It can be installed alongside LibreOffice and/or MS Office. It uses
the same format as LibreOffice natively so most spreadsheets can be
bounced between the 2 programs quite happily.

Part of the advantage of LibreOffice is that it fits well into a wider
eco-system and co-operates well with a wider range of programs and
suites allowing you to tailor individual machines to specific
use-cases and yet still retain the ability to share files between
different machines and different people using different OSes and
programs.

Regards from
Tom )

Hi Helen,

if you are using linux it should be in the repos so install from there.

If windows I went to gnumeric.org. Underneath welcome it shows Gnumeric 1.12.13 and I clicked on the "get it from here" link.

On the next page I selected the link under windows, save to my computer, etc.

Not sure what to do if you are mac user :frowning:

HTH

Tim

Hi :slight_smile:
Yeh, from this page
http://www.gnumeric.org/download.html
click the blue link in the paragraph about the Windows version. That
should start downloading the exe file.

If you use Linux or Bsd then the best way is to use one of your
"package managers" (such as "Synaptic package manager") to install the
version in your repositories.
Regards from
Tom :slight_smile:

Thanks for the offer, Andrew!

The original file is definitely bigger than the allowable limit. However,
when I originally tried to import it, it opened up the first third of the
file and allowed me to save it, and reopen it. However, when I tried to
save it a second time, it crashed and the file was corrupted.

I am pretty sure the problem is not that I am running out of memory. I have
24 gig on a 64-bit machine. I think the program is just buggy when you try
to run it at its absolute outside capacity limit.

So I have figured out how to open my file in R and save subsets small
enough to open more reliably.

I wish the program checked for file size and offered a sensible list of
options when it is too big, like opening the first N rows, or opening n
rows after skipping m rows, or opening a taking a p percent sample of rows.
But perhaps that is too much to hope for in a spreadsheet.

And thanks again for your thought and attention.

Warmest regards, Andrew

Thanks Tom!

Gnumeric is a great product and I have used it before. I was hoping to use
Calc in this case because I am trying to learn to use the LO database,
Base, as a stand-alone or a front end for PostgreSQL. But I find it very
hard to define, use, and even just to import a file into a database if I
can not first look at it and determine how missing variables are coded,
which fields are character and which are numeric, etc. Some census products
are really good at giving users this kind of metadata, others not so much.
(The Economic Census metadata is hard enough to read and understand (would
you put material intended to explain something to the public in
pipe-delimited text?) that I have written to them asking for meta-meta
data). So I wanted to use Calc for exploration and base to do the heavy
lifting, in the hope that things might be easier if I stayed withing one
document family. Now I am thinking that some other approach will be easier.

Warmest regards, Andrew

Hi :slight_smile:
Ahh, Base seems to be better when the data is held externally. Postgresql
should be an excellent choice for holding the data and then 'just' get Base
to connect to it.

I would have thought Calc would then be really good at displaying the
data?! So i'm not sure what the problem is now. Hopefully someone better
with databases might be more helpful! Errr, most people on this mailing
list seem to be better with databases than me!
Good luck and regards from
Tom :slight_smile:

Dear Tom -- Thanks for your attention to my problem.
I believe the problem with using Calc as a viewer is that I have 3.6
million lines of data.
But I am getting more comfortable with using R itself to answer these
questions. I think part of my problem is that it is the nature of of legal
training to try to andticipate every possible problem. But when the number
of possible problems exceeds a certain level, my brain runs out of working
memory even if my computer still has RAM. Also, I set ambitious goals for
allowing people to run my software on ordinary laptops a year or three old.
I have to say that since I have scaled those back and got 24 gig of RAM
instead, many problems seem more manageable, even without a database.

Peace, Andrew

Hi :slight_smile:
I think it's the same with programmer training. Inevitably there are too
many variables or unknown factors that rarely occur.

One methodology to handle it is to "release early and release often" to get
"more eyeballs on the code". Even if people can't see the code itself or
don't bother to look it still helps to have a real-world usable tool, even
if it can only work under limited conditions. It being in use helps
prioritise issues that really need to be dealt with. It also helps attract
attention from other people and some of those might be able to help. Some
might only point out problems that hadn't been noticed before and that
might lead to a better understanding of some other problem.

Doing the first draft or building the first prototype is really tough but
once it's done it really helps figure out what needed to be done to build
the prototype. Trying to be too perfect first time is rarely useful.
Regards from
Tom :slight_smile:

"But I am getting more comfortable with using R itself to answer these..."

Excuse my ignorance - What is R?

This particular Thread has provided amazing details on the limitations of
Libre Office Calc, tips, hints, and workarounds- except do not know what is
R. I apologize in advance for not offering a solution. Jackie

Jacqueline Tarleton wrote

"But I am getting more comfortable with using R itself to answer these..."

Excuse my ignorance - What is R?

This particular Thread has provided amazing details on the limitations of
Libre Office Calc, tips, hints, and workarounds- except do not know what
is
R.

You'll find what you seek here...

http://en.wikipedia.org/wiki/R_(programming_language)
http://www.r-project.org/

Stuart

Hi :slight_smile:
i assumed it is a 3rd party database program that contains the tables of
data. Base is usually excellent at linking to such external databases. It
seems to be the best way of using Base and the default way of using it.

The idea is that Base then performs various functions through SQL code; to
sort, filter, concatenate, link and all the rest. All this is done kinda
invisibly, or with a view that most normal users couldn't really cope
with. Like watching a play from back-stage.

Then Calc or Writer get used to produce reports or forms or letters in a
user-friendly way. The reports don't contain the data, neither does Base =
they are just like looking through different windows into the database
tables that are held externally.

So, i don't know what R is either. I get the impression that if it is a
3rd party database program it's something hideously mangled = as if
produced by a micro-managing, non-expert committee.

Regards from
Tom :slight_smile:

Is there a keyboard shortcut that will let me move from cell to cell in a table in Writer? I can not find any mention of one in the help section, and so far, the only way I know to move the cursor to the next cell is to use the mouse.
Ruth Ann, Cincinnati, OH USA

With portable-LO, 4.0.6.2, /Curtains7/, 64b, the four arrow keys move one cell up, down, right, or left, at least for me.
Seems absurdly simple, so maybe there's more to it.

trj

Do the (four) keyboard arrow keys not work for you? You should also find that Tab moves forward in the table and Shift+Tab similarly backwards. For this reason, if you need to insert a Tab character in a table cell, you need to use Ctrl+Tab instead of simple Tab.

I trust this helps.

Brian Barker

My apologies. I had asked about tables, and indeed, your information was very useful. Thank you.

However, what I should have asked about was how to navigate on a page of labels.
Sorry, I think when I look at the page, it looks like a table to me, so I think of it that way. But in reality, it is a page of labels that I am trying to navigate.
The arrow keys only move me around within each label, and the tab key just inserts a tab into the text of the label.
Ruth Ann

http://en.wikipedia.org/wiki/R_(programming_language)

R is a free software programming language and software environment for statistical computing and graphics.

Calc has a limit of one million rows (well, 1048576 if I remember correctly).

I expect that you will not have this problem if you manipulate the data using R. I don't remember off hand how large my data sets were when I last used R, but I never had a problem apart from figuring out how to accomplish what I wanted.

If you have trouble when Calc performs an auto-save, then disable auto-save.

GNUmeric used to have a row limit of 65536 rows, but, that was removed. I have no idea if they simply set a new maximum, or, if they removed the limits in some other way.