Overly aggressive autofomatting issues in LibreOffice

​This *Wapo* article (
https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/)​
- while dealing more specifically with Microsoft's Excel, should, as noted
in the next-to-last paragraph, give even LibreOffice developers pause. If
the claim that LibreOffice behaves like Microsoft Office in this particular
respect is indeed true, then I submit that some attention should be devoted
to making turning off autoformatting in Calc easier for users....

Henri

You have to register to read the article.

I didn't have to.

Nor did I.

I tried - and a popup appers to write down my mailadress and register.
Couldn't read the text in background.

Regards

Robert

I tried - and a popup appers to write down my mailadress and register.
Couldn't read the text in background.

Regards

Robert

This is what I saw as well.

Surely some of the first things to understand about spreadsheets are that
o Each cell is formatted in some way,
o Data entered into cells may automatically be edited, and
o In the absence of explicit cell formatting by the user, entering data may also (invisibly) set the cell format.

If you drive your car into a lamppost or a brick wall, do you call this a vehicle manufacturing error or admit that not having learned to use the brake pedal puts the blame on you? Any spreadsheet user entering "MARCH1" and seeing it instantly converted to 01/03/01 (or perhaps 03/01/01) who doesn't immediately wake up to the problem with their own skills doesn't deserve the label "researcher". (Incidentally, I think LibreOffice will not interfere with "SEPT2" but would with "SEP2".)

Anyone creating a spreadsheet containing gene (or other) names needs to format the appropriate columns as Text before entering data. If they don't carry this out, they should expect confusing results.

It is certainly true that spreadsheets are a particularly fragile method of handing data and need to be used with care. Their apparent ease of use is deceptive: users do indeed need to be aware of proper techniques before being let loose on an application. They should also be aware of the consequences of published spreadsheets being moved between locales and to system with different settings - for example, date origins. It's not clear to me whether by "supplemental files" the author means spreadsheet documents themselves or just material originally entered into spreadsheets. Doesn't the fragility mean that spreadsheet documents are unsuitable for publication in this fashion and that researchers should know to fossilize material, perhaps as PDF, for publication? Any reader who wants to develop the material can contact the author for a copy of the original spreadsheet document; it is then up to them to be aware of the portability issues.

I took my new umbrella out with me the other day but it failed to protect me from the rain. I complained about this failing to the manufacturer, who suggested that just taking it with me was not sufficient and that I needed to unfurl it and hold it above my head. Worse than that, they had the effrontery to suggest that I should have known this!

Brian Barker

​The point, as I understood it, Brian, was that a means to automatically
turn off the autoformatting feature in the spreadsheets in MS Office,
LibreOffice, and Apache OpenOffice, which could be useful to many users,
particularly those writing scientific papers in which the confusion
described in the article, is lacking. Note also that Google Sheets seems to
be able to provide such a means....

To imply, as you seem to do above, that users are simply careless in their
use of these tools is, to my mind, both false and ungenerous...

Henri​

As a research scientist of over 28 years, I do NOT believe the problem is with the spread sheet software but merely due to poor proof reading of the author and (presumably papers are sent for independent review before publication) by the independent reviewer. If this problem has been known for 10 years then even more shame on the authors and reviewers. Spreadsheets are meant for mathematical use (hence Open Office Calc). If you use a program for something other than it primary purpose the you need to take steps to ensure the appropriate steps to protect your data integrity.

Regards

Peter

Please, have a look here :
https://bugs.documentfoundation.org/show_bug.cgi?id=101696

Best regards.
JBF

--
Registered Linux User: #480675
Registered Linux Machine: #408606
Linux since June 2005

On Fri, Aug 26, 2016 at 10:58 AM, M Henri Day <mhenriday@gmail.com> wrote:

2016-08-26 17:31 GMT+02:00 Wade Smart <wadesmart@gmail.com>:

> I tried - and a popup appers to write down my mailadress and register.
> Couldn't read the text in background.
>
> Regards
>
> Robert

This is what I saw as well.

We seem to be discussing an entirely different issue from that which I
originally proposed. I hope Cristopher Ingraham will regard it as fair use
if I reproduce the content of the article below
:

«A surprisingly high number of scientific papers in the field of genetics
contain errors introduced by Microsoft Excel, according to an analysis
recently published in the journal Genome Biology.

A team of Australian researchers analyzed nearly 3,600 genetics papers
published in a number of leading scientific journals — like Nature, Science
and PLoS One. As is common practice in the field, these papers all came with
supplementary files containing lists of genes used in the research.

The Australian researchers found that roughly 1 in 5 of these papers
included errors in their gene lists that were due to Excel automatically
converting gene names to things like calendar dates or random numbers.

You see, genes are often referred to in scientific literature by symbols —
essentially shortened versions of full gene names. The gene "Septin 2" is
typically shortened as SEPT2. "Membrane-Associated Ring Finger (C3HC4) 1, E3
Ubiquitin Protein Ligase" gets mercifully shortened to MARCH1

Even worse, there's no easy way to undo this automatic formatting once it
has happened. Edit -> Undo simply deletes everything in the cell. You can
try to convert the formatting from "General," the default, to "Text," which
you might expect to change it back to the original characters you enter. But
instead, changing the formatting to "Text" makes the cell contents appear as
42615 — Excel's internal numeric code referring to the date 9/2/2016.

Even more troubling, the researchers note that there's no way to permanently
disable automatic date formatting within Excel. Researchers still have to
remember to manually format columns to "Text" before you type anything in
new Excel sheets — every. single. time.
But even the genetics researchers among us are only human, and they
sometimes forget to do this. Hence, you end up with 20 percent of these
genetics papers containing preventable errors introduced by Excel.

The Australian researchers note that this problem was first identified in a
paper published more than a decade ago. "Nevertheless, we find that these
errors continue to pervade supplementary files in the scientific
literature," they write.

Genetics isn't the only field where a life's work can potentially be
undermined by a spreadsheet error. Harvard economists Carmen Reinhart and
Kenneth Rogoff famously made an Excel goof — omitting a few rows of data
from a calculation — that caused them to drastically overstate the negative
GDP impact of high debt burdens. Researchers in other fields occasionally
have to issue retractions after finding Excel errors as well.

The Australian researchers note that Excel isn't the only spreadsheet
program with overly aggressive autoformatting issues — the same errors crop
up in open-source programs like LibreOffice Calc and Apache OpenOffice Calc
too.

They do note, however, that one perfectly free spreadsheet program did not
have any issues storing the gene names as typed — Google Sheets.»

Perhaps now we can get back to discussing the issue of overly aggressive
autoformatting in LibreOffice ?...

Henri

Looks like a user problem to me.
Is no one proof reading this papers before submission?

Wade

- while dealing more specifically with Microsoft's Excel, should, as noted
in the next-to-last paragraph, give even LibreOffice developers pause. If
the claim that LibreOffice behaves like Microsoft Office in this particular

a)The actual research paper is:
Gene name errors are widespread in the scientific literature
    Mark Ziemann,
    Yotam Eren and
    Assam El-OstaEmail author
Genome Biology201617:177
DOI: 10.1186/s13059-016-1044-7
© The Author(s). 2016
Published: 23 August 2016

Downloadable from
http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7

b) Issues with gene researchers using Excel were noted as long as 1999,
with the Human Gene Project. (Researchers are still correcting errors
made by that project, because of their over-reliance on Excel, instead
of using a real database.)

c) Ignoring the issue of using the wrong tool for the job, the majority
of errors are a direct result of the researchers not knowing how to
correctly use the tool that they were using. What doesn't help matters,
is that the researchers don't realise they are having problems, until
after the fact.

some attention should be devoted to making turning off autoformatting

in Calc easier for users

Should people be encouraged to use Calc for a task for which it is
neither designed, nor suitable for, rather than using Base, which is
designed to be a database?

Maybe write a user guide: _Abusing Calc: How to do things without
totally destroying your data, when using Calc for that for which it is
neither designed to do, nor is suitable for_:
* Two chapters on using Calc as a text editor;
* Three chapters on using Calc as a database;
* Two chapters on using Calc as a drawing program;

jonathon

+1 :wink:

And don't forget calc as a photo album.

Someone I knew used Excel to create a road map!

Unless things have changed a lot since I did research, the data in the spreadsheet was not typed by the researcher. It was probably typed by an undergraduate work-study student or, at best, a harried departmental secretary.

Brian, love the umbrella metaphor.

-bill

Which illustrates one of the ongoing characteristics of office suites. They offer many so different methods of accomplishing a task that a user just wanting to get work done has to first invest what seems to be an unreasonable effort into learning how to most effectively use his/her tool. And, heaven forbid if the gene researcher and department secretary mix and match their respective methods into the same document/spreadsheet.

I can understand why a gene researcher, or a department secretary for a gene researcher, might prefer to focus their energy on gene research rather than office suite research.

Virgil