cannot convert text to number in spreadsheet from csv file

Greetings,

I have the problem below with LO 3.5.7.2 on a Fedora box.

I have a shell script that generates and save to a text file called synthesis.csv many lines like this:

2013-02-15|Payment A|-100.25|008|fae|
2013-03-15|Payment B|-50.25|008|fae|

Calc loads that file without problems, when tell it to use | as field separator.
Lets assume the payments imports (-100.25 and -50.25) end up in cells F1 and F2

if I write in a third cell the formula =sum(F1:F2) I get zero, because F1 and F2 are seen as text.

OK, so I select those cells and do Format->Cells->Numbers->Number = -1234.12

But I still get zero, because the operation above adds to the content
of \both F1 and F2 a single quote prefix:

F1 = '-100.25
F2 = '-50.25

which, right now, I have to remove manually to make the formula work.

What is happening, and how can I fix it? Even better: how to make LO
automatically format and use strings in the format -1234.12 as numbers
when it opens the file?

Thanks,
      Marco

I have the problem below with LO 3.5.7.2 on a Fedora box. I have a shell script that generates and save to a text file called synthesis.csv many lines like this:

2013-02-15|Payment A|-100.25|008|fae|
2013-03-15|Payment B|-50.25|008|fae|

Calc loads that file without problems, when tell it to use | as field separator. Lets assume the payments imports (-100.25 and -50.25) end up in cells F1 and F2. if I write in a third cell the formula =sum(F1:F2) I get zero, because F1 and F2 are seen as text.

Your problem is there, of course. I cannot reproduce this if I "load that file": in that case, the values are correctly interpreted as numbers. Are you perhaps instead pasting the material into an existing sheet? In that case, the existing cell format of the sheet comes into play. If column F is formatted as text, that's how your values will be interpreted.

OK, so I select those cells and do Format->Cells->Numbers->Number = -1234.12. But I still get zero, because the operation above adds to the content of \both F1 and F2 a single quote prefix:

F1 = '-100.25
F2 = '-50.25

which, right now, I have to remove manually to make the formula work.

It's important to realise that changing cell formatting changes only the way data is displayed: it doesn't change the data itself. So your values are still text. In order to make this clear, an apostrophe is helpfully prepended in the Input Line to values that look like numbers. When you remove those manually, you are asking Calc to reinterpret the values. Now that the cells are formatted as numbers, this will - as you need - create numbers from your text values.

What is happening, and how can I fix it?

If you cannot solve the problem at source, there is a simpler workaround:
o In an unused column, insert the formula =VALUE(F1) in row 1 and fill it down the column.
o Copy the column and paste it back over the source column, but using Edit | Paste Special... (or right-click | Paste Special... or Ctrl+Shift+V) instead of ordinary Paste. In the Paste Special dialogue, remove the tick from "Paste all" and ensure that Numbers is ticked but Formulas *not* ticked.

Even better: how to make LO automatically format and use strings in the format -1234.12 as numbers when it opens the file?

You seem to be writing from a country where the normal fractional separator would be a comma, not a point. Is this a locale issue?

I trust this helps.

Brian Barker

What is happening,

Judging by your name, I suppose you are Italian. I assume that you are
using LibreOffice in your native language, and in Italy, as in most of
Europe, you are using coma as decimal number separator.

And here is answer to your question: your data uses dot as decimal
number separator, but your software expects coma. Since data does not
match expected format, LibreOffice falls back to treating data as text.

and how can I fix it?

Select all columns that have misinterpreted, open "Search and Replace"
dialog. Insert dot (.) in "Search" and coma (,) in "Replace".
Make sure that "Only selected" is checked on, but "Regular expression"
is checked off.
Click "Replace all".

Even better: how to make LO
automatically format and use strings in the format -1234.12 as numbers
when it opens the file?

In text file import dialog, you can see preview of data to be imported.
You can click on any column to select it. Then, using "Column type"
drop-down above preview, select "English (USA)".

You can use the same procedure to tell LibreOffice what format your
dates are, if different from your native one.

Did you click on the column and specify 'standard' as format for this column?

What decimal separator do you use?

When importing decimal '.', make sure the Language option is set to something that works with decimal dots (i.e. English USA)

Greetings,

I have the problem below with LO 3.5.7.2 on a Fedora box.

I have a shell script that generates and save to a text file called synthesis.csv many lines like this:

thanks to all who quickly answered yesterday. Here are more details,
based on your initial comments and suggestions.

I am indeed Italian and in Italy, but this is happening on a Fedora
Linux computer with US English as system language. The shell
environment has LANG=en_US.UTF-8 and "English (USA)" is the default
language in Libreoffice. In the text import form of Calc, charset is
western europe(ascii/us), language is english(usa) and all columns
have standard format header (but changing the charset to utf8 and/or
the format of the interested column to text, or "US english" doesn't
make any difference

Besides, I do not paste anything into a new spreadsheet, I directly
open the result of the script with Calc at the shell prompt, and then
(need to) work on that file in calc, savint it as .ods:

#> cat sources.*txt | ./analyzer.pl > analysis_result.csv ; oocalc analysis_result.csv

the two sample lines in my original email

2013-02-15|Payment A|-100.25|008|fae|
2013-03-15|Payment B|-50.25|008|fae|

where actual lines from analysis_result.csv (which has ASCII encoding,
according to the "file" command), which you may save as test.csv file
and load in calc to see what happens (of course, the result would
depend on YOUR locale, default language etc, but that IS the actual
text I am working on).

Summarizing, it seems I need some efficient way to tell Calc, when it
opens that file, that all the cells of a certain column, except the
header, are NUMBERS, not text, in the -1234.12 format.

The reason / end goal is that I and the other final users of that csv
file must be able to quickly add formulas to that spreadsheet and have
them work as expected without manual reformatting/rewriting of those
columns.

thanks,
  Marco

M. Fioretti wrote

Summarizing, it seems I need some efficient way to tell Calc, when it
opens that file, that all the cells of a certain column, except the
header, are NUMBERS, not text, in the -1234.12 format.

What Mirosław Zalewski has suggested upthread is essentially correct. The
data formats must match the language used (usually the locale) on import.
The solutions are to either: (a) edit the data to contain data formats used
in the expected language; (b) change the expected language on the Text
Import dialogue to match the used data formats. For a numeric format of
-100.25 try selecting a Language of "English (US)" or equivalent on the Text
Import dialog.
Best wishes, Owen.

as I wrote in the email you replied to, this is exactly what I have been
doing so far. And it doesn't work, or isn't enough.

      Marco

Hi, don't know if this helps but, I have seen the same thing in LO and
other programs too. Zotero, under windows 8.1.1. English (Ireland), where
LO was in English, but the files coming from Italian versons kept the
definitions as they would be Italian instead of Irish (as the locale was
Irish English). For Zotero the program "decided" the locale based on the IP
of the machine (don't know why) and needed an config change. For LO files
from Italian source they needed to be saved as "new name" and the locale
changed.

Could it be that for some reason programs also look at your IP address and
when seeing that it is in X-location (western) assume that your system
locale is based on that IP zone. I have seen that when moving my PC from
Italy to Finland e.g. Why this happens I have no idea.

The above might be a long shoot but since it happens for some programs
might be a worth to check out.

Greetings,

I have the problem below with LO 3.5.7.2 on a Fedora box.

I have a shell script that generates and save to a text file called synthesis.csv many lines like this:

>>>2013-02-15|Payment A|-100.25|008|fae|
>>>2013-03-15|Payment B|-50.25|008|fae|

I had a flash (some would call it a desperate shoot in the dark...),
and modified the script to enclose in double quotes the numbers above:

>>>2013-02-15|Payment A|"-100.25"|008|fae|
>>>2013-03-15|Payment B|"-50.25"|008|fae|

now everything works as expected. When I open the CSV file with calc,
I can just write in another cell, WITHOUT any manual cell reformatting,

=sum(F1:F2)

and I get -150.50 displayed in that cell. Yay!

Now, if someone could explain **why** this works, or even better: why
this was necessary on my system, but not for other people who wrote to
me privately to help debugging the problem, it would be even better...

   Marco

I just ran some tests on my system, and it seems like all I need to do
is choose a "Language" that uses the correct decimal specifier, and the
numbers are imported as numbers.

Then I experimented a bit with the "Column type" for the number column.
If I use a comma as decimal separator in the file, and choose a
"Language" that uses commas, then the numbers are imported as numbers,
both when I leave the "Column type" as "Standard" and when I choose "US
English". If, however, I use a full stop as decimal separator in the
file, and choose a "Language" that uses commas during import, then
leaving the "Column type" as "Standard" means the numbers are imported
as text, but changing the "Column type" to "US English" means the
numbers are now imported correctly as numbers.

One of my first thoughts for solving this problem was simply to choose
the column type on import, and it seems like this will solve the
problem, although just choosing a correct "Language" should have as
well, and apparently it didn't. But I am still surprised to see so
little choice in the column types. The "Standard", "Text" and "Hide"
options make perfect sense, but I'm not sure why a "US English" is
needed. Maybe because it's the most common, so you might need the
overall document to be one "Language", and specific columns to be
another? But then why only the choice of "US English", and not any
language? Date types also make sense, although some testing shows that
you don't have any choice for the separator. And why isn't there any
number types?

Maybe a feature request needs to be added to expand the column types?
Maybe with options to specify the date format more completely, to
specify a number format, and to choose any language? Sounds right, but
will it be easy enough to do? And if it is, maybe even more power,
maybe the option to use functions on input columns, so that you could
transform data in more ways? That's probably getting beyond the scope
of imports, though, and is best done in the document itself after
import. But at least the ability to import numbers and dates with a
little more flexibility?

Just my thoughts, as I see nobody else has mentioned the column types
yet.

Paul

Hi :slight_smile:
Feature requests are often a good plan. The QA team and then the devs can
then work out how to move things forwards.
https://wiki.documentfoundation.org/QA/BugReport
CSV is notoriously non-standardised so differences between different
people's experiences is not uncommon. It is an extremely efficient format
for transmission of data though so it's used quite widely and that is
unlikely to change.
Regards from
Tom :slight_smile:

I import English web data.
I installed the Extension CT2N, Convert Text To Numbers, and this puts an
icon in the Tool Bar.
A click on CT2N and it is done.
I had a problem with currency prefixes as they did not convert, but Cor Nous
sorted that for me.

Tink.