convert csv to xlsx command line UTF8 coding problem

Hi

I run from the command line

/usr/bin/soffice --headless --convert-to xlsx test.csv

Which works nicely with ASCII code, however if the csv contains UTF8
the coding fails, the same is true for the ods format.
What can I do?

I am not sure whether the attachments pass the SPAM filter so here is
the mini csv

In the help it list options for infilter and filter options for convert-to.
Don't know if you would need one or both, and it does list UTF8, but don't
think the help shows all options.

LibreOffice 5.3.3.2 3d9a8b4b4e538a85e0782bd6c2d430bafe583448

Usage: soffice [options] [documents...]

Options:
--minimized keep startup bitmap minimized.
--invisible no startup screen, no default document and no UI.
--norestore suppress restart/restore after fatal errors.
--quickstart starts the quickstart service
--safe-mode starts the safe mode
--nologo don't show startup screen.
--nolockcheck don't check for remote instances using the installation
--nodefault don't start with an empty document
--headless like invisible but no user interaction at all.
--help/-h/-? show this message and exit.
--version display the version information.
--writer create new text document.
--calc create new spreadsheet document.
--draw create new drawing.
--impress create new presentation.
--base create new database.
--math create new formula.
--global create new global document.
--web create new HTML document.
-o open documents regardless whether they are templates or not.
-n always open documents as new files (use as template).

–display <display>
      Specify X-Display to use in Unix/X11 versions.
-p <documents…>
      print the specified documents on the default printer.
–pt <printer> <documents…>
      print the specified documents on the specified printer.
–view <documents…>
      open the specified documents in viewer-(readonly-)mode.
–show <presentation>
      open the specified presentation and start it immediately
–language=<language_tag>
      Override the UI language with the given locale
      Eg. --language=fr
–accept=<accept-string>
      Specify an UNO connect-string to create an UNO acceptor through which
      other programs can connect to access the API
–unaccept=<accept-string>
      Close an acceptor that was created with --accept=<accept-string>
      Use --unnaccept=all to close all open acceptors
–infilter=<filter>[:filter_options]
      Force an input filter type if possible
      Eg. --infilter="Calc Office Open XML"
          --infilter="Text (encoded):UTF8,LF,"
–convert-to output_file_extension[:output_filter_name[:output_filter_options]]
[–outdir output_dir] files
      Batch convert files (implies --headless).
      If --outdir is not specified then current working dir is used as output_dir.
      Eg. --convert-to pdf *.doc
          --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
          --convert-to "html:XHTML Writer File:UTF8" *.doc
          --convert-to "txt:Text (encoded):UTF8" *.doc
–print-to-file [-printer-name printer_name] [–outdir output_dir] files
      Batch print files to file.
      If --outdir is not specified then current working dir is used as output_dir.
      Eg. --print-to-file *.doc
          --print-to-file --printer-name nasty_lowres_printer --outdir /home/user
*.doc
–cat files
      Dump text content of the files to console
      Eg. --cat *.odt
–pidfile=file
      Store soffice.bin pid to file.
-env:<VAR>[=<VALUE>]
      Set a bootstrap variable.
      Eg. -env:UserInstallation=file:///tmp/test to set a non-default user profile
path.

Remaining arguments will be treated as filenames or URLs of documents to
open.

"Michael" == Michael D Setzer <msetzerii@gmail.com> writes:

   > In the help it list options for infilter and filter options for
   > convert-to. Don't know if you would need one or both, and it does
   > list UTF8, but don't think the help shows all options.

Thanks

I tried

/usr/bin/soffice --headless --utf-8 --convert-to xlsx test.csv

In various order but obtained
Unknown option: --utf-8

The help shows no --utf-8 option, it is a filter not an option?

I think this is what it would look like? Not sure of the LF, part of it?

/usr/bin/soffice --headless --infilter=Text (encoded):UTF8,LF,–convert-to xlsx test.csv

"Michael" == Michael D Setzer <msetzerii@gmail.com> writes:

   > The help shows no --utf-8 option, it is a filter not an option?
   > I think this is what it would look like? Not sure of the LF, part of it?

   > /usr/bin/soffice --headless --infilter=Text (encoded):UTF8,LF,–convert-to xlsx test.csv

Thanks I tried it out in various variations

/usr/bin/soffice --headless --infilter="Text(encoded):UTF8" --convert-to xlsx test.csv

But the encoding is still broken.

Je la 22/06/2017 10:44, Uwe Brauer skribis :

"Michael" == Michael D Setzer <msetzerii@gmail.com> writes:

    > The help shows no --utf-8 option, it is a filter not an option?
    > I think this is what it would look like? Not sure of the LF, part of it?

    > /usr/bin/soffice --headless --infilter=Text (encoded):UTF8,LF,–convert-to xlsx test.csv

Thanks I tried it out in various variations

/usr/bin/soffice --headless --infilter="Text(encoded):UTF8" --convert-to xlsx test.csv

But the encoding is still broken.

I had recently a coding problem (for PHP .htaccess), because my favourite text editor "gedit" handles only UTF8
I was able to understand and solve using "medit", a text editor available through synaptic in my Ubuntu 17.04. It allowed me to chane the code of the file, which I switched to "occidental", and so made visible all the UTF8 characters (or parts of)

iconv -f utf-8 -t ascii//TRANSLIT|

This may allow you to pipe/redirect into your soffice command.

There is also

konwert utf8-ascii|

at https://sourceforge.net/projects/konwert/

However it is unclear how the command is used with a file name.

I have not tried either, but may work for you.

Hope this helps

http://www.unix.com/man-page/debian/1/konwert/