--convert-to txt:Text *.odp

Hello Everyone,

I am strugling with something. I need to dump the text out of
presentation files (Impress, PowerPoint, anything that LibreOffice
Impress can read).

The problem, I can convert to pdf, but not text. Is there any reason
why? After all, grabbing text strings should be easy. However, I get the
following

$ libreoffice --headless --convert-to txt:Text FilterPresentation.odp
convert /tmp/FilterPresentation.odp -> /tmp/FilterPresentation.txt using
Text
Overwriting: /tmp/FilterPresentation.txt
Error: Please reverify input parameters...

Also, why is it that I cannot have any instance of libreoffice running
and have this work at all? (This is testing with open windows, will it
work with just --headless?? I need to be able to do batch jobs that may
overlap.)

Thank you for any help anyone may have.

Trever

Hi Trever,

Trever L. Adams schrieb:

Hello Everyone,

I am strugling with something. I need to dump the text out of
presentation files (Impress, PowerPoint, anything that LibreOffice
Impress can read).

The problem, I can convert to pdf, but not text. Is there any reason
why? After all, grabbing text strings should be easy. However, I get the
following

$ libreoffice --headless --convert-to txt:Text FilterPresentation.odp
convert /tmp/FilterPresentation.odp -> /tmp/FilterPresentation.txt using
Text
Overwriting: /tmp/FilterPresentation.txt
Error: Please reverify input parameters...

Also, why is it that I cannot have any instance of libreoffice running
and have this work at all? (This is testing with open windows, will it
work with just --headless?? I need to be able to do batch jobs that may
overlap.)

Thank you for any help anyone may have.

Why do you need text? Because a typical slide has text not only in the presentation objects but also in graphic objects, it seem useless to me. I do not see an export filter to text in the UI. Where do you find an information about such a filter?

Have you considered to use the file format "OpenDocument Presentation (Flat XML)"? You get a human readable XML file, which you can process then, for example with an XSLT filter, oder store by a control system. This format is specified in ODF as well.

Kind regards
Regina

One suggestion:
            convert to .pdf -
                from here you can view all in order to do whatever;
                   as: 'select all' text --> delete; the images would
remain yet the text would be gone.
       Then you could 'save as' in whatever form you desire.

Hello Everyone,

Hello All,

Unfortunately, as stated, this is for batch jobs. Converting to PDF is
an option (only because of other filters such as pdftotext). The problem
with that, is this is a system that will be used heavily. Every
conversion setup will slow things down and possibly cause loss of
context of the text (where such context exists).

I believe it is a bug that Impress and Calc won't do text export. Yes,
Calc does .csv, but the commas are a problem. A text export should use
spaces (if this can be done on the command line then calc works. Impress
needs a text export.

Why? It is text, let me get it out easily. Why not through <insert
convoluted conversion chain>? Because, the text, as I understood it, is
stored in UTF-8 in all ODT/ODS/ODP, etc. formats. Is this true for files
imported (Word, Excel, etc.)? I don't know. If I just needed the Open
Document formats, I would use odt2txt, ods2txt, odp2txt, etc. (I believe
I have seen all of them). However, I need fast, one-stop conversion to
UTF-8 text for all office formats that LibreOffice can read.

My feature request: Please, implement conversion to text for Impress and
Calc as well. As stated, just dump the text out of ODP (use a newline or
other whitespace character between text objects and slides, otherwise
just straight text). For ODS, csv export works if the comma can be
replaced by a space.

If a bug report belongs elsewhere, I am sorry. I was unable to find a
bugzilla or similar and this seemed the best list for it.

Thank you,
Trever Adams

This will never get looked at unfortunately unless you report it to freedesktop.

https://bugs.freedesktop.org/

Make sure to be very concise in what you are requesting.

Regards,
Joel

Joel,

Thank you very much for helping know where this was. I will file it now.

Trever

Hello Regina,

Thank you for this idea. This may work.

Why? Think ingress/egress filters for email and web. If you can get
text, you can make sure that content is being filtered appropriately (no
matter what the rules and intents might be).

Thank you again,
Trever

<snip>

I believe it is a bug that Impress and Calc won't do text export. Yes,
Calc does .csv, but the commas are a problem. A text export should use
spaces (if this can be done on the command line then calc works. Impress
needs a text export.

In Calc:

1. Save As...
2. Choose to save as csv; tick Edit Filter Settings
3. Accept Use Text CSV Format
4. Tick Fixed Column Width
5 Save file

Saved file is pure text, fixed column width, no commas, no quotes.

<snip>

My feature request: Please, implement conversion to text for Impress and
Calc as well. As stated, just dump the text out of ODP (use a newline or
other whitespace character between text objects and slides, otherwise
just straight text). For ODS, csv export works if the comma can be
replaced by a space.

As far as Impress is concerned, a presentation is essentially a set of

drawings with added bells & whistles. A text export would be so rarely used
that I doubt you could get any developer to be interested in adding such a
feature.

John

Especially since you can just take ODP file, unzip it and strip all XML tags
from content.xml file inside.

Of course result will be barely readable, since most of people has no idea how
"presentation in TXT" may look like.

I cannot do that with Microsoft formats, etc. that LibreOffice supports.

Trever

That will not work for massive batch/automatic conversions.

Trever

I'm sorry. I did not understand your requirement. LO provides a programming
interface that can provide such special case functionality. I doubt that it
would ever be provided in LO itself.

John

Hi :slight_smile:
Once you change the settings for converting to text i think the setting stays the same until you re-set it.  So from then on when you use headless mode to save as text-files it should avoid all the commas.

You know that ".csv" stands for Comma Separated Values and that looks like it's been a standard way of exchanging data in text-format for a couple of decades.  There is also tsv but that seems to be a lot rarer.  Isn't fixed-width a really unusual format?

Regards from
Tom :slight_smile: