Using LibreOffice on the command line to batch convert .htm files to .html files

Hello all,

This is my first post.

I am working on migrating a website. I am trying to convert many files
written in an old version of MS Word, which were then saved as old
Microsoft 2002/2003 XML files. The files were saved using an .htm
extension. The files are filled with Microsoft xml crud. (I will just
refer to them as .htm files for the rest of this e-mail)

I found a simple solution, in simply opening the file in LibreOffice
Writer, and re-saving the file in HTML Document (Writer) (.html) format.
Now the files work great.

I don't want to do this one file at a time obviously, as there are hundreds
of these .htm files. I am trying to figure out a way to do this for
multiple files in a folder...I think the term is "batch processing".

In other words, have a script that will:
1. iterate through each .htm file in a folder
2. open the file in LibreOffice Writer
3. save the .htm file in HTML Document (Writer)(.html) format
4. close the file
5. iterate over all the remaining files in the folder until all files have
had their formats changed

Is there a way to do this via a command line script. Or by creating a
batch file?

I'm sorry, I'm a bit of a novice when it comes to the command line or batch
files. I know how to open LibreOffice Writer.exe from the command line
with one argument, which will open that document, but that's about it.

I have some experience in other scripting languages, like Python, Perl,
etc, but not windows scripting. I am having a very difficult time getting
this to work in Python, so I thought I would come here and try to ask for
guidance.

I could attach a copy of one of the .htm files that I am converting if that
would help, but don't want to attach a file in my very first e-mail.

thank you,
Joe
paperbag76@gmail.com

Hi :slight_smile:
Ahh, just spotted the give-away ".exe" so it sounds like you are using
Windows. It is still worth trying the "--help" tag to see if you do get a
quick-help cheat-sheet.

Let us know either way! :slight_smile:
Regards from
Tom :slight_smile:

Hi :slight_smile:
We call it "headless mode". Errr, which OS are you using? Is it a Windows
or a Gnu&Linux or Mac?

Headless mode can be scripted and there might even be a thread in the
archives that shows a decent script worth copying. I think the better way
is to try using LibreOffice on the command-line and get it doing more and
more until you've figured it out. For example does
soffice
or
lowriter
work from the command-line? On my Gnu&Linux both work but some OSes might
be limited to using just 1 of those. Then try, for example
lowriter --help
to get a quick cheat-sheet of options.

Hopefully people on this list can help but there might also be
documentation at
https://wiki.documentfoundation.org/Documentation/Other_Documentation_and_Resources#Programmers
or scroll up a bit to see what is in the "Corporate Users" section of the
page.

Attachments don't get to the mailing-list anyway! You can use Nabble to
upload them to a central place so that people can choose to look if they
want.

I would try to keep the original documents in MS format so that if there is
any problem with some tiny subset of all the ones being converted then you
can focus on those and do them with a bit more finesse. However from Doc,
Xls etc to Odt, Ods etc should work reasonably well.

It's the DocX, XlsX etc that is a bit more unpredictable thanks to MS's
constant changing of that format (currently on at least 3 different
"transistional" versions and at least 1 "strict" none of which seem to
fully comply with their ISO promise). Even with those i think a
batch-process using a scripted headless mode is the best plan and then deal
with individual oddities later.

Regards from
Tom :slight_smile:

Hi Tom,

a little warning (GNU/Linux):
Dont use headless soffice while soffice is running in a visible window. It
might screw up your window dimensions, when opening the next window i.e. in
writer. I had soffice headless run by a script in the background for faxes
and it drove me crazy. Now I am using an older version of LO in parallel for
headless fax spooling.

Yours
Walther

Hello Tom,

Thanks for being so nice and offering your help.

Yes, I am running Windows 7 64 bit.

My cmd.exe does not recognize "lowriter", but it does recognize "soffice".
When I typed "soffice --help", a window pops up called "Help Message" It
lists the LibreOffice version I am using. Then it says :
"Usage: soffice [options] [documents...]"
Then there is a list of options flags. But the list is quite short...only
a page. There's nothing in it about batch processing. I really didn't
glean anything from it.

I followed the link you gave me to
https://wiki.documentfoundation.org/Documentation/Other_Documentation_and_Resources#Programmersand
...well it's pretty overwhelming. I then followed the link for
"Andrew
Pitonyak's macro page", hoping that he might have some pre-baked code to
run as a macro in LibreOffice to batch convert files between different
formats. I downloaded his "Useful Macro Information" file, but it's 518
pages! I did look through the contents to try to find something about
batch processing or file format changing, but there is nothing.

So...I'm stumped on what to do next.

Secondly, I should clarify more the files I'm starting with. I believe
they were written in Microsoft Word from...whatever version existed in the
90s. Then at some later point, the files were somehow converted to this
crazy Microsoft XML format, but saved with an ".htm" file extension. The
files are full of bizarre Microsoft Server-specific instructions that just
totally break the webpage. I'm using Apache as my server, not Microsoft
Server, and Apache can't understand all those weird Microsoft
Server-specific commands like:
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"

I understand that file attachments are not allowed. If it would help to
get a look at the source code for one of these files, just so you have some
inkling of what I'm talking about, I can try to post it somewhere for you
to take a gander at.

Thanks,
Joe
paperbag76@gmail.com

Hi Joe,

I'm also running Win 7 64 bit, and my LO (4.1.1.2) executable sits under

c:\Program Files (x86)\LibreOffice 4\program\soffice.exe

running that with the "--help" parameter gives a dialog window with a
bunch of options, too many for the dialog window, which doesn't have a
scroll bar, so I can't see them all. It even writes over the close
button. There seems to be a bug...
Also, the close button cannot actually be clicked...

Luckily, at the bottom just before it gets cut off I see the
following, which, as I can't actually copy any of the text, is typed
out and may contain a spelling error or two, plus is badly formatted:

--convert-to output_file_extension[:output_filter_name] [--outdir
output_dir] files
    Batch convert files.
    If --outdir is not specified then current working dir is used as
    output_dir.
    Eg. --convert-to pdf *.doc
    --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc

This seems to be what you need. You should be able to put all the files
in one directory and run LO with the parameter "--convert-to pdf
*.htm", possibly giving another directory as outdir, and possibly with
the --headless parameter.

Hope this helps.

Paul

Hi Tom,

Hi :slight_smile:
I think you can use a pipe such as "| more" to get just 1 screenfull
at a time.

That pipes the output from one command to another, which won't help in
my case as the executable doesn't produce any console output, it
opens a dialog window.

I actually had LO open at the time, and running the executable just
opened another dialog window, so maybe this is different if LO isn't
already running, in which case your suggestion might work (although
wouldn't be strictly necessary with windows command line having a
scrollable buffer), I haven't tried.

the "--web" tag looks interesting too.

--web create new HTML document.

In this case I don't see how the "--web" parameter would be of any
help, it just opens a new blank web document, AIUI.

Paul

Paul,

This was amazingly helpful. I think finally I'm on the right path.

I too found that same problem with the help file. The dialog window is cut
off by the bottom of the screen. There is no scroll bar. The dialog
window got cut off right at the command I needed. I also saw the "close"
button that was just floating in the bottom middle of the dialog window
that cannot be clicked. This bug desperately needs to be fixed.

Do you know, is there a way to see the full dialog window, with all its
commands, somewhere on the web? Is it archived somewhere in the
LibreOffice help?

I have found more detailed help on using this specific "convert-to" command
at
http://ask.libreoffice.org/en/question/2641/convert-to-command-line-parameter/

But I cannot find the full list of commands.

Also, and most importantly, it appears to me that this command cannot run
in batch mode? In the example given at the website I linked above, the
format of the command only takes one file at a time. How would it process
multiple files at a time, if the final parameter is a file name? Wouldn't
it need to be a directory name?

thank you so much
joe

Hi Joe,

Glad this got you started in the right direction. See inline for more...

This bug desperately needs to be fixed.

Yes, I agree. Should someone open a bug report, or is there one
already? Anybody know more about this? I haven't actually gotten as far
as registering an account for the bugtrack system before...

Also, and most importantly, it appears to me that this command cannot
run in batch mode? In the example given at the website I linked
above, the format of the command only takes one file at a time. How
would it process multiple files at a time, if the final parameter is
a file name? Wouldn't it need to be a directory name?

Well, no. It does take a filename, but according to the help you can use
wildcard characters, like in my example in the previous mail. So for
example to convert all the ".htm" files in a directory, you would use
"*.htm" as the file name, like so

soffice.exe --convert-to pdf *.htm

And, as commented before, you might want to give an output
directory with "--outdir", and might need to specify "--headless".

Note that according to the URL you gave, not using --headless will open
a blank LO window and not finish until you close that, and using
--headless when LO is already open will *silently do nothing*.

Also, according to the URL you gave, wildcard characters don't
actually work in the filename, so you have to use a batch script. An
example is given in the second last answer. Note that the example given
is supposed to be run straight from the command line, but could also be
placed in a batch file, although you may have to make the change
recommended in the answer, and you will *have* to change the "(*.odg)"
to "(*.htm)" in your case. To make a batch file, simply copy and
paste the commands into a plain text file, save it, then rename
it to change the ".txt" extension to ".bat". Then you should be able to
run the file (either by double clicking or from the command line), and
it will process all the .htm files in the same directory as the batch
file.

You could also write the same batch file in python, as I seem to recall
you stating you were at least a little familiar with python.

You'll have to experiment to see if you can use wildcard characters or
not, as they would be by far the easiest solution. And if they don't
work, a bug should *definitely* be filed.

Hope this gets you all the way to a solution.

Paul

Paul,

I feel like I'm almost there.

This may be outside the scope of this listserv, so I hope it is ok if I ask
this question.

The example given on
http://ask.libreoffice.org/en/question/2641/convert-to-command-line-parameter/
is as follows:

set path=%path%;C:\Program Files (x86)\LibreOffice 4\programfor %f in
(*.odg) do (
    soffice.exe --headless --convert-to pdf --outdir "C:\tmp" %f)

I do not know what language this is. I do have some programming
experience, mostly in Perl, years ago (I forgot most of it...I'm rusty. I
took C in college too but can't remember that either). I've also done some
programming tutorials on Codecademy.com in Javascript, Python, jQuery,
HTML, CSS.

Here is my understanding of the code. The writer first adds the
LibreOffice directory to the "path" environment variable. Path is a
Windows operating system environment variable containing special
directories. These directories tell Windows where to look for executable
files. Thus, any executable file that is in a folder, that is in the "path"
environment variable, can be run at the command prompt by simply typing its
name, without having to specify exactly where it is. For example, typing
"soffice" (just the executable file's name) at the command prompt, instead
of "C:\Program Files (x86)\LibreOffice 4\program", will open soffice.exe.
It makes using the command line simpler and quicker.
In that way, the writer can simply start his code in the for loop with
"soffice".

What I don't understand is, how does the batch file know where to look for
the input files? All that the batch file is given is an iterator variable
%f. This iterator variable ideally takes on the values, one by one, of the
file names that end with .odg (in this case). But how does the command
line know where to look to find those .odg files to convert in the first
place?

Thanks again, and sorry for so many questions. I just feel so close, and
I've been working on this problem for days.
joe

Hi Joe,

set path=%path%;C:\Program Files (x86)\LibreOffice 4\programfor %f in
(*.odg) do (
    soffice.exe --headless --convert-to pdf --outdir "C:\tmp" %f)

Let me format that a bit better, tweak it slightly, and change it to
match your requirements (although you must still change "C:\tmp" to
your preferred location):

1) set path=%path%;C:\Program Files (x86)\LibreOffice 4\program
2) for %f in (*.htm) do soffice.exe --headless --convert-to pdf --outdir
"C:\tmp" %f

Sorry if this wordwraps. It's meant to be only two lines, and I have
numbered each line, which is not supposed to be part of the line.

I do not know what language this is.

This is what used to be DOS, and is now cmd.com under Windows, which
provides a command line with the same syntax as DOS. On *nix systems
this is called a shell script, in the windows world it's a batch file,
and sometimes also referred to as shell scripting. See here for an
intro:

http://www.computerhope.com/batch.htm

This is therefore not a language per se, but rather commands that the
command line interpreter understands.

So this could either be typed into the command line one line at a time,
pressing enter after each line, or this could all be put in a file,
called a batch file and given the ".bat" extension, and run as one
single item. Either way is almost exactly the same. A batch file is
simply a plain text file of DOS commands (I'm going to call them that
for simplicities sake) that cmd.com knows to execute by running each
line as if it were typed into the terminal directly. The only
difference in this instance is that if the commands are run line by line
from the command line, using "%f" as above is fine, but if they are in
a batch file instead, you need "%%f", so the second line becomes:

2) for %%f in (*.htm) do soffice.exe --headless --convert-to pdf
--outdir "C:\tmp" %%f

Here is my understanding of the code. The writer first adds the
LibreOffice directory to the "path" environment variable. Path is a
Windows operating system environment variable containing special
directories. These directories tell Windows where to look for
executable files. Thus, any executable file that is in a folder, that
is in the "path" environment variable, can be run at the command
prompt by simply typing its name, without having to specify exactly
where it is. For example, typing "soffice" (just the executable
file's name) at the command prompt, instead of "C:\Program Files
(x86)\LibreOffice 4\program", will open soffice.exe. It makes using
the command line simpler and quicker. In that way, the writer can
simply start his code in the for loop with "soffice".

Exactly correct.

What I don't understand is, how does the batch file know where to
look for the input files? All that the batch file is given is an
iterator variable %f.

The batch file would be the whole thing, so the "batch file" as you put
it isn't given anything, it's just run. Line two can be broken up into
the following fragments:

1) for %f in (*.htm) do
2) soffice.exe --headless --convert-to pdf --outdir "C:\tmp" %f

The first fragment is a "for" command, which is a built in DOS command,
i.e. cmd.com knows what to do with it without needing to run a program.
Almost anything you type at the command line is either a builtin
command or the name of a program to run. See here for more info on the
"for" command:
http://www.robvanderwoude.com/for.php

The for command iterates over a list and executes a command for each
item in the list.

The for command consists of the keyword "for", a variable to hold each
iteration of the list, the "in" keyword, a list to iterate over in
brackets, and the "do" keyword, and then a command to execute for each
item in the list (the second fragment). Here the variable is "f", and to
tell DOS that it is a variable, we need to precede it with a percent
sign, or two if it is in a batch file.

The for loop will execute the second fragment for each item in the
list, and each time the "%f" will hold the next item in the list. In
this case the items in the list are all the files with a ".htm"
extension, so each time the second fragment is run, the "%f" will hold
the name of the next file with an ".htm" extension.

The list can be given as, say, a simple list, like so

"(file1.htm file2.htm file3.htm)"

but that would mean typing out all the filenames. By using a wildcard
character (see: http://www.ahuka.com/?page_id=31) DOS knows that this
means the list consists of all the files in the current directory that
have the ".htm" extension.

The second fragment, therefore, is run multiple times, each time with a
different filename, and does exactly what you would expect.

This iterator variable ideally takes on the
values, one by one, of the file names that end with .odg (in this
case). But how does the command line know where to look to find
those .odg files to convert in the first place?
From the above I hope it is clear that it comes from the for loop,

specifically from the "(*.htm)"

Remember, the "dir" command is another builtin, and it takes a
list of files and displays each one with its size and other attributes.
So you can either give "dir" a single file to list (or multiple files
separated by space), or give it a filename with wildcards to match more
than one file. DOS actually replaces the filename pattern that contains
the wildcards with a list of files that match and hands that list to
"dir". In the same way the "for" command takes a list of files, and DOS
simply translates that wildcard pattern into a list of files for us
before handing it to the "for" command.

It's called shell globbing, and basically it means that the shell
(cmd.com in this case) will translate a filename with wildcards into a
list of filenames that match, before actually calling the builtin or
program. It doesn't work everywhere, in some places the shell doesn't
glob, and in others even though it does, the program or builtin
doesn't accept more than one single file and so won't work despite
the shell actually doing the globbing, but most places you need to give
a filename you can use wildcards and the shell will glob it, and any
program or builtin that works with a list of files will be happy.

http://en.wikipedia.org/wiki/Glob_(programming)

Thanks again, and sorry for so many questions. I just feel so close,
and I've been working on this problem for days.

No worries. Hope this makes it all clear to you.

I'm heading for bed now, so if there is anything else, please feel free
to ask, but expect a little longer delay in my response this time :slight_smile:

Paul

Paul, your answers are extremely helpful and polite.
I know you won't see this for awhile. That's ok. I can't believe you're
still up anyways, seeing as I think you're in South Africa from your e-mail
address.

I think I've sort of underestimated my explanation of my knowledge of
computer science to you. I do understand for-loops very well. What I am
asking has more to do with paths and filesystems I think.
When I say, "Where does the .bat file for-loop know where to find the .htm
files to iterate over?", I don't mean in the actual for-loop code. I
understand, the for loop code looks at the "conditional" part of the
for-loop. What I mean is, where is this list of files in the computer's
filesystem?

I think this quote of yours might get to the heart of the issue:

"but that would mean typing out all the filenames. By using a wildcard
character (see: http://www.ahuka.com/?page_id=31) DOS knows that this
means the list consists of all the files in the current directory that
have the ".htm" extension."

When you say "the list consists of all the files in the *current *directory
that have the .htm extension", that's what I'm getting at, when I say,
"where does the .bat file know where to get its input .htm files from?".
The filesystem directory.

In sum: Does this mean that my .bat file would have to go into the same
windows directory as my input .htm files, in order to know which ".htm"
files it should use to create the glob?

And thus, generally speaking, does this mean that .bat files must always go
into the same directory as the files that they take as input?

I'm really sorry if this is basic stupid questions. I learned some high
level languages, but never DOS.

thank you,
joe

Hi :slight_smile:
I think you can use a pipe such as "| more" to get just 1 screenfull at a
time. On Uk keyboards the | is between the Shift button on the left and
the Z key. On US keyboards and laptops it's often somewhere around the
"Enter". On the key it looks like the vertical line is broken in the
middle but it doesn't show the gap onscreen.

soffice --help | more

However that end bit looks roughly the same as mine so i can just
copy&paste what i get into the email. I've copied the whole thing because
it's quite short really. The bit about converting is at the end but the
"--web" tag looks interesting too.

"LibreOffice 3.5

Usage: soffice [options] [documents...]

Options:
--minimized keep startup bitmap minimized.
--invisible no startup screen, no default document and no UI.
--norestore suppress restart/restore after fatal errors.
--quickstart starts the quickstart service
--nologo don't show startup screen.
--nolockcheck don't check for remote instances using the installation
--nodefault don't start with an empty document
--headless like invisible but no userinteraction at all.
--help/-h/-? show this message and exit.
--version display the version information.
--writer create new text document.
--calc create new spreadsheet document.
--draw create new drawing.
--impress create new presentation.
--base create new database.
--math create new formula.
--global create new global document.
--web create new HTML document.
-o open documents regardless whether they are templates or not.
-n always open documents as new files (use as template).

--display <display>
      Specify X-Display to use in Unix/X11 versions.
-p <documents...>
      print the specified documents on the default printer.
--pt <printer> <documents...>
      print the specified documents on the specified printer.
--view <documents...>
      open the specified documents in viewer-(readonly-)mode.
--show <presentation>
      open the specified presentation and start it immediately
--accept=<accept-string>
      Specify an UNO connect-string to create an UNO acceptor through which
      other programs can connect to access the API
--unaccept=<accept-string>
      Close an acceptor that was created with --accept=<accept-string>
      Use --unnaccept=all to close all open acceptors
--infilter=<filter>
      Force an input filter type if possible
      Eg. --infilter="Calc Office Open XML"
--convert-to output_file_extension[:output_filter_name] [--outdir
output_dir] files
      Batch convert files.
      If --outdir is not specified then current working dir is used as
output_dir.
      Eg. --convert-to pdf *.doc
          --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
--print-to-file [-printer-name printer_name] [--outdir output_dir] files
      Batch print files to file.
      If --outdir is not specified then current working dir is used as
output_dir.
      Eg. --print-to-file *.doc
          --print-to-file --printer-name nasty_lowres_printer --outdir
/home/user *.doc

Remaining arguments will be treated as filenames or URLs of documents to
open.
"

Walther, i take it you mean other existing instances of LibreOffice that
are already open suddenly leap to a new position on the screen and/or
resize? I'm not sure it would be a problem for this case but it's good to
know! Thanks! :slight_smile:
Regards from
Tom :slight_smile:

Hi :slight_smile:
So on the command-line you would do something like

cd c:/Documents and Settings/Joe B/My Documents/html files

to get into the right folder and then run the batch file from there. I'm
not sure if you can use

cd c:

tbh, i think you have to skip the c: bit and plunge straight into the rest
of it. Also there is likely to be problems with the spaces so you might
have to use "escape characters" making it more like

cd "Documents and Settings"/"Joe B"/blah..

or

cd Documents\ and\ Settings/Joe\ B/My\ Documents/html\ files

or some other strangeness. It's why i prefer to avoid using spaces in file
and folder names. CamelCase is so much easier.
Regards from
Tom :slight_smile:

Hi Joe,

Paul, your answers are extremely helpful and polite.

Glad I can help!

I know you won't see this for awhile. That's ok. I can't believe
you're still up anyways, seeing as I think you're in South Africa
from your e-mail address.

Yeah, I'm a bit of a night owl, although last night was probably later
than normal for me.

I think I've sort of underestimated my explanation of my knowledge of
computer science to you. I do understand for-loops very well. What
I am asking has more to do with paths and filesystems I think.

Ok, sure, sorry, you had said you weren't much of a programmer and I
knew you were missing something so I figured I'd cover it completely.

When I say, "Where does the .bat file for-loop know where to find
the .htm files to iterate over?", I don't mean in the actual for-loop
code. I understand, the for loop code looks at the "conditional"
part of the for-loop. What I mean is, where is this list of files in
the computer's filesystem?

If you are working at the command line, most commands work only in the
current directory. For example, the "dir" command only lists files in
the current directory. It lists subdirectories, but not the files in
them. Shell globbing (translating the *.htm to a list of files ending
in .htm) also works only in the current directory. The shell interprets
the wildcard pattern, and looks for all files in the current directory
that match that pattern, and gives that list of files to the command
instead of the wildcard pattern you typed.

This is the same behaviour on *nix. On *nix you can use "pwd" to get
the "present working directory", on Windows it seems that just typing
"cd" gives you the current directory, but it should also be part of the
prompt. I'm sure you know that you can use the "cd" command to "change
directory", and that's on both windows and *nix.

So when typing the script out at the command line instead of running it
from a batch file, it will always work in whatever directory you're
currently in. If you put the script in a batch file, it still only
works in the current directory. If you run the batch file from the
command line, the current directory is whichever directory you are in
when you run the script, not the directory in which the script is. So,
for example, if the script is in "c:\utils" but you are in "c:\docs",
you will need to either add "c:\utils" to the path, or run the script as
"c:\utils\script.bat". In either case, that just tells the shell where
to find the script, it doesn't change the current directory, so the
script resides in "c:\utils", but is run in and acts on the files in
"c:\docs".

If you run the script by double clicking it from windows explorer,
there is still a current directory that the script runs in, it's just
behind the scenes, and windows sets the current directory of the
executing script to the directory it is saved in. So if the script is in
"c:\utils" and you double click it from within windows explorer, it
will run in "c:\utils".

This can be changed, of course. You need to create a shortcut to the
script, and then right click the shortcut to set properties. You will
see that the "target" field refers to the script itself, but there is
also a "Start in" field, which by default will be the same directory as
the script is in. If you change that, it will change the
current directory that the script executes in.

Alternatively, you can put a "cd" command in the script itself to
change to the desired directory before the rest of the instructions.

In sum: Does this mean that my .bat file would have to go into the
same windows directory as my input .htm files, in order to know which
".htm" files it should use to create the glob?

Yes, exactly.

And thus, generally speaking, does this mean that .bat files must
always go into the same directory as the files that they take as
input?

Yes, unless you put "cd" commands in the batch file, or use a shortcut
to change the working directory.

I'm really sorry if this is basic stupid questions. I learned some
high level languages, but never DOS.

Heh, I started in the days before windows, so DOS was the only way to
do things.

The fundamentals are the same on DOS and *nix, although the *nix command
line has a lot more power. I'd recommend reading up a bit on this.
Windows GUIs are a fancy way of doing essentially the same stuff, and
often this sort of thing is going on under the hood, so understanding
it sometimes comes in handy.

Hope you're all squared away now.

Paul

Hi Tom,

The tricky bit is knowing where to put the batch file so that Windows
can run it from anywhere. Maybe the root directory, C: ? It's not
very elegant and is probably bad practice security-wise but i don't
know a better answer.

This is a matter of preference, largely, but normally I would say the
best would be to create some sort ot "utils" folder with all your
batch files, and then add that to the path, so that the batch files can
be called from anywhere.

Paul

Hi :slight_smile:
So on the command-line you would do something like

cd c:/Documents and Settings/Joe B/My Documents/html files

to get into the right folder and then run the batch file from there.
I'm not sure if you can use

cd c:

Yes, you can use that to change directory. Note that this won't work to
change drive, though. Each drive has its own current directory, and you
use just a drive letter and colon to change drives. So if for example
you are on the G drive, and you use "cd c:\users\tom\documents", it
will change the current directory for the C drive, but you will still
be working on the G drive. If you then type "c:" you will switch to the
C drive, and would then already be in the documents folder that you
previously changed to.

Using the drive letter, or at least the beginning slash, is called an
absolute path, and it will work from anywhere in the filesystem. Not
using it, and just starting with a directory name, is called a relative
path, and only looks in the current folder for subdirectoires with that
name. So for example if you are in "c:\utils", and you type "cd
c:\docs\paul", you will go to the "paul" folder under "docs", which is
off the root of the C drive, but if you just typed "cd docs\paul", it
would try to go to the "paul" folder under the "docs" folder under the
"utils" folder, which you are currently in.

tbh, i think you have to skip the c: bit and plunge straight into the
rest of it. Also there is likely to be problems with the spaces so
you might have to use "escape characters" making it more like

cd "Documents and Settings"/"Joe B"/blah..

Actually, you can surround the whole path in a single set of quotes,
like so:

cd "Documents and Settings/Joe B/blah.."

Also, note that on windows it's "\" as directory separators, but on *nix
it's "/", although "/" does actually (usually) work on windows too.

or

cd Documents\ and\ Settings/Joe\ B/My\ Documents/html\ files

I just tried this, and it doesn't seem to work on windows, although it
does on *nix.

Paul

I wanted to say a big thank you to everyone on the list, especially Paul,
Tom and Brian.

I finally got the batch file to work.

My final code is:

for %%f in (*.htm) do (
"C:\Program Files (x86)\LibreOffice 4\program\soffice.exe" --headless
--convert-to html:"HTML" --outdir
"C:\Users\Joe\Clare\MCSLT\OutputHtmlFiles" %%f
)

I put the batch file in the directory that contains the .htm files I wanted
to convert, and it worked. Eureka. The batch file converted from the
crappy "Microsoft Word 2002 files saved as xml files with an .htm
extension" to normal .html files.

In my batch file above, I had to use the absolute path of soffice.exe to
get the batch file to work. I tried putting "C:\Program Files
(x86)\LibreOffice 4\program\soffice.exe" in my %path% environment variable,
so that I could simply call soffice.exe in the batch file, but this caused
Firefox to crash on startup! What a bizarre problem.
I confirmed this issue by removing "C:\Program Files (x86)\LibreOffice
4\program\soffice.exe" from the %path% variable, and rebooting my
computer. Voila, Firefox worked again.
To reproduce the problem, I then re-added "C:\Program Files
(x86)\LibreOffice 4\program\soffice.exe" to the %path%, and rebooted by
computer. Again Firefox would crash at startup.
I then removed "C:\Program Files (x86)\LibreOffice 4\program\soffice.exe"
once again from %path% and rebooted. This again fixed the problem so that
Firefox could startup without crashing.
I believe I added the entry correctly to %path%, using a semi-colon after
the last entry in %path% to separate my new entry from the last entry. I
have successfully modified the %path% variable before, by adding
"C:\Python34" and had no problems.

Anyway...maybe someone is interested in this apparent conflict between
soffice.exe in %path% and Firefox?

The important thing is, the batch script works thanks to all your generous
time and help

joe

Hi :slight_smile:
I've just got up and not even had my first cuppa tea yet! Paul has been
amazing! Good work there! I used to enjoy creating simple batch files but
Paul really takes it several plateau's higher.

I think the batch file just acts on whichever folder you happen to run it
from. You can "cd" into a new directory on the command-line and then
re-run the batch file. There are codes you can use to look for input from
the command-line and then use that as the pathname but by default it'll
just act in the folder you are currently in. It might be worth adding that
question to the Ask LO thread.

The tricky bit is knowing where to put the batch file so that Windows can
run it from anywhere. Maybe the root directory, C: ? It's not very
elegant and is probably bad practice security-wise but i don't know a
better answer.

Regards from
Tom :slight_smile:

Er, the answer to this - as you had previously explained - is actually "No".

Brian Barker