Importing HTML in LO 4.2

Has anyone experienced this new problem (or is it a feature?) in 4.2?

In previous versions of LO, opening an html file would give you the file as would be seen on a web browser.

In 4.2 it is the actual html code that is displayed. This behaviour happens when opening any html document or inserting an html file into an open LO document.

Does anyone know of a way around this?

I use moneydance to track finances, and it allows reports to be saved in html. Producing printed reports used to be a matter of simply opening the Income and Expenses report in LO, inserting the Account Balances at the end of the report, then edit titles, styles etc at will.

I have discovered that I can open the html files in Firefox, then copy and paste into LO, but that does seem unnecessarily complicated, when it used to just work in a single step.

Hello,
     In LibreOffice 4.2, there is an option under the View menu in LibreOffice Writer/Web labeled "HTML Source." Be sure that that item is unchecked. Also, if that doesn't fix the problem, try going into LibreOffice, selecting File>New>HTML Document, and once in Writer/Web, go to File>Open and select the HTML file. For me, HTML files load in LibreOffice 4.2.0.4 correctly. Did this fix the issue?

Regards,
xmlhttprequest.open@gmail.com

Hi,

No, I made sure that the "HTML source" item was not selected. I've tried a few different files generated by different sources. Generally they behave as I described, although I did find one that opened correctly. I have no idea what the difference is between those that work and those that don't.

Opening them from a blank html document makes no difference. In fact the title bar just says LO Writer not Writer/Web.

If I start with a blank html document and use Insert-File it shows up as the source.

I'm using the linux version of LO. Are you using Windows or Linux?
Keith

Hello,

Hi,

No, I made sure that the "HTML source" item was not selected. I've tried a few different files generated by different sources. Generally they behave as I described, although I did find one that opened correctly.

What kind of files did you try that were generated by different programs? Which ones worked? Do all the files have the .html extension, or are some .xhtml, etc? Do you know if there is a certain pattern in the source code of each file that causes LibreOffice to display the source? When I tried loading an HTML file into LibreOffice 4.2, it worked correctly. If the files do not contain any sensitive, personal, or otherwise information that you do not want to publish, you can send the list and I the HTML files you tried. However, *E-Mail attachments are not accepted by the LibreOffice mailing list server*. If you use the mailing list via Nabble, I think you can upload a file to the server. If you use an E-Mail client to use the mailing list, you can use something like Sendspace (http://www.sendspace.com/).

I have no idea what the difference is between those that work and those that don't.

Opening them from a blank html document makes no difference. In fact the title bar just says LO Writer not Writer/Web.

That isn't right! Maybe another thing you can try is a clean install of LibreOffice (with all of the settings reset). If you urgently need a temporary solution, and nothing works, you can download LibreOffice 4.1.4 from: http://mirror.nexcess.net/tdf/libreoffice/stable/4.1.4/ Just select the type you want (I don't know if you prefer the RPM or the Debian package).

If I start with a blank html document and use Insert-File it shows up as the source.

I'm using the linux version of LO. Are you using Windows or Linux?

I am using Windows Vista, and sometimes Windows 8 (mostly Vista). If you are able to send the HTML files (at least one that works and one that doesn't) to the list and I (that doesn't contain any personal data), I might be able to examine the file's source and find the route of the bug. Then, I can file an official bug that explains the issue. I don't know if the issue is operating system-specific or not.

Keith

I hope this helps!

Regards,
xmlhttprequest.open@gmail.com

NOTE: If you received this e-mail twice, I apologize. It doesn't look like Nabble, the mailing list interface, picked it up. So, I am resending it to the list.
Hello,

Hi,

No, I made sure that the "HTML source" item was not selected. I've tried a few different files generated by different sources. Generally they behave as I described, although I did find one that opened correctly.

What kind of files did you try that were generated by different programs? Which ones worked? Do all the files have the .html extension, or are some .xhtml, etc? Do you know if there is a certain pattern in the source code of each file that causes LibreOffice to display the source? When I tried loading an HTML file into LibreOffice 4.2, it worked correctly. If the files do not contain any sensitive, personal, or otherwise information that you do not want to publish, you can send the list and I the HTML files you tried. However, *E-Mail attachments are not accepted by the LibreOffice mailing list server*. If you use the mailing list via Nabble, I think you can upload a file to the server. If you use an E-Mail client to use the mailing list, you can use something like Sendspace (http://www.sendspace.com/).

I have no idea what the difference is between those that work and those that don't.

Opening them from a blank html document makes no difference. In fact the title bar just says LO Writer not Writer/Web.

That isn't right! Maybe another thing you can try is a clean install of LibreOffice (with all of the settings reset). If you urgently need a temporary solution, and nothing works, you can download LibreOffice 4.1.4 from: http://mirror.nexcess.net/tdf/libreoffice/stable/4.1.4/ Just select the type you want (I don't know if you prefer the RPM or the Debian package).

If I start with a blank html document and use Insert-File it shows up as the source.

I'm using the linux version of LO. Are you using Windows or Linux?

I am using Windows Vista, and sometimes Windows 8 (mostly Vista). If you are able to send the HTML files (at least one that works and one that doesn't) to the list and I (that doesn't contain any personal data), I might be able to examine the file's source and find the route of the bug. Then, I can file an official bug that explains the issue. I don't know if the issue is operating system-specific or not.

Keith

I hope this helps!

Regards,
xmlhttprequest.open@gmail.com

I've produced a document that doesn't work with LO. It is at https://www.dropbox.com/s/mbylcso4r2qm2r2/test.html

Strangely it does open in Abiword.

Earlier documents produced by moneydance as recently as last month open fine in LO. I have not upgraded moneydance in that time.

Having quickly opened about 20 html's the problem is limited to those produced lately by moneydance.

I haven't had time to look closely at the html source. I suppose it's possible that LO is now being more strict in its interpretation.

The work around in the short term might be to use Abiword.

I use 4.1.4 on Ubuntu
I only had View / Print Layout, Web Layout when viewing an HTML file.

Then I played around with the LO Writer and Writer/Web options, plus the Internet options.
After I added the "browser plug-in optional check-box, the page now had View/HTML_Source.
I did not save the HTML file, since I did not want any changes to the web page.
Now that HTML file opens and always shows the View HTML_source, but then opening a second HTML file along with the first one, afer I unchecked everything I thought I changed, it opens without the HTML_Source option. Two files opened at the same time and one has the View HTML_Source and the other does not.

SO, why is one now showing that HTML source option and the other does not. The HTML_Source one did not originally have that option till I did some check-box adding. So something changed to see the HTML source code in Writer for the original and not for the next one.

YES, it is weird.

Well, now I just opened the first file and it is no longer showing the HTML_Source view option. Weirder yet.

It might have something to do with the "Field Code" check-box in Writer/Web, and the Internet / Browser Plug-in / Display documents in browser check-box. Right now, I have the field code check-box checked in Writer, but not in Writer/Web.

So do you have the Field Code check-boxes checked in one or both Writer/Web view and Writer view options, and the browser plug-in checked?

Why you see the view source option for one file and then not for another is a mystery. Was there some "script" tagged with the original file for a bit so it would show the HTML source view but not tagged with the other file. Then get "untagged" with the first one after playing around again with the check-boxes during the viewing of the file, so the LO shut down and then opening the first page again after restarting LO.

SO, this source viewing and non source viewing of the HTML files/pages is in Ubuntu's 4.1.4 [64-bit] version and not just in 4.2.0.4.

ALSO, it seems to open differently with how the page is opened. Close all your LO documents. This will display the option showing Text Document, Spreadsheet, Presentation, etc.. Also it shows the "Open" option. I just found out that if I press the "Text Document" option to open the HTML file, it does not show the HTML Source viewing option. Yet, if I open the same file using the "Open" option, instead of the "Text Document" option, I get to see the HTML Source viewing option.

So how are you opening your files? Open Text Documents? Or, just the "Open" option? It does seem to make a difference in 4.1.4.2. It may have the same affect in 4.2.0.4.

[hopefully you are not confused by my descriptions, since it is about 3:30 in the morning here. Got up to see the snow storm that hit here a few hours ago and now predicted to be in full force.]

Hi :slight_smile:
Sorry! 2 waggle-the-wires 'answers'.

Have you tried getting to the libreoffice splash-screen (ie with no
documents open) and then dragging the html file into the grey area
where a document would normally be showing?

Also have you tried using
File - Open
and then navigated to the html file to see if that opens it?

Regards from
Tom :slight_smile:

I would say yes to that.
Since LO seems to be used as a web editor - i.e. LibreOffice Writer/Web options - we should not need to remove the DOCTYPE tag from the HTML files to get it to work.

Uploading the document to the W3C validator (http://validator.w3.org/) it appears the DOCTYPE is not valid. It should be:
-//W3C//DTD HTML 4.0 Transitional//EN
(the file is missing "//EN" from the end).

The original document doesn't fail with the version of LO I have, so cannot check, but try correcting the DOCTYPE. If that allows the file to open as expected, it appears the problem is with moneydance producing invalid output, rather than with LO.

Arbitrarily changing the DOCTYPE to indicate HTML5 may well also fail, as the rest of the document is not valid HTML5.

Mark.

null wrote:

To edit HTML files, I tend to use Kompozer, an WYSIWYG HTML editor, or just a plain text editor.

Hello,

Uploading the document to the W3C validator (http://validator.w3.org/) it appears the DOCTYPE is not valid. It should be:
-//W3C//DTD HTML 4.0 Transitional//EN
(the file is missing "//EN" from the end).

Not only that; there should be a reference to http://www.w3.org/TR/html4/loose.dtd at the end of the DOCTYPE section! The HTML DOCTYPE is messed up.

The original document doesn't fail with the version of LO I have, so cannot check, but try correcting the DOCTYPE. If that allows the file to open as expected, it appears the problem is with moneydance producing invalid output, rather than with LO.

Unfortunately, the corrected syntax doesn't fix a thing. I think the LibreOffce HTML parser might not handle DOCTYPEs correctly? It worked in previous versions. If I were to put a comment like <!-- Hello world --> or something else into the HTML file (DOCTYPE removed), the parser recognizes it. Even inserting <!THISISNOTADOCTYPE> works. It is just when there is a DOCTYPE, even with correct syntax, that the LibreOffice HTML parser is tripped.

Arbitrarily changing the DOCTYPE to indicate HTML5 may well also fail, as the rest of the document is not valid HTML5.

Mark.

Regards,
xmlhttprequest.open@gmail.com

Hello,
     I just discovered something. If the DOCTYPE of the HTML file is not the topmost line (line 1), LibreOffice renders the file fine (instead of showing the source). However, If there is a blank line before the DOCTYPE (like the HTML file Keith uploaded), the HTML parser is tripped and shows the source. So maybe LibreOffice expects the DOCTYPE to be on line 1, and if not on line 1, panics and doesn't render the HTML. But still, LibreOffice shouldn't have trouble rendering the file, even if the person who wrote the HTML (or the generator) didn't follow every strict standard. Web browsers know how to adapt to that kind of stuff.
So, the file that Keith uploaded renders successfully in LibreOffice 4.2 if the <!DOCTYPE ...> syntax is corrected to:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
, and the DOCTYPE is on line 1. Is anyone able to confirm this?
Not to mention the unclosed <P> (paragraph) tag in the HTML file (luckily that didn't seem to influence whether Writer/Web rendered the page correctly).

Regards,
xmlhttprequest.open@gmail.com

Hi,

I've just tried all of those changes and none of them work.

Additionally, I've tried opening from a blank Writer/Web document, from file manager, from the opening screen- all with the same result.

Just for reference, as we seem to be getting very different results my version of LO reports as Version: 4.2.0.4 Build ID: 420m0(Build:4)

Keith

Hello,

Hi,

I've just tried all of those changes and none of them work.

Does this file work?: https://doc-0g-8s-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/8ka37k9nojan5h3m348ambggvqd7vnq8/1391637600000/12418654871184392135/*/0B-6nu1NEycJ0bTMwQWppVUtaZEU?h=16653014193614665626&e=download
It is a revised version of the Moneydance file you sent me. It renders fine in LibreOffice. The reason I am asking you to try so many things is so that I can attempt to gather enough information to file a bug.

Additionally, I've tried opening from a blank Writer/Web document, from file manager, from the opening screen- all with the same result.

Just for reference, as we seem to be getting very different results my version of LO reports as Version: 4.2.0.4 Build ID: 420m0(Build:4)

Mine reports as:
Version: 4.2.0.4
Build ID: 05dceb5d363845f2cf968344d7adab8dcfb2ba71

Keith

Regards,
xmlhttprequest.open@gmail.com

Yes!! It works!

I see that as well as the other changes relating to the DOCTYPE line you've added a <html> tag.

Keith

Hello,

Yes!! It works!

I see that as well as the other changes relating to the DOCTYPE line you've added a <html> tag.

Well, actually, there was an HTML tag in the file before. You may not have seen it because the source was crammed into one line. I think I have enough information to file a bug!

Keith

Congrats and regards from
xmlhttprequest.open@gmail.com

Hello,

Yes!! It works!

I see that as well as the other changes relating to the DOCTYPE line you've added a <html> tag.

Well, actually, there was an HTML tag in the file before. You may not have seen it because the source was crammed into one line. I think I have enough information to file a bug!

Thank you for helping with this. Feel free to use the file as an example if needed.

I hope that the devs can fix this soon,

Keith

Hello,

Hello,

Yes!! It works!

I see that as well as the other changes relating to the DOCTYPE line you've added a <html> tag.

Well, actually, there was an HTML tag in the file before. You may not have seen it because the source was crammed into one line. I think I have enough information to file a bug!

Thank you for helping with this. Feel free to use the file as an example if needed.

Here is the official bug report: https://www.libreoffice.org/bugzilla/show_bug.cgi?id=74595

I hope that the devs can fix this soon,

Thanks for reporting the issue. Now that it has been reported, it might be fixed.

Keith

Regards,
xmlhttprequest.open@gmail.com