diff of 2 docx files

Hello.

Is it possible to have a diff between 2 .docx files in libreoffice? Or with any other tool, I do not really mind...
I am not fond of office suites, but at my work they sent me an outdated document describing what I have to do, and when I asked about some "details" ( or errors, in fact ) in it, they sent me a different ( but still with errors... ) version of the same document.
With a simple text file, I could have made a diff ( lot of tools for that ) but I have no idea about how to do it with docx files. I have obviously tried to unzip them, and use diff and meld on resulting files, but those files are so messy, that I thought that, maybe, they are encrypted of whatever!
Now that I think about it, I could do a diff of files made through a copy/paste of the text... but that would be quite dirty.

Thanks for any suggestions.

PS: I did not registered to this list, so please add me in CC.

If you write the files out as .fodt (Flat XML) files from within LO, you will have straight XML files to compare.

Beyond that, you could get the tika-app.jar from the Apache Tika project, which will let you extract plain text from the .fodt files and directly from the .odt files.

The problem is that I am not the author of those files. But you are true, I could simply copy the text in a file of better format... I should have thought about that myself btw... so obvious.

Thanks.

i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document).

regards,

som

i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document).

regards,

som

As has been suggested, try the Compare Document facility:
o Open the newer document.
o Go to Edit | Compare Document... .
o Browse to and insert the older document.

You can accept or reject the identified changes, or cancel the dialogue if you wish merely to view them.

I trust this helps.

Brian Barker

Similarly, you could use the command terminal:

libreoffice -convert-to fodt /path/to/m$file

Then apply your diff tool. In addition, you could then use xslt to
extract the elements/text of the fodt files.

If you write the files out as .fodt (Flat XML) files from within LO, you
will have straight XML files to compare.

Beyond that, you could get the tika-app.jar from the Apache Tika
project, which will let you extract plain text from the .fodt files and
directly from the .odt files.

Make that ".docx files."

Hello,
     For some reason, on my Windows PC, the "libreoffice" command is not present. So, if you choose to use the command terminal, and "libreoffice" is not recognized, the following command should work:
soffice --convert-to fodt path\to\file.docx
path\to\file.docx would be the path to the docx file. The path may use / instead of \ depending on the operating system.
The command "libreoffice" might work for you, though. I guess it depends on the installation, operating system, PATH variable configuration, or some other factor.

Regards,
xmlhttprequest.open@gmail.com

I did it. But... well, when I speak about a comparison, I also speak about a good way to immediately notice the differences. Take, for example, the command-line tool diff, or graphical meld, winmerge ( useful when I have to use windows... ).
Even a one char change is obvious.

I just retried it, to be sure. The changes between the 2 documents were... say, almost inexistent, and so minor ( and they named the document v2 and v3... those people just changed the version number... pfff... or maybe "yet another error" I guess. No comment. ) which is probably why I did not noticed them when selecting the changes the first time. But now at least I know how libreoffice show comparisons.
It may be the more effective for that kind of work ( working on formated text ) , I do not know, but I think that this dialog box is not very explicit*. But maybe it's only a question of habit from a programmer point of view, and one which likes a lot old tools like terminals and ncurses applications.
For example, a small text to explicitly explain what are the lines, or a help button, tooltips, I do not know what. But something that even dumb people could be able to notice. After this failure, I tried on the web and in the doc but was not able to find anything anew ( probably a bad choice of keywords ). I ended by decompressing the files to compare those damned xml things by hand...

But thanks for the help anyway. Problem solved.

*: in my situation, a very minor difference ( "Version 2.0" became "Version 3.0" ) in the documents, on the 1st page - so, no automatic move to the change which would have gave me a hint about what to take a look at - and using it for the 1st time made me not understanding at all what were those 2 lines "insertion" and "removal".

Sounds like an interesting solution, I'll think about it next time those guys send me their junk.

To be honest, I did found that feature without asking here. But the changes were so minor* that I did not understood that this tool showed them, since the results were not obvious at all and the visual changes almost imperceptible to a user which is not used to GUIs.
Maybe one could make the hint more obvious, for example by replacing the selected background color by one which jumps more to eyes, or by showing some circles or something bigger around the change when it is too small? The dialog box itself could be enhanced, by being more explicit ( there are no bubble tooltips, no help button and no description text for now ).
Those are just some ideas to enhance your tool. I do not mind a lot about this, because I have to admit that I really do not like using office suites, and my activities rarely imply me to edit some of them.

Problem solved anyway, thanks to have shared some time for this.

*: at the bottom of the first page ( so, no cursor's move, and highlight in an area which does not jump to eyes ) the author changed a '2' to make it a '3' ( version change...yes...sounds a bit like new firefox version system :slight_smile: ). Note that the quantity of changes in the xml files were quite impressive, the kind of dirty results which makes me more and more convinced that xml is over-used. And that it becomes even worse when Microsoft tries to use it. Can you believe that with that so minor change, compressed files have a delta size of 66 bytes? One char => 66 bytes compressed! So inefficient... but not your fault, ms is responsible for that.