Hello,
This PDF file
<https://www.legifrance.gouv.fr/download_code_pdf.do?cidTexte=LEGITEXT000006074228&dlType=pdf>
has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.
Thank you.
Hello,
This PDF file
<https://www.legifrance.gouv.fr/download_code_pdf.do?cidTexte=LEGITEXT000006074228&dlType=pdf>
has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.
Thank you.
Hello Gilles,
Hello,
This PDF file
<https://www.legifrance.gouv.fr/download_code_pdf.do?cidTexte=LEGITEXT000006074228&dlType=pdf>
has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.
In order to create a PDF with a TOC/index you'll have to set heading styles to the appropriate paragraphs.
Opening a PDF with LibO won't go anywhere as the tool for that is Draw which can't set styles for a text processor.
I can't see a way to do that quickly, I'm afraid: a copy/paste from the PDF document to Writer is possible but you'll have to fix a lot of things (eg. useless carriage returns) and apply heading styles by hand. On a 400+ pages document this a big PITA.
Hopefully someone else will come with brighter ideas.
Bien cordialement,
You want brighter ideas? Say no more!
So... hmm... I'm afraid there won't be many fully-automated tools that can
build a TOC for you. A PDF basically contains a lot of individual elements,
that are arranged to look like something coherent.
From the document you linked, it could theoretically be possible to write a
tool that split every pages, grab the raw text, use a regex to find actual
titles, build a TOC, and inject it in the PDF. This would assume:
- Text extraction works correctly (it's not always the case with PDF)
- Titles always follow the same format
But on this kind of document, you could definitely get some acceptable
results. I experimented a bit. The output is here:
http://www.cjoint.com/c/GGjw0OtPkGc
And for the curious, the "script" I used is here:
https://pastebin.com/icQSZxQr
As you'll see, it is VERY specific to this document, but it is possible to
do something.
There is a round-about way of doing this using Nuance's PDF Converter, but
I have not used it since I abandoned Windows® several years ago. With the
PDF Converter, one can make a Word file which could be read by LO, then
use LO's Insert ToC tool and export the result back to PDF.
Gordon
Tauranga N.Z.
Thanks much everyone. I naively thought it could simply be done by converting
the PDF into text in LO, and run a few regexes to build a TOC :-/