Creating a dictionary with libreoffice from a simple TXT-file

Constantine · December 25, 2014, 4:35pm

Hi J.A. de Vries,

thank you for your comments.
You are of course right. I do work with and on linux for about 15-20 years.
I agree, editors are a matter of taste. That is not my problem though, there
are so many for linux to chose from.
My problem was/is with regexpress'. I couldn't get into it again. Last time
I worked intensively with those was so long ago, that I forgot everything
about them. And now that I needed them, I couldn't get in into them again.
Especially after working for over 20 hours non-stop. Actually, I finished
this morning at 10:00 creating these glossaries, that means 30 hours
non-stop.
I did try to learn and understand them again, but I just couldn't get the
feeling of it again.
Since four years, I am a widower and I don't like celebrating holidays
anymore. At least for the moment, hopefully that will change in the near
future.
So, I decided to start working on the glossaries, something I intended for
years and my colleagues kept asking about them since I shared my thoughts
with them about what we could do with the material we gathered over the
years.
Now the work is done. I tested the glossaries with OmegaT and they work
perfectly. In total I've got more than 50.000 entries with in fact even more
definitions.

To krackedpress,
I am not sure I understand your question/comment:

"WoW
where do you get an "easy" documentation file that has this type of
search parameters and "coding"?"

But as I already mentioned, we gathered them over the years in various
forms. Like notes in text-files, scanned docs written on typewriter
converted with an OCR-software, even handwritten and many open-source
sources.
So yes, I did everything manually and I am very happy and relieved that is
done now.

I must thank Brian again for his help at the very right moment. He is right,
he practically has done the job for me and it was like a Christmas present.
I would probably (or better said theoretically) come to the same (or
probably similar) solution, but only after reading manuals for at least 2-3
days and experimenting for even much-much longer.
As I said, I was too exhausted and too much concentrated on the content of
the material, to be able to think clearly about the correct usage of
regexpress' and get the feeling for them again. Besides, I repeat, in this
task the most important think was to assure the quality and correctness of
the content.

Constantine · December 25, 2014, 4:55pm

After a lot of responses how to do this in Writer,
a shortnote how to do this in Calc.....

Open the textfile, when the 'Text import' wizzard is show do:
1) Select characterset 'Unicode (UTF-8)'
2) Separater options: 'separated by', check 'Tab' and 'Space', other
options should not be checked.
3) at 'Text delimiter' type a space
4) klik 'OK'

5) Insert a column B, and fill it with a semi-colon ';'

6) Klik save-as, type a name, and check 'Edit filter settings'
7) The Export Text file' wizard should be shown.
8) Character set: 'Unicode (UTF-8)'
9) Field delimiter: space ' '
10) Text delimiter: <empty> ''
11) checkboxes: only leave 'Save cell content as shown' checked.....

Hi Luuk,

I am afraid this doesn't work. I thought of it myself and also tried it at
the beginning of my work.
As I said, Terms consist of 2-5 words, so when using space as separator
there is no way to insert a column (especially B) for the semicolon. Besides
definitions are sometimes so long with so many spaces, that calc reports not
being able to create enough columns for the whole content.

The correct and professional way is what Brian suggested and I was looking
for.
Now I can use these expressions in the future too because the need for their
usage occurs very often in mine kind of work.

Felmon_Davis · December 25, 2014, 6:43pm

did I misread the thread? I thought the solutions Brian produced _did_ use regular expressions only within the context of LO.

not a big reg exp man myself but I probably would have attempted it with 'sed' which is also an 'editor' I guess.

subsequently Constantine points out he was exhausted and I very much know what it means to be stuck barking up the wrong tree! after a bit they all look the same.

F.

Luuk · December 26, 2014, 10:11am

After a lot of responses how to do this in Writer,
a shortnote how to do this in Calc.....

Open the textfile, when the 'Text import' wizzard is show do:
1) Select characterset 'Unicode (UTF-8)'
2) Separater options: 'separated by', check 'Tab' and 'Space', other
options should not be checked.
3) at 'Text delimiter' type a space
4) klik 'OK'

5) Insert a column B, and fill it with a semi-colon ';'

6) Klik save-as, type a name, and check 'Edit filter settings'
7) The Export Text file' wizard should be shown.
8) Character set: 'Unicode (UTF-8)'
9) Field delimiter: space ' '
10) Text delimiter: <empty> ''
11) checkboxes: only leave 'Save cell content as shown' checked.....

Hi Luuk,

I am afraid this doesn't work. I thought of it myself and also tried it at
the beginning of my work.
As I said, Terms consist of 2-5 words, so when using space as separator
there is no way to insert a column (especially B) for the semicolon. Besides
definitions are sometimes so long with so many spaces, that calc reports not
being able to create enough columns for the whole content.

The fact that Terms consist of 2-5 words is not in your first post....

i just found it in your post from 'Wed, 24 Dec 2014 16:11:38 -0700 (MST), where you say that this is 'not important'....

Indeed there where terms with two words or something like this:

a.D. (außer Dienst) εκτός υπηρεσίας, εν αποστρατεία

where "(außer Dienst)" belongs to "a.D."
Which is not bad because "(außer Dienst)" in the second field is more
usefull to me (us).

For the fewer cases with two or more german words at the beginning, well, I
think we will survive that and be able to correct it manually.

The correct and professional way is what Brian suggested and I was looking
for.
Now I can use these expressions in the future too because the need for their
usage occurs very often in mine kind of work.

The correct, and professional way is certainly NOT to store these thing in a DOC, a database would be far more suited.

A slight variation of my approach will also work in Calc. Because you seem not interested in this solution, i will not share it here.