Spell Check Dictionary

Mark_LaPierre · May 20, 2014, 11:55pm

Hey All,

I've noticed that LibreOffice does not remember the words that I add to
my dictionary when I open a new document that contains the same words I
just added in the previous document. Every new document does not
recognize the spelling of words that I have added when editing earlier
documents. It behaves as though it is adding the words to a dictionary
that is contained within the individual document instead of adding them
to a dictionary that is located in my home and referenced by all
documents even though I know this is not the case because the document
is in plain text format that I can open with any ordinary text editor
such as gedit, on the desk top, or vim in a shell.

If I open the same document again, the next day of example, the words
that I added to the dictionary previously when editing that same
document are not marked as misspelled, but if I open a new document the
words are not recognized there.

I've spent a couple of hours reading the LibreOffice Word manual and
searching with Google to no avail.

Is there a setting somewhere that I've missed that will allow the use of
a local dictionary? Is this a known bug?

CentOS release 6.5 (Final)

Linux mushroom.patch 2.6.32-431.17.1.el6.i686 #1 SMP Wed May 7 20:52:21
UTC 2014 i686 i686 i386 GNU/Linux

[mlapier@mushroom ~]$ rpm -qa | grep libreoffice
libreoffice-base-4.0.4.2-9.el6.i686
libreoffice-math-4.0.4.2-9.el6.i686
libreoffice-pdfimport-4.0.4.2-9.el6.i686
libreoffice-langpack-en-4.0.4.2-9.el6.i686
libreoffice-ure-4.0.4.2-9.el6.i686
libreoffice-graphicfilter-4.0.4.2-9.el6.i686
libreoffice-calc-4.0.4.2-9.el6.i686
libreoffice-impress-4.0.4.2-9.el6.i686
libreoffice-emailmerge-4.0.4.2-9.el6.i686
libreoffice-xsltfilter-4.0.4.2-9.el6.i686
libreoffice-core-4.0.4.2-9.el6.i686
libreoffice-draw-4.0.4.2-9.el6.i686
libreoffice-4.0.4.2-9.el6.i686
libreoffice-opensymbol-fonts-4.0.4.2-9.el6.noarch
libreoffice-report-builder-4.0.4.2-9.el6.i686
libreoffice-writer-4.0.4.2-9.el6.i686
libreoffice-pyuno-4.0.4.2-9.el6.i686
[mlapier@mushroom ~]$

jfn · May 21, 2014, 4:28am

Mark,

you didn't acknowledge my answer to your previous similar question in
this mailing list (05/18).

Is there a setting somewhere that I've missed that will allow the use of
a local dictionary? Is this a known bug?

Here's what I suggested:
8< ---------------------------------------
Have you ticked the following checkbox: Tools > Options, Language
Settings > Writing Aids, User-defined dictionaries, Standard [All]?

This done, LibO should behave
--------------------------------------- >8

HTH,

Urmas · May 21, 2014, 5:08pm

"Kracked_P_P---webmaster":

I might suggest he try the en_US dictionary that contains over 797 thousand words in its list,

That dictionary contains just 476898 words actually.

Tom_Davies1 · May 21, 2014, 7:21pm

Hi
It's interesting that i believed it until i saw who posted it. Now i have
no idea but think it's unlikely. I could believe the US trying to dumb
things or be less confusing by removing words so that people have fewer to
choose from.
Regards from
Tom

Mark_LaPierre · May 21, 2014, 11:51pm

DONE

Mark_LaPierre · May 22, 2014, 12:37am

English sucks as a language anyway. It's a conglomeration of words
grafted on from many other real languages that mostly still adhere to
the rules of the original language. The result is that English has no
consistent rules without the ever present, "Except", word. This
paragraph contains one of the prime examples. I almost all cases adding
apostrophe "s" on the end of a word denotes ownership, i.e. Tom's car,
but to indicate ownership with the word it the 's' is added without the
apostrophe. Of course its could also indicate multiple quantities of its.

Then there are words like disgruntled. Has anyone ever been gruntled?

Then too as in also, two as in one more then one, and to as in where you
are going. There's lead as in the heavy metal, lead as in being shown
the way, lead as in showing the way.

Keith_Bates · May 22, 2014, 1:19am

An anti-English troll- that's a new one for this list.

I can't say that I've studied every language in the world, but I did study French, New Testament Greek and Ancient Hebrew. Guess what? They ALL have weird rules, exceptions and strange words.

This would be due to the fact that languages are mostly used by humans who can be a little bit creative.

I studied some rigidly conformist languages but they were rather dull. As far as I know there is no equivalent for "I love you" in BASIC, FORTRAN or C++

Keith- whose name disproves the i before e rule

Brian_Barker · May 22, 2014, 1:33am

In almost all cases adding apostrophe "s" on the end of a word denotes ownership, i.e. Tom's car, ..

With nouns and proper nouns, yes. (Actually grammatical possession, not ownership: Tom may own Tom's car but Tom does not own Tom's home town!)

... but to indicate ownership with the word it the 's' is added without the apostrophe.

That's no exception: "it" is not a noun but a pronoun. You would no more put an apostrophe in the corresponding possessive pronoun "its" than you would write m'y our you'r or hi's or he'r or ou'r or thei'r!

Of course its could also indicate multiple quantities of its.

No: two its are a them.

Then there are words like disgruntled. Has anyone ever been gruntled?

No, but they have gruntled - that is, made little grunts. And dis- here is an intensifier, not a negator.

Then too as in also, two as in one more then one, and to as in where you are going.

Since when have homophones been a problem?

There's lead as in the heavy metal, lead as in being shown the way, lead as in showing the way.

Since when have homographs been a problem? (Oh, and that middle example should be "led" anyway"!)

Brian Barker

Virgil_Arrington · May 22, 2014, 11:05am

I'm reminded of the sentence, "Write a letter to Mrs. Wright, right now."

Virgil

Urmas · May 22, 2014, 4:09pm

"Kracked_P_P---webmaster":

There are 797866 lines in the .dic file with the top one the number of words.

Due to the author's error, it is shipped unmunched. In the proper form it contains 476898 entries, probably even less if some wordforms are missing. That is close to 70% misrepresentation.

krackedpress · May 23, 2014, 2:52pm

What do you mean by the term "unmunched"? Never heard of that term in relation to a .dic file.

I explained before that each form of a word is truly a word of its own, so the figure is correct.

Mark_Bourne · May 23, 2014, 7:24pm

"Kracked_P_P---webmaster":

There are 797866 lines in the .dic file with the top one the number
of words.

Due to the author's error, it is shipped unmunched. In the proper form
it contains 476898 entries, probably even less if some wordforms are
missing. That is close to 70% misrepresentation.

I don't know how spell-check dictionaries are usually compared but, to me, it would make sense to count each form as a separate word. It may be more efficient in use to compress the dictionary into a smaller number of entries, but if there's a single entry encoding 4 forms of the same root word, I'd count that as 4 words. Otherwise, a dictionary containing 100000 words but only the root word of each would seem just as good as a dictionary containing the same 100000 root words plus all the variations encoded into each entry.

Kracked_P_P---webmaster wrote:

What do you mean by the term "unmunched"?

munch
/mʌntʃ/
verb (used with object)
1. to chew with steady or vigorous working of the jaws, often audibly.
...
Related forms
un·munched, adjective

(http://dictionary.reference.com/browse/unmunched - I didn't swallow the dictionary, munched or otherwise)

Never heard of that term in relation to a .dic file.

Since a .dic file doesn't strike me as being particularly tasty, nor useful after chewing, perhaps we should be glad that it is unmunched.

(FWIW, neither LibreOffice nor SeaMonkey recognises 'unmunched'...)

Mark.

krackedpress · May 24, 2014, 12:19am

"Kracked_P_P---webmaster":

There are 797866 lines in the .dic file with the top one the number
of words.

Due to the author's error, it is shipped unmunched. In the proper form
it contains 476898 entries, probably even less if some wordforms are
missing. That is close to 70% misrepresentation.

I don't know how spell-check dictionaries are usually compared but, to me, it would make sense to count each form as a separate word. It may be more efficient in use to compress the dictionary into a smaller number of entries, but if there's a single entry encoding 4 forms of the same root word, I'd count that as 4 words. Otherwise, a dictionary containing 100000 words but only the root word of each would seem just as good as a dictionary containing the same 100000 root words plus all the variations encoded into each entry.

Kracked_P_P---webmaster wrote:

What do you mean by the term "unmunched"?

munch
/mʌntʃ/
verb (used with object)
1. to chew with steady or vigorous working of the jaws, often audibly.
...
Related forms
un·munched, adjective

YES, I heard of the term, but not used in the dictionary file.

(http://dictionary.reference.com/browse/unmunched - I didn't swallow the dictionary, munched or otherwise)

Never heard of that term in relation to a .dic file.

Since a .dic file doesn't strike me as being particularly tasty, nor useful after chewing, perhaps we should be glad that it is unmunched.

(FWIW, neither LibreOffice nor SeaMonkey recognises 'unmunched'...)

The mean that mine is "munched", since it seem to work just fine for me.
With my 797K dictionary enabled, it has checked itself, just fine. So the words that are in it that is not in the default en_US works.

Mark.

Here are the file sizes of several version of the en_US .ext file[s].

spell checking words, thesaurus, and hyphenation file size based on words - "unmunched".

98,000 words - 5.5 MB
217,000 words - 5.8 MB
390,000 words - 6.2 MB
797,000 word - 6.8 MB file
3 million words - 11 +/- MB file - experimental and not published.

The file size is shown on a Linux Mint "Caja" file manager.
I am not the only one who produced a 5-7 MB .ext file for the spell checking, etc., system.