New version of Spanish dictionaries

Hi Ricardo,

Hi,

My name is Ricardo Palomares and I'm part of the currently active team
of RLA-ES [1], the project to maintain a free spelling dictionary of
Spanish in different variants. It's has been a long time since our
last release 0.6, in April 2012 [2].

We have the new 0.7 version available for downloading [3] and we were
told around a year ago that we should email to these addresses in
order to get LibreOffice Spanish builds to ship the dictionaries. I
hope our information is still good. :slight_smile:

I'll push your update to LibreOffice repo, but there is a problem. The
character encoding of th_es_ES_v2.dat is corrupted, it is not
ISO-8859-1 as the header states, nor it is UTF-8. For example:
al㠼1
-|Dios|Todopoderoso|Jehová|Altísimo

The problem is that first byte of UTF-8 sequence is changed from 'Ă'
to 'ă' at many places. Maybe I can fix this, maybe not, but it would
be better, if you fixed it, and release 0.7.1 version.

Best regards,
Andras

Hi :slight_smile:
So a search&replace of

.  ă

to be replaced by

.  Ă

or is it in other places too?
Regards from
Tom :slight_smile:

If you can wait before uploading, I'll fix it on our side.

Thanks for reporting.

This should be fixed now. In case other teams are using the same
source of thesaurus than us, the encoding problem was already in the
file downloaded from:

http://openthes-es.berlios.de/

(it has equivalent URLs for English and German). As the file came
wrong, and I'm new with this matters, I thought it was normal. It
should be fixed now (I have a PHP script that fixes that and other
problems with the built files in that site).

The new files have been uploaded again to:

http://forja.rediris.es/frs/?group_id=341

Thank you for reporting this.

Hi Ricardo,

This should be fixed now. In case other teams are using the same
source of thesaurus than us, the encoding problem was already in the
file downloaded from:

http://openthes-es.berlios.de/

(it has equivalent URLs for English and German). As the file came
wrong, and I'm new with this matters, I thought it was normal. It
should be fixed now (I have a PHP script that fixes that and other
problems with the built files in that site).

The new files have been uploaded again to:

http://forja.rediris.es/frs/?group_id=341

Sorry, I found something again. After letter 'í' there is always a
soft hyphen character (0xAD). I think it is a mistake.

Best regards,
Andras

This should be fixed now. In case other teams are using the same
source of thesaurus than us, the encoding problem was already in the
file downloaded from:

http://openthes-es.berlios.de/

(it has equivalent URLs for English and German). As the file came
wrong, and I'm new with this matters, I thought it was normal. It
should be fixed now (I have a PHP script that fixes that and other
problems with the built files in that site).

The new 0.7 files have been uploaded again to:

http://forja.rediris.es/frs/?group_id=341

Thank you for reporting this.

No problem, I'm really sorry to cause you so much trouble and I'm very
grateful for your patience and diagnostics. It is a sum of encoding
problems: the original file from openthes-es.berlios.de is wrongly
encoded and carries on errors in the syntax, as you let us know with
version 0.6. I wrote a PHP script to fix the syntax errors, but PHP
handles the encoding based only on the encoding of the PHP script
itself. Finally, it seems that some characters are displayed
differently on Linux than the regular ANSI set I found in this page:

http://www.alanwood.net/demos/ansi.html

This time I've revised the PHP script, so instead of using something like:

$text = str_replace("³", "ó", $text);

I've used:

$text = str_replace(chr(0xC3) .chr(0xB3), chr(0xF3), $text);

for every escape sequence I've been able to find. I must admit that I
still can't get synonym suggestions for words that I know for certain
that are included in the thesaurus, but I didn't get any with version
0.6, either.

Anyway, the files are again uploaded at the usual place:

http://forja.rediris.es/frs/?group_id=341

Thanks again. I hope to have got it right this time. :slight_smile:

Hi Ricardo,

http://forja.rediris.es/frs/?group_id=341

Thanks again. I hope to have got it right this time. :slight_smile:

Almost! :wink:
Now letter 'ú' became letter 'ó' in words. E.g. adúltero -> adóltero,
baúl -> baól, túnel -> tónel etc.

Regards,
Andras

This is embarrassing! :frowning: Thanks for noticing.

In the meanwhile, as I kept getting no result for synonyms, I've been
debugging the PHP script with a sample dat file and I've caught some
more errors that, this time, are fixed now. LibreOffice does offer
synonyms at last!! (Of course, it was not fault of LibreOffice, but of
errors in the dat and index files).

Hopefully, this time will be the definitive. Here are, again, the files:

http://forja.rediris.es/frs/?group_id=341

Thank you for your support and patience. This is my first time
building and testing LibreOffice/OO.org dictionary extensions (I came
to RLA-ES through my involvement in Mozilla) and I've hit almost every
rough edge in the release process. I sincerely think that forthcoming
dictionary versions will be a lot smoother. :slight_smile:

Pushed, thanks.

Andras

Yet another question: I visited this URL some days ago (last Friday,
IIRC):

http://extensions.libreoffice.org/@@hosting-your-extension

asking for a credential so I could be added as a member of the Spanish
dictionaries extension:

http://extensions.libreoffice.org/extension-center/spanish-dictionaries

but I haven't got any answer so far. Is there any alternative
procedure to get the credentials when I'm not trying to publish a new
extension?

TIA

PS.: BTW, we've found a duplicate project for Spanish dictionaries:

http://extensions.libreoffice.org/extension-center/diccionario-espanol