Sort Order in Calc

Dear All,

Our Language, Sidama, uses the latin script. However, there are additional
consonants which are formed by using two letter combinations. Eg ph is
considered one consonant and so are sh, ch, zh, ts, ny. These constants are
ordered after the first constant used to create them. "Our alphabet" reads,
a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc

As a result, we want to sort string using this order. eg the words
cala, chala, cola should be sorted cala, cola,chala... ch has to come after
c.

So, how can we include this behavior in LO?

Hello,

Dear All,

Our Language, Sidama, uses the latin script. However, there are additional
consonants which are formed by using two letter combinations. Eg ph is
considered one consonant and so are sh, ch, zh, ts, ny. These constants are
ordered after the first constant used to create them. "Our alphabet" reads,
a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc

As a result, we want to sort string using this order. eg the words
cala, chala, cola should be sorted cala, cola,chala... ch has to come after
c.

So, how can we include this behavior in LO?

You need to add the rules to LC_INDEX section of your locale.
(i18npool/source/localedata/data/sid_ET.xml)

See a working solution (Hungarian) at:
http://opengrok.libreoffice.org/xref/core/i18npool/source/localedata/data/hu_HU.xml#205

If you can't submit a patch, please file a bug.

Best regards,
Andras

*The Hungarian LC_INDEX*

  <LC_INDEX>
    <IndexKey phonetic="false" default="true" unoid="charset">A(A, Á)
B C {Cs} D {DZ} {DZS} E(E, É) F G {Gy} H I(I, Í) J-L {Ly} -N {Ny}
O(O, Ó) Ő(Ö, Ő) P-S {Sz} T {Ty} U(U, Ú) Ű(Ü, Ű) V-Z {Zs}</IndexKey>
    <UnicodeScript>0</UnicodeScript>
    <UnicodeScript>1</UnicodeScript>
    <UnicodeScript>2</UnicodeScript>
    <UnicodeScript>3</UnicodeScript>
    <FollowPageWord>p.</FollowPageWord>
    <FollowPageWord>pp.</FollowPageWord>
  </LC_INDEX>

*LC_INDEX for Sidama*

    <IndexKey phonetic="false" default="true"
unoid="alphanumeric">A-Z</IndexKey>
    <UnicodeScript>0</UnicodeScript>
    <UnicodeScript>1</UnicodeScript>
    <FollowPageWord>STP</FollowPageWord>
    <FollowPageWord>StO</FollowPageWord>
  </LC_INDEX>

I couldn't find any documentation but I'm guessing you should first change
the unoid value to charset, what's Sidama status regarding Unicode?

Yaron Shahrabani

<Hebrew translator>

Ignore the Unicode question, you've already answered it :relaxed:

Yaron Shahrabani

<Hebrew translator>

Citēts Yaron Shahrabani <sh.yaron@gmail.com>
Fri, 14 Jun 2013 10:06:49 +0300:

Ignore the Unicode question, you've already answered it :relaxed:

to my mind - it would be much betted if you submit locale data through
http://www.it46.se/localegen/ - then there is a chance you locale data will spread a lot wider.

Of course, you language must possess respective language code in, for example, ISO 639-3. Is it Sidamo (http://www.ethnologue.com/language/sid) or Sidama?

Janis

Hi,

> Our Language, Sidama, uses the latin script. However, there are additional
> consonants which are formed by using two letter combinations. Eg ph is
> considered one consonant and so are sh, ch, zh, ts, ny. These constants are
> ordered after the first constant used to create them. "Our alphabet" reads,
> a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc
>
> As a result, we want to sort string using this order. eg the words
> cala, chala, cola should be sorted cala, cola,chala... ch has to come after
> c.
>
> So, how can we include this behavior in LO?

You need to add the rules to LC_INDEX section of your locale.
(i18npool/source/localedata/data/sid_ET.xml)

Well, yes, but the IndexKey element is only used for Writer's index
table feature. General sorting uses collation, defaulting to ICU's
Unicode collation rules. If Unicode did not define these exemptions for
'sid' or ICU didn't implement it yet then we'd have to add a language
specific rule to i18npool/source/collator/data/

See a working solution (Hungarian) at:
http://opengrok.libreoffice.org/xref/core/i18npool/source/localedata/data/hu_HU.xml#205

Btw, why does that opengrok instance not understand UTF-8?

  Eike

Hello Andras,

I have file a bug at[1] with modified locale for the sort order.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=65809

Regards,