Sort Order in Calc

Tadele_Assefa · June 14, 2013, 5:53am

Dear All,

Our Language, Sidama, uses the latin script. However, there are additional
consonants which are formed by using two letter combinations. Eg ph is
considered one consonant and so are sh, ch, zh, ts, ny. These constants are
ordered after the first constant used to create them. "Our alphabet" reads,
a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc

As a result, we want to sort string using this order. eg the words
cala, chala, cola should be sorted cala, cola,chala... ch has to come after
c.

So, how can we include this behavior in LO?

timar · June 14, 2013, 6:31am

Hello,

Dear All,

Our Language, Sidama, uses the latin script. However, there are additional
consonants which are formed by using two letter combinations. Eg ph is
considered one consonant and so are sh, ch, zh, ts, ny. These constants are
ordered after the first constant used to create them. "Our alphabet" reads,
a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc

As a result, we want to sort string using this order. eg the words
cala, chala, cola should be sorted cala, cola,chala... ch has to come after
c.

So, how can we include this behavior in LO?

You need to add the rules to LC_INDEX section of your locale.
(i18npool/source/localedata/data/sid_ET.xml)

See a working solution (Hungarian) at:
http://opengrok.libreoffice.org/xref/core/i18npool/source/localedata/data/hu_HU.xml#205

If you can't submit a patch, please file a bug.

Best regards,
Andras

yaron · June 14, 2013, 7:04am

*The Hungarian LC_INDEX*

  <LC_INDEX>
    <IndexKey phonetic="false" default="true" unoid="charset">A(A, Á)
B C {Cs} D {DZ} {DZS} E(E, É) F G {Gy} H I(I, Í) J-L {Ly} -N {Ny}
O(O, Ó) Ő(Ö, Ő) P-S {Sz} T {Ty} U(U, Ú) Ű(Ü, Ű) V-Z {Zs}</IndexKey>
    <UnicodeScript>0</UnicodeScript>
    <UnicodeScript>1</UnicodeScript>
    <UnicodeScript>2</UnicodeScript>
    <UnicodeScript>3</UnicodeScript>
    <FollowPageWord>p.</FollowPageWord>
    <FollowPageWord>pp.</FollowPageWord>
  </LC_INDEX>

*LC_INDEX for Sidama*

I couldn't find any documentation but I'm guessing you should first change
the unoid value to charset, what's Sidama status regarding Unicode?

Yaron Shahrabani

yaron · June 14, 2013, 7:07am

Ignore the Unicode question, you've already answered it

Yaron Shahrabani

Janis · June 14, 2013, 10:38am

Citēts Yaron Shahrabani <sh.yaron@gmail.com>
Fri, 14 Jun 2013 10:06:49 +0300:

Ignore the Unicode question, you've already answered it

to my mind - it would be much betted if you submit locale data through
http://www.it46.se/localegen/ - then there is a chance you locale data will spread a lot wider.

Of course, you language must possess respective language code in, for example, ISO 639-3. Is it Sidamo (http://www.ethnologue.com/language/sid) or Sidama?

Janis

erAck · June 14, 2013, 12:31pm

Hi,

> Our Language, Sidama, uses the latin script. However, there are additional
> consonants which are formed by using two letter combinations. Eg ph is
> considered one consonant and so are sh, ch, zh, ts, ny. These constants are
> ordered after the first constant used to create them. "Our alphabet" reads,
> a,b,c,ch,d,dh, e,f,g,h,i,j,k,l,m,n,ny,o,p,etc
>
> As a result, we want to sort string using this order. eg the words
> cala, chala, cola should be sorted cala, cola,chala... ch has to come after
> c.
>
> So, how can we include this behavior in LO?

You need to add the rules to LC_INDEX section of your locale.
(i18npool/source/localedata/data/sid_ET.xml)

Well, yes, but the IndexKey element is only used for Writer's index
table feature. General sorting uses collation, defaulting to ICU's
Unicode collation rules. If Unicode did not define these exemptions for
'sid' or ICU didn't implement it yet then we'd have to add a language
specific rule to i18npool/source/collator/data/

See a working solution (Hungarian) at:
http://opengrok.libreoffice.org/xref/core/i18npool/source/localedata/data/hu_HU.xml#205

Btw, why does that opengrok instance not understand UTF-8?

Eike

Tadele_Assefa · June 17, 2013, 5:14am

Hello Andras,

I have file a bug at[1] with modified locale for the sort order.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=65809

Regards,