Font versus Character Substitution

Hi:

I’ve been fighting with multiple language documents in LibreOffice for some
time, and am determined to understand what’s going on. To that end I have a
few questions and some thoughts. My intent is either to have someone point
out that I’m doing it all wrong, or to star some sort of conversation about
how to improve things. So here goes …

According to the Document Foundation Wiki’s Localization Guide for
LibreOffice, versions of LibreOffice (and I’m particularly concerned with
Writer here) running on Unix/Linux rely on the fontconfig utility to suggest
a font replacement when a requested font is not available. Apparently
LibreOffice uses its own routines under Windows.

First Question: Is this still true in current versions?

According to the documentation (man pages and such) for FontConfig, that
utility matches fonts using a very specific set of font characteristics that
are weighted in the following order: foundry, charset, family, lang, etc.
With the advent of “free” fonts over the past decade, I’ve found that many –
even most – such fonts have no foundry listed. Since that is the most
heavily weighted attribute in the matching process, and since null is never
equal to null, the matches returned are sometimes inappropriate.

Second Question: Is FontConfig called by LibreOffice only when a complete
font is missing, or if the font in question just doesn’t have some
particular character(s)?

This is particularly of interest with fonts having Complex Text Layouts.
Although it is possible to set a CTL replacement for fonts in Writer (e.g.
Tools | Options | LibreOffice Writer | Basic Fonts (CTL), Format | Character
and so forth), this is always limited to a single font, which isn’t always
realistic.

As an example, let’s say we go to Tools | Options | Language Settings |
Languages, check off “Complex Text Layout” and specify the language as Thai
(again, this can only be done for one language at a time). If we use a font
such as FreeSerif that has all the characters needed for both English and
Thai (as well as most other languages you’re likely to run across),
everything works just fine.

On the other hand, if we use a free Google font such as Droid Sans (which
has no foundry property), we get into trouble. Apparently in order to keep
fonts to a reasonable size, Google offers additional unicode pages as
separate font files: Droid Sans Armenian, Droid Sans Ethiopic, Droid Sans
Thai, etc. If the current font is Droid Sans and some Thai characters are
typed, fontconfig does NOT return Droid Sans Thai, but rather (seems to)
return the first font in which it locates the requested language.

Of course, this can be corrected for ONE font with Tools | Options |
LibreOffice Writer | Basic Fonts (CTL), but if more than one font is used
and neither is a so-called “pan-unicode” font, you must go through some
annoying machinations to get things working correctly (this can be done with
character styles, of course, but that can’t be considered anything other
than a work-around). Character styles were really not meant for this –
that’s what selecting a font is supposed to do.

All of this tends to make creation of bi-lingual documents, particularly
those using CTL, less flexible than normal single language document
creation – sprinkling a document with multiple documents is of course
generally considered a bit tacky, but there are legitimate reasons for doing
so. Creation of documents that use more than one language (and yes, I’ve
done that) is even more tedious; if more than one CTL language is used (e.g.
mixing Arabic and Hebrew, or Thai and Arabic) it’s a very difficult, if not
impossible task. Could this perhaps be the real underlying reason a
middle-east peace agreement can’t be reached? Is it just because there’s no
way to write it down. (Sorry – I know this isn’t the place for politics, but
the problem seems real enough to mention).

Even officially sanctioned free fonts can be problematic. The government of
Thailand, for instance, in an effort to bring some standardization to
official documents and avoid font piracy, has specified a set of thirteen
customized free fonts – SIPA.zip contains these – but none of these has any
foundry property defined. They are quite suitable for documents mixing Thai
and English, but don’t help if some third language is introduced into a
Writer document as well, particularly if the third language also requires
CTL. That isn’t actually far-fetched, as they have neighbors that use CTL
scripts.

Has anyone given much thought to permitting finer-grained control of CTL
font substitution – e.g. making the CTL font substitutions page similar to
the Options | LibreOffice | Fonts page used for general font substitution of
missing fonts. In other words, permitting multiple fonts to have
substitutions defined? And then also reversing the order of the Complex Text
Layout section of Tools | Options | Language Settings | Languages from
[CTL=On, CTL Language=Whatever] to [Language=Whatever and then for each
language CTL=On]?

I apologize if this sounds like a rant; it's not intended to be - it's just
a stream of consciousness thing caused by frustration at trying to
understand this whole subject ...

Thanks in advance for any comments, suggestions, and so forth.

Are you using the language block at the bottom of Writer to change language when writing a new paragraph in a different language?

It should be displaying your default language until you change it.

Hope this helps.

Also you can try to create your own styles for each language you are using.

    Selecting a language for a Paragraph Style

1. Place the cursor in the paragraph whose paragraph style you want
    to edit.
2. Open the context menu and select *Edit Paragraph Style*. This
    opens the *Paragraph Style* dialog.
3. Select the *Font* tab.
4. Select the *Language* and click *OK*.
    All paragraphs formatted with the current paragraph style will
    have the selected language.

https://help.libreoffice.org/Writer/Creating_New_Styles_From_Selections

Hope this helps.

What was described happens even when when language-specific styles are
used, but the font(s) associated with the style omits the required glyphs.

jonathon

Paul:

I appreciate the comments, but such a soluion would, in effect, prohibit
mixing languages within a single paragraph, sentence, or even line,
something that isn't all that common even in a mostly single language
document. For example "The sense of the word XYZ (in some other language
with a different character set) isn't quite the same as its definition would
lead you to believe." It's this sort of thing that is a problem with the way
languages, fonts, and characters are handled.

There is no doubt whatsoever that things have improved dramatically in this
regard since the early days of the PC, but at some point we need to finish
the process begun with code pages, refined by by unicode and other such
improvements. What better project to make these advances that LibreOffice.
(Of course, some serious Writer bugs need to be fixed as well, but I'm
greedy!!)

Are there any L10n folks who want to chime in? Should this thread be
reposted or copied to that area? Are there any contributing developers who
are fluent in any of the CTL languages? Developers who aren't familiar with
at least some multi-lingual issues probably have a huge hurdle to overcome
before diving in.

I would be happy if we could come up with a decent document showing what
LibreOffice support should be for Complex Text Languages.

Frank (แฟรงค์)

Paul:

/snip/

Frank (แฟรงค์)

Just out of curiosity, what is that? Thai? Pretty sure it's not Korean.

--doug

First Question: Is this still true in current versions?

I _think_ so.

Since that is the most heavily weighted attribute in the matching

process, and since null is never equal to null, the matches returned are
sometimes inappropriate.

Sometimes? I'd suggest that the returns are always inappropriate.

On the other hand, if we use a free Google font such as Droid Sans (which has no foundry property), we get into trouble.

The scenario you describe is precisely why I have said since at least
2001, that the only way to ensure that the glyphs are right, is to use
language specific styles. You can not get away with setting English,
Arabic, and Korean in the same style, and have things show up correctly.
That works if, and only if one uses a pan-Unicode font for all three
languages and writing systems.
If you want Arabic to display properly, then all three fonts in the
style have to be the Arabic font. If you want Korean to display
properly, then all three have to be set to the same font.
This applies regardless of language & writing system combination you are
using.

return the first font in which it locates the requested language.

The order of precedence is Writing System, then specific glyphs, with a
slight preference for the precomposed glyph, even if you originally did
not use a precomposed glyph.

FWIW, you need to turn both CJKV & CTL on, regardless of what writing
system is being used --- even if writing English. ( "æ", and similar
letter constructions require both CTL & CJKV features.)

Character styles were really not meant for this – that’s what selecting a font is supposed to do.

Character styles fill in the holes that paragraph styles create.
Also useful for one or two words in a different language, or the same
language, but a different writing system.

Has anyone given much thought to permitting finer-grained control of CTL font substitution

IMNSHO, the optimal solution would be to have styles be both language
and writing system dependent.

– e.g. making the CTL font substitutions page similar to

the Options | LibreOffice | Fonts page used for general font substitution of
missing fonts.
In other words, permitting multiple fonts to have substitutions defined?

The "bad" workaround is to create a set of character styles, that have
the correct font for each set of glyph(s), and then write a macro that
basically changes each glyph in the document to that of the character
style for the glyph.

I apologize if this sounds like a rant; it's not intended to be

But it bringing up some non-optimal things that have been around since
LibreOffice 0.x was released.

jonathon

  * English - detected
  * English

  * English

<javascript:void(0);>

You are correct. It is Thai, or at least a Thai would read it as "Frank" so I
guess it's more correctly "Frank" transliterated into Thai characters.

Well, you are correct; I should have said that it isn't possible to
[conveniently] mix the two. See my answer to the message below.

Hi again, Jonathon:

Re: Sometimes? I'd suggest that the returns are always inappropriate.
-- I was just trying to be polite and accept that I'm possibly more picky
than many.

Re: The order of precedence is Writing System, then specific glyphs, with a
slight preference for the precomposed glyph, even if you originally did
not use a precomposed glyph. FWIW, you need to turn both CJKV & CTL on,
regardless of what writing
system is being used --- even if writing English. ( "æ", and similar
letter constructions require both CTL & CJKV features.)
-- Where did you get that order of precedence? I got what I recorded from
the man page. Or are you talking about LibreOffice's internal order of
precedence. That's why I wanted to confirm that LO ignored its own
precedence on Linux systems. If that's the case, it's another path for
inconsistencies in documents to appear.

Re: IMNSHO, the optimal solution would be to have styles be both language
and writing system dependent.
-- Well, I sort of agree, but my opinion is that you have to have a good set
of formatting primitives (can't think of a better word for this) before you
can build styles --- Styles in that view being a way to insure consistent
use of an identical grouping of primitives.

Re: Character styles fill in the holes that paragraph styles create. Also
useful for one or two words in a different language, or the same language,
but a different writing system.
-- Your use of the term "fill in the holes" suggests to me that we are
perhaps in agreement that some refinements need to be made to the underlying
approach. I suspect we both want the fonts, characters, and so forth to be
"styled" the way we want regardless of how many times we might switch
languages or writing systems in the same sentence (unusual, but it happens
often enough that it got me riled up enough to begin whining).

Re: You can not get away with setting English, Arabic, and Korean in the
same style ... etc.
-- You're right, you can't RIGHT NOW. But I think we should redesign the
interfaces to make that more transparent). We both know it CAN BE DONE
although we accomplish it with different hacks - it's just that I think we
shouldn't need to go through that.

Re: FWIW, you need to turn both CJKV & CTL on, regardless of what writing
system is being used --- even if writing English. ( "æ", and similar letter
constructions require both CTL & CJKV features.)
-- If someone had asked me about digraphs like that I don't think I would
have known the answer, so I've learned something at least. It makes sense; I
just never used these before.

Frank

-- Where did you get that order of precedence?

Experience.
I first stumbled across it, when either « ʼn » or « ŋ » was coming up wrong.
Took some examination of the fonts I had, to figure out what was happening.

One day, when I was feeling extremely venturous, I looked, to no avail,
through the source code, trying to determine where it looked for the
writing system.

That's why I wanted to confirm that LO ignored its own precedence on Linux systems.

FWIW, On extremely rare occasions, it does the same thing. I think it
might be related to glyphs in the private part of Unicode.

If that's the case, it's another path for inconsistencies in documents

to appear.

+1

Well, I sort of agree, but my opinion is that you have to have a good set
of formatting primitives (can't think of a better word for this) before you
can build styles --- Styles in that view being a way to insure consistent
use of an identical grouping of primitives.

+1

Also need to add a couple of more formatting primitives. (Rotate base
line 90 degrees, Rotate base line 270 degrees, Rotate base line 180
degrees, Boustrophedon, Reverse Boustrophedon. I've forgotten the rest.)

But my working assumption is that issues with writing systems will be
fixed "real soon now".

Re: You can not get away with setting English, Arabic, and Korean in the
same style ... etc.
-- You're right, you can't RIGHT NOW. But I think we should redesign the
interfaces to make that more transparent). We both know it CAN BE DONE
although we accomplish it with different hacks

First thing that is needed, is for LibO to provide an indication that
the glyph is not found in the font used by the style, and a substitute
is being used.

Just realized that that could be done as an extension. It would need to
check the list, character, paragraph, and cell files.
The same issues crop up in multilingual spreadsheets, as in multilingual
text.
I don't know if it is an issue when using Base.
Using Draw to create text documents has its own set of problems.
(Text documents using Reverse Boustrophedon writing systems can only be
created by using Draw.
It is a toss up whether Write or Draw produces better looking text
documents, for Boustrophedon writing systems. Either way, they are
painful to create.)

it's just that I think we shouldn't need to go through that.

Because of the way the LibO code is written, language has to be declared
in the style. Basically, this is so that grammar checking,
spell-checking and related functions work as expected.

For styles to consistently work as expected, I think that they have to
be writing system dependent. At a minimum, that helps with the missing
glyphs from the font issue.

jonathon

If it's really not possible to mix languages in the same paragraph, then it's
strange that Character Styles have the option to select font languages for
Asian and CTL.

Sorry if this have no sense, I have never used Asian or CTL languages.

Miguel Ángel

Hi,thanks for this, I hadn't noticed it before (or, at least, I hadn't realised that the input language could be changed here). However, I'm using windows 7 and I have Russian and Greek Polytonic keyboards enabled, as well as some custom variants of English to use for transcriptions from ancient language scripts. All of these are available from the Windows toolbar, but they don't seem to be accessible via the language block at the bottom of Writer. Should they be? Do I have to enable them somehow in LO? Or is this for something else?
Thanks,/Gary

Gary:

Keep in mind that Languages, Scripts, and Keyboards, although obviously
having some relationship, are fairly independent. I'm still trying to sort
everything out, particularly in Writer (which seems to tie these together in
ways and/or for reasons that I don't understand).

Unfortunately I haven't used Windows for quite some time, and I've been told
that some of these relationships are different in Linux and Windows versions
(particularly detecting which fonts cover which scripts and so forth). So I
can't offer any specific help for your situation.

However - I'm attempting (in spite of significant time constraints) to
gather all I can find in one place, as I'm pretty sure some of the Writer
documentation is incorrect or at least misleading (likely because it hasn't
been touched for many years) so please post back any further discoveries,
observations, or solutions you find ...

Frank

On April 16, 2015 4:44:11 AM PDT, Gary Collins wote:

transcriptions from ancient language scripts.

If those languages are not listed in >Settings >Fonts >CTL, or >Settings >Fonts >Western, as appropriate, LibO will not handle them correctly. It will randomly substitute your correct glyph from your correct font, with some strange glyph from who knows where.
(It even does that when writing medieval English using thorn, and The other obsolete letters of the English alphabet.)

I think Ancient CJKV work fine, provided you have the appropriate font, _and_ the glyphs are in the Unicode Base Plane. (I don't remember if LibO supports non base plane glyphs.)
(I don't remember if the mu wang manuscript I was reading in LibO used images, or a font.)

I'd suggest filing an rfe for all ancient languages you use, if there is an ISO code for them, and also if they aren't yet supported in settings..

You'll need complete locale data for each language.
I'm assuming that there is an ISO-360 code for the language.

jonathon

I can get the characters I need from the fonts I have, no problem (usually - there have been glitches). What I was saying is this:

When I change input language, I change it using the keyboard/language selector on the windows taskbar.  In the post I was responding to, it was suggested that the language selector button (or whatever it's called) at the bottom of the writer window could be used to change input language.

However, when I try to use that button, it doesn't present the range of choices that I have from the Windows taskbar.

From the Windows taskbar language icon, I can choose from Russian, Greek (polytonic) or English. From the adjacent keyboard icon (with English language selected) I can choose from United Kingdom, Akkadian United Kingdom Extended or United Kingdom Extended - Latin.

From the language button at the bottom of the Writer window, the only choice I have is between English (UK) and English (USA). I don't know if it should offer me the same languages as the windows taskbar, or if its purpose is more limited.

/Gary

These are very different things, I think. You need both.

o The keyboard choices you have enabled in Windows (and which can be selected in the Windows taskbar) can indeed be described as "input languages", since they govern the relationship between the keys you press and the corresponding characters that are transmitted to whatever application you are using (or to Windows itself). Changing this choice potentially modifies the character that you will see when you press any key.

o Within a text document (e.g. in LibreOffice Writer), you may want to use a spelling checker and a thesaurus and to have automatic hyphenation. To complete any of these tasks, the application needs to know which language you mean the text to be in - and so how to treat it. You may also wish to set the language of some text as "[None]" in order to disable these processes. You can set the language in Writer in various ways, since language is a character property, a character style property, and a paragraph style property (but not a paragraph property, although an entire paragraph can be given a language setting using the character property, of course). The indication in the Status Bar is of the effect of all of these settings on the current selection or at the cursor position.

If you, say, wish to start typing in Russian in an otherwise English document, you will need to change keyboard setting in Windows and will also wish to change the language setting in Writer. You can do this through a context menu from the Status Bar indication; you may well see only a couple of languages there, but the full set is available via the More... item - leading to the Font tab of the Character dialogue. You will certainly find Greek and Russian there. Note, though, that you may well prefer to set the language property using either character styles or paragraph styles, rather than using the direct character formatting provided through the Status Bar facility.

I trust this helps.

Brian Barker

Thanks, Brian, that makes perfect sense. I just wanted to be sure that I wasn't doing something "wrong," or, at least, inefficient. I don't tend to use spell check very often, so I can just stick with changing the input on the windows task bar.
/G.