A toolbar option that makes it easier to type using unicode

Samiur_Rahman · June 29, 2016, 8:42pm

We need an "unicode editor" to be implemented in LiberOffice.

It will allow you to select a certain unicode range, such as a specific
language or script, and then choose from a number of fonts that support
that range, or give the opportunity to download or buy a font that supports
that range.

Liboffice apps can offer a choice between "type mode" and "unicode mode."

Libreoffice should come with all free unicode fonts.

There should be an option on the toolbar that asks you to select mode, and
you can select from "type" and "unicode."

After you've selected mode, there should be a box that asks you to select
an unicode range, and then another box should ask to select from an number
of unicode fonts.

The boxes should say "Select unicode range" and "Select unicode font."

toki · June 29, 2016, 10:12pm

We need an "unicode editor" to be implemented in LiberOffice.

CF Bug 34882
( https://bugs.documentfoundation.org/show_bug.cgi?id=34882 )

It will allow you to select a certain unicode range, such as a specific

The virtual keyboard extension rears its head once more.

For between roughly 3 characters and 500 characters, virtual keyboards
are OK.More than that, and it is simpler to remap the physical keyboard.

The boxes should say "Select unicode range" and "Select unicode font."

Select the font then the range.

jonathon

toki · June 30, 2016, 3:01pm

Samiur wrote:

Right now, there are countless Unicode fonts but only Microsoft

Unicode MS that's publicly
available.

I have no idea what you mean by "only Arial Unicode Microsoft that is
publicly available" means.
As a Pan-Unicode font, it is still playing catchup with Code2000 --
which is stuck at Unicode 5.2, due to no maintenance since at least 2011.

In terms of ubiquitousness, Google's Noto font family looks better, and
provides far more extensive coverage of Unicode than anything from
Microsoft.
The first major problem with Noto is that is stuck on Unicode 6.1.
Unicode 9.0 is slated for Phase 3, whenever that begins.
The second, and potentially more serious issue with Noto is that it is
neither correct, nor comprehensive, in its support of CJKV. Furthermore,
correct, comprehensive CJKV support
is not planned.

(Using roughly 1,000 glyphs in the private plane, an individual or
organization could construct a private IME/font combination that
correctly constructs and displays
any glyph in CJKV, regardless of its in/formal inclusion in the Unicode
specification.)

For users of different scripts and languages, they need to install extra fonts.

Regardless of provided fonts, users will install extra fonts, either for
utilitarian reasons
(fonts don't contain required glyphs), or aesthetic reasons (the
provided fonts, at best, are uglier than hell.)

Putting in all free unicode fonts in the package will make it easier to type in non-Western European characters.

IMEs have far more to do with the ease of using non Western writing
systems, that fonts, and what they contain, or more commonly, fail to
contain.

You don't need to buy extra fonts

For some parts of Unicode 9.0, there are no FLOSS fonts.

and have the ability to select from multiple unicode fonts that support a certain range.

This would be a useful tool.

jonathon

Samiur_Rahman1 · June 30, 2016, 4:22pm

and have the ability to select from multiple unicode fonts that support a

certain range.

This would be a useful tool.

My idea is mostly about that. I envision allowing a "type mode" (what we now
type with, using typefaces) and "unicode mode" (the new idea) in all
LibreOffice applications.

CVAlkan · July 4, 2016, 1:45pm

I, for one, am totally opposed to any perpetuation of this absurd distinction
between "typing" and "unicode" as it only perpetuates the silly idea that
somehow western character sets are "normal" and other scripts are "complex
text". Are Arabic, Japanese, Hebrew, Hindi, Korean, Laotian and
Thai-speaking users not "typing" simply because they use a different
alphabet?

It is, of course, necessary that any/all computing devices support the lower
"ASCII" characters, since this is what operating systems, programming
languages, compilers, et al. understand. That, in itself, is an artifact of
history and not likely to change an any of our lifetimes. But, there is no
reason whatsoever that a modern computer should not be able to easily handle
ANY character one wishes to type - acknowledging of course that the desired
characters must be available in whatever font is desired.

As for "universal" Unicode fonts, that is highly impractical for both size
(such a font would be HUGE) and aesthetic reasons (what the heck does
sans-serif mean in Hangkul script anyway? and how does one match a Thai
style to an Arabic one?), but if a user (like me for instance) who
occasionally types using three distinct scripts (not including musical
symbols, which I also use) on a regular basis, there are many completely
free fonts available that cover most arbitrary sets of unicode planes
desired.

Unfortunately, the major suppliers of such fonts are quite lax in providing
information on which scripts are included in any given font they offer. But
it isn't all that difficult to locate a variety.

There is little support for entering non-Latin scripts in many applications
(including MS-Word and LibreOffice Writer - and I say this in spite of all
the "Complex Text Layout" obfuscation in the menus of these and other
examples - but - using an appropriate "Input Method" (I use iBus for
instance) one can easily switch scripts (language support is an entirely
different matter). To make matters worse, Writer, as those who regularly use
it for multi-lingual writing know, often substitutes fonts that don't
require substitution (at least on Linux - I left the Windows world some
years back), limits the user to ONE additional "complex text language" and
so forth - certainly a legacy of its Star days in spite of the addition of
CTL. There are multi-lingual examples that I can type all on one line in a
terminal or text editor on my machine that Writer chokes on - even when its
CTL is set up. (As a note: for those who use more than one script in a
single document, the CTL facility is best avoided: certain passages that
Writer butchers can be entered in a text editor and pasted (paste special if
regular past doesn't work) into the document if necessary).

Having said all that, Writer certainly could use some UI elements that could
help: an indication as to what font is actually being used at any given time
(and no: the displayed font is NOT always the one in use even when single
non-Latin glyphs are used); an indication as to what Unicode plane a
character is in, and so forth. Some of the poster's suggestions would
certainly be welcome. The ability to pick a font based on the unicode ranges
desired would be wonderful - but it strikes me that this really isn't the
responsibility of an application ... Being able to select a font and be
able to determine which planes are implemented might be more practical,
although there are many fonts which "support" a given plane without
including all of its defined glyphs. It's a tough call by any measure.

Sorry for the rant - but Latin characters written left-to-right are not
NORMAL to a large majority of the world; at some point, some bright
developer will generalize this much better than is currently done. Right
now, the operating systems are ahead of the applications as far as I can
see.

V_Stuart_Foote · July 4, 2016, 2:49pm

Frank, *

CVAlkan wrote

I, for one, am totally opposed to any perpetuation of this absurd
distinction between "typing" and "unicode" as it only perpetuates the
silly idea that somehow western character sets are "normal" and other
scripts are "complex text". Are Arabic, Japanese, Hebrew, Hindi, Korean,
Laotian and Thai-speaking users not "typing" simply because they use a
different alphabet?

Thank you, well put!

...

Having said all that, Writer certainly could use some UI elements that
could help: an indication as to what font is actually being used at any
given time (and no: the displayed font is NOT always the one in use even
when single non-Latin glyphs are used); an indication as to what Unicode
plane a character is in, and so forth. Some of the poster's suggestions
would certainly be welcome. The ability to pick a font based on the
unicode ranges desired would be wonderful - but it strikes me that this
really isn't the responsibility of an application ... Being able to
select a font and be able to determine which planes are implemented might
be more practical, although there are many fonts which "support" a given
plane without including all of its defined glyphs. It's a tough call by
any measure.
...

Which raises the question, where in the LibreOffice GUI would this belong?
IMEs (like iBUS) are implemented in the OS, but at the application level
within LibreOffice more remains to be done.

As you've noted on multiple occasions the CJK and CTL "modes" in LibreOffice
still need considerable attention ( tdf#96255
<https://bugs.documentfoundation.org/show_bug.cgi?id=92655> ).

As LibreOffice has now implemented the :Emoji: auto-correct glyph
replacements, and the <Alt>+x Unicode toggle, we now exposed a lot of users
to use of glyphs contained in the Unicode SMP--which the default typeface
for the script in use is often not covered--nor in an often broken
fallback--resulting in some interesting visual glitches during editing.
Also, integral to LibreOffice, we need to examine the Unicode page
maps--searching by codepoint names or even graphic composition of glyphs
which requires efficient extraction of coverage.

All of which lands us back on LibreOffices GUI for Special Characters. A UI
element already consistent across script in use, but which is lacking some
essential features for better composition of multi-script text.
Specifically for managing the tables of Unicode codepoints, and search for
and display of typeface coverage of those tables.

A number of suggested enhancements to the Special Character dialog have
already been implemented

https://bugs.documentfoundation.org/show_bug.cgi?id=34882
http://user-prompt.com/libreoffice-design-session-special-character/

But there is much more that could be done to enhance the Special Character
dialog as an application component of LibreOffice for multi-script Unicode
beyond the core handling in "Western", CJK and CTL scripts that admittedly
need developer attention.

But, frankly this discussion does not belong on the User list--rather with
the Design and UX-Advise channel.

Stuart

CVAlkan · July 4, 2016, 3:53pm

Stuart:

Thanks for the thoughts.

You state: "A number of suggested enhancements to the Special Character
dialog have already been implemented." I looked at the references you
provided, and find them rather well thought out from a UI perspective, and
look forward to seeing them. As of Version: 5.2.0.0.beta2; Build ID:
ae12e6f168ba39f137fc110174a37c482ce68fa4; CPU Threads: 4; OS Version: Linux
3.19; UI Render: default; Locale: en-US (en_US.UTF-8), which I have at the
moment, they are not present, so I'm wondering when the new "Insert
Character" dialogs will be available so I can try them out.

As for posting to anything other than the User channel, both the on- and
off-line responses to the tdf#96255 bug report that you mentioned indicates
to me that perhaps NIH is a bit too prevalent for my taste (I'm long
retired, and old enough to avoid such nonsense). Although having done
software development in the past, I'm not at all used to FOSS development,
but getting acclimated to dealing with amateur wannabes (who regularly break
as many things as they fix and don't do the most obvious testing) as well as
pro developers (with whom I feel comfortable - perhaps the middle ground is
just silent?) is too difficult (or at least not worth the aggravation) at my
age.

As for "where in the LibreOffice GUI would this belong?" I suppose that,
architecturally speaking, I would say nowhere - the operating system should
supply the characters, the overlay information (แฟรงค์ โอเบอลี - note the
*separate* characters (not accents or tone marks) above the last letter in
each word) and so forth, while any app should do its own thing with that
(formatting, font choices and so forth). But - that will be a long migration
I expect.

Specifically though, an indication of "(substitute) font in use" could be
something printed in dark gray (or maybe in red as a warning) just above or
below the Font and Size selectors in the sidebar and that would be eminently
useful. If the substitute were shown in its own selector, that would be
handy for me personally, but I suspect that would cause no end of confusion
for single script users. The reality is that the usefulness of such an
enhancement would be severely limited by the random (?) font substitutions
that now occur.

The Alt+x thing was a fabulous improvement by the way; I wasn't even aware
of it until reading about it earlier last month.

Take care ...

toki · July 7, 2016, 2:49pm

years back), limits the user to ONE additional "complex text language" and

Technically, that is one CTL per style.

IMNSHO, what should be done, is to eliminate the CTL, CJKV, Western
Script differentiation, in favour of one language and one writing system
per style. (Japanese, with five different writing systems, would be an
obvious exception. Even then, there would eight listings for Japanese
--- Braille, Combined, Katakana, Hiragana, Kanji, Romanji, Emoji, and
arguably, hentaigana. )

single document, the CTL facility is best avoided:

How good or bad the CTL facility is, literally depends upon the writing
system that one is using.

although there are many fonts which "support" a given plane without
including all of its defined glyphs. It's a tough call by any measure.

If you are defining plane the same way that the Unicode Consortium does,
then less than a dozen fonts contain most of the glyphs in one plane.

OTOH, if by "plane", you mean "Unicode Range", then the number of fonts
that support most of the glyphs within that range increases
significantly. The caveat is that CJKV fonts are language orientated,
not writing system orientated. (Rephrasing, a font for Japanese won't
work for Chinese, and neither will work for Vietnamese.)

Latin characters written left-to-right are not NORMAL to a large

majority of the world;

More languages use the Latin Writing System, than any other writing
system. By that criteria, it is normal.

developer will generalize this much better than is currently done.

It probably is too late to get the Unicode Consortium to implement CJKV
correctly. Ditto for Indus Valley writing systems.

1,000 glyphs, and any ideograph in CJKV could be constructed correctly,
according to the dictates of the language it was being used for,
regardless of how archaic or rare said ideograph is. The downside is
that people would have to know how to correctly write the ideograph.

jonathon

CVAlkan · July 7, 2016, 7:07pm

Hi Jonathon:

I agree with almost everything you say, but am not sure about "More
languages use the Latin Writing System, than any other writing system. By
that criteria, it is normal." First, I wasn't really intending to use the
word "normal" in that context; it was more intended to reflect "the assumed
default" or something similar. Latin/English needs to be present, of course,
in any computer system for obvious reasons. But wouldn't the phrase "more
languages" be more suitably replaced by "more people?" Not disagreeing with
you - just food for thought. In any case, I guess my real point was that
oodles (technical term) of folks don't use Latin script in their day-to-day
writings and any hurdles they need to jump through annoy me just from an
architectural/design point. I realize that not everyone mixes scripts in the
same document, but it just seems to me that if I can do that in GEdit or
Kate with no yelling and screaming, I feel annoyed when Writer (and most
other word processors) seems to get in the way.

And you are certainly correct that I should have used the term "Unicode
Range."

the impression that use of ideographs pretty much died a few centuries ago -
being replaced by the awful quốc ngữ quasi-Latin script (Cyril and Methodius
did a much better job adapting Greek script to serve or their Russian
missionary work - a script now called Cyrllic - one can only assume
Methodius wasn't the main driving force).

I am a big believer in styles, but the idea of having separate styles for
separate attributes is fine, having to multiply that number by the number of
scripts in use within a document just smacks of Rube Goldberg - there's got
to be a better way. I'm just sorry I'm old enough that I might never see
such a solution. I guess that's why I simply don't like the whole CTL idea.
Your own example of the five systems for Japanese seems to me to acknowledge
at least the idea that we're still seeking the common denominator.

But, keep up your efforts ...

Frank