Feature Request - Categories for special characters

Steve_Gruspier · November 13, 2013, 9:28pm

Hello:

I was wondering if this was the place to request a feature. I was thinking the "Special Character" section is very cluttered. My feature request is a setting that would narrow down special characters to ones that are used in specific fields such as "Engineering" or "Physics". Something along those lines that could help people become more efficient using Libreoffice. I am constantly using special characters such as the ohms (uppercase omega) symbol for resistance, particularly when I use Libreoffice to generate tests.

e-letter · November 14, 2013, 10:12am

Hello:

>
> I was wondering if this was the place to request a feature. I was
>

Perhaps as discussion, then submit via bugzilla.

> thinking the "Special Character" section is very cluttered. My feature
> request is a setting that would narrow down special characters to ones
> that are used in specific fields such as "Engineering" or "Physics".

Not a good idea; suppose 'ε' has different definitions for different
disciplines. The dialogue window would have duplicates of each
"special character" for each field because users would navigate to the
field of personal interest and ignore other fields of knowledge.

Paul16 · November 14, 2013, 1:09pm

Actually, I like the idea. The current Special Characters dialog allows
you to choose a font and a subset. I'm not entirely sure how the subset
is derived (I'm really not that clued up on all the unicode
complexities), but they seem to be just a quick way to navigate the
large, complete list of characters.

It might be more useful to have another dropdown that lets you choose a
custom subset of characters, and only show that subset. Default subsets
could be things like "Engineering", "Maths", etc, and you could design
your own. Each subset would just have a list of which characters to
display.

Sounds simple enough (and useful) to me, and I'm not sure I agree
with e-letter's objection above, but as I said, I don't understand the
complexities of things like how changing the fonts might affect this,
etc, so perhaps there are technical hurdles to this.

Someone with more indepth knowledge care to comment on the feasability
of this?

Paul

krackedpress · November 14, 2013, 2:54pm

Who gets to decide which font glyphs get removed? Remember "Special
Characters" is really a list of all of the glyphs that the font being
used has defined.

If I use Arial Unicode, I get all of those glyphs that are actual
letters and such of non-Latin character-based languages, like Asian
languages.

How about the fonts that contain no letters but images and other glyphs,
like arrows, dingbats, wingbats, and other images that may be needed to
be used by the user.

I use many fonts that are "image only" types in my document creations,
form time to time. There are a vast number of "specialty fonts" that
are designed to give the users images instead of letter style of glyphs.

Now the big question - how do you define a new "Special Character"
option like "these are for Engineering and these are for Physics or
Mathematics", that will "know" that this font does, or not, have the
categories of glyphs for ALL of the 200,000 and more fonts out there?

That is not something I would attempt. My collection of 200,000+ font
files have such a variety of glyphs and placement of these glyphs, that
there is no way to do what you ask unless you require a users to only
use a preset set of fonts, and no others, in the Spacial Character" options.

Regina · November 14, 2013, 3:55pm

Hi Steve,

Steve Gruspier schrieb:

Hello:

I was wondering if this was the place to request a feature. I was
thinking the "Special Character" section is very cluttered. My feature
request is a setting that would narrow down special characters to ones
that are used in specific fields such as "Engineering" or "Physics".
Something along those lines that could help people become more efficient
using Libreoffice. I am constantly using special characters such as the
ohms (uppercase omega) symbol for resistance, particularly when I use
Libreoffice to generate tests.

I do not like the idea to remove characters or group them in another way. The Unicode groups are well defined and easy to handle.

But I would like the idea of a user defined, "favorite characters" list or similar, or even more then one, each for a special topic.

For your problem I think of this methods:
* Write an Autotext or a document, which contains all your favorite characters. Open it beside your actual text and use copy&paste to insert the characters.
* Use your OS to insert the character by typing the number; you need a list of numbers beside your keyboard.
* Use a macro to insert a special character. You can connect the macro to a button; I use the character itself as "name" of the button, so it is shown on the button. So you can generate your own toolbar with your favorite characters. For Writer such macro in Basic is for example (I hope the line end are set correctly in mail transport):

Sub lcl_InsertCharacter_Writer(byval sChar as string)
Dim oDoc as variant: oDoc = ThisComponent
Dim oCurrentController as variant: oCurrentController = oDoc.getCurrentController()
if not(oCurrentController.supportsService("com.sun.star.text.TextDocumentView")) then
  msgbox("Only for Writer")
  exit sub
end if
Dim oTextViewCursor as variant: oTextViewCursor = oCurrentController.getViewCursor()
Dim oText as variant
If IsEmpty(oTextViewCursor.Cell) Then
         oText=oTextViewCursor.Text
Else
         oText=oTextViewCursor.Cell.Text
End If
oText.insertString(oTextViewCursor,sChar,false)
End Sub

That is the general method, and for each single character:
sub OE_Lower_Ligature
lcl_InsertCharacter_Writer(chr(clng("&H153")))
end sub

Here &H153 is the number of the character œ , &H is the markup for a hex-number and 153 is the number itself, as can be seen in the special character dialog.

Kind regards
Regina

doug11 · November 14, 2013, 5:02pm

/snip/
The first suggestion is almost what you can do in WordPerfect. In WP, you can type ctrl-w and a package of 10 windows opens, each with about 40 special
characters. Then you can hi-lite one and paste it. You probably only need one "window" so the proposed solution looks very reasonable.
--doug

toki · November 15, 2013, 6:06pm

Sounds simple enough (and useful) to me, and I'm not sure I agree with e-letter's objection above,

Instead of the current theoretical maximum of 2000 page to search for a
rarely used glyph, whose position is known, you'd have to search through
25000 pages for a glyph whose position is both unknown, and unknowable
to all, except the creator of the font palette.

Someone with more indepth knowledge care to comment on the feasibility of this?

My suggestion is that an extension be made, either "forking" the
Thunderbird extension _abcTajpu_, or one that requires the user add
their 20-40 most used glyphs.

jonaθon

Paul16 · November 15, 2013, 6:19pm

I really have no idea what you are talking about here...
How does 2000 or 25000 come into it at all? We're simply talking about
being able to filter the list by custom selections, be that their 20-40
most used, "Engineering" symbols, or whatever.

krackedpress · November 15, 2013, 9:34pm

They are talking about Unicode fonts. They could have 2 to 10 thousand
glyphs, depending on which language glyphs are supported.

What you are asking may be in the "basic" special character sets in
Basic Latin, Latin-1, Latin Extended A and B, among other glyph sets in
a "well rounded" font. There may be 100 to 500 glyphs in those sets in
your "popular" fonts that are used. The sets do have names that are
defined by the "font standards", but I never remember the names or what
goes where.

Paul16 · November 15, 2013, 10:18pm

That still doesn't make any sense. What is this theoretical 2000
page maximum?

And why would the glyph's position be known? That's assuming you know
where the glyph is. Most cases you would only know what it looks like,
but not where it is in the list, hence why you would want some sort of
filter to make it easier to find.

And why would a filter on the special characters mean that you suddenly
need to search through 25000 pages? You would need to search through
*less* characters, not more, because you have filtered the list to only
show a subset.

As I see it, the major problem with this is that changing the font
changes the available special characters. So any subset that was
defined might not have all the characters available for the selected
font, but surely that could be shown quite simply?

Or would certain fonts have certain special characters at different
unicode locations, i.e. would different fonts have different symbols
for the same unicode point (or whatever it is called)?

And where does the current list of subsets come from anyway? Is that
defined within the font?

David_Gast · November 15, 2013, 10:08pm

I have two ideas.

1. Highlight the categories, so it is easy to tell where the category starts and ends.
2. Allow some input box so you could type some substring of the characters' names and get
all matching characters. For example, if you typed equal, all characters with equal
in the name would be listed. (I do not know if the names are i*18n or not.).

I also have a related question. Is the some way to sort using LC_COLLATE=C,
that is, the ASCII character set, rather than en_US.ISO8859-15 or something
similar?

Best regards,

David Gast

krackedpress · November 16, 2013, 3:37pm

Here is a PDF file with two fonts and their named glyph sets shown in
screen clips of the "Insert Special Character" option.

Liberation Serif has 29 named sets within the range of glyph positions.
The second font is Arial Unicode MS, which has 79 named sets within the
same range [plus on more position]

http://LibreOffice-NA.US/special-characters-1.pdf

All glyphs have positions in the list of glyphs. The "space" character
is "U+0020", and the "?" is "U+003F".

Liberation Serif contains various glyphs from "U+0020" to "U+FFFC",
while Arial Unicode MS goes to "U+FFFD".

Arial Unicode MS includes a large collection of glyphs from many
different languages, while Liberation Serif skips most of them.

To be honest, the basic Latin glyphs that use the letters that English,
Spanish, French, etc., use for their languages, reside in just a few
glyph sets. Most fonts have these sets plus some of the glyphs used in
Mathematics and other specialized usage in those languages, as well as
some others. For the fonts that have more glyphs than your "standard"
fonts, they could contain glyphs for non-Latin-based languages and other
special glyphs needed by the user. But then there are those fonts that
do not use the standard of "this glyph goes here" and use there own
specialized glyph sets. Many Calligraphy fonts have additional fonts
that contain combinations of letters that you might see in the "art" of
hand Calligraphy. Also there are those special fonts that are in the
"dingbat", "wingbat", "webdings", and other names of fonts that are
composed of special images in each glyph position. The number 2 could
be an arrow pointing down and slightly to the left, or it could be a
snowflake, or a pumpkin. These fonts will not adhere to the glyph name
set "standards".

. . . . And you thought fonts were easy to understand . . . .

Fonts are easy to use but the internal information stored within the
fonts, the glyphs, the set information, and a whole bunch more that most
people never will know about unless you use a font creation software.

The whole point of this posting is that there may be a lot of "things"
that would need to be known and done for a "special character" sorting
or filtering routine that would work for the major percent of the fonts
out there. Then there are the pesky ones that will make the routine
fail badly.

To be honest, I am not an expert on fonts. I have a very large
collection of fonts - over 214,000 files in 15.2 GB of drive space
[according to the "properties" info on the folders that contain the font
collections]. I spent years collecting free fonts and looking into
their differences. Once in a while I even work on sorting some more of
them in the folders that are listed by name [for serif and san serif]
and by font type [calligraphy, dingbats, non-English languages, and many
others].

If you want my opinion, as a person who works with a lot of different
fonts and has over 400 fonts installed on his desktop, the Insert
Special Character option is good as it is. The only thing that might
help a person, who needs to use the "basic sets" of special characters,
would be a printout of the font glyphs. There are many font viewer
packages that will allow you to do this. That way, if you want to use a
special letter with a wavy line above it, you will know where to look
for that character in the glyph list for the font. I do this from time
to time myself.

toki · November 16, 2013, 5:35pm

That still doesn't make any sense. What is this theoretical 2000 page maximum?

Unicode allows for 1,114,112 different glyphs, excluding variants.
with variants, you are looking at roughly 1,750,000 glyphs.

And why would the glyph's position be known?

Taking, thorn, for example, with the current setup, one knows to look in
the Runic range. when the sub-range is at the whim of a programmer, it
could be literally anywhere.

Most cases you would only know what it looks like, but not where it is in the list, hence why you would want some sort of filter to make it easier to find.

Under what circumstances would one be using glyphs they know not the
name of?

And why would a filter on the special characters mean that you suddenly
need to search through 25000 pages? You would need to search through
*less* characters, not more, because you have filtered the list to only
show a subset.

The claim is that the current filters are inadequate. Thus, the need to
dummy it down, so that it is less efficient, more time consuming, and
awkward to use. But because less glyphs are displayed, Joe Sixpack
thinks it is easier to use.

As I see it, the major problem with this is that changing the font changes the available special characters.

Obviously that is going to happen. That is the desired and expected
behaviour. To do otherwise would constitute a show-stopping bug of the
highest possible priority to fix.

Or would certain fonts have certain special characters at different
unicode locations, i.e.
would different fonts have different symbols for the same unicode point

(or whatever it is called)?

Those are variants, and are part of the Unicode specification.
If font creators correctly implement the full sub-range, those variants
would be included, but since, for various reasons, won't implement the
full sub-range, the variants are omitted.

And where does the current list of subsets come from anyway? Is that defined within the font?

The Unicode Specification.

jonathon

Brian_Barker · November 16, 2013, 7:36pm

Many people do not know the name "ampersand". No-one knows what "@" is called. Some people think "~" is a tilde. (Hint: it's a swung dash.)

;^)

Brian Barker

Paul16 · November 16, 2013, 7:55pm

Actually, it *is* a tilde (or at least looks identical to).

See the section "Keyboards" here: http://en.wikipedia.org/wiki/Tilde

My keyboard only has one such symbol on it that I am aware of, so it
has to be the tilde.

Also see the section "Computing", which talks of it being an
indicator of the home directory on Unix like OSen. The aforementioned
key on my keyboard is used for this, ergo it must be a tilde.

Beyond that (using unicode characters, say), I'm not sure I could tell
the difference between them offhand. I'd probably use the tilde for all
the various uses.

Just saying

And hence I'm in complete agreement that people often need a symbol
that they know the look of (or maybe only sort of remember the look
of), but don't know what it is called, or where to look for it.

Paul

Paul16 · November 16, 2013, 8:17pm

> That still doesn't make any sense. What is this theoretical 2000
> page maximum?

Unicode allows for 1,114,112 different glyphs, excluding variants.
with variants, you are looking at roughly 1,750,000 glyphs.

And so... how does this relate to 2000 pages? Or are you saying this
would all fit on about 2000 pages? And if so, what's the 25000 pages
about?

> And why would the glyph's position be known?

Taking, thorn, for example, with the current setup, one knows to look
in the Runic range.

You might, but that doesn't mean everybody does. Especially when it is
some half-remembered symbol from university physics you are looking
for. Or one seen in some article that you are trying to duplicate.

when the sub-range is at the whim of a programmer, it could be
literally anywhere.

Sure, but the assumption is that it is easier to find a glyph by usage
than by name. Say for example U+2126 ("Ω"). Looks a lot like an ohm to
me. But I found it under "Letterlike Symbols". How on earth would I
know to look under "Letterlike Symbols" if I was writing a quick
document about some wiring, and needed to note the wire's resistance.
Surely it would make more sense to look under "Electrical Symbols"?

> And why would a filter on the special characters mean that you
> suddenly need to search through 25000 pages? You would need to
> search through *less* characters, not more, because you have
> filtered the list to only show a subset.

The claim is that the current filters are inadequate. Thus, the need
to dummy it down, so that it is less efficient, more time consuming,
and awkward to use.

Why on earth would I want to make it *more* complex if it is too
dificult as it is? I can see you clearly don't understand the
suggestion.

But because less glyphs are displayed, Joe Sixpack thinks it is
easier to use.

Exactly. So useful for Joe Sixpack, if not for you.

See, it's like this. Currently, you can choose a font, and once you've
done that, you get a list of subsections. The full list of characters
is displayed, but the subsections dropdown allows you to move around in
that quickly *if* you know which subsection your glyph is in. If you
don't, you have a large list to scroll through.

What I'm proposing is another dropdown, let's call it "filters", that
would allow you to display only the glyphs that belong in that filter.
And someone could design a set of filters, let's say an xml file for
each filter, giving the filter name and the list of unicode characters
that belong in that filter. Somewhere LO looks in a folder for all
these xml files, and populates the filter dropdown with the names of
all the filters. By default "All" is selected, so you see all the
characters in the Special Characters dialog, just as now. If you don't
want to use filters, don't change anything, and it will work just the
same. If you know, however, that you need the ohm symbol, but don't
know where to find it, you can change the filter dropdown to
"Electrical Symbols", and the subsection dropdown will go blank, and
the list of characters will only show those characters that are defined
in the xml file as belonging to electrical symbols, making ohm easier to
find. And if the font you have chosen doesn't have the ohm sign, it
just won't be in the list of characters in the dialog, at least not
until you change font. Or better yet it will have a red cross in that
box to show the font doesn't include that symbol. And if your font is
webdings or whatever, and the character for ohm doesn't look like an
ohm, then you will get a pumpkin, or whatever, instead of an ohm.

This way there is minimal change to the dialog, no difference in usage
if you don't want a difference in usage, and an easy to use filter
system if you want it. And filters are easy to add. Just drop in a new
xml file into the correct folder under the LO installation. LO could by
default come with some common ones, and anybody could make their own
and share them. If the LO ones do what you need, no worries, if not, go
look for some custom ones that are more complete than the LO ones, or
cover other categories, or roll your own.

Now does that really sound like it would be *more* complex? Would it
make the number of pages go from 2000 to 25000? Would it leave you at
the whim of the programmer?

I don't think so.

Paul

Regina · November 16, 2013, 8:27pm

Hi,

Brian Barker schrieb:

Under what circumstances would one be using glyphs they know not the
name of?

Many people do not know the name "ampersand". No-one knows what "@" is
called.

Unicode has charts, where you can lookup the characters. For search by number http://www.unicode.org/charts/ or http://www.unicode.org/charts/charindex.html for search by name.

If you only know how it looks like, try http://shapecatcher.com/

Some people think "~" is a tilde.

My Editor tells me, that in the arrived mail it is U+00FE and that is "TILDE".
http://www.unicode.org/charts/PDF/U0000.pdf
Perhaps you should sent the mail in UTF-8?

(Hint: it's a swung dash.)
A 'swung dash' is U+2053. Perhaps your font for emails has it, otherwise you will see a placeholder: SWUNG DASH ⁓
"DejaVu Sans" has it.
http://www.unicode.org/charts/PDF/U2000.pdf

Kind regards
Regina

Brian_Barker · November 16, 2013, 8:35pm

Some people think "~" is a tilde. (Hint: it's a swung dash.)

Actually, it *is* a tilde (or at least looks identical to).

The character I typed is mid-height and wouldn't fit over even a lower-case "n", as a tilde needs to.

See the section "Keyboards" here: http://en.wikipedia.org/wiki/Tilde

Actually, if you read that page generally, it talks of swung dashes and appears to make the distinction I do. But yes: it does think that what you get directly from the keyboard is a tilde.

My keyboard only has one such symbol on it that I am aware of, so it has to be the tilde.
The aforementioned key on my keyboard is used for this, ergo it must be a tilde.

I think these just tell you that some people call the swung dash a tilde: I know that, of course, and I don't expect everyone (or indeed anyone) to agree with me. And I have to confess to a bit of cheek in adding a reference to this character to my serious point about "&" and "@". I shall not press my case!

Brian Barker

Paul16 · November 16, 2013, 10:36pm

>> Taking, thorn, for example, with the current setup, one knows to
>> look in the Runic range.
> You might, but that doesn't mean everybody does.

Only if one has paid absolutely no attention to how glyphs in the font
are organized. Even a thirty second scan shows that it is ordered by
writing system.

That assumes, among other things, that the user knows what thorn is. As
an example, let's say the user wants to write a small section about
mathematical sets, and needs the intersection symbol. I just tried to
look that up in the special characters dialog, and had absolutely no
idea where it was. And the list was simply too long to search through
by brute force. Luckily, one of the subsets is helpfully named
"Mathematical Operators". Makes it very easy to find. But why are there
no "Engineering Operators" or "Electrical Operators"? Now what do I do
if I need one of those?

Yes, I realise why there are no such categories defined within unicode,
that's not my point. My point is if that's what I'm looking for, it
would be handy to have such a subset.

>> when the sub-range is at the whim of a programmer, it could be
>> literally anywhere.
> Sure, but the assumption is that it is easier to find a glyph by
> usage than by name

The issue you fail to recognize is that the same glyph can be used in
any number of different fields, to represent very different concepts
and meanings.

You are mistaken. I don't fail to recognise that. I am fully aware of
that. And there is nothing stopping the same unicode character being
included in multiple filters.

Perhaps you don't fully understand how the proposed system works.

>> The claim is that the current filters are inadequate. Thus, the
>> need to dummy it down, so that it is less efficient, more time
>> consuming, and awkward to use.
> Why on earth would I want to make it *more* complex if it is too
> difficult as it is? I can see you clearly don't understand the
> suggestion.

Take a look at the problems created by the various types of indexing
methods used for Chinese dictionaries, and why each of those indexing
solutions is touted as being the best, and thus only system that
should be used.

Not knowing anything about this, I won't comment. But you tell me, why
exactly is my system "more complex"?

>> But because less glyphs are displayed, Joe Sixpack thinks it is
>> easier to use.
> Exactly. So useful for Joe Sixpack, if not for you.

Less glyphs displayed means more pages have to be viewed to find the
appropriate glyph. Which means that in the long run, it will be even
more awkward for Joe Sixpack.

Only if it's not in the filter he thinks it is in. Chances are he has
at least some idea of what purpose it serves, and so will be able to
find it in an aptly named filter (or think of them as a collection). If
he truly has no idea where it might be, then there is probably no help
for him, short of some sort of sketch-and-search, which would be one
smashing great idea, but is probably technically unfeasable.

Do that as a user-installable extension.

Sure. No reason not to. That would be a first step. Although I don't
see why it couldn't be part of the core LO, but no, it wouldn't have to
be.

What happens when the ohm symbol is not in the set of "Electrical
Symbols"? Joe Sixpack is even more lost than under the current setup.

Well, then he either needs to do an exhaustive manual search, or
download a more complete filter/collection. Or amend it himself.

> And if your font is webdings or whatever, and the character for ohm
doesn't look like an ohm, then you will get a pumpkin, or whatever,

Then you are back with the mess that fonts were, before most software
incorporated, and could utilize Unicode.

That has nothing to do with this idea, that problem exists all on its
own. You try webdings in the current Special Characters dialog and tell
me that the problem would be purely in my extension.

What happens when the programmer omits glyphs because s/he thinks that
they are so rare/obscure that they will not be useḍ?

As I stated, download a more complete filter/collection, or make one
yourself. Chances are that any offical ones would be fairly complete to
start with, but the whole point of my system (which is just how I
envision the OP's enhancement idea) is that it would be extensible.

I really should have called it a collection, rather than a filter, that
might have avoided some confusion.

Paul

Mark_Bourne · November 21, 2013, 6:59pm

David Gast wrote:

I have two ideas.

Interestingly, Windows Vista's "Character Map" utility (and probably also Windows 7's?) has similar ideas...

1. Highlight the categories, so it is easy to tell where the category starts and ends.

Vista's character map has an option to group by Unicode subrange, where only the characters from the selected subrange are shown - as opposed to LibreOffice's current behaviour of jumping to the first character in the range, but giving no easy indication where the range ends. As you suggest, highlighting the range would be similarly helpful.

2. Allow some input box so you could type some substring of the characters' names and get
all matching characters. For example, if you typed equal, all characters with equal
in the name would be listed. (I do not know if the names are i*18n or not.).

Vista's character map does pretty much exactly this. I think the character names are defined in the Unicode standard. Not sure if they're internationalised though. The thing that keeps catching me out with Vista is that after searching, the "Search" button changes to "Reset" - so to do a new search you have to first reset, then type the query string, then search; you can't just type a new query and search for it.

Mark.