To generate the Unicode comparison page of various vendor emoji,
Unicode prefers to use 72x72 images for all the supported emoji
without aliasing. This tool will generate these from the
directory of cleaned images produced by the emoji build, using
the aliases defined in emoji_aliases.txt.
Previously we haven't put the binary into the repo itself, since it's
built using the tooling. But people who fetch fonts from the get/noto
website want to know more about the version history of the fonts they
find there. Checking the binary into the noto-emoji site will facilitate
this.
This was built locally from the public images using the standard makefile
and zopflipng.
When relying on aliasing, a number of single character emoji can be
replaced by sequence emoji (in particular, gendered variants). If
these images aren't present, the current code that displays a sequence
'visually' fails to find an image for one of the parts, so bails and
there's no visual presentation for those sequences.
To fix this, we first canonicalize the part we're looking for, and try
to find an image for that, and if we fail we check for an alias and
try to find an image for that.
Forgot to canonicalize the aliases, so most of them wouldn't get used
because the keys against which they're compared are canonical. Fixed
that.
Also report unused aliases.
- Support --ignore_missing flag to skip missing data on output.
When all_images is set, this skips sequences for which we have
no image files. When all_images is not set, this skips sequences
for which we have image files but are not in the canonical
sequence list (e.g. older sequences for which we included skin
tone variants but which later versions of unicode decided there
shouldn't be).
- Use alias information to add alias sequences when not using
all_images and we have an image for the target sequence.
- Use alias information to mark missing images with '-alias-' when
we expect an alias (note, not only when we actually have one)
- Embed tool name, date, and arguments in a comment in the generated
html.
We currently name the mixed-gender 'kiss' and 'couple with heart'
images after the single-codepoint sequences. But aliasing maps
the single codepoint sequence to the gendered sequence, not the
reverse. As a result the build doesn't create ligatures for the
gendered sequences, since it thinks the image doesn't exist.
Fix this to use the gendered-sequence-names for these images, and
let aliasing work as intended. This follows the convention we've
adopted of letting the name more completely describe the image
contents, and defining how to represent less-specific sequences
using aliasing rather than baking these decisions into each image
name.
We've been inconsistent about use of the variation selector in image names,
and it's cleaner if we just consistently drop it. We use the unicode data
for the full unicode strings for these names now so we don't need it in
the image data.
Formerly the annotations file created a set of sequences that would
cause the name field to display with a special background color. This
lets you choose one of three colors by defining the 'type' of annotation
in the file. The file format was enhanced and the code using it takes
the type of annotation into account.
This also adds a sample annotation file with annotations for a number of
situations we currently expect to encounter: missing images that we expect
to be supported by aliases to other images, flags that we expect to not
support, and new unicode 10 emoji that we might not yet have image data
for.
By default, the list of emoji sequences is based on the union of
the sequences encoded in the image file names for all the directories
(or the first directory if --limit was set). The --all_emoji option
uses the emoji sequences from nototools/unicode_data instead.
By default, the list of emoji sequences is in unicode codepoint order.
The --emoji_sort option uses the emoji sequence sort order from
nototools/unicode_data instead.
Along with this, the ordered list of sequences becomes an argument to
write_html_page, which it should have been all along.
It's a bit cleaner to canonicalize the keys when we read the file names.
This means we can just use the one canonical key, instead of using
the original to get the file and the canonical one to render text and
show the decoding.
This is a rewrite of add_glyphs in third_party/color_emoji. The
primary motivation was to move special aliasing rules out of that
code and use an external aliases file instead. This new version
is a bit more thorough about aliasing, and hopefully a little
easier to read.
The new add_glyphs takes its parameters using keywords, so
the invocation in the Makefile changed (as well as the path to
the tool).
emoji_aliases.txt was extended to add the flag aliases that were
formerly defined in the old add_glyphs code.
add_aliases was modified so the name of the alias file could be
passed in as a parameter to the main utility function that reads
the alias mapping from the file.
The new code expects all glyphs used by the template GSUB tables
to be named in the GlyphOrder table, but doesn't require the cmap
and hmtx table to be fleshed out. The new code fleshes these out
when it processes the sequences to add. As a result the cmap and
hmtx tables in the template were truncated.
The new code also sorts the GlyphOrder table when it extends/rebuilds
it.
Since subregion flag sequences consist of BN and ON they can be
impacted by bidi, and once again we have the problem that these are
processed in visual order so we need GSUB rules such that we can
handle them in either direction. All subregion flag sequences
contain U+E007F, so we use that as a trigger for adding the
reversed sequence.
We also need to handle emitting the missing flag glyph for the
reversed sequences.
And we also want to strip out tag glyphs when the context is reversed.
This means the chaining context should include 'E007F' as well.
Instead of writing code to build the additional lookups needed for
subregion missing flag handling, this adds a GSUB table to the
template and lets add_glyphs do its normal thing to the first
GSUB lookup.
Main changes are:
- Uses correct path to the font when a font is used. With standalone, also
copies the font to under the destination directory.
- Canonical sequences are used in text rendered by a font. Chrome handles
these better (though still not perfectly).
- The description column now is renamed 'Sequence' and shows the
(canonical) codepoint sequence for all emoji. It also continues to show
the component images for sequences.
- The name column now always shows the sequence name using the unicode
data. Single character emoji that are not default emoji presentation
now have the names prefixed by '(emoji)'. Names for the unknown flag
PUA char and for the combining enclosing keycaps char (not technically
an emoji but an emoji component) are special-cased, they are not in the
emoji sequence name data built by unicode_data.