It's a bit cleaner to canonicalize the keys when we read the file names.
This means we can just use the one canonical key, instead of using
the original to get the file and the canonical one to render text and
show the decoding.
This is a rewrite of add_glyphs in third_party/color_emoji. The
primary motivation was to move special aliasing rules out of that
code and use an external aliases file instead. This new version
is a bit more thorough about aliasing, and hopefully a little
easier to read.
The new add_glyphs takes its parameters using keywords, so
the invocation in the Makefile changed (as well as the path to
the tool).
emoji_aliases.txt was extended to add the flag aliases that were
formerly defined in the old add_glyphs code.
add_aliases was modified so the name of the alias file could be
passed in as a parameter to the main utility function that reads
the alias mapping from the file.
The new code expects all glyphs used by the template GSUB tables
to be named in the GlyphOrder table, but doesn't require the cmap
and hmtx table to be fleshed out. The new code fleshes these out
when it processes the sequences to add. As a result the cmap and
hmtx tables in the template were truncated.
The new code also sorts the GlyphOrder table when it extends/rebuilds
it.
Since subregion flag sequences consist of BN and ON they can be
impacted by bidi, and once again we have the problem that these are
processed in visual order so we need GSUB rules such that we can
handle them in either direction. All subregion flag sequences
contain U+E007F, so we use that as a trigger for adding the
reversed sequence.
We also need to handle emitting the missing flag glyph for the
reversed sequences.
And we also want to strip out tag glyphs when the context is reversed.
This means the chaining context should include 'E007F' as well.
Instead of writing code to build the additional lookups needed for
subregion missing flag handling, this adds a GSUB table to the
template and lets add_glyphs do its normal thing to the first
GSUB lookup.
Main changes are:
- Uses correct path to the font when a font is used. With standalone, also
copies the font to under the destination directory.
- Canonical sequences are used in text rendered by a font. Chrome handles
these better (though still not perfectly).
- The description column now is renamed 'Sequence' and shows the
(canonical) codepoint sequence for all emoji. It also continues to show
the component images for sequences.
- The name column now always shows the sequence name using the unicode
data. Single character emoji that are not default emoji presentation
now have the names prefixed by '(emoji)'. Names for the unknown flag
PUA char and for the combining enclosing keycaps char (not technically
an emoji but an emoji component) are special-cased, they are not in the
emoji sequence name data built by unicode_data.
- update Makefile to include approved GB subregion flags by default
- update flag_glyph_name to generate sequence names for these
- fix bug where the glyphorder table wasn't getting updated with
components, which was causing ttx to fail when compiling the
ttx to ttf in a later phase.
--skip_if_larger does nothing and returns an error if pngquant would
generate a larger file than the original. Formerly we would not copy
the file in this case so later operations expecting the file would fail.
No image triggered this, though, so the issue went unnoticed. We want
the smaller of the two files. It's unclear if later compression using
optipng would still do better with the larger quantized file vs the
original unquantized file, but we need to have a file.
Now optionally takes '-c' flag to set background colors for the last
emoji set. Colors are 6-digit rgb hex values (for css after '#').
When more than one color is specified, the last emoji set is repeated
across columns, one for each color. '-c' with no arguments defaults
to a set of 11 colors. Omitting the flag uses the standard background
color.
- includes aliases
- checks coverage of sequences (assumes full coverage of all unicode
emoji and sequences for now)
- reports sequence names
(Some of this code needs to be shuffled into other places, sequence name
lookup and emoji_vs stripping doesn't belong here since these operations
are more generally useful. That will come.)
Emoji image files we get from upstream sometimes use the 'canonical'
sequences from the unicode data, which can contain emoji variation
selectors in the sequence. For our image data we wish to ignore
variation selectors. This tool renames files in a directory to the
corresponding sequence without emoji variation selectors, so that
other tooling doesn't need to account for them.
This maps a sequence of codepoints to another sequence, where the
first sequence should be an alias to the second.
Initial data contains emoji sequences where we expect an image named
according to the second sequence, but want to support the first
sequence with the same image.
- supports checking files with other extension besides .png
- checks all files under a root directory and not just the
files directly in a directory
- checks for duplicate files in multiple directories under a root
- reports the directory containing a file when there are problems
The generated html references images in multiple directories that
might not be in any defined location relative to the html file.
For sharing the results it's convenient if we have the images
and html file under the same parent directory. This option computes
the necessary images and copies them to directories under the directory
into which the html file is written, and makes the html file reference
the files in these new locations.
In addition, this removes some clutter from the generated table by
using the nth-last-of-type pseudo-class selector instead of tagging
all the cells in a column with a class name.
Previously our copy of waveflag took just an input and output filename.
Upstream takes a prefix and one or more input filenames, and concatenates
the prefix to the input filename as the output.
The makefile is changed to pass a prefix and the input filename, instead
of the input filename and the output filename as it formerly did.
Unfortunately for us, our inputs have a directory prefix since they're
not in the current directory, and we don't want this prefix in the output
file path. So we tweak our copy of waveflag.c to call basename on the
input file path before we append it to the prefix.
We also make the tool a little less noisy by putting more printfs
under the debug flag.