UNUM 3.5: Updated to Unicode 15.0.0

Version 3.5 of UNUM is now available for downloading. Version 3.5 incorporates the Unicode 15.0.0 standard, released on September 13th, 2022. UNUM releases are identified with both its own version number and the Unicode version it incorporates, hence “3.5-15.0.0”. The update to Unicode adds support for two new language scripts, additional characters for several scripts including 4193 CJK (Chinese, Japanese, and Korean) ideographs, and 20 new emoji. There are a total of 149,186 characters in 15.0.0, of which 4489 are new since 14.0.0. (UNUM also supports an additional 65 ASCII control characters, which are not assigned graphic code points in the Unicode database.)

New or changed character blocks are:

    0x10EC0, 0x10EFF => 'Arabic Extended-C'
    0x11B00, 0x11B5F => 'Devanagari Extended-A'
    0x11F00, 0x11F5F => 'Kawi'
    0x13430, 0x1345F => 'Egyptian Hieroglyph Format Controls'
    0x1D2C0, 0x1D2DF => 'Kaktovik Numerals'
    0x1E030, 0x1E08F => 'Cyrillic Extended-D'
    0x1E4D0, 0x1E4FF => 'Nag Mundari'
    0x31350, 0x323AF => 'CJK Unified Ideographs Extension H'

The twenty new emoji added this time are a relatively bland and uncontroversial set including hearts in various colours, pushing hands, a moose, black bird, and jellyfish. The Lords of Unicode apparently wanted to avoid the scornado that descended upon them with the introduction of the pregnant man emoji in 14.0.0.

UNUM also contains a database of HTML named character references (the sequences like “<” you use in HTML source code when you need to represent a character which has a syntactic meaning in HTML or which can’t be directly included in a file with the character encoding you’re using to write it). There have been no changes to this standard since UNUM 2.2 was released in September 2017, so UNUM 3.5 will behave identically when querying these references except, of course, that numerical references to the new Unicode characters will be interpreted correctly.

You can download UNUM from the UNUM Documentation and Download Page. Source code is maintained on and may be downloaded from UNUM’s GitHub repository, https://github.com/Fourmilab/unum.

UNUM is written entirely in standard Perl and will run on any system on which Perl is installed. No Perl add-on modules are required.

1 Like