Fourmilab UNUM 3.6-15.1.0—Unicode Character Database Updated to 15.1.0

image

Version 3.6-15.1.0 of UNUM is now available for downloading. Version 3.6 incorporates the Unicode 15.1.0 standard, released on 2023-09-12. UNUM releases are identified with both its own version number and the Unicode version it incorporates, hence “3.6-15.1.0”. This is a relatively minor update to last year’s 15.0.0 standard, with a total of 149,813 characters in 15.1.0, of which 627 are new since 15.0.0. (UNUM also supports an additional 65 ASCII control characters, which are not assigned graphic code points in the Unicode database.)

There is just one new character block

    0x2EBF0, 0x2EE5F => 'CJK Unified Ideographs Extension I'

The new characters are almost entirely new CJK (Chinese-Japanese-Korean) ideographs added to accommodate planned additions to the Chinese national coded character set standard (GB 18030). “The remaining additions to the repertoire extend the set of ideographic description characters, to better enable description of unusual CJK ideographs.”·

UNUM also contains a database of HTML named character references (the sequences like “<” you use in HTML source code when you need to represent a character which has a syntactic meaning in HTML or which can’t be directly included in a file with the character encoding you’re using to write it). There have been no changes to this standard since UNUM 2.2 was released in September 2017, so UNUM 3.6 will behave identically when querying these references except, of course, that numerical references to the new Unicode characters will be interpreted correctly.

You can download UNUM from the UNUM Documentation and Download Page. Source code is maintained on and may be downloaded from UNUM’s GitHub repository, https://github.com/Fourmilab/unum.

UNUM is written entirely in standard Perl and will run on any system on which Perl is installed. No Perl add-on modules are required.

2 Likes