World Library  
Flag as Inappropriate
Email this Article

Mapping of Unicode character planes

Article Id: WHEBN0027370523
Reproduction Date:

Title: Mapping of Unicode character planes  
Author: World Heritage Encyclopedia
Language: English
Subject: UTF-8, Comparison of Unicode encodings
Publisher: World Heritage Encyclopedia

Mapping of Unicode character planes

In the Unicode standard, planes are groups of numerical values (code points) that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00–10hexadecimal of the first two positions in six position format (hhhhhh). As of version 6.1, six of these planes have assigned code points (characters), and are named.

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode Consortium has been able to identify.[1] While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode Consortium has stated that the limit will never be changed.[2]

The odd-looking code points limit (it is not a power of 2) is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two 16-bit words is used to encode 220 code points in the planes 1 to 16, in addition to the use of single code unit to encode plane 0.[3] It is not due to UTF-8, which was designed with a limit of 231 code points (32768 planes), and can encode 221 code points (32 planes) even if limited to 4 bytes.[4]

Sometimes, the terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (planes 1–16) and their characters.[5]


Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The High Surrogates (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.

As of Unicode 6.2, the BMP comprises the following blocks:

  • Devanagari (0900–097F)
  • Bengali (0980–09FF)
  • Gurmukhi (0A00–0A7F)
  • Gujarati (0A80–0AFF)
  • Oriya (0B00–0B7F)
  • Tamil (0B80–0BFF)
  • Telugu (0C00–0C7F)
  • Kannada (0C80–0CFF)
  • Malayalam (0D00–0D7F)
  • Sinhala (0D80–0DFF)
  • Tagalog (1700–171F)
  • Hanunoo (1720–173F)
  • Buhid (1740–175F)
  • Tagbanwa (1760–177F)
  • CJK Radicals Supplement (2E80–2EFF)
  • Kangxi Radicals (2F00–2FDF)
  • Ideographic Description Characters (2FF0–2FFF)
  • CJK Symbols and Punctuation (3000–303F)
  • Hiragana (3040–309F)
  • High Surrogates (D800–DB7F)
  • High Private Use Surrogates (DB80–DBFF)
  • Low Surrogates (DC00–DFFF)
  • Private Use Area (E000–F8FF)
  • CJK Compatibility Ideographs (F900–FAFF)
  • Alphabetic Presentation Forms (FB00–FB4F)
  • Arabic Presentation Forms-A (FB50–FDFF)
  • Variation Selectors (FE00–FE0F)
  • Vertical Forms (FE10–FE1F)
  • Combining Half Marks (FE20–FE2F)
  • CJK Compatibility Forms (FE30–FE4F)
  • Small Form Variants (FE50–FE6F)
  • Arabic Presentation Forms-B (FE70–FEFF)
  • Halfwidth and Fullwidth Forms (FF00–FFEF)
  • Specials (FFF0–FFFF)

Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts such as Linear B, Egyptian hieroglyphs, and cuneiform scripts; historic and modern musical notation; mathematical alphanumerics; Emoji and other pictographic sets; reform orthographies like Shavian and Deseret; and game symbols for playing cards, Mah Jongg, and dominoes.

As of Unicode 6.2, the SMP comprises the following blocks:

  • Bamum Supplement (16800–16A3F)
  • Miao (16F00–16F9F)
  • Kana Supplement (1B000–1B0FF)
  • Byzantine Musical Symbols (1D000–1D0FF)
  • Musical Symbols (1D100–1D1FF)
  • Ancient Greek Musical Notation (1D200–1D24F)
  • Tai Xuan Jing Symbols (1D300–1D35F)
  • Counting Rod Numerals (1D360–1D37F)
  • Mathematical Alphanumeric Symbols (1D400–1D7FF)
  • Arabic Mathematical Alphabetic Symbols (1EE00–1EEFF)
  • Mahjong Tiles (1F000–1F02F)
  • Domino Tiles (1F030–1F09F)
  • Playing Cards (1F0A0–1F0FF)
  • Enclosed Alphanumeric Supplement (1F100–1F1FF)
  • Enclosed Ideographic Supplement (1F200–1F2FF)
  • Miscellaneous Symbols And Pictographs (1F300–1F5FF)
  • Emoticons (1F600–1F64F)
  • Transport And Map Symbols (1F680–1F6FF)
  • Alchemical Symbols (1F700–1F77F)

Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards.

As of Unicode 6.1, the SIP comprises the following blocks:

  • CJK Unified Ideographs Extension B (20000–2A6DF)
  • CJK Unified Ideographs Extension C (2A700–2B73F)
  • CJK Unified Ideographs Extension D (2B740–2B81F)
  • CJK Compatibility Ideographs Supplement (2F800–2FA1F); not Unified

Unassigned planes

Planes 3 to 13: No characters have yet been assigned to Planes 3 through 13. Plane 3 is tentatively named the Tertiary Ideographic Plane, but as of version 6.1 there are no characters assigned to it. It is reserved for Oracle Bone script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.[6]

It is not anticipated that all these planes will be used in the foreseeable future, given the total sizes of the known writing systems left to be encoded. The number of possible symbol characters that could arise outside of the context of writing systems is potentially huge. At the moment, these 11 planes out of 17 are unused.

Supplementary Special-purpose Plane

Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for deprecated language tag characters for use when language cannot be indicated through other protocols (such as the xml:lang attribute in XML). The other block contains glyph variation selectors to indicate an alternate glyph for a character that cannot be determined by context.

As of Unicode 6.1, the SSP comprises the following blocks:

  • Tags (E0000–E007F)
  • Variation Selectors Supplement (E0100–E01EF)

Private Use Area planes

The two planes 15 and 16, called Supplementary Private Use Area-A and -B (or simply Private Use Area (PUA)) are available for character assignment by parties outside the ISO and the Unicode Consortium. They are used by fonts internally to refer to auxiliary glyphs, for example, ligatures and building blocks for other glyphs. Such characters will have limited interoperability. Software and fonts that support Unicode will not necessarily support character assignments by other parties.


ko:유니코드 범위 목록
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.