|
Category: Definition
Surrogate pair
A surrogate pair is the representation of a "16-bit Unicode value"
in UTF-16, which originally was not designed to display charachters with a hexadecimal value
larger than 0xFFFF lagen. Thus, a replacement (surrogate) presentation for unpresentable Unicode characters.
The first surrogate pair ("first high surrogate pair" or
"first upper surrogate pair") is a 16-bit-code-value in the range between
U+D800 and U+DBFF. The second surrogate pair (second low surrogate pair) is a
16-bit-code-value in the range between U+DC00 to U+DFFF.
Surrogate pairs came into being in a time when the Unicode Consortium realised that 16
bits
are not sufficient to represent all characters in the world. The already existing
UCS-2 16-bit-code had a range not yet allocated between 0xD800 and 0xDFFF, which was split
into two ranges without further ado. The first range between U+D800 and U+DBFF represents
the "upper half" or "high half" of characters, whilst the range between
U+DC00 and U+DFFF, represents the "lower half" of characters.
Thus it was possible to display another 1024 * 1024 = 1048576 characters. This is more than necessary.
These "half-characters" are called "surrogate pair".
Example for desurrogating surrogate pairs
1. First subtract 0x10000 from the value of the Unicode character.
The value of a character can, for example, be retrieved from the database of the
Unicode Home Page.
0x2A6D6 - 0x10000 = 0x1A6D6
2. Next, the hexadecimal result of the subtraction is converted into a binary value.
0x1A6D6 (hex) = 11010011011010110
3. The binary value should be 20 digits long. If this is not the case, the missing (leading) digits must be filled, using "0".
11010011011010110 = 00011010011011010110
4. Split this 20-digit binary value into two halves.
00011010011011010110 = 0001101001 1011010110
5. To obtain corresponding Unicode characters, so-called templates are filled with the two binary values.
110110xxxxxxxxxx 110111xxxxxxxxxx
0001101001 1011010110
=
1101100001101001 1101111011010110
6. This makes up the surrogate pair
|