A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  special characters  sybase-tech-blog


Category: Definition

Surrogate pair

A surrogate pair is the representation of a "16-bit Unicode value" in UTF-16, which originally was not designed to display charachters with a hexadecimal value larger than 0xFFFF lagen. Thus, a replacement (surrogate) presentation for unpresentable Unicode characters. The first surrogate pair ("first high surrogate pair" or "first upper surrogate pair") is a 16-bit-code-value in the range between U+D800 and U+DBFF. The second surrogate pair (second low surrogate pair) is a 16-bit-code-value in the range between U+DC00 to U+DFFF.

Surrogate pairs came into being in a time when the Unicode Consortium realised that 16 bits are not sufficient to represent all characters in the world. The already existing UCS-2 16-bit-code had a range not yet allocated between 0xD800 and 0xDFFF, which was split into two ranges without further ado. The first range between U+D800 and U+DBFF represents the "upper half" or "high half" of characters, whilst the range between U+DC00 and U+DFFF, represents the "lower half" of characters. Thus it was possible to display another 1024 * 1024 = 1048576 characters. This is more than necessary. These "half-characters" are called "surrogate pair".

Example for desurrogating surrogate pairs

1. First subtract 0x10000 from the value of the Unicode character. The value of a character can, for example, be retrieved from the database of the Unicode Home Page.

    0x2A6D6 - 0x10000 = 0x1A6D6
    

2. Next, the hexadecimal result of the subtraction is converted into a binary value.

    0x1A6D6 (hex) = 11010011011010110
    

3. The binary value should be 20 digits long. If this is not the case, the missing (leading) digits must be filled, using "0".

    11010011011010110 = 00011010011011010110
    

4. Split this 20-digit binary value into two halves.

    00011010011011010110 = 0001101001 1011010110
    

5. To obtain corresponding Unicode characters, so-called templates are filled with the two binary values.

    110110xxxxxxxxxx 110111xxxxxxxxxx
      0001101001       1011010110
=
1101100001101001 1101111011010110

6. This makes up the surrogate pair

    0xD869 0xDED6