Serialization Data Format
7-bit Encoded Unsigned Integer
7-bit encoding is a way to reduce the number of bytes normally required to encode an unsigned integer.
The encoding assumes an octet (an 8-bit byte) where the most significant bit (MSB), also commonly known as the sign bit, is reserved to indicate whether another VLQ octet follows.
VLQ octet:
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|
27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
A | B6 | B5 | B4 | B3 | B2 | B1 | B0 |
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows until the 8th byte. If more than 8 bytes are required to encode an unsigned 64-bit integer, then the 9th byte is the 8 least significant bits of the value.
B0 through B6 is a 7-bit number [0x00, 0x7F], where B0 denotes the least significant bit (LSB), and B6 denoted the most significant bit (MSB). The VLQ octets are arranged most significant first in a stream.
7-bit Encoded signed Integer
Signed integers are encoded using the least significant bit for the sign.
This can be done using the following transformation: (n << 1) | (n >> (k - 1))
for fixed k-bit integers,
and then the result is encoded as an unsigned integer.
Most C++ compilers will recognize this as a left rotate operation that can be performed using a single CPU instruction.