2 回答

TA貢獻1803條經驗 獲得超3個贊
編碼/二進制/ varint.go
package binary
// This file implements "varint" encoding of 64-bit integers.
// The encoding is:
// - unsigned integers are serialized 7 bits at a time, starting with the
// least significant bits
// - the most significant bit (msb) in each output byte indicates if there
// is a continuation byte (msb = 1)
// - signed integers are mapped to unsigned integers using "zig-zag"
// encoding: Positive values x are written as 2*x + 0, negative values
// are written as 2*(^x) + 1; that is, negative numbers are complemented
// and whether to complement is encoded in bit 0.
//
// Design note:
// At most 10 bytes are needed for 64-bit values. The encoding could
// be more dense: a full 64-bit value needs an extra byte just to hold bit 63.
// Instead, the msb of the previous byte could be used to hold bit 63 since we
// know there can't be more than 64 bits. This is a trivial improvement and
// would reduce the maximum encoding length to 9 bytes. However, it breaks the
// invariant that the msb is always the "continuation bit" and thus makes the
// format incompatible with a varint encoding for larger numbers (say 128-bit).
一種用于無損數據壓縮的經典技術是霍夫曼編碼,其中較常見的符號通常比不太常見的符號使用更少的位來表示。在實踐中,較小的數字最常出現。因此,如果我們可以有效地表示小數,即使表示大數的效率較低,我們也將獲得無損數據壓縮。
Uvarints 是一種使用一個或多個字節序列化無符號整數的方法。較小的數字占用較少的字節數。uvarint 中的每個字節,除了最后一個字節,都設置了最高有效位 (msb)——這表明還有更多的字節要來。每個字節的低 7 位用于以 7 位為一組存儲數字,最低有效組在前。我們可以對任意位數的無符號整數執行此操作。
例如,
uint bits : uvarint bytes
7 7f : 1 7f
14 3fff : 2 ff7f
21 1fffff : 3 ffff7f
28 fffffff : 4 ffffff7f
35 7ffffffff : 5 ffffffff7f
42 3ffffffffff : 6 ffffffffff7f
49 1ffffffffffff : 7 ffffffffffff7f
56 ffffffffffffff : 8 ffffffffffffff7f
63 7fffffffffffffff : 9 ffffffffffffffff7f
64 ffffffffffffffff : 10 ffffffffffffffffff01
依此類推,對于我們想要的盡可能多的 uint 位。
如果我們有很多數字表示為 1 到 49 位的無符號 64 位整數序列化為 1 到 7 個字節的 uvarints,我們不會關心是否有一些數字表示為 57 到 64 位序列化為 9 到 10 個字節的 uvarint 的無符號 64 位整數。
- 2 回答
- 0 關注
- 287 瀏覽
添加回答
舉報