📅 Original date posted:2016-05-17
📝 Original message:Implemented a few of your suggestions.
Also opened a formal pull request for the BIP at
https://github.com/bitcoin/bips/pull/389 and the code at
https://github.com/bitcoin/bitcoin/pull/8068.
On 05/09/16 17:06, Pieter Wuille via bitcoin-dev wrote:
> On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
>> Hi all,
>>
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
>
> Hi Matt,
>
> thank you for working on this!
>
>> ===New data structures===
>> Several new data structures are added to the P2P network to relay
>> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
>> BlockTransactionsRequest, and BlockTransactions. Additionally, we
>> introduce a new variable-length integer encoding for use in these data
>> structures.
>>
>> For the purposes of this section, CompactSize refers to the
>> variable-length integer encoding used across the existing P2P protocol
>> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.
>
> This is a not, but I think it's a bit strange to have two separate
> variable length integers in the same specification. I understand is one
> is already the default for variable-length integers currently, and there
> are reasons to use the other one for efficiency reasons in some places,
> but perhaps we should aim to get everything using the latter?
Fixed, the whole thing now uses New Varints.
>> ====New VarInt====
>> Variable-length integers: bytes are a MSB base-128 encoding of the number.
>> The high bit in each byte signifies whether another digit follows. To make
>> sure the encoding is one-to-one, one is subtracted from all but the last
>> digit.
>
> Maybe it's worth mentioning that it is based on ASN.1 BER's compressed
> integer format (see
> https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
> section 8.1.3.5), though with a small modification to make every integer
> have a single unique encoding.
>
>> ====HeaderAndShortIDs====
>> A HeaderAndShortIDs structure is used to relay a block header, the short
>> transactions IDs used for matching already-available transactions, and a
>> select few transactions which we expect a peer may be missing.
>>
>> |shortids||List of uint64_ts||8*shortids_length bytes||Little
>> Endian||The short transaction IDs calculated from the transactions which
>> were not provided explicitly in prefilledtxn
>
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
>
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
>
> log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
>
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
>
> Note that 1 in 10000 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
>
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).
I switched to 6-byte short txids.
>> ====Short transaction IDs====
>> Short transaction IDs are used to represent a transaction without
>> sending a full 256-bit hash. They are calculated by:
>> # single-SHA256 hashing the block header with the nonce appended (in
>> little-endian)
>> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
>> each corresponding 8-byte chunk of the hash from the previous step
>> # Adding each of the XORed 8-byte chunks together (in little-endian)
>> iteratively to find the short transaction ID
>
> An alternative would be using SipHash-1-3 (a form of SipHash with
> reduced iteration counts; the default is SipHash-2-4). SipHash was
> designed as a Message Authentication Code, where the security
> requirements are much stronger than in our case (in particular, we don't
> care about observers being able to finding the key, as the key is just
> public knowledge here). One of the designers of SipHash has commented
> that SipHash-1-3 for collision resistance in hash tables may be enough:
> https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946
>
> Using SipHash-1-3 on modern hardware would take ~32 CPU cycles per txid.
Switched to SipHash2-4.
>> ===Implementation Notes===
>
> There are a few more heuristics that MAY be used to improve performance:
>
> * Receivers should treat short txids in blocks that match multiple
> mempool transactions as non-matches, and request the transactions. This
> significantly reduces the failure to reconstruct.
Done.
> * When constructing a compact block to send, the sender can verify it
> against its own mempool to check for collisions, and if so, choose to
> either try another nonce, or increase the short txid length.
Additionally we should compare to the orphan pool (which apparently
helps a lot).