Tomas [ARCHIVE] on Nostr: 📅 Original date posted:2017-04-08 📝 Original message:Thank you for your ...
📅 Original date posted:2017-04-08
📝 Original message:Thank you for your elaborate response Eric,
On Sun, Apr 9, 2017, at 00:37, Eric Voskuil wrote:
> My point was that "Using a storage engine without UTXO-index" has been
> done, and may be a useful reference, not that implementation details
> are the same.
I haven't dived into libbitcoin V2/V3 enough to fully grasp it and
though your comments help, I still not fully do. I will answer below
what is related to bitcrust itself.
My post wasn't posted to claim innovation; I merely try to explain how
Bitcrust works and why it performs well.
> First, I remain confused on your comments pertaining to UTXO growth
> and network protocol. I followed your conversation with Greg and it
> remains unclear to me. From what I understand you have isolated order
> (double spend) from script validation. I think we all understand that
> script validation requires inputs and outputs while double spend
> detection requires correlation of inputs. What I do not understand is
> your choice of optimization axis.
>
> Detection of double spend is not useful in isolation. One must also
> validate scripts, which requires outputs. I can see that there is an
> opportunity to reject blocks (within the same branch) faster by
> validating for double spends before validating script. But unconfirmed
> transactions do not exist in a branch, and are therefore not truly
> conflicting, until they are mined. And even after they are mined
> conflicting txs remain potentially valid in other branches. So
> rejecting txs due to conflict comes down to a denial of service
> policy, which ultimately must be based on fee increment (e.g. RBF).
> But fees are based on the amount of the output value that remains
> unspent in the transaction. So this in turn requires the retrieval of
> outputs.
>
> And yet the remaining scenario of fast rejection of invalid blocks is
> not a meaningful optimization. Optimizing for the case where a block
> has valid and sufficient PoW and yet is invalid (for double spend) is
> counterproductive. And even so, the txs within the invalid block may
> be entirely valid independent of the block, so you are back to looking
> up their outputs to obtain fees in the case of a double spend or to
> validate script otherwise. In all cases you need to get the outputs.
>
> > Bitcrust simply scans the tree. Although earlier designs used a
> > skip-list, it turns out that accompanied by a spent-index lagging a
> > few blocks behind, raw scanning is faster then anything even though
> > it needs to scan ~5 blocks times ~4000 inputs before reaching the
> > first spent-index, the actual scan is highly cache efficient and
> > little more then a "REP SCASQ", reaching sub-microsecond per input
> > on each core *including* the lookup in the spend index.
>
> I realize that you see the implementation of the ordering validation
> as interesting detail, but I find it hard to justify contemplating the
> implementation in isolation from the output lookup requirement. And if
> one must looking up both outputs and spends for each validation, it
> makes more sense to co-locate that data.
>
> Recovering in one step all data necessary to validate a tx has real
> advantages over either interleaving queries and validation or
> splitting input vs. output validation queries into two steps. It is a
> significantly more test-friendly approach, has better performance
> characteristics, and simplifies code. I cannot see any reason to
> perform the data read for double spend validation in isolation of that
> for script validation.
You seem to ignore here the difference between base load and peak load.
If Compact blocks/XThin with further optimizations can presync nearly
100% of the transactions, and nodes can do as much as possible when a
transaction comes in, the time spent when a block comes in can be
minimized and a lot more transactions can be handled with the same
resources.
The reason for "splitting" is that for an incoming transaction the
spent-state of the outputs being spent isn't particularly relevant as
you seem to acknowledge. When the block comes in, the actual output data
isn't relevant.
The *only* thing that needs to be checked when a block comes in is the
order, and the spend-tree approach absolves the need to access outputs
here.
As it also absolves the need for reorgs this greatly simplifies the
design. I am not sure why you say that a one-step approach is more
"test-friendly" as this seems to be unrelated.
>
> If by results you are referring to performance numbers, it's very hard
> to draw any conclusions without a full benchmark. It's great that if
> you are able to boost Core, but from my perspective the numbers aren't
> especially compelling.
>
I fully agree and hopefully do not pretend to hide that my numbers are
premature without a full implementation. I just think they are promising
enough to convince at least myself to move on with this model.
> Despite the site's explanation I cannot think of any reason to ever
> validate two blocks at the same time. You would always prioritize the
> block with the greatest PoW. Doing otherwise just slows down the net
> validation in all but the pathological case where a miner has produced
> an *invalid* block with *more* PoW than another valid block which
> arrived at the node within the same second. Rejecting a *valid* block
> with more PoW in favor of one with *less* "processing" is a hard fork,
> so you probably wouldn't want to do that either. But with compact
> block validation times approaching 25ms it's hard to justify stopping
> a block validation for any reason.
I don't get what you are saying. Why pick the greatest PoW of two
competing blocks? If two blocks come in, an implementation is free to
choose whichever block to build on. Choosing so is not a "hardfork".
Parallel validation simply makes it easier to make an optimal choice,
for if two blocks come in, the one that is validated fastest can be
build upon without the risk of validationless mining.
>
> That's not to say parallel block validation difficult to do. If you
> can validate one block's full set of inputs in parallel (which is not
> novel) doing the same with additional blocks has trivial additional
> complexity.
I am not trying to claim novelty here.
> I am also interested in your previous comments about soft forks. These
> are material considerations that Greg touched on but it doesn't sound
> like you fully appreciate just yet. When a tx is pre-validated the
> rules applied must be the same rules as those of some future block.
> Yet a tx can be included in more than one block (different branches).
> Across branches and even in one branch, validation rules change, and
> can change back. The changes are based on accumulated branch history.
> Pre-validation can later become invalidated, and differently in
> different branches. And maintaining proper context requires either
> storing state that you are apparently not storing, or invalidating
> optimizations. Based on your comments you do not seem to be accounting
> for this in your storage assumptions or in your results. A recent post
> by Greg highlights the complexity and consensus criticality of these
> considerations.
Frankly, I think this is a bit of an exaggeration. Soft forks are
counted on a hand, and I don't think there are many - if any -
transactions in the current chain that have changed compliance based on
height. This makes this a compliance issue and not a performance issue
and the solution I have explained, to add height-based compliance as
meta data of validation seems to
be adequate and safe.
> The hash table store that I described can fully navigate the block
> tree and transaction DAG, since the stored tx, parent and point hashes
> are also natural keys and each link is navigable in constant time. It
> is also lock-free, can concurrently write any number of blocks during
> initial block download and supports read/write concurrency. It has
> successfully indexed and stored the entire blockchain from the P2P
> network in 16 minutes (locally). It also stores both confirmed and
> unconfirmed transactions in the same store, so there is nothing to
> write when a block is confirmed except for the block header/hashes and
> updates to spender heights for any output spent by the new block's
> txs. It is similarly capable of storage in the block table of weak
> chain blocks...
>
I think I get the gist of your approach and it sounds very interesting
and I will definitely dive in deeper.
It also seems sufficiently different from Bitcrust to merit competing on
(eventual) results instead of the complicated theory alone.
Best,
Tomas
Published at
2023-06-07 17:59:34Event JSON
{
"id": "d74f0b2bf95aae22fe2554b1b29aa2439e0144bcb12d3102ff84bcfa5955fe07",
"pubkey": "1c03575343555d1132a621c49466190d680da4a306ba8b992e8b87e267609cdd",
"created_at": 1686160774,
"kind": 1,
"tags": [
[
"e",
"d4a682be1f6603f0ff8798c52b7225cac6554e21f3aedb0c80e7d41cf71983ad",
"",
"root"
],
[
"e",
"cd3b13bf4c5f318f0691be0287ec57a5e82f790998a05632334a65971870fb33",
"",
"reply"
],
[
"p",
"82205f272f995d9be742779a3c19a2ae08522ca14824c3a3b01525fb5459161e"
]
],
"content": "📅 Original date posted:2017-04-08\n📝 Original message:Thank you for your elaborate response Eric,\n\nOn Sun, Apr 9, 2017, at 00:37, Eric Voskuil wrote:\n\u003e My point was that \"Using a storage engine without UTXO-index\" has been\n\u003e done, and may be a useful reference, not that implementation details\n\u003e are the same.\n\nI haven't dived into libbitcoin V2/V3 enough to fully grasp it and\nthough your comments help, I still not fully do. I will answer below\nwhat is related to bitcrust itself.\n\nMy post wasn't posted to claim innovation; I merely try to explain how\nBitcrust works and why it performs well. \n\n\n\u003e First, I remain confused on your comments pertaining to UTXO growth\n\u003e and network protocol. I followed your conversation with Greg and it\n\u003e remains unclear to me. From what I understand you have isolated order\n\u003e (double spend) from script validation. I think we all understand that\n\u003e script validation requires inputs and outputs while double spend\n\u003e detection requires correlation of inputs. What I do not understand is\n\u003e your choice of optimization axis.\n\u003e \n\u003e Detection of double spend is not useful in isolation. One must also\n\u003e validate scripts, which requires outputs. I can see that there is an\n\u003e opportunity to reject blocks (within the same branch) faster by\n\u003e validating for double spends before validating script. But unconfirmed\n\u003e transactions do not exist in a branch, and are therefore not truly\n\u003e conflicting, until they are mined. And even after they are mined\n\u003e conflicting txs remain potentially valid in other branches. So\n\u003e rejecting txs due to conflict comes down to a denial of service\n\u003e policy, which ultimately must be based on fee increment (e.g. RBF).\n\u003e But fees are based on the amount of the output value that remains\n\u003e unspent in the transaction. So this in turn requires the retrieval of\n\u003e outputs.\n\u003e \n\u003e And yet the remaining scenario of fast rejection of invalid blocks is\n\u003e not a meaningful optimization. Optimizing for the case where a block\n\u003e has valid and sufficient PoW and yet is invalid (for double spend) is\n\u003e counterproductive. And even so, the txs within the invalid block may\n\u003e be entirely valid independent of the block, so you are back to looking\n\u003e up their outputs to obtain fees in the case of a double spend or to\n\u003e validate script otherwise. In all cases you need to get the outputs.\n\u003e \n\u003e \u003e Bitcrust simply scans the tree. Although earlier designs used a \n\u003e \u003e skip-list, it turns out that accompanied by a spent-index lagging a\n\u003e \u003e few blocks behind, raw scanning is faster then anything even though\n\u003e \u003e it needs to scan ~5 blocks times ~4000 inputs before reaching the\n\u003e \u003e first spent-index, the actual scan is highly cache efficient and\n\u003e \u003e little more then a \"REP SCASQ\", reaching sub-microsecond per input\n\u003e \u003e on each core *including* the lookup in the spend index.\n\u003e \n\u003e I realize that you see the implementation of the ordering validation\n\u003e as interesting detail, but I find it hard to justify contemplating the\n\u003e implementation in isolation from the output lookup requirement. And if\n\u003e one must looking up both outputs and spends for each validation, it\n\u003e makes more sense to co-locate that data.\n\u003e \n\u003e Recovering in one step all data necessary to validate a tx has real\n\u003e advantages over either interleaving queries and validation or\n\u003e splitting input vs. output validation queries into two steps. It is a\n\u003e significantly more test-friendly approach, has better performance\n\u003e characteristics, and simplifies code. I cannot see any reason to\n\u003e perform the data read for double spend validation in isolation of that\n\u003e for script validation.\n\n\nYou seem to ignore here the difference between base load and peak load.\nIf Compact blocks/XThin with further optimizations can presync nearly\n100% of the transactions, and nodes can do as much as possible when a\ntransaction comes in, the time spent when a block comes in can be\nminimized and a lot more transactions can be handled with the same\nresources.\n\nThe reason for \"splitting\" is that for an incoming transaction the\nspent-state of the outputs being spent isn't particularly relevant as\nyou seem to acknowledge. When the block comes in, the actual output data\nisn't relevant.\n\nThe *only* thing that needs to be checked when a block comes in is the\norder, and the spend-tree approach absolves the need to access outputs\nhere.\n\nAs it also absolves the need for reorgs this greatly simplifies the\ndesign. I am not sure why you say that a one-step approach is more\n\"test-friendly\" as this seems to be unrelated.\n\n\u003e \n\u003e If by results you are referring to performance numbers, it's very hard\n\u003e to draw any conclusions without a full benchmark. It's great that if\n\u003e you are able to boost Core, but from my perspective the numbers aren't\n\u003e especially compelling.\n\u003e\n\nI fully agree and hopefully do not pretend to hide that my numbers are\npremature without a full implementation. I just think they are promising\nenough to convince at least myself to move on with this model.\n \n\u003e Despite the site's explanation I cannot think of any reason to ever\n\u003e validate two blocks at the same time. You would always prioritize the\n\u003e block with the greatest PoW. Doing otherwise just slows down the net\n\u003e validation in all but the pathological case where a miner has produced\n\u003e an *invalid* block with *more* PoW than another valid block which\n\u003e arrived at the node within the same second. Rejecting a *valid* block\n\u003e with more PoW in favor of one with *less* \"processing\" is a hard fork,\n\u003e so you probably wouldn't want to do that either. But with compact\n\u003e block validation times approaching 25ms it's hard to justify stopping\n\u003e a block validation for any reason.\n\nI don't get what you are saying. Why pick the greatest PoW of two\ncompeting blocks? If two blocks come in, an implementation is free to\nchoose whichever block to build on. Choosing so is not a \"hardfork\".\nParallel validation simply makes it easier to make an optimal choice,\nfor if two blocks come in, the one that is validated fastest can be\nbuild upon without the risk of validationless mining.\n\n\u003e \n\u003e That's not to say parallel block validation difficult to do. If you\n\u003e can validate one block's full set of inputs in parallel (which is not\n\u003e novel) doing the same with additional blocks has trivial additional\n\u003e complexity.\n\nI am not trying to claim novelty here.\n\n\u003e I am also interested in your previous comments about soft forks. These\n\u003e are material considerations that Greg touched on but it doesn't sound\n\u003e like you fully appreciate just yet. When a tx is pre-validated the\n\u003e rules applied must be the same rules as those of some future block.\n\u003e Yet a tx can be included in more than one block (different branches).\n\u003e Across branches and even in one branch, validation rules change, and\n\u003e can change back. The changes are based on accumulated branch history.\n\u003e Pre-validation can later become invalidated, and differently in\n\u003e different branches. And maintaining proper context requires either\n\u003e storing state that you are apparently not storing, or invalidating\n\u003e optimizations. Based on your comments you do not seem to be accounting\n\u003e for this in your storage assumptions or in your results. A recent post\n\u003e by Greg highlights the complexity and consensus criticality of these\n\u003e considerations.\n\nFrankly, I think this is a bit of an exaggeration. Soft forks are\ncounted on a hand, and I don't think there are many - if any -\ntransactions in the current chain that have changed compliance based on\nheight. This makes this a compliance issue and not a performance issue\nand the solution I have explained, to add height-based compliance as\nmeta data of validation seems to \nbe adequate and safe.\n\n\n\u003e The hash table store that I described can fully navigate the block\n\u003e tree and transaction DAG, since the stored tx, parent and point hashes\n\u003e are also natural keys and each link is navigable in constant time. It\n\u003e is also lock-free, can concurrently write any number of blocks during\n\u003e initial block download and supports read/write concurrency. It has\n\u003e successfully indexed and stored the entire blockchain from the P2P\n\u003e network in 16 minutes (locally). It also stores both confirmed and\n\u003e unconfirmed transactions in the same store, so there is nothing to\n\u003e write when a block is confirmed except for the block header/hashes and\n\u003e updates to spender heights for any output spent by the new block's\n\u003e txs. It is similarly capable of storage in the block table of weak\n\u003e chain blocks...\n\u003e \n\nI think I get the gist of your approach and it sounds very interesting\nand I will definitely dive in deeper.\n\nIt also seems sufficiently different from Bitcrust to merit competing on\n(eventual) results instead of the complicated theory alone.\n\nBest,\nTomas",
"sig": "272f45e50d34c925514b701058f62900680fc9c3baf46bb2d9470908e5ccd95302724d3fe8317a1d6176803ac5bb3f3dd2e618dc07228f51f6a49ba6b32680fd"
}