Christian Decker [ARCHIVE] on Nostr: đź“… Original date posted:2019-01-08 đź“ť Original message: Rusty Russell <rusty at ...
đź“… Original date posted:2019-01-08
đź“ť Original message:
Rusty Russell <rusty at rustcorp.com.au> writes:
>> But only 18 000 pairs of channel updates carry actual fee and/or HTLC
>> value change. 85% of the time, we just queried information that we
>> already had!
>
> Note that this can happen in two legitimate cases:
> 1. The weekly refresh of channel_update.
> 2. A node updated too fast (A->B->A) and the ->A update caught up with the
> ->B update.
>
> Fortunately, this seems fairly easy to handle: discard the newer
> duplicate (unless > 1 week old). For future more advanced
> reconstruction schemes (eg. INV or minisketch), we could remember the
> latest timestamp of the duplicate, so we can avoid requesting it again.
Unfortunately this assumes that you have a single update partner, and
still results in flaps, and might even result in a stuck state for some
channels.
Assume that we have a network in which a node D receives the updates
from a node A through two or more separate paths:
A --- B --- D
\--- C ---/
And let's assume that some channel of A (c_A) is flapping (not the ones
to B and C). A will send out two updates, one disables and the other one
re-enables c_A, otherwise they are identical (timestamp and signature
are different as well of course). The flush interval in B is sufficient
to see both updates before flushing, hence both updates get dropped and
nothing apparently changed (D doesn't get told about anything from
B). The flush interval of C triggers after getting the re-enable, and D
gets the disabling update, followed by the enabling update once C's
flush interval triggers again. Worse if the connection A-C gets severed
between the updates, now C and D learned that the channel is disabled
and will not get the re-enabling update since B has dropped that one
altogether. If B now gets told by D about the disable, it'll also go
"ok, I'll disable it as well", leaving the entire network believing that
the channel is disabled.
This is really hard to debug, since A has sent a re-enabling
channel_update, but everybody is stuck in the old state.
At least locally updating timestamp and signature for identical updates
and then not broadcasting if they were the only changes would at least
prevent the last issue of overriding a dropped state with an earlier
one, but it'd still leave C and D in an inconsistent state until we have
some sort of passive sync that compares routing tables and fixes these
issues.
>> Adding a basic checksum (4 bytes for example) that covers fees and
>> HTLC min/max value to our channel range queries would be a significant
>> improvement and I will add this the open BOLT 1.1 proposal to extend
>> queries with timestamps.
>>
>> I also think that such a checksum could also be used
>> - in “inventory” based gossip messages
>> - in set reconciliation schemes: we could reconcile [channel id |
>> timestamp | checksum] first
>
> I think this is overkill?
I think all the bolted on things are pretty much overkill at this point,
it is unlikely that we will get any consistency in our views of the
routing table, but that's actually not needed to route, and we should
consider this a best effort gossip protocol anyway. If the routing
protocol is too chatty, we should make efforts towards local policies at
the senders of the update to reduce the number of flapping updates, not
build in-network deduplications. Maybe something like "eager-disable"
and "lazy-enable" is what we should go for, in which disables are sent
right away, and enables are put on an exponential backoff timeout (after
all what use are flappy nodes for routing?).
Cheers,
Christian
Published at
2023-06-09 12:53:51Event JSON
{
"id": "f96e9e7b49ca4b75ae31526d0fc0495506694bd1266e67871a649b7f0b6522f0",
"pubkey": "72cd40332ec782dd0a7f63acb03e3b6fdafa6d91bd1b6125cd8b7117a1bb8057",
"created_at": 1686315231,
"kind": 1,
"tags": [
[
"e",
"fd0da5dbd5383b525edc98216d5094b180c1b831bf7af1f8df8ca35294a8c8fd",
"",
"root"
],
[
"e",
"0be0c1e4270ac05a9c8cceac122e76a1723dc1b050b165449fb7e20f2fefc296",
"",
"reply"
],
[
"p",
"13bd8c1c5e3b3508a07c92598647160b11ab0deef4c452098e223e443c1ca425"
]
],
"content": "📅 Original date posted:2019-01-08\n📝 Original message:\nRusty Russell \u003crusty at rustcorp.com.au\u003e writes:\n\u003e\u003e But only 18 000 pairs of channel updates carry actual fee and/or HTLC\n\u003e\u003e value change. 85% of the time, we just queried information that we\n\u003e\u003e already had!\n\u003e\n\u003e Note that this can happen in two legitimate cases:\n\u003e 1. The weekly refresh of channel_update.\n\u003e 2. A node updated too fast (A-\u003eB-\u003eA) and the -\u003eA update caught up with the\n\u003e -\u003eB update.\n\u003e \n\u003e Fortunately, this seems fairly easy to handle: discard the newer\n\u003e duplicate (unless \u003e 1 week old). For future more advanced\n\u003e reconstruction schemes (eg. INV or minisketch), we could remember the\n\u003e latest timestamp of the duplicate, so we can avoid requesting it again.\n\nUnfortunately this assumes that you have a single update partner, and\nstill results in flaps, and might even result in a stuck state for some\nchannels.\n\nAssume that we have a network in which a node D receives the updates\nfrom a node A through two or more separate paths:\n\nA --- B --- D\n \\--- C ---/\n\nAnd let's assume that some channel of A (c_A) is flapping (not the ones\nto B and C). A will send out two updates, one disables and the other one\nre-enables c_A, otherwise they are identical (timestamp and signature\nare different as well of course). The flush interval in B is sufficient\nto see both updates before flushing, hence both updates get dropped and\nnothing apparently changed (D doesn't get told about anything from\nB). The flush interval of C triggers after getting the re-enable, and D\ngets the disabling update, followed by the enabling update once C's\nflush interval triggers again. Worse if the connection A-C gets severed\nbetween the updates, now C and D learned that the channel is disabled\nand will not get the re-enabling update since B has dropped that one\naltogether. If B now gets told by D about the disable, it'll also go\n\"ok, I'll disable it as well\", leaving the entire network believing that\nthe channel is disabled.\n\nThis is really hard to debug, since A has sent a re-enabling\nchannel_update, but everybody is stuck in the old state.\n\nAt least locally updating timestamp and signature for identical updates\nand then not broadcasting if they were the only changes would at least\nprevent the last issue of overriding a dropped state with an earlier\none, but it'd still leave C and D in an inconsistent state until we have\nsome sort of passive sync that compares routing tables and fixes these\nissues.\n\n\u003e\u003e Adding a basic checksum (4 bytes for example) that covers fees and\n\u003e\u003e HTLC min/max value to our channel range queries would be a significant\n\u003e\u003e improvement and I will add this the open BOLT 1.1 proposal to extend\n\u003e\u003e queries with timestamps.\n\u003e\u003e\n\u003e\u003e I also think that such a checksum could also be used\n\u003e\u003e - in “inventory” based gossip messages\n\u003e\u003e - in set reconciliation schemes: we could reconcile [channel id |\n\u003e\u003e timestamp | checksum] first\n\u003e\n\u003e I think this is overkill?\n\nI think all the bolted on things are pretty much overkill at this point,\nit is unlikely that we will get any consistency in our views of the\nrouting table, but that's actually not needed to route, and we should\nconsider this a best effort gossip protocol anyway. If the routing\nprotocol is too chatty, we should make efforts towards local policies at\nthe senders of the update to reduce the number of flapping updates, not\nbuild in-network deduplications. Maybe something like \"eager-disable\"\nand \"lazy-enable\" is what we should go for, in which disables are sent\nright away, and enables are put on an exponential backoff timeout (after\nall what use are flappy nodes for routing?).\n\nCheers,\nChristian",
"sig": "c7a14ee315699fbe25366f0878881c456b6d34ba1e6a83f1a0c0e7cf731242eab281e5449eaa39cf0d0759bba09061484a36a8361b8585fc7b88cd8d77e4635b"
}