📅 Original date posted:2022-06-16 📝 Original message: # DataStruct -- Data ...

📅 Original date posted:2022-06-16
📝 Original message:
# DataStruct -- Data fragmentation over Lightning

## Introduction

Greetings once again,

This mail proposes a spec for data fragmentation over custom records,
allowing for transmission of data exceeding the maximum allowed size
over a single HTLC.

As in the case of DataSig, we seek feedback as we want to improve
and tweak this spec before submitting a BLIP version of it.

## DataStruct

The purpose of this spec is to define a structure that describes
fragmented data, allowing for transmission over separate HTLCs
and assisting reassembly on the receiving end.
The proposed fragmentation structure also allows out-of-order
reception of fragments.

Since these fragments are assumed to be transmitted over Lightning
HTLCs, we want to use a compact encoding mechanism, thus we describe
their structure with protobuf:

```protobuf
message DataStruct {
uint32 version = 1;
bytes payload = 2;
optional FragmentInfo fragment = 3;
}

message FragmentInfo {
uint64 fragset_id = 1;
uint32 total_size = 2;
uint32 offset = 3;
}
```
* `version`: The version of DataStruct spec used.
* `payload`: The data carried by this fragment.
* `fragment`: Fragmentation information, in case of fragmented data.

The `FragmentInfo` fields describe:
* `fragset_id`: Identifier indicating a fragment set, common to all
fragments of the same data.
* `total_size`: The total data size this fragment is part of.
* `offset`: Starting byte offset of this fragment's `payload`
in the total data.

If the total data can be transmitted over a single HTLC, then the
`fragment` field should be omitted.

If the `fragment` field is set on a received DataStruct instance the
receiving node should wait for the full fragment set to be received
before reconstruction. For each received fragment of a fragment set
(as indicated by `fragset_id`), the receiving node should assemble
the data by inserting each `payload` at the offset indicated by the
`fragment`'s `offset` field. Once the whole data range has been
received, a node can safely assume the data has been received in
full.

### Sending

In this section we will walk through the procedure of utilizing
DataStruct in order to transmit some data `D` that have a size of
42KB.

It is also important to note that we don't describe an algorithm that
efficiently and dynamically splits the byte array `D` into an
optimal set of fragments. A fragment's transmission may fail for
various reasons (including uncertain channel liquidity, stale routing
data or route lengths that prohibit meaningful data injection).
It is the responsibility of the sender to fragment the data and
transmit the fragments towards the destination. The receiver simply
receives fragments that will (ideally) completely cover `D`, allowing
its reconstruction.

In this example, we will assume that the sender will settle for
splitting the data `D` into 84 fragments of 512B size each.
This is not optimal as it will probably result in raised transmission
costs, depending on route length.

A sender intending to transmit the data `D` to another node should:

1. Split the bytes of `D` into 84 fragments of 512B each.
2. Generate an identifier for this data transmission, `Di`.
3. For each fragment `f`, a `DataStruct` instance should be created:
1. Populate `version` with the spec version followed,
2. Populate payload with `f`,
3. Populate `fragment` as follows:
1. Populate `fragset_id` with `Di`,
2. Populate `total_size` with len(`D`),
3. Populate `offset` with the fragment's starting byte index.
4. Encode the created DataStruct instance, resulting in a byte
array `DS`.
5. Transmit `DS` over the custom records of an HTLC.
6. In case of failure, transmission can be retried over a
different route.

### Receiving

Continuing the last example, the receiving node can execute the
following steps for each received fragment `DS` in order to assemble
the data `D`:

1. Decode `DS` according to DataStruct definition.
2. Check `version` field, and decide whether to proceed or ignore
the fragment.
3. If the received DataStruct instance contains a `fragment` field:
1. Retrieve the reconstruction buffer identified by `fragset_id`,
creating it with size `total_size` if it does not exist.
2. Insert `payload` at `offset` to reconstruction buffer.
3. Check if reconstruction buffer is complete. If all of the
body of the reconstruction buffer is filled, the buffer
contains the total data `D`.

### Notes / Remarks

* We mention that the encoded DataStruct is placed inside a custom
TLV record, but do not specify the exact TLV key. This is a spec
regarding data fragment transmission, and as such should not define
specific TLV keys to be used.

* Interoperability could be achieved by different applications
utilizing the same TLV as well as data encoding for transmission.

* A node can send and receive payments that carry data in different
TLV keys. It is the responsibility of the application to send and
listen for data over specific TLV keys.

* It is the responsibility of the sender to transmit fragments that
allow for full data reconstruction on the receiving end.

* Fragments could carry ranges of bytes that overlap (e.g. two
fragments that cover the range 256-511 (0-511, 256-767)).

* A DataSig could accompany a transmitted DataStruct, allowing the
receiving node to verify the data source and destination.

* If DataSig is also included with each fragment, the receiver could
identify reconstruction buffers based not only on `fragset_id` but
the sender's address as well. This means that a node could
simultaneously be receiving two different fragment sets with the
same `fragset_id`, as long as they are originating from different
nodes.

* It is the responsibility of the sender to properly coordinate
simultaneous transmissions to a destination node by using different
`fragset_id` values for each fragment set.

* If the sender uses an AMP payment's HTLCs to carry the different
fragments, it is not strictly necessary to declare the `total_size`
of the data. The condition for data reconstruction completion could
be the success of the AMP payment, unless they want to utilize both
AMP and single path payments for data transmission (transmit over
multiple payments possibly with multiple HTLCs on each payment).

* There is a lot of room for optimisations, like signing larger
chunks of data and not each transmitted fragment. This way you would
transmit less DataSig instances and leave more available space for
the fragment data.

- A working proof of concept that utilizes DataSig and DataStruct
over single path payments can be found here:
https://github.com/GeorgeTsagk/satoshi-read-write

--
George Tsagkarelis | @GeorgeTsag | c13n.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20220616/ebb61e73/attachment-0001.html>;

George Tsagkarelis [ARCHIVE] on Nostr: 📅 Original date posted:2022-06-16 📝 Original message: # DataStruct -- Data ...