Bottom-up Load-balancing for Nostr Relays fiatjaf Alex Gleason PABLOF7z jb55 I humbly ...

Bottom-up Load-balancing for Nostr Relays

fiatjaf (nprofile…xh4c) Alex Gleason (nprofile…mess) PABLOF7z (nprofile…478g) jb55 (nprofile…2ln2) I humbly submit a draft strategy that relays could use to protect themselves, which is hopefully better than the throttling currently used, and should increase the overall health of the Nostr network if adopted. Your feedback would be much appreciated! 🙏

quoting
naddr1qv…rfd7

Bottom-up Load-balancing for Nostr Relays

Goal: protect your relay by distributing clients more evenly across all relays without top-down coordination.

The problem

Nostr relays are a public good with the costs borne by benevolent volunteers running them. So far this system has been robust, supporting tens of thousands of active users per month. There is some low-key evidence of strain on these servers (see below) and overload could become a bigger problem as the network grows, attracts spammers, griefers, and other bad actors. It could become expensive to run the most popular relays even under ideal conditions with conscientious users.

The main strategies employed by relays to mitigate overload currently are a) throttling requests b) pre-set PoW c) authentication of some form, with throttling being common on popular servers. Throttling is an opaque strategy where client apps only receive the kill signal once they have been throttled. Pre-set PoW is exclusionary to most clients that don’t implement it, and authentication is centralizing.

The current strategy employed by clients for selecting relays is for client apps to choose a set of default relays, and for users to modify those defaults manually in the UI. This results in natural aggregation of clients on the more popular relays. This concentrates load on a smaller number of relays and is centralizing.

A solution

A solution that helps both clients and relays is to make the relay overload signal legible.

Our proposal is simple:

Relay software measures common load utilization percentage metrics like cpu, memory, and disk.

Overloaded relays publish their current load expressed as a proof-of-work requirement (NIP-11).

Clients upon seeing the proof-of-work requirement can do the PoW or switch relays (NIP-13, NIP-01).

This sets up a basic PoW market where “prices” help “consumers” decide where to allocate their “capital” (events with proof-of-work tokens). The likely effect is that clients will automatically re-distribute their traffic away from “expensive” overloaded relays towards “cheap” underloaded relays, making the entire network healthier.

The new thing here is tying PoW to load. Existing NIPs cover most of the PoW part of the soultion. NIP-11 covers publishing of PoW requirements by relays. NIP-13 and NIP-01 cover clients performing PoW and relays reacting to it.

A pleasant side-effect of using PoW as the load signal is that would-be spammers are forced to pay an actual cost in the form of energy expenditure and hardware.

How to participate

Participation is opt-in. Any relay or client can participate in this scheme if they want to try it out, or ignore it if they don’t. It doesn’t require large scale changes to the protocol.

Relay implementation

To participate in this protocol relays would make these changes:

Collect load metrics.

Publish PoW requirements (NIP-11).

Check PoW on incoming client events (NIP-13, NIP-01).

Collecting load metrics is straightforward. Relays can collect CPU loadavg, memory usage (averaged), and disk utilization as percentages. We suggest the final load percentage value should be: max(cpu%, mem%, disk%) as a starting point. The reason for using max() is that problems arise if any of your resources is exhausted, so we should rely on the most exhausted resource as the actual load value.

PoW required can be calculated (see section below) by using a sensible starting point for tolerable load - the zero-PoW-load point. A max-bits diffculty can be specified for the 100% load point. The zero-PoW-load value is the load percentage up until which the relay is comfortable offering connections “free” without requring any PoW. This situation would be the same as it is currently with clients able to freely connect to most relays. Once the zero-PoW-load is reached the server would start publishing a number of bits of PoW required using NIP-11. The PoW required would scale up with the load experienced. Of course zero-PoW-load can be set low to facilitate early PoW market signals.

New metrics that relays care about can be added to the calculation in future without clients caring. Other metrics could be used such as percent of available TCP/websocket connection slots used, or actual server rental costs vs. maximum cost a volunteer will bear.

Validating PoW on incoming client events is something that has already been discussed and implemented in relays. See NIPs 11, 13, and 01.

PoW Required Calculation

zero_pow_load is a setting for the minimum load percentage where PoW should kick in with 1 bit of PoW. 80% could be a sensible starting point. Operators of popular relays can use their own historical data to determine this.

max_bits is set to the highest PoW clients will have to do when server load goes near 100%. This can be calibrated to some value like 1 hour of PoW on an average modern device. It’s not expected clients will actually perform this but it sets the scale, and it should prevent highly resourced spammers from 100% utilization of any relay.

current_load is the computed maximum of all of the load metric percentages gathered.

Then the formula for calculating the required PoW bits is:
pow_bits_required = max_bits * max(0, (current_load - zero_pow_load) / (100 - zero_pow_load))
Since PoW increases logarithmically (each bit being exponentially harder than the last) some scaling function may be required to smooth this off.

Client implementation

These are the changes clients would make to participate in this protocol.

Select relays based on PoW (favour PoW-cheap + known reliable relays).

Do PoW if connecting to overloaded servers (NIP-11 and NIP-13).

Selecting relays for the user then might mean keeping a larger list of potential default relays and selecting a subset at setup time based on PoW requirements. It might also mean actively monitoring for high-PoW relays and switching away if a relay is frequently expensive.

It is in a client’s best interest to select a low-PoW underloaded set of relays to publish to, whilst still favouring known-reliable relays. If the whole network becomes loaded then PoW acts as a deterrent for non-critical use and spammers. Only the most commited clients and users will participate. It can also be a transparent signal to the community that more relays are required.

Addenda

So that’s the main specification of the scheme. The following are related addenda.

Evidence of strain

Some random low-key evidence of strain on relays, and worries about the costs of running them.

note1kp5…7hkk

note19cp…9dm4

note16vm…u5wv

TODO: sample some nostr events/users randomly from the firehose and get stats on relay centralization.

More weird PoW ideas

One thing not covered in NIPs is PoW-on-connect which could help in future if there are malicious clients camping on websockets.

Another idea is building PoW into npubs, similar to vanitygen, which would put some skin-in-the-game onto users when creating keys. Some relays may choose only to service high PoW npubs that have proven their commitment to the network and protocol.

On resource rationing

Some ways to ration unexpectedly demanded goods in an emergency:

(1) market prices (“price gouging”) (2) waiting in line (3) centrally planned rationing (4) don’t ration: just let the resource run out

– Nick Szabo on Twitter

Market prices are a least-worst business-as-usual option.

https://njump.me/naddr1qvzqqqr4gupzpk5nry54wj2lkk0kauwwr8n5j3as0y4ka2smzd9qzhqrymjqjlg6qq5kymm5w3hk6tt4wqkkcmmpvskkyctvv9hxx6twvukkvmmj94hx7um5wgkhyetvv9uhxrarfd7

mccrmx on Nostr: Bottom-up Load-balancing for Nostr Relays fiatjaf Alex Gleason PABLOF7z jb55 I humbly ...

Bottom-up Load-balancing for Nostr Relays

The problem

A solution

How to participate

Relay implementation

PoW Required Calculation

Client implementation

Addenda

Evidence of strain

More weird PoW ideas

On resource rationing