"Gossip", "Outbox", "Inbox", "Blastr", "Small relays vs Big relays". You've probably ...

This is a long form article, you can read it in https://habla.news/a/naddr1qqxnzde3xy6rvwpnx56rvdpkqgspwwwexlwgcrrnwz4zwkze8rq3ncjug8mvgsd96dxx6wzs8ccndmcrqsqqqa28jnw7un

“Gossip”, “Outbox”, “Inbox”, “Blastr”, “Small relays vs Big relays”. You’ve probably seen most of these topics come up in conversations and memes recently. You might have even read hodlbod (nprofile…u0w6)’s article (naddr1qq…a9ny), or Mike Dilger ☑️ (nprofile…442w)’s very technical article (naddr1qq…6usv), or maybe even fiatjaf (nprofile…cy6q)’s one with the clickbaity title (naddr1qv…cqx3).

These are all great, and you should go and read them too. But one thing that each of them is guilty of is assuming that the audience has enough context to jump into the middle of a very nuanced and technical discussion. I’ve yet to see a clear description of what Gossip or Outbox really are and certainly none that are written in a way that is approachable for a non-technical audience. I hope this article can give you that context and serve as a high-level overview of the discussion and the technologies involved so that you can go forth to create better, more informed memes (is there anything more important, really?) and join the discussion in a productive way.

The problems

Centralization is the default

First off, why are we even talking about relays? Most of know that nostr is a protocol which is designed to be decentralized from the ground up. That decentralization is due in large part to the fact that users read and write data to multiple relays by default. So instead of all your data going to a centralized database (usually owned and operated by a single company) with nostr you have the ability to send your data to as as many relays as you’d like (relays are just databases, after all). Likewise, you can read other people’s data from as many relays as you’d like to. Decentralized design = decentralized system, right!? Well, turns out, no.

The problem with a design like this is that what can be done and what will be done are often very different things. Imagine the following scenario: You follow 1000 people; 700 of them post their notes to the Primal relay, the other 300 post their notes to the Damus relay. If you don’t also write your content to those two relays, the people that you care about won’t see your content, they won’t see your replies to their notes, they won’t even know you exist. So while; in practice, it’s easy to read & write to many different relays, users and their content will tend to centralize because it’s the path of least resistance to a good user experience. Network effects and economies of scale always apply, they just don’t always lead to the outcomes that you want.

Before you start to panic, this sort of centralization isn’t a huge issue just yet. We are still a nascent network and there are still hundreds of active relays out there. Almost all nostr clients make relay selection a first class citizen and don’t default new users to a single relay. The fact that we’re even having this conversation in a serious way at this stage is a great sign that the community cares enough to be proactive about maintaining (and improving) decentralization over time.

That said, this is not a issue that we can take lightly either. The top 5-10 relays do already have an outsized percentage of users and we have many examples of these centralizing tendencies across dozens of protocols and networks over the last 30 years, so the threat is real.

The status quo is wasteful

The other major issue is that currently most nostr clients are extremely wasteful in how they fetch data. The default is to simply get everything from all the relays a user wants to connect to. Because most of us are writing the same data to many relays, this leads to huge amounts of duplicated data being downloaded, having its signatures verified, and then (for the most part) thrown away. For those of us with latest generation smartphones, unlimited data, and a working power outlet nearby, this is fine. For everyone else, this is a major roadblock to adopting nostr.

A lightly technical aside

There are a few important features of nostr that make any sort of intelligent data fetching possible. To understand how any of the potential solutions to the aforementioned problems would actually work, it’s important to have a baseline understanding of these technical details. I promise, I’ll keep it high level.

Relay lists

Recently the concept of a Relay List Metadata has been introduced to the spec in NIP-65 (there are also other types of relay lists documented in NIP-51). This is a nostr list event where users publish their preferred relays with a marker that says whether the relay is for read-only, write-only, or read & write. This vastly simplifies the process of finding preferred user relays for clients and I imagine that this will become the de facto standard place to look for a user’s preferred relays.

NIP-05

The NIP-05 spec also documents a way for users to signal their preferred relays. However, unlike the NIP-65 relay list, this is a simple list of relays without any details on whether the user reads or writes to those relays.

Tag markers

Tag markers are positional elements in tags that give the client publishing the event the ability to leave a hint as to which relay other clients can expect to find a given user or note that is being referenced.

For example, in a user’s contact list (a kind: 3 event) you have many "p" tags to reference all the user’s that are followed. These tags look like this:

{
  "kind": 3,
  "tags": [
    ["p", "91cf9..4e5ca", "wss://alicerelay.com/", "alice"],
    ["p", "14aeb..8dad4", "wss://bobrelay.com/nostr"],
    ["p", "612ae..e610f"]
  ],
  "content": "",
  ...other fields
}

All three of these "p" tags are valid (only the "p" and the user’s pubkey are required), in the first and second you can see the third item is a relay where the user’s events can be found.

These types of tag markers are repeated all throughout nostr. Here’s a few more example references for an event (in this case a long-form article):

["e", "b3e392b11f5d4f28321cedd09303a748acfd0487aea5a7450b3481c60b6e4f87", "wss://relay.example.com"],

["a", "30023:a695f6b60119d9521934a691347d9f78e8770b56da16bb255ee286ddf9fda919:ipsum", "wss://relay.nostr.org"]

As you can imagine, these hints can be very helpful but only if clients actually attempt to fetch the content from the referenced relay.

The solutions?

Now that you understand the problem space a bit better let’s define those terms we started with.

Blastr

Blastr was created by Deleted Account (nprofile…wkeh) and benthecarman (nprofile…puwc) from Mutiny and isn’t a relay. Instead, Blastr is a proxy (i.e. it looks like a relay to clients) that ingests any event sent to it and, in turn, sends those events to EVERY online relay on the network. It’s a mass re-broadcaster for events that can be helpful to get your data pushed out to as many relays (and thus users) as possible. The drawback, of course, is that this is tremendously wasteful from a bandwidth and storage perspective.

Gossip (in 3 flavors)

This is by far the most confusing part for most people when watching the memes of the last few weeks fly by and I’ve seen a lot of confused takes out there. Most of the confusion stems from the multiplicity of definitions of what the “gossip model” actually is. Let’s go through the options.

Gossip protocols: This is a general concept more than a specific implementation. Gossip protocols are protocols that attempt to spread information around a network in a uniform way. For example, Bitcoin nodes use a variation of the gossip protocol to make sure that transactions end up in as many mempools as possible. This is important in computing when you want to reach consensus or when all nodes in a network need to have the same information in order to operate the network. Since nostr doesn’t have any consensus rules or shared compute, it’s somewhat pointless to try and make sure all events are propagated to all relays (hence the limited usefulness of Blastr).
The Gossip client from Mike Dilger ☑️ (nprofile…442w) : This is a nostr client that was built from the ground up to try and use relays and relay hints in events to the fullest to keep things as decentralized as possible while being efficient in how much data it was fetching. Mike has a great (slightly outdated and very technical) video that talks about his motivation behind building Gossip in the way he did. It’s worth a watch. video link
Gossip model: This is what people are usually referring to when they are talking about relays on nostr. The Gossip model is a loose, catch-all term used to refer to all the ways in which clients attempt to understand which relays they should read & write to for a given user. Again, this isn’t really a specific spec or implementation but encompasses many different strategies. This vagueness inherent in the term makes discussions about the “gossip model” pretty imprecise and prone to misunderstanding.

Don’t gossip

To be clear: You really shoud not be thinking about or talking about any of this as the “gossip model” since that definition is so abstract as to be unusable. Which brings us finally to the real topic being discussed at the moment among devs; the Outbox model.

Outbox/Inbox model

This is the real topic of conversation right now: How should client developers build relay discovery and selection features into their apps. As we already talked about, if left alone, it’s likely that we’d unintentionally centralize most of nostr onto a few huge relays. So making sure that we encourage (and build sensible defaults) to help client developers to treat relay discovery and selection properly is really critical.

Right now, the discussion centers around one main approach, called the “Outbox model”. There is also an “Inbox model” which is a still just a high level idea which I’ll mention below but it’s not being implemented yet (as of late March 2024).

The “Outbox model”: This strategy looks at what relays users are using to publish their events (from relay lists and tag markers) and then uses an algorithm to decide how to fetch all the needed events from the array of relays. Different implementations can use different algorithms to select relays. For example, one implementation might optimize fetching events from the smallest number of relays (favoring large relays), while another might optimize for fetching from the relays that have the smallest user overlap (favoring small relays).
The “Inbox model”: As you can imagine, this strategy flips the outbox model on it’s head. It’s so far just an idea (proposed by Alex Gleason (nprofile…aufq)) and a draft NIP but the idea is that when your client posts on your behalf, it will loop over your entire follow list, making a list of the relays that each of your followers uses to read events. Then the client will publish your event to all of those relays. If all clients followed this paradigm, then each of us would only have to read from a single relay. To quote Alex Gleason (nprofile…aufq) ’s original post, “This doesn’t take away from the outbox approach, and maybe should even be used together instead of as a replacement. But my point is that clients should be trying harder to deliver posts instead of just fetch posts. Because it benefits users when their posts can be seen.”

Why the Outbox model has broad support

To understand why implementing an Outbox model is so powerful at solveing the problems laid out at the beginning of this article, you can do two quick thought experiments:

A user banned from all other relays

Imagine a user who’s content is banned from all public relays for some reason. Instead, they have to run their own relay and publish their events there. With a simple “follow the major relays model” (or even with Blastr attempting to copy and paste their events to every other relay) this user is completely invisible to the network. User’s would have to know about the banned user’s private relay and select that relay in each client they use. What’s more, if that relay ever had to change URL (likely for a user that is banned so broadly), all user’s would need to know what the new URL is in order to change relays and fetch events from the new location.

With Outbox however, clients will see that their user’s follow this user, will then look up this user’s relay list, and will know where they need to go to fetch their events. It becomes much more difficult for relays to censor or block users with an Outbox model in place

Duplicate event fetching

We talked earlier about how many nostr clients often fetch the same data many times, just to throw that data away. Even using the simplest algorithms with an Outbox model, you can significantly reduce the amount of duplicate data you’re fetching from relays. We, as users, would all also need to specific many fewer relays in our relay lists but would still be quite sure our clients would be able to find all the content we want to see.

Wrapping up

Hopefully this has given you a better overall understanding of what folks are talking about when they refer to Gossip (remember: don’t refer to it this way) or Outbox (outbox, yay!) and why we need to be proactive about maintaining the decentralization of nostr.

JeffG on Nostr: "Gossip", "Outbox", "Inbox", "Blastr", "Small relays vs Big relays". You've probably ...