Steve [ARCHIVE] on Nostr: 📅 Original date posted:2011-09-06 🗒️ Summary of this message: A developer is ...
📅 Original date posted:2011-09-06
🗒️ Summary of this message: A developer is building a node crawler to map out the Bitcoin network and provide useful statistics, seeking input on useful information and connection strategies.
📝 Original message:Hi All,
I started messing around today with building a node crawler to try and
map out the bitcoin network and hopefully provide some useful
statistics. It's very basic so far using a mutilated bitcoinj to
connect (due me being java developer and not having a clue with c/c++).
If it's worthwhile I'll hack bitcoinj some more to run on top Netty to
take advantage of it's NIO architecture (netty's been shown to handle
1/2 million concurrent connections so would be ideal for the purpose).
Hoping to a get a bit of input into what would be useful as well as
strategy for getting max possible connections without distorted data. I
seem to recall Gavin talking about the need for some kind of network
health monitoring so I assume there's a need for something like this...
Firstly at the moment basically I'm just storing version message and the
results of getaddr for each node that I can connect to. Is there any
other useful info that can be extracted from a node that's worth collecting?
Second and main issue is how to connect. From my first very basic
probing it seems the very vast majority of nodes don't accept incoming
connections no doubt due to lack of upnp. So it seems the active crawl
approach is not really ideal for the purpose. Even if it was used the
resultant data would be hopelessly distorted.
A honeypot approach would probably be better if there was some way to
make a node 'attractive' to other nodes to connect to. That way it
could capture non-listening nodes as well. If there is some way to
influence other nodes to connect to the crawler node that solves the
problem. If there isn't which I suspect is the case then perhaps
another approach is to build an easy to deploy crawler node that many
volunteers could run and that could then upload collected data to a
central repository.
While I'm asking questions I'll add one more regarding the getaddr
message. It seems most nodes return about 1000 addresses in response to
this message. Obviously most of these nodes haven't actually talked to
all 1000 on the list so where does this list come from? Is it mixture
of addresses obtained from other nodes somehow sorted by timestamp?
Does it include some nodes discovered by IRC/DNS? Or are those only used
to find the first nodes to connect to?
Thanks for any input... Hopefully I can build something that's useful
for the network...
Published at
2023-06-07 02:22:04Event JSON
{
"id": "33d0f9975bb2eb2ce3f08a9a7d926d8930ca94bdcfcf127cea5c23866978fbc2",
"pubkey": "fec9b86ad53f610a8cab1b299acd9dc48a0c018163c9a3ac28bfef2117de2ae6",
"created_at": 1686104524,
"kind": 1,
"tags": [
[
"e",
"d4038ef51b325e2ff2d004554a70621275dc2d12af3c3da58cc3d5c5f5142af9",
"",
"reply"
],
[
"p",
"a23dbf6c6cc83e14cc3df4e56cc71845f611908084cfe620e83e40c06ccdd3d0"
]
],
"content": "📅 Original date posted:2011-09-06\n🗒️ Summary of this message: A developer is building a node crawler to map out the Bitcoin network and provide useful statistics, seeking input on useful information and connection strategies.\n📝 Original message:Hi All,\n\nI started messing around today with building a node crawler to try and \nmap out the bitcoin network and hopefully provide some useful \nstatistics. It's very basic so far using a mutilated bitcoinj to \nconnect (due me being java developer and not having a clue with c/c++). \n If it's worthwhile I'll hack bitcoinj some more to run on top Netty to \ntake advantage of it's NIO architecture (netty's been shown to handle \n1/2 million concurrent connections so would be ideal for the purpose).\n\nHoping to a get a bit of input into what would be useful as well as \nstrategy for getting max possible connections without distorted data. I \nseem to recall Gavin talking about the need for some kind of network \nhealth monitoring so I assume there's a need for something like this...\n\nFirstly at the moment basically I'm just storing version message and the \nresults of getaddr for each node that I can connect to. Is there any \nother useful info that can be extracted from a node that's worth collecting?\n\nSecond and main issue is how to connect. From my first very basic \nprobing it seems the very vast majority of nodes don't accept incoming \nconnections no doubt due to lack of upnp. So it seems the active crawl \napproach is not really ideal for the purpose. Even if it was used the \nresultant data would be hopelessly distorted.\n\nA honeypot approach would probably be better if there was some way to \nmake a node 'attractive' to other nodes to connect to. That way it \ncould capture non-listening nodes as well. If there is some way to \ninfluence other nodes to connect to the crawler node that solves the \nproblem. If there isn't which I suspect is the case then perhaps \nanother approach is to build an easy to deploy crawler node that many \nvolunteers could run and that could then upload collected data to a \ncentral repository.\n\nWhile I'm asking questions I'll add one more regarding the getaddr \nmessage. It seems most nodes return about 1000 addresses in response to \nthis message. Obviously most of these nodes haven't actually talked to \nall 1000 on the list so where does this list come from? Is it mixture \nof addresses obtained from other nodes somehow sorted by timestamp? \nDoes it include some nodes discovered by IRC/DNS? Or are those only used \nto find the first nodes to connect to?\n\nThanks for any input... Hopefully I can build something that's useful \nfor the network...",
"sig": "a3199d999b70d3393aa73de6045e24523b16394926e324d04b0a6dac2b5af363a4bf3ac21b344e64c7c0b8369257a8d7a8022d7cf34be35e5afb2c33947be89e"
}