Utxo set analysis time!
I have found a "toolchain" to extract taproot utxo pubkeys that's at least *reasonably* efficient - more on that, at the end, for the engineers.
But here's an analysis of a snapshot of the whole 167M utxo set as of 16th March 2024.
Of the 167M utxos, a full 39M are taproot (in other words, about 1 in 4 of all the individual "bits of bitcoin" that exist in our global consensus, are taproot - but not 1 in 4 *bitcoins*, i.e. not by value)!
Of that 39M, 33M are *sub 1000 sats*, i.e. basically dust or near dust. Pretty obviously, these will be "data carrying" type (probably ordinals stuff? sorry I don't know the details). Here's a rough breakdown of the taproot outputs in the utxo set by value:
Amount in sats Number of utxos(taproot only)
> 5 million 51674
> 2.5 million 81512
> 1 million 154130
> 500k 238060
> 250k 352235
> 100k 800843
> 50k 1043038
> 25k 1333547
> 10k 2853756
> 1000 sat 6084116
> 100 (i.e. ~all) 39034007
This will not be news to most. IMO taproot *economic* usage only picks up when Lightning implementations start using it; there is only fairly limited other incentive, for now.
For my taproot based 'anonymous usage project' (see recent posts), a filter of about 500k sats makes sense to me - anon sets of 250k are pretty decent, though as we've seen, we can definitely support much larger sets.
About the "toolchain".
Step 1 is to run the dumptxoutset RPC call against Core. As noted, this currently returns an 11GB data set of 167 million coins, so be aware if your setup is size constrained.
Step 2 is to parse the custom format of this data set. I believe it's Level DB. I found the easiest way was to run this useful tool: https://github.com/theStack/utxo_dump_tools/ against the file created by Step 1. This creates a sqlite database with intelligible columns in the 'utxo' database (like 'value', 'scriptpubkey').
Then I wrote a primitive Python script to do a SELECT from utxos WHERE value >=? AND scriptpubkey LIKE '5120%' .. something along those lines.
I can directly take the output of Step 3 as input to the aut-ct tool I've been talking about recently to create tokens.