Why Nostr? What is Njump?
2023-06-07 18:28:21
in reply to

Luke Kenneth Casson Leighton [ARCHIVE] on Nostr: šŸ“… Original date posted:2021-02-03 šŸ“ Original message:(hi folks do cc me, i am ...

šŸ“… Original date posted:2021-02-03
šŸ“ Original message:(hi folks do cc me, i am subscribed digest, thank you for doing that,
ZmnSCPxj)

On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj at protonmail.com> wrote:
> Good morning Luke,
>
> I happen to have experience designing digital ASICs, mostly pipelined
data processing.
> However my experience is limited to larger geometries and in
SystemVerilog.

larger geometries for a hardware wallet ASIC is ok (as long as it is not
retail based and trying to run e.g. RSA, taking so long to complete that
the retail customer walks out)

> On the technical side, as I understand it (I have been out of that
industry for 4 years now, so my knowledge may be obsolete)

not at all! still very valuable

> as you approach lower geometries, you also start approaching analog
design.

yyeah i could intuitively tell/guess there might be something like this
which would throw a spanner in the works, it is why the grant request i put
in specifically excluded data-dependent constant time analysis and also
power analysis.


> In our case we were already manually laying out gates and flip-flops (or
replacing flip-flops with level-triggered latches and being extra careful
with clocks) to squeeze performance (and area) ...

ya-howw :)


> Many of the formal correctness proofs were really about the formal
equivalence of the netlist to the RTL; the correctness of the RTL was
"proved" by simulation testing.

thanks to Symbiyosys we are using formal proofs much more extensively, as
effectively a 100% coverage replacement for unit tests.

an example is popcount. we did two versions. one is a recursive tree
algorithm, almost impossible to read and understand what the hell it does.

the other is a total braindead 1-liner "x = x + input[i]", rubbish
performance though.

running a formal proof on these gave us 100% confidence that the complex
optimised version does the damn job.


yes we still do unit tests, these are more "demo code".

now, the caveat is that you have to have a model of the "dut" (device under
test) against which to compare, and if the dut is ridiculously complex then
the formal model variant, which has to do the same job, ends up equally as
complex (or effectively a duplicate of the dut) and the exercise is a bit
of a waste of time...

...*unless*... there happens to be other implementations out there. then
the proof can be run against those and everybody wins through collaboration.



now, here's why i put in the NLnet Grant request to explore going back to
the mathematics of crypto-primitives.

many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because that does
S-Boxes for Rijndael. they have gone mad by analysing algorithms trying to
fit them to standard ISAs.

nobody does Rijndael S-Boxes any way other than 256-entry lookup tables
because no standard ISA has general-purpose Galois Field Multiply.

consequently implementations in assembler get completely divorced from the
original mathematics on which the cryptographic algorithm was based.

the approach i would like to take is, "hang on a minute: how far would you
get if you actually added *general-purpose* instructions that *directly*
provided the underlying mathematical principles, and then wrapped a
Vector-Matrix Engine around them?".

would this drastically simplify algorithms to the point where *READABLE* c
code compiles directly to opcodes that run screamingly fast, outperforming
hand-optimised SIMD code using standard ISAs?

then, given the Formal Correctness approach above, can we verify that the
mathematically-related opcodes do the job?


> (to be fair, there were tools to force you to improve coverage by
injecting faults to your RTL, e.g. it would virtually flip an `&&` to an
`||` and if none of your tests signaled an error it would complain that
your test coverage sucked.)

nice!

> Things might have changed.

nah. this is such a complex area, run by few incumbent players, that
innovation is rare. not least, innovation is different and cannot be
trusted by the Foundries!


> A good RTL would embed SystemVerilog Assertions or PSL Assertions as well.
> Some formal verification tools can understand a subset of SystemVerilog
Assertions / PSL assertions and validate that your RTL conformed to the
assertions, which would probably help cut down on the need for RTL
simulation.

interesting.

> Overall, my understanding is that smaller geometries are needed only if
you want to target a really high performance / unit cost and performance /
energy consumption ratios.
> That is, you would target smaller geometries for mining.

yes.

> If you need a secure tr\*sted computing module that does not need to be
fast or cheap, just very accurate to the required specification, the larger
geometries should be fine and you would be able to live almost entirely in
RTL-land without diving into netlist and layout specifications.

hardware wallet ASICs.

i concur.

> A wrinkle here is that licenses for tools from tr\*sted vendors like
Synopsys or Cadence are ***expensive***.

yes they are :) we are currently working with Sorbonne University LIP6.fr
and Staf Verhaegen from Chips4Makers, trying a different approach:
coriolis2.

this will do fine up to 130nm (skywater). beyond that, mmm, we need a few
more years.

> What is more, you should really buy two sets of licenses, e.g. do logic
synthesis with Synopsys and then formal verification with Cadence, because
you do not want to fully tr\*st just one vendor.

interesting, good advice.

> Synthesis in particular is a black box and each vendor keeps their
particular implementations and tricks secret.

sigh. i think that's partly because they have to insert diodes, and
buffers, and generally mess with the netlist.

i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!

plus, they make an awful lot of money, it is good business.

> Pointing some funding at the open-source Icarus Verilog might also fit,
as it lost its ability to do synthesis more than a decade ago due to
inability to maintain.

ah i didn't know it could do synthesis at all! i thought it was simulation
only.

> Note as well that I heard (at the time when I was in the industry) that
some foundries will not even accept a netlist unless it was created by a
synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor
Graphics, maybe more I have forgotten since).

yes i heard this too, they don't want their time wasted: after all they
only make money by selling wafers, and if they can't sell any they have to
run empty wafers to keep the equipment at operating temperature.

if you book a slot 18 months in advance and the RTL doesn't work during
testing 3 months before the deadline they may not be able to find someone
else in time.

anything to reduce the risk there is good, so i totally get why.

thank you for the insights and the discussion, really appreciated.

l.


--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20210203/855e015c/attachment.html>;
Author Public Key
npub10j439tucjvu6dz9qgqkls46k6gzf4d2ewv2lk7uuc40uue2aalqq973d8z