nprofile1q…4hrac Do CPUs desperately need more ILP? I thought modern CPUs were ...

Why Nostr? What is Njump?

Asahi Lina (朝日リナ) /

npub14w…zuequ

2025-04-16 10:33:55

in reply to nevent1q…sjq2

nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpq00pce86why7sp4zf237kplxk2g8zterc6wmfw4fywhjs953ggazq54hrac (nprofile…hrac) Do CPUs desperately need more ILP? I thought modern CPUs were quite crazy good at that (Apple is a good example).

I think the issue is most software isn't parallelizable and sequential code is much easier to reason about... that's why VLIW didn't really catch on. Nobody wants to schedule instructions manually, and compilers aren't any better at it than superscalar CPU frontends.

That's how we ended up with the CPU/GPU dichotomy. Sequential code that can be parallelized at the batch level ("SPMD") runs on multicore CPUs, the subset of that which can process multiple data units at once uses SIMD (manually or autovectorized), and embarrassingly parallel code (low branching) that doesn't need low latency for dispatch ends up as SIMT on GPUs, while still being programmed mostly as sequential code working on a single unit (the parallelization is implemented in hardware/drivers).

At the end of the day there will always be underutilized silicon... any time you have different types of resources (ALUs, etc.) their distribution will never match the usage of real world workloads most of the time.

Author Public Key

npub14w78207els8vs5fxduhhval0r9zgujpf2khcqyfuhmkt2tlyvcyq2zuequ

Show more details

Published at

2025-04-16 10:33:55

Kind type

1 Short Text Note

Event JSON

{ "id": "2ba93267b427cfbdc42f635aa74cee90d5924a82848afc2b92c9775de8cb5991", "pubkey": "abbc753fd9fc0ec851266f2f7677ef19448e482955af80113cbeecb52fe46608", "created_at": 1744799635, "kind": 1, "tags": [ [ "p", "7bc38c9f4eb93d00d449547d60fcd6520e25e478d3b697552475e502d2284744", "wss://relay.mostr.pub" ], [ "p", "b2542012f4cd3dfbb8badbc0ce455effbabf3f346ac76dc6986a47d12a271cb1", "wss://relay.mostr.pub" ], [ "e", "4450064280479e26143b8130622029b4f2cbc0af1ed85f74b45b18f42d7ce35a", "wss://relay.mostr.pub", "reply" ], [ "proxy", "https://vt.social/users/lina/statuses/114347188933901523", "activitypub" ], [ "client", "Mostr", "31990:6be38f8c63df7dbf84db7ec4a6e6fbbd8d19dca3b980efad18585c46f04b26f9:mostr", "wss://relay.mostr.pub" ] ], "content": "nostr:nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpq00pce86why7sp4zf237kplxk2g8zterc6wmfw4fywhjs953ggazq54hrac Do CPUs desperately need more ILP? I thought modern CPUs were quite crazy good at that (Apple is a good example).\n\nI think the issue is most software isn't parallelizable and sequential code is much easier to reason about... that's why VLIW didn't really catch on. Nobody wants to schedule instructions manually, and compilers aren't any better at it than superscalar CPU frontends.\n\nThat's how we ended up with the CPU/GPU dichotomy. Sequential code that can be parallelized at the batch level (\"SPMD\") runs on multicore CPUs, the subset of that which can process multiple data units at once uses SIMD (manually or autovectorized), and embarrassingly parallel code (low branching) that doesn't need low latency for dispatch ends up as SIMT on GPUs, while still being programmed mostly as sequential code working on a single unit (the parallelization is implemented in hardware/drivers).\n\nAt the end of the day there will always be underutilized silicon... any time you have different types of resources (ALUs, etc.) their distribution will never match the usage of real world workloads most of the time.", "sig": "3359ac85f8a3f6d25c1d6c860c4ae281e6e1fadb424796ad19426428f266bf5e741f229a433537be29aa514de45298ad81086774747142c2613c28e66b9ff916" }