Asahi Lina (朝日リナ) on Nostr: nprofile1q…4hrac Do CPUs desperately need more ILP? I thought modern CPUs were ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpq00pce86why7sp4zf237kplxk2g8zterc6wmfw4fywhjs953ggazq54hrac (nprofile…hrac) Do CPUs desperately need more ILP? I thought modern CPUs were quite crazy good at that (Apple is a good example).
I think the issue is most software isn't parallelizable and sequential code is much easier to reason about... that's why VLIW didn't really catch on. Nobody wants to schedule instructions manually, and compilers aren't any better at it than superscalar CPU frontends.
That's how we ended up with the CPU/GPU dichotomy. Sequential code that can be parallelized at the batch level ("SPMD") runs on multicore CPUs, the subset of that which can process multiple data units at once uses SIMD (manually or autovectorized), and embarrassingly parallel code (low branching) that doesn't need low latency for dispatch ends up as SIMT on GPUs, while still being programmed mostly as sequential code working on a single unit (the parallelization is implemented in hardware/drivers).
At the end of the day there will always be underutilized silicon... any time you have different types of resources (ALUs, etc.) their distribution will never match the usage of real world workloads most of the time.
Published at
2025-04-16 10:33:55Event JSON
{
"id": "2ba93267b427cfbdc42f635aa74cee90d5924a82848afc2b92c9775de8cb5991",
"pubkey": "abbc753fd9fc0ec851266f2f7677ef19448e482955af80113cbeecb52fe46608",
"created_at": 1744799635,
"kind": 1,
"tags": [
[
"p",
"7bc38c9f4eb93d00d449547d60fcd6520e25e478d3b697552475e502d2284744",
"wss://relay.mostr.pub"
],
[
"p",
"b2542012f4cd3dfbb8badbc0ce455effbabf3f346ac76dc6986a47d12a271cb1",
"wss://relay.mostr.pub"
],
[
"e",
"4450064280479e26143b8130622029b4f2cbc0af1ed85f74b45b18f42d7ce35a",
"wss://relay.mostr.pub",
"reply"
],
[
"proxy",
"https://vt.social/users/lina/statuses/114347188933901523",
"activitypub"
],
[
"client",
"Mostr",
"31990:6be38f8c63df7dbf84db7ec4a6e6fbbd8d19dca3b980efad18585c46f04b26f9:mostr",
"wss://relay.mostr.pub"
]
],
"content": "nostr:nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpq00pce86why7sp4zf237kplxk2g8zterc6wmfw4fywhjs953ggazq54hrac Do CPUs desperately need more ILP? I thought modern CPUs were quite crazy good at that (Apple is a good example).\n\nI think the issue is most software isn't parallelizable and sequential code is much easier to reason about... that's why VLIW didn't really catch on. Nobody wants to schedule instructions manually, and compilers aren't any better at it than superscalar CPU frontends.\n\nThat's how we ended up with the CPU/GPU dichotomy. Sequential code that can be parallelized at the batch level (\"SPMD\") runs on multicore CPUs, the subset of that which can process multiple data units at once uses SIMD (manually or autovectorized), and embarrassingly parallel code (low branching) that doesn't need low latency for dispatch ends up as SIMT on GPUs, while still being programmed mostly as sequential code working on a single unit (the parallelization is implemented in hardware/drivers).\n\nAt the end of the day there will always be underutilized silicon... any time you have different types of resources (ALUs, etc.) their distribution will never match the usage of real world workloads most of the time.",
"sig": "3359ac85f8a3f6d25c1d6c860c4ae281e6e1fadb424796ad19426428f266bf5e741f229a433537be29aa514de45298ad81086774747142c2613c28e66b9ff916"
}