on Nostr: its unreal how long Intel and AMD took to provide the user with a proper unified ...
its unreal how long Intel and AMD took to provide the user with a proper unified memory copy/fill instruction(rep movs/rep stos)
with the Pentium it was slow as hell
then they improved it with "Fast Strings" for data larger than 256 bytes, still with a 36-cycle startup cost
then they finally implemented "Fast Short REP MOVS" and "Fast Short REP STOS" in Ice Lake(2019) but it still loses out to nontemporal MOVNTDQ+PREFETCHNTA loops which are barely able to overwhelm the memory bus speed
Published at
2023-08-24 12:38:45Event JSON
{
"id": "b23fd6efa6abde2bbeede8eac54c6019d8609392078d4863c1c9a5c4c97807bf",
"pubkey": "0c0de16ff59e79a63e29ce820c3435fb309ad6bb4c1c4a8fe9128f03c9cb1496",
"created_at": 1692880725,
"kind": 1,
"tags": [
[
"emoji",
"niggainsane",
"https://ryona.agency/emoji/cyberianclaims/niggainsane.png"
],
[
"proxy",
"https://ryona.agency/objects/fbe6de03-41fd-4200-a2f3-437af320bb4f",
"activitypub"
]
],
"content": "its unreal how long Intel and AMD took to provide the user with a proper unified memory copy/fill instruction(rep movs/rep stos)\nwith the Pentium it was slow as hell\nthen they improved it with \"Fast Strings\" for data larger than 256 bytes, still with a 36-cycle startup cost\nthen they finally implemented \"Fast Short REP MOVS\" and \"Fast Short REP STOS\" in Ice Lake(2019) but it still loses out to nontemporal MOVNTDQ+PREFETCHNTA loops which are barely able to overwhelm the memory bus speed\n:niggainsane:",
"sig": "dab296bc919d158257b403f2dbccf4112a558bc8a931fb1d2a50a30d9ab0b012833dcc2b2212232e0d4257a7deab0de00a5eabeb18fd2698232f805a8003300f"
}