NOTimothyLottes on Nostr: Something I've never really understood about AMD's compiler. The hardware has some ...
Something I've never really understood about AMD's compiler. The hardware has some amazing and 'free' buffer addressing logic (image below), and yet the compiler seems to just ignore it.
For example,
int a=...;
int d=ssbo.array[(0xDEAD0000>>2)+a];
Generates
v_lshlrev_b32 v0, 2, v0
v_add_nc_u32 v1,lit(0xdead0000),v0
buffer_load_dword v1, v1, s[0:3], 0 offen
There is NO need for those VALU ops, the HW buffer addressing can add 0xdead0000 from an SGPR and scale 'a' by 4 (via const_stride)
Published at
2024-03-12 04:43:22Event JSON
{
"id": "4961696d9e8dfb9e19e22cdb2f66d0b2178de18b47293c40fc8507371da3db0f",
"pubkey": "236199054982d0c356cbb0df14a2a9bea38b25d27a96190fc51cd67558bac16b",
"created_at": 1710218602,
"kind": 1,
"tags": [
[
"proxy",
"https://mastodon.gamedev.place/users/NOTimothyLottes/statuses/112080886327279380",
"activitypub"
]
],
"content": "Something I've never really understood about AMD's compiler. The hardware has some amazing and 'free' buffer addressing logic (image below), and yet the compiler seems to just ignore it.\n\nFor example, \nint a=...;\nint d=ssbo.array[(0xDEAD0000\u003e\u003e2)+a];\n\nGenerates\nv_lshlrev_b32 v0, 2, v0\nv_add_nc_u32 v1,lit(0xdead0000),v0\nbuffer_load_dword v1, v1, s[0:3], 0 offen\n\nThere is NO need for those VALU ops, the HW buffer addressing can add 0xdead0000 from an SGPR and scale 'a' by 4 (via const_stride)\n\nhttps://cdn.masto.host/mastodongamedevplace/media_attachments/files/112/080/876/680/809/102/original/f7f3bd9fb3785ecd.png",
"sig": "4f92374f0dba8dd63fb66dd47916ac7306dc16b295fb411e5678b60653373992cc757ba054d7ec0eaedd90dba35da930a63b5bd20e75911ef96f24e2267a43d1"
}