Cybersecurity & cyberwarfare on Nostr: Faster Integer Division with Floating Point Multiplication on a common ...
Faster Integer Division with Floating Point
Multiplication on a common microcontroller is easy. But division is much more difficult. Even with hardware assistance, a 32-bit division on a modern 64-bit x86 CPU can run between 9 and 15 cycles. Doing array processing with SIMD (single instruction multiple data) instructions like AVX or NEON often don’t offer division at all (although the RISC-V vector extensions do). However, many processors support floating point division. Does it make sense to use floating point division to replace simpler division? According to [Wojciech Mula] in a recent post, the answer is yes.
The plan is simple: cast the 8-bit numbers into 32-bit integers and then to floating point numbers. These can be divided in bulk via the SIMD instructions and then converted in reverse to the 8-bit result. You can find several code examples on GitHub.
Since modern processors have several SIMD instructions, the post takes the time to benchmark many different variations of a program dividing in a loop. The basic program is the reference and, thus, has a “speed factor” of 1. Unrolling the loop, a common loop optimization technique, doesn’t help much and, on some CPUs, can make the loop slower.
Converting to floating point and using AVX2 sped the program up by a factor of 8X to 11X, depending on the CPU. Some of the processors supported AVX512, which also offered considerable speed-ups.
This is one of those examples of why profiling is so important. If you’d had asked us if converting integer division to floating point might make a program run faster, we’d have bet the answer was no, but we’d have been wrong.
As CPUs get more complex, optimizing gets a lot less intuitive. If you are interested in things like AVX-512, we’ve got you covered.
hackaday.com/2024/12/22/faster…
Published at
2024-12-23 06:00:50Event JSON
{
"id": "90b4dccf3bd077e0ab523674f4ad767c5de145458bb46484c371d62b07c5bfdf",
"pubkey": "1dc8c37eac0e57b91b49cd204ea17f3247e15f1957256358b2f34d7a0e451936",
"created_at": 1734933650,
"kind": 1,
"tags": [
[
"p",
"15a03ee92fda904c66e083a3b8c771c462190761797ece68edd34de5e927e321",
"wss://relay.mostr.pub"
],
[
"imeta",
"url https://hackaday.com/wp-content/uploads/2024/12/simd.png?w=800",
"m image/webp",
"dim 800x382"
],
[
"proxy",
"https://poliverso.org/objects/0477a01e-aa18b042-ad5e11dae2b8d616",
"activitypub"
]
],
"content": "Faster Integer Division with Floating Point\n\nMultiplication on a common microcontroller is easy. But division is much more difficult. Even with hardware assistance, a 32-bit division on a modern 64-bit x86 CPU can run between 9 and 15 cycles. Doing array processing with SIMD (single instruction multiple data) instructions like AVX or NEON often don’t offer division at all (although the RISC-V vector extensions do). However, many processors support floating point division. Does it make sense to use floating point division to replace simpler division? According to [Wojciech Mula] in a recent post, the answer is yes.\n\nThe plan is simple: cast the 8-bit numbers into 32-bit integers and then to floating point numbers. These can be divided in bulk via the SIMD instructions and then converted in reverse to the 8-bit result. You can find several code examples on GitHub.\n\nSince modern processors have several SIMD instructions, the post takes the time to benchmark many different variations of a program dividing in a loop. The basic program is the reference and, thus, has a “speed factor” of 1. Unrolling the loop, a common loop optimization technique, doesn’t help much and, on some CPUs, can make the loop slower.\n\nConverting to floating point and using AVX2 sped the program up by a factor of 8X to 11X, depending on the CPU. Some of the processors supported AVX512, which also offered considerable speed-ups.\n\nThis is one of those examples of why profiling is so important. If you’d had asked us if converting integer division to floating point might make a program run faster, we’d have bet the answer was no, but we’d have been wrong.\n\nAs CPUs get more complex, optimizing gets a lot less intuitive. If you are interested in things like AVX-512, we’ve got you covered.\n\nhackaday.com/2024/12/22/faster…\n\nhttps://hackaday.com/wp-content/uploads/2024/12/simd.png?w=800",
"sig": "54e19ae4a18f52f81e99d7d6396e4523300fcf922973f6b59453cf1d7957d49def8966d3f11feae9b01331068d61b0a94513a09de71cedf3d1f78714a0a78a67"
}