- The death of hardware store optimization.
- Examining the extent of AVX related downclocking on Intel's Ice Lake CPU
- Concurrent operations can be grouped relatively neatly into categories based on their cost
- Taking a second look at the newly introduced mask registers, this time with the benefit of a SKX die shot from Fritzchens Fritz.
- We look at the zero store optimization as it applies to Intel's newest micro-architecture.
- Probing a previously undocumented zero-related optimization on Intel CPUs.
- Adding static comments to a static blog using staticman. Static.
- Unexpected performance deviations depending on how you spell zero.
- Investigating some details of SIMD related frequency transitions on Intel CPUs.
- Some mostly too-low-level-to-care-about hardware details of the mask registers introduced in AVX-512.
- Can using clang-format make your code slower? Kind of.
vector<T> for various
T may not perform as you'd expect.
- Trying to determine exactly where asynchronous interrupts are delivered on Intel CPUs.
- A laundry list of speed limits that your code can't exceed.
- Building sort functions faster than what the C and C++ standard libraries offer.
- CPU microcode updates can cause silent and dramatic performance changes.
If there’s one thing the internet needs, it’s another blog. So after messing around with Jekyll and Github Pages for way longer than is reasonable, here we are.