MESIng with cache

(Please excuse the terrible pun, couldn’t help myself). As we all know, computer cache is a touchy beast, seemingly little modifications to the code can result in major performance changes. I’ve been playing with performance monitoring counters recently (using Agner Fog’s library I mentioned before). I was mostly interested in testing how cmpxchg instruction behaves under the hood, but wanted to share some other tidbits as well. Let’s assume we’re working with a simple spinlock code.