The Engineering Art of Balancing Desire with Reality (as told by processor caches)

By Jared Sulzdorf on March 3, 2020

AMD’s Zen architecture on their Ryzen CPUs. Look at all that complexity!!

In a course about high performance computer architecture, it’s no surprise that most of the time is spent discussing how to speed up computers using their architecture. It’s almost as though the name of the course tells you exactly what to expect.

This week in CS6290 at Georgia Tech, we’ve moved on to caches, which play a key role in speeding up the retrieval of information. The processor’s goal is crunching data which is held either in main memory (RAM) or on the disk (an SSD or HDD). To get that data, the processor issues requests for memory addresses and retrieves the data from the memory storage unit that holds that information.

On a modern CPU, there are a number of memory caches. <brag alert!> My new AMD Ryzen 7 3700x has three such caches (following the typical naming convention of L1, L2, L3) with the size of the cache increasing the further away the cache gets from the registers and ALU (512 KB, 4MB, and 32MB).</brag alert!> The goal of these caches is to reduce the cost of going to main memory or to the disk. Generally, we can access the L1 cache in a cycle, but going to main memory may take 30 cycles, while going off to main memory takes hundreds of cycles and varies wildly in performance based on the storage technology. We can quantify this relationship using the following equation:

Average Memory Access Time (AMAT) = Hit Time + Miss Rate * Miss Penalty

Where:

Hit time is the number of cycles it takes to access the cache and find the memory address in question
Miss rate is the percentage of cache access where we fail to find the memory address in the cache
Miss penalty is the number of cycles it requires to go off chip (or to another cache) to find that memory address

As with all things in engineering disciplines, the game is to balance the physical realities and implementation details with the math. Here, we want the hit time to be low, which means we want a cache that is small (so that we don’t spend too much time reviewing each entry looking for a particular memory address), but we also want the miss rate to be low, which means we want a cache that is larger and more intelligently organized (invariably increasing the time it takes to get a hit). It is this balancing game that drives most of the conversation around caches.

There are a number of different cache organization schemes and coherence/replacement policies, and hardware techniques that are leveraged to make this balancing act less cumbersome. They are however, all responses to this basic notion that complex technical systems have points of friction where the desire to accomplish one thing will ultimately lead to a trade-off in another area.

These friction points are where innovation occurs and you see the beauty of computer science. Heuristics that piggyback off of common sense approaches (such as kicking the least recently used cache line out of a full cache) are developed and shown to be amazingly effective against more complex and technical designs. I’ve said it before, but this is one of the reasons that I’m glad much of my formal education has focused on computing systems. There’s no other study in this field that blends the theoretical with the practical all while influencing every other area of computer science.