During Apple’s “Scary Fast” event, unlike most other things, one feature caught my attention: dynamic caching. Like most people who probably watched the presentation, I had one reaction: “How does memory allocation increase performance?”
Apple introduced the new M3 chip based on a “cornerstone” feature it calls Dynamic Caching for GPUs. Apple’s simplistic explanation doesn’t make it clear what dynamic caching actually does, much less how it improves GPU performance on the M3.
I dug deeper into specific GPU architectures and sent some direct questions to find out what dynamic caching actually is. Here’s my best take on what is undoubtedly the most technically dense feature Apple has ever imposed on a brand.
What exactly is dynamic caching?
Dynamic caching is a feature that allows M3 chips to use only the exact amount of memory that a particular task requires. Here’s how Apple describes it in the official press release: “Dynamic caching, unlike traditional GPUs, allocates local memory usage in hardware in real-time. With dynamic caching, only the exact amount of memory needed for each task is used. This is an industry first, transparent to developers and the cornerstone of the new GPU architecture. This dramatically increases the average GPU utilization, significantly increasing the performance of the most demanding pro apps and games.
In typical Apple fashion, many technical aspects are deliberately obscured to focus on the result. Just enough to get the gist without giving away secrets or confusing the audience with technical jargon. But the general conclusion seems to be that dynamic caching allows more efficient memory allocation to the GPU. Simple enough, right? Well, it’s still not clear exactly how memory allocation “increases average utilization” or “significantly increases performance.”
To try to understand dynamic caching, we need to step back to examine how GPUs work. Unlike CPU, GPU excels at handling heavy workloads in parallel. These workloads are called shaders, which are programs that the GPU executes. To use the GPU effectively, programs need to execute Ton of shaders in one go. You want to use as many available cores as possible.
This creates an effect that Nvidia calls a “tail.” A load of shaders executes at once, and then utilization declines while more shaders are sent to execute on threads (or more accurately, thread blocks on the GPU). This effect was reflected in Apple’s presentation when it explained dynamic caching as GPU usage increased before coming down.
How does it play in memory? Functions on your GPU read instructions from memory and write the output of the function to memory. Many tasks will also need to access memory multiple times while they are executing. Unlike CPUs, where memory latency through RAM and cache is extremely important due to the low levels of parallel tasks, memory latency on GPUs is easy to hide. These are highly parallel processors, so if some functions are roaming in memory, others may be executing.
This works when all shaders are easy to execute, but demanding workloads will have very complex shaders. When these shaders are scheduled to execute, they will be allocated the memory needed to execute, even if it is not needed. The GPU is allocating a lot of its resources to a complex task, even if those resources are wasted. Dynamic caching seems to be Apple’s attempt to more effectively use the resources available to the GPU, ensuring that these complex tasks will Only what do they need.
In theory, this should increase the average utilization of the GPU by allowing more tasks to be executed at once, rather than having a small set of demanding tasks gobble up all the resources available to the GPU. Apple’s explanation focuses first on memory, making it appear that memory allocation alone increases performance. From my understanding, it seems that efficient allocation allows more shaders to execute simultaneously, which will increase utilization and performance.
used vs allocated
One major aspect that is key to understanding my attempt at explaining dynamic caching is how shaders branch. Programs executed by your GPU are not always stable. They can change depending on different conditions, which is especially true for large, complex shaders like those required for ray tracing. These conditional shaders need to allocate resources for the worst possible scenario, which means some resources may be wasted.
Here’s how Unity explains dynamic branching shaders in their documentation: “For any type of dynamic branching, the GPU must allocate register space for the worst case. If one branch is much more expensive than the other, it means the GPU wastes register space. This can lead to fewer invocations of the shader program in parallel, reducing performance.
Apple appears to be targeting this type of branching with dynamic caching, allowing the GPU to only use the resources it needs, rather than wasting them. It’s possible that this feature may have an impact elsewhere, but it’s not clear where and when dynamic caching begins when the GPU is performing its tasks.
still a black box
Of course, I need to note that this is all just my understanding, based on how GPUs traditionally function and what Apple has officially said. Apple may release more details about how it all works eventually, but ultimately, if Apple is truly able to improve GPU utilization and performance, the technical nuances of dynamic caching won’t matter.
Finally, dynamic caching is a marketable term for a feature that goes deep within the architecture of the GPU. Trying to understand that without the person designing the GPU will inevitably lead to misconceptions and unnecessary explanations. In theory, Apple could have scrapped the branding and let the architecture speak for itself.
If you were looking for a deeper look into what dynamic caching might be doing in the M3’s GPU, you now have a possible explanation. However, what’s important is how the final product performs, and we don’t need to wait until Apple’s first M3 devices become available to the public to find out. But based on the performance claims and demos we’ve seen so far, it certainly looks promising.