Is Static Random-Access Memory Reaching the Limit of Scaling?

johnwalker · 29 January 2024 13:59

The main memory of most present-day computers is dynamic random-access memory (DRAM), called “dynamic” because each bit is stored simply as the charge on a capacitor connected to the gate of a single transistor. Left alone, the charge will leak away, losing the data, so dynamic memory must be “refreshed” regularly by reading and rewriting every bit frequently—for many chips every 64 milliseconds, or around 16 times a second. Static random-access memory (SRAM), by contrast, has what amounts to a multi-transistor flip-flop for every bit it stores. SRAM is much faster than DRAM, and is typically used for computer processor registers and on-chip cache for frequently-accessed data, allowing it to be manipulated without going to the slower main memory, and there is no need for the power consumption and overhead of refreshing the memory.

High-performance processors derive much of their speed by efficient use of on-chip SRAM caches for instructions and data, and designers have powerful incentives to fit as much SRAM which runs as fast as possible onto their processor chips. But since the bit cells of an SRAM are much more complicated (in many cases, six transistors instead of a single transistor for DRAM), shrinking them is more difficult, and begins to run into problems such as quantum tunnelling, power leakage, and sensitivity to tolerances in fabrication. Are we reaching the end of the road in scaling down SRAM and, if so, what approaches may allow SRAM performance to continue to increase?

jabowery · 30 January 2024 17:23

There are a couple of issues that may address this in the not-too-distant future:

The long history of Moore’s Law has crippled computer science by throwing hardware at software problems rather than permitting long-ago-abandoned but superior directions in software to emerge. This is related to The Hardware Lottery problem. I don’t know what adjective to use to describe people who say lack of interest in the Hutter Prize is due to the resource limit Marcus is sticking with. Even worse are those who don’t understand why I suggested that Ray Ozzie, upon taking over from Gates, hold an internal competition whereby the MS software suite be reduced to the smallest install size binary – because at least those guys don’t have the “limited hardware” to complain about. And what in the hell is up with academia? Oh, well, never mind…
Even though even on-chip DRAM latency is slower than SRAM, it is still a LOT faster than going off chip even for SRAM. There are enormous strides to be made here – especially since bandwidth is the same for DRAM and SRAM once latency is amortized. That’s the point of my dream* of stuffing the SOC with interleaved DRAM banks even if that means limiting SRAM real estate to something like the Cray-1 registers.

* I fully recognize the high risk of that dream given the need for mixed signal circuitry being central to its DRAM bank mutual exclusion between different CPUs. It’s such a wild idea that, in some respects, it may be up there with overcoming the meta-Hardware Lottery and transitioning the industry to GaAs. So I can’t get any mixed signal experts to even offer an opinion on it. However, there may be light at the end of the “tunnel” so to speak: The same guy that drew my attention to a formula that resolves the proton radius puzzle happens to be a mixed signal expert.

jabowery · 10 February 2024 00:17

Time to lay this idea on the cs.stackexchange since no one else has come up with it in the 2 decades following my suggesting it where it might have prevented human suffering at the hands of M$ “quality”.