Moore's law is dead

When I was trying to get funding for Audiallo (I know how to make very good mixed-signal hearing aids), I would start my presentation by saying "Moore's Law is dead." The purpose of this statement was to set the stage for the end of feature reduction, and to make the audience uncomfortable enough to consider a mixed-signal approach for hearing aids. Having a Ph.D. related to semiconductors, one would think that I would have a bit of credence regarding this statement, but I found that years of marketing have made Moore's law something of a mainstay in the vocabulary of business. This has made me spend a bit of time considering why this fact, and I have written this page to briefly explain power constrained computing, i.e: getting the most out of the battery.

Disclaimer: I will be glossing over a bunch of device physics and not explaining a bunch of details.

What is "Moore's Law?"

Gordon Moore was one of the Traitorous Eight who started Fairchild Semiconductor. He then went on to co-found Intel. The initial vision of Intel was to make memory ICs, and he noticed that about every 2 years, the amount of transistors that fit on an IC would double. He then wrote a "marketing piece" called "Cramming More Components onto Integrated Circuits". If you are in the business of selling memory, you basically want something like this paper because it says that you'll have to purchase a new, better IC every few years. If you read the document, it reads like a marketing piece, and I draw your attention to Figure 2.

So, why is it a "law"? It's not a law from the point of anything that is physical, but Carver Mead coined "Moore's Law", which is probably the greatest gift to any company on the planet, in this case Intel. Moore's Law is a trend, which held up very well with scaling over time due to a bunch of great engineers. What does "scaling" mean actually? It means that you can just make a device smaller and keep the same ratio of all dimensions. Carver and Conway famously described scalable CMOS in their book Introduction to VLSI systems, and as long as scaling held, it was a boon for the IC makers. So what happened that scaling doesn't help us that much today? Physics happened. Some of the nastier parts of electron transport showed up. The behaviors where always there, but they were higher order effects that used to have very little effect on the devices.

But a transistor is just a switch right?

Well, for digital people, a transistor is just a switch that is "on" or "off" For analog people and physicists (I'm in this group), the transistor is an amplifier and there is no "off" state because it always has some conductance. The "switch" point between ON and OFF is the threshold point, which is a voltage. Below this threshold voltage is a region called "subthreshold" and above this is above threshold. For scaling to hold, the physics needs to hold as well as far as the ratios of electric behavior. This means that the regardless of the process, the percentage of operation in each region is same. The following figure shows changes in current drive (Isat) and threshold (Vthn) range for different processes.

If scaling is holding, the "sub-vt%" would be holding as well.

There is a group called the International Technology Roadmap for Semiconductors (ITRS) that does predictions for the future of scaling in CMOS. I do not understand the internal operating of the group because their predictions never lined up with the physics of devices I tested. I give the ITRS numbers as a comparison with the published IBM numbers (IBM has a wonderful fab! The physics is always better than other houses that I've used), so the IBM numbers are probably the "best case" scenario. The graph clearly shows that the transistors are becoming worse amplifiers and subthreshold is a larger region of operation. (my thesis suggests that scaling for CPUs will end at 22nm due to velocity saturation being hit before you get out of subthreshold, so that's my guess of the lowest node you'll see for CPUs. Memory can get to about 12nm due to the nature of the application)

What does Moore's Law have to do with processing power?

So, what does Moore's Law have to do with processing power? The answer is nothing actually. There's really no correlation to processing power and the number of transistors on silicon. The actual processing power is really application specific and depends on the paradigm of the analysis. I design things in the subthreshold region of operation that has the best power performance, but worst relative speed, so I have a good Op/Joule performance. However, most people just look for pure throughput. I have news for the throughput people: you're hosed in a portable application. In fact, it's much worse than you think due to batteries, but I'll get to that later. Dr. Marr (our papers should be floating around the web, look for marr and degnan) and I were looking at the ExaScale challenges and through a survey of processors on the market, we found that a power-wall exists for processing that represents an asymptote in processing efficiency. You might call this a "Power Wall". (All I need now is for Carver to coin a name...) This means that regardless of how much power you throw at a CPU, you'll never an improvement in actual performance. If you actually want to process data and are power conscience, you use a metric called MMAC/mW, which is Million Multiplies per milliWatt.

What these graphs show is that processors are hitting an energy efficiency limit, even with scaling. This means that you can no longer throw more transistors and more power at a problem. (I'm not addressing clusters of CPUs here, I'm interested in processor that need trickles of power that can run on a battery). So, scaling isn't helping with power and processing performance, but what about batteries? Aren't those getting better?

It's even worse on a battery

Battery performance is not improving at the same rate as power consumption. First, let's look at energy density plotted with scaling. In the ideal case, the weight of a battery is fixed, so that if you need more power, the battery does not need to be larger. In the ideal case of scaling, the transistors use 1/2 the power for 1/2 the size, so even if you double the transistor count, you still can use the same battery. Things actually look like:

It would be nice if the slopes where the same, but they are not. If the slopes were the same, it would mean parity between power needs and battery performance.

So what do you do?

Well, there are a lot of things. From the CS perspective, you can have smaller code to so that the "switches" flip less. Basically, code should be functional instead of "feel good", ie: write code that make sense on the CPU level instead of a bunch of lamba functions that are fun to think about. From the hardware perspective, you can be smarter with how you design ICs. I like using an asynchronous core with a synchronous interface, and most issues can be overcome with some smart engineering.