Limits of Computing (1997)
[Author's note 11th Mar 2011 - In the IT business writing predictive articles like this one is simply asking to be humiliated, through massively underestimating the pace of change in our industry. The reason I'm uploading the following Byte article from 1997 is that I'm quite proud of the fact it defies such humiliation. My predictions of limiting transistor feature sizes and clock speeds by 2010 lie within the same order of magnitude as today's actual parameters - roughly speaking 35nm (0.035micron) feature-size and 1.5 GHz clock speed - which is as close to a bullseye as one can hope for in this game!]
An edited version of this material originally appeared in the March, 1998 issue of Byte magazine, a McGraw-Hill publication.
THE LIMITS OF COMPUTING
We've been living with Moore's Law for 34 years now, so the idea that one day soon it might cease to apply is rather hard to accept. It was back in 1964, four years before he helped to co-found Intel, that the promising young engineer Gordon Moore noticed that the number of transistors you could squeeze onto a single chip was doubling every year. Those early chips contained a few thousand transistors, but once the semiconductor business got into its stride the pace slackened a little, doubling the transistor count only every 18 months, and it's been doing that ever since. Now the Pentium Pro contains 5.5 million transistors, and the next generation Merced will have closer to 10 million. What's more, as the number of transistors doubles, the next generation chip doesn't cost much more per unit to manufacture, so the cost per transistor actually halves every 18 months.
This happens because chips are fabricated using lithography - in effect a kind of printing process - so the economics work rather like print on paper. It would cost almost exactly the same for Byte to print twice as many words, at half the point-size, on this page, though you might destroy your eyesight trying to read them. That problem doesn't arise with transistors, because a half-size transistor is as just good as its predecessor. Actually it's better, because smaller transistors are faster than big ones, at least in the CMOS (Complementary Metal Oxide Semiconductor) technology from which all modern chips are built. As you shrink CMOS transistors you can switch them proportionally faster by raising the clock frequency, while the power consumed by each transistor remains the same. The fortunes of Intel and the other semiconductor giants have rested for nearly 20 years on this virtuous circle of physics. It's the reason that the 166MHz Pentium machine I'm writing this article on is hundreds of times more powerful than the single mainframe that served the whole of London University during my student days in the 1960s.
Every few years someone predicts the end of this exponential growth of chip density, but every time so far they've been wrong. Now the signs are though that a slackening of the pace is in sight, not next year but within the next decade, and the reasons may be as much financial as technical. Going back to that print-on-paper analogy, you may have already spotted a weakness in the argument - printing twice the words on each page might not raise the incremental cost of printing the page, but Byte's overhead would double because the writers would need paying twice as much for all the extra words. The analogy with semiconductors holds good; as chips get denser, the cost of designing them and the cost of fabrication equipment goes up too, not quite so fast perhaps but following a similarly exponential growth curve. An increasing number of industry commentators are coming to believe that it will be the cost of building new fab plants, rather than technological obstacles, that causes Moore's curve to flatten off into a plateau.
May The Physics Be With You
The Pentium Pro and PowerPC microprocessors that power our current generation of desktop machines are built in CMOS technology with a 0.35 micron feature size (roughly speaking the size of each transistor and the width of the metal tracks that connect them) while the next versions of these chips, currently being sampled, will shrink to a 0.25 micron process. As Tom Thompson explained in this spot last year ["Limits of Computing", Byte April 1996] these sizes are already pushing up against one kind of technical barrier, in the lithography process itself. The laws of optics require you to use light of wavelength no greater than the smallest feature to accurately copy the masks that describe a chip's layout onto silicon. Current litho machines use mercury arc light of 0.365 micron wavelength, and next generation machines will move to exotic krypton/fluorine UV lasers which emit at 0.248 microns. As chip fabricators try to reduce the feature size further, the illumination sources they need get ever more exotic, more expensive and less tractable. UV lasers, plus optical tricks to enhance resolution are good down to about 0.1 micron, but below that you need X-rays and then things get really tricky. No method exists for focussing X-rays, so you can't exploit optical reduction; masks must now operate like 'contact prints' meaning they need to be the same size as the chip itself, and making them terribly difficult to manufacture. It's also hard to find mask materials that are sufficiently opaque to X-rays. Despite decades of research, no commercial chip has yet been made using X-ray lithography.
Lithography is only the first obstacle. At around 0.1 micron feature sizes, problems with transistor threshold voltages, capacitance and resistance in the metal interconnects also hit crisis point. The attraction of CMOS is that - in contrast to competing technologies - transistors made in CMOS consume almost no current when they're turned off, the benign characteristic that enables immensely complex chips like the Pentium to keep reasonable power consumption. As you shrink chip designs each individual transistor can run faster while consuming the same amount of power, but of course the total number of transistors per unit area is increased, and so the power consumed (and hence heat dissipated) per unit area rises by a square law - halve the feature size, four times the watts/squ cm. Another benign property of CMOS technology comes to the rescue - the transistors' voltage and power consumption are also related by a square law, so that lowering the supply voltage can compensate for this increased power consumption. Dropping from 5 volt to 2 volt operation brings a 6-fold power saving (52/22=25/4) and dropping to 1 volt would reduce power consumption 25-fold. That's why chip designers have been reducing CPU core voltages over the last decade as feature sizes get smaller, by steps from 5v, through 3.3v and 2.8v, down to 2.5 and even 1.8 volts for next-generation chips. As a result we keep getting double the computing power for almost the same power consumption.
But in semiconductor design the trade-offs just keep on coming at you. Reducing the supply voltage reduces the maximum clock rate for the chip by increasing the gate delay (ie. the transistors switch slower), unless you offset this by also reducing the transistor's threshold voltage (the voltage at which the transistors switch on.) Lower the threshold voltage too far and the transistors will not work at all, but even before that point they begin to leak current even when turned off. Steve Furber, Professor of Computing at Manchester University, is an authority on low-power chip architectures (he designed the original ARM chip) who describes the situation like this: "in standard bulk CMOS, for every 60 millivolts you lower the threshold, leakage goes up by an order of magnitude. At the moment the industry is lowering the supply voltage only as fast as it has to, to avoid transistor breakdown." The bottom line of this mind-boggling series of trade-offs is that just under 1 volt operation looks like a fundamental limit for CMOS technology, and that's exactly where sub-0.1 micron chips are headed.
Cross-talk is another gremlin that will make 0.1 micron hard to achieve. Today's chips are essentially 2-dimensional objects, so as you shrink them you have to cram everything closer together. The tracks that join the transistors must get narrower, but to avoid increasing their resistance too much they have to be left relatively thick. Eventually they look more like walls than roads (see fig. 1), and neighboring pairs of tracks start to behave as long parallel plate capacitors, which both slows down signal transmission and also allows the signal on one wire to leak into its neighbour (cross-talk).
As process geometries shrink, capacitive effects between the more closely spaced conductors lead to signal leakage or 'cross-talk'.
Chip designers will be fighting an uphill battle against cross-talk. For example the latest generation of CAD tools, when asked to lay out a long straight track run will automatically space the tracks further apart than the minimum width, but as Furber observes "of course that's very good for reducing cross-talk, but it starts to negate the advantage of the small feature size". One solution that promises to delay the inevitable is employing a better conductor, copper, for the interconnect so that tracks can be made of lower profile to reduce intertrack capacitance (see Text Box "Copper-bottomed chips"). Keeping up exponential growth beyond 0.1 micron will mean moving beyond the current universe of 2-dimensional CMOS fabrication. There are hundreds of different university research projects into 3-D chip structures and molecular level transistors built from exotic organic polymers, but none of these technologies yet looks like a viable basis for such a huge industry.
IBM has solved the problem of bonding copper to silicon, allowing it to create tracks with superior conductance.
In September 1997 IBM announced to the industry that it has cracked the problems of copper metallization for its new CMOS 7S process. Manufacturers have long known that copper, a superior conductor, would be better than the aluminium currently used to make the tracks, as it permits thinner tracks for the same resistance. However aluminium is much easier to deposit by evaporation, and it sticks better to the silicon substrate, which is why it has been used for the last 30 years. IBM's CMOS 7S process will lead the market in several parameters (for a while until Intel catches up). Using six metal layers and a 0.2 micron feature size, it can pack 150 to 200 million transistors onto a die and operate at 1.8 volts. IBM will be introducing ASICs built in CMOS 7S this year, and will migrate PowerPC to the process soon.
It's a combination of all these difficulties that may cause Moore's exponential curve to level off, at somewhere just below 0.1 micron feature size, with over 100 million transistors on a chip, a 1 Gigahertz clock rate and 1 volt operation, sometime during the next decade. Steve Furber thinks "we've been blessed so far, in that the sort of things ECL [Emitter Coupled Logic as used in 1970s mainframes] board designers used to worry about - transmission line effects, ringing and so on - have not existed in CMOS chips to any significant extent, but they're coming". The SIA's (Semiconductor Industry Association) National Technology Roadmap for Semiconductors, a collection of best-guess predictions from experts across the industry, agrees, hinting at a slackening of the increase in clock speed around the 1 GHz level by year 2010. Outside of CPU design, there are other obstacles. In DRAM fabrication, progress is slowing as cell sizes fall below 0.2 micron, and a move to expensive Silicon-on-Insulator technologies may be needed to get past the 1 Gbit level. Bus speeds currently lag 5 times behind CPU speeds (eg. 300 MHz Pentium on 66 MHz bus), putting tremendous pressure on cache technologies to absorb the difference. A 1 GHz CPU will demand at least a 100 MHz bus - Intel already has motherboards that fast running in its labs - but such boards are hard to design and expensive to make, which may upset the economics of the commodity PC market. However in a sense none of these technical obstacles are the real problem – instead it may be the business model that is in danger of topping out (see Text Box: The Cost of Fab).
A Quantum Leap?
What also makes the 0.1 threshold hard to cross is that at those sizes quantum effects start to become important - the number of electrons in the body of each transistor is only a few thousand, rather than billions. Ohm's Law and other rules that electrical engineers work by only describe the statistical average behaviour of huge numbers of electrons - once you get electrons on their own they follow the very different laws of quantum mechanics. As Steve Furber puts it "the probability that a few hundred electrons will spontaneously migrate to the next transistor may be only zero-point-many-noughts percent, but given billions of transistors, in a chip that’s expected to perform billions of operations per second, without error for years, you'd eventually start to see that happen."
But what about the possibility that quantum effects might be put to use to create a new kind of computer? Though many researchers are pursuing that goal, there aren't any working quantum computers yet. Present experiments are still at the level of demonstrating one or two working quantum gates. However the excitement that drives the search for a quantum computer is that it promises to be able to solve classes of problem that are for ever denied to conventional computers.
Rather than operating on the bulk properties of electrons like a 'classical' digital computer, quantum computers operate at the level of single quantum particles - atoms, ions, single electrons or photons. Tom Thompson described some of the current experimental systems in his 1996 article - promising systems include beryllium or caesium atoms trapped inside resonant cavities, and quantum 'dots' which are single electrons trapped in the energy wells at the junction of two layers of doped silicon. Many of the experiments on quantum computing use lasers and mirrors, because quantum optics is a well understood field, rather than because that's the way to practical devices. Which physical implementation eventually triumphs is not too important at this stage though, as the principle underlying them all is the same, in a very profound sense (see text box "The Square Root of Not").
Quantum computers operate on basic entities called qbits, analogous to the bits in a classical computer but more powerful. Qubits can be in one of three states: 0 and 1 are the same as for a classical bit, but the third state is a superposition of the two, both 0 and 1 at the same time. Were you to build a 4 qubit-wide register and put all 4 qubits into the superposed state, that register would contain all the numbers between 0 and 7 at the same time. Any arithmetic operations you perform on this register will be applied to all these numbers simultaneously, so quantum computers are intrinsically highly parallel machines, and the implications of this are startling (see box "Taming Tough Algorithms").
In order to exploit quantum parallelism you need the right kind of problem and algorithm. For example adding two superpositions of the numbers 0..7 would produce a superposition of 0..15, but reading the register would collapse it to one random number from that range, probably not the correct result - addition is not a quantum operation. Quantum computers have a different range of usefulness from classical computers, and in the opinion of David Deutsch of Oxford University "they will probably never be used to add up accounts in banks". Deutsch wrote the seminal paper on quantum computing in 1985, and is leading figure in the quantum computing community. He went on to explain "the way that all quantum computers will be used is as a module that you plug into a classical computer, which will program the lasers to first set up the quantum bits in their initial state, then to do the desired computation on them, and finally to read out the answer to be incorporated into some classical computation."
THE SQUARE ROOT OF NOT
To create a quantum superposition you'll need a quantum particle which has at least two states (a ground state and an excited state) and some means of forcing it to switch between the states. These states could be the spin of a single electron, electronic orbitals in an ion, or the magnetic dipole of an atomic nucleus, and corresponding switching agents could be pulses of laser light or microwaves. Designate one of the states (say the ground state) as representing 0 and the other as 1, so the particle encodes the values of a classical digital bit. As shown in fig 1, applying a switching pulse of duration p turns a 0 into a 1 (or vice versa) and so the particle is behaving as a boolean NOT gate.
What makes a quantum gate different is what happens when you hit the particle with a pulse of duration exactly p/2. The particle's state now becomes a 'coherent superposition' of both states, that is it represents both 0 and 1 at the same time (see fig. 2). If you apply the p/2 pulse again, this superposed state collapses into the opposite of the original state. David Deutsch christened this device the √NOT gate because applying it twice in succession is equivalent to a single NOT operation (√NOT * √NOT = NOT). He also demonstrated that this is a universal operation (as are almost all quantum interactions), so you can perform any quantum computation at all by using a combination of them - just as you can use NAND gates to create any classical computation. Deutsch states this rather more dramatically by "almost anything is a quantum computer if you shine the right kind of light on it".
A pair of particles can also enter a superposed state in which all their combinations (00,01,10,11) are contained at once - such particles are said to become 'entangled' and quantum mechanics confers on them the remarkable property of 'remembering' their entanglement even after they have moved far apart, which raises the possibility of quantum encrypted communications. A quantum computer would carry out its operations by allowing pairs (or threes, fours, etc.) of qubits to become entangled in a defined sequence, to evolve the system towards a desired answer. The final act of reading the result collapses any remaining superpositions, so the quantum computer would not produce multiple answers.
The big problem for quantum computers is 'decoherence', the collapse of the superposed state back into 0 or 1. A superposition normally only persists for a short period - measured in microseconds - but since the switching pulses are measured in nanoseconds there's time to perform some thousands of operations before it collapses. However any unwanted interactions with the particle's environment will cause the superposition to collapse prematurely. The challenge in designing any sort of quantum computation is to find sequences of quantum operations that will produce an answer before decoherence sets in.
Deutsch doubts that quantum computers will ever be fabricated with the sort of large scale integration we see in CMOS devices. You can't transmit superpositions down wires, and the problem of decoherence would forbid that sort of approach. Instead he sees the first practical quantum computers as consisting of a single ion-trap cavity, containing a few tens of ions. These would be kept close to 0o Kelvin by laser cooling, using the same tunable laser that performs the computations. The laser would excite the ions by pulsing at different frequencies, causing different pairs to interact, rather like the way that the metal balls in a Newton's Cradle toy interact. The exponentially parallel nature of the quantum computer means you could still solve huge problems with only a few qubits. As Deutsch says "for classical computers the difficulty has been to make them smaller and faster, but for a quantum computer the difficulty will be to put more bits into the interactive region". Exciting as quantum computers are, they are not in direct line of descent from MOS technologies, and the same economics are unlikely to apply to their fabrication - they are likely to have their own, different equivalent to Moore's Law.
Taming Tough Algorithms
Perhaps the hardest to bear of the limits on computing has nothing to do with physics or materials, but with algorithms. Computer scientists classify algorithms according to how fast their execution time grows with the problem size n. The set P (for Polynomial) is the set of all algorithms whose running time grows only as a constant power of the problem size n (eg. as n2 or n3). P contains many well-known polynomial algorithms for sorting, searching and so on that form the basis for most of our computer software.
On the dark side is NP, the set of all algorithms whose time grows exponentially (or worse) with problem size (eg. as 2n). The running time of NP problems can soon overwhelm any known computer for even quite moderate problem sizes - and there are many economically important problems for which only NP algorithms are known. For example Public Key Cryptography is secure precisely because factorizing large numbers is an NP problem. These problems are not all so exotic either - even humble tasks like creating an optimum school timetable, or planning the best routes between several cities are NP problems if you need an exact solution. An NP routing problem that takes 17 minutes (210 secs) to solve for 10 cities would take 40 billion trillion years (2100 secs) for 100 cities. The real buzz about the quantum computer is that it promises to cut NP problems down to size. That's because quantum parallelism is itself exponential in nature - a register made up of n superposed qubits operates on 2n numbers in a single 'cycle'. That means that a quantum computer could in theory tackle what are at present NP problems in polynomial time, making them just another mundane computing task.
If and when a working quantum computer can be built, its first "killer app" will be the factorising algorithm discovered by Peter Shor of Bell Labs in 1994. Shor's algorithm works by filling a register with n qubits and setting them to the superposed state, then raising a random number x to the power of the register contents (ie. all the integers up to 2n), dividing by n and storing the remainders in another register. The remainders form a repeating sequence, whose repeat frequency f is revealed by allowing the qubits in the second register to interfere with one another, and from f you can calculate a factor of n. The method sometimes fails, but trying again with a different x will succeed, and since the whole process takes only three quantum steps, you can afford to keep trying until it does. No doubt government agencies (and businesses) that depend on Public Key encryption are watching these developments with interest. Fortunately it seems likely that quantum encryption algorithms can be found which are wholly uncrackable.
THE COST OF FAB
It may be business models rather than any technical obstacles that will halt the exponential growth in chip density. G. Dan Hutcheson is the man who, back in 1981, developed the non-linear ROI (Return on Investment) model that big semiconductor companies still use to guide their investment in plant and equipment. He points out that it's price-based business decisions, rather than technical obstacles that ultimately control chip lifecycles: "the economic consequences of approaching technical barriers will be felt before the barriers themselves are reached. For example, the costs of achieving higher levels of chip performance rise very rapidly as the limits of a manufacturing technology are approached... Increasing costs may drive prices beyond what buyers are willing to pay, causing the market to stagnate before the actual barriers are encountered." In Fig. 3 the solid green line depicts the overall cost/performance history of a hypothetical chip family; the dotted curves show the cost/performance curves for successive generations. E1, E2, and E3 show the points at which the economics force abandonment of the old technology, long before the technical "brick wall" region (T1,T2 and T3).
Each new generation of a chip design has its own cost/performance curve.
Each peak in this cycle requires a new fabrication plant, and the cost of such fabs is rising at about half as fast as Moore's Law itself, doubling around every three years. Intel spent $1.3 billion on its new fab in Chandler (Ariz) while both Samsung and Siemens have paid $1.5 for new fab. You need to sell a lot more chips to recoup such investments and eventually that will not be possible - then you have to raise prices. Hutcheson says "the semiconductor industry is not likely to come screeching to a halt anytime soon. When the cost per bit begins to rise permanently, the most likely result will be an industrial phase change that significantly alters business models." Other industries went through similar phase changes long ago.
For example the aeroplane business went from the Wright brother's biplane to the Flying Fortress in just 40 years. A key parameter - cost per passenger-mile travelled - decreased monotonically for decades until in the 1960's it hit the price buffers. Concorde was the (uneconomical) high-water mark of speed, and the Boeing 747 Jumbo was the (still unsurpassed) high mark of capacity. With exponential growth finished the industry had to diversify, eg. by making some smaller and slower planes for the short-haul operators. Research effort shifted from speed and size to greater fuel efficiency, quietness, and passenger comfort. The equivalent in the semiconductor business would be for big-iron chips like Intel's Merced to remain expensive, and get revised less often. Semiconductor firms would have to branch out horizontally, using spare silicon real estate to add more functions to a chip: DSPs, graphics accelerators, eventually whole computer systems on one die. There are signs this is starting to happen ; Intel is rapidly diversifying into graphics, telephony, networking and video; has just reorganized itself into four divisions; and has finally conceded it needs a cheap Pentium II for sub-$1000 PCs.
It’s certainly a scenario endorsed by Robin Saxby, Chairman and CEO of ARM Ltd. The firm has no fabrication facilities of its own, using commercial foundries like TSMC instead, and Saxby sees the company's future in designing ASICs based around the ARM core and eventually whole systems on a chip: "if you look at the motherboard of a today's PC, you'll find devices from around 15 different silicon vendors - CPU from Intel, graphics from C & T [Chips and Technologies], modem from Cirrus Logic, sound chip from Yamaha. The future PC 'motherboard' will be a single chip but just one silicon vendor will supply that chip and the biggest challenge is getting the Intellectual Property for all those circuits to work and connect together." ARM is one of 128 firms, others including IBM, Alcatel, Fujitsu, Sony, and Toshiba, that formed the Virtual Socket Interface Alliance (VSIA; www.vsi.org) to create standards for integrating such system-on-a-chip products by sharing and licensing the circuit components.