Internet

The Internet is best understood as a tall, multi-layered cake, each layer depending entirely on the foundation below it. At the bottom are physical cables, then communication protocols, then addressing systems. On top of these reside the hosting layer with all the cloud infrastructure, and finally, at the summit are the service applications themselves that we use daily — websites, mobile apps, social platforms and increasingly AI/ML services — all entirely dependent on the layers below.

Every time a new layer emerged or matured, markets treated it as a gold rush—and a bubble followed.

The internet network architecture

1. Physical Layer - The Actual Cables: At the base are undersea links, terrestrial conduits for fibre-optic cables—thin glass strands carrying light signals across oceans and continents—which form the physical highways of global data. These are literal glass tubes that shoot light pulses carrying data in packets. The physical wiring layer fueled the late 1990s telecom boom. Massive overinvestment in fibre infrastructure led to overcapacity, and many companies like WorldCom collapsed when demand failed to materialise as quickly as projected.

2. Internet Protocol and language Layer - This is where TCP/IP lives: Above them sit the open protocol and language layer — open standards like TCP/IP, routing mechanisms and open-source implementations — enabling devices to “converse in the same language”. These are the fundamental rules for how data packets move around. Next comes the addressing tier: systems such as DNS, IP addresses and URLs that translate human-friendly names into numerical network addresses so packets can find their way through the network. This layer was built mostly by unpaid developers, academics, and global standards bodies rather than corporations. The Linux Foundation maintains much of the open-source software that implements these protocols, but no single entity owns them. This layer didn't create traditional financial bubbles because it's fundamentally open standards and freely available to all. However, it enabled every bubble that followed.

3. Server/Hosting Layer: This is where cloud providers sit. Cloud platforms, essentially data centres, such as AWS, Google Cloud, and Azure, became the world’s rented server rooms, while networks like Cloudflare acted as global traffic police—accelerating content, filtering attacks and keeping services reachable. They rent out virtual machines and storage, also called web servers, databases, and the frameworks that power websites and services, and specialist networks such as Cloudflare that host, accelerate and protect both applications and traffic. Simply, AWS hosts applications, Cloudflare protects and accelerates them. AWS hosts a third of the internet's servers, acting as a warehouse for companies' data. Whereas Cloudflare handles traffic for a fifth of all websites, acting as a security guard and delivery system between customers and the warehouse. Cloudflare provides: DDoS protection (blocking attacks), CDN (caching content closer to users), DNS services Security features. When AWS or Cloudflare go down, huge chunks of the internet break, millions of sites go down and become unreachable. The cloud layer drove a decade of hyperscaler valuations (2010s–present). Every company felt compelled to "move to the cloud," creating massive valuations for infrastructure providers even before profitability models were fully proven.

4. Application Layers: This is where actual websites and services run. Everything familiar about the Internet lives here: social media, e-commerce, streaming services, and now AI applications. Within the application layer, infrastructures called runtime engines (runtime execution environments) V8 (2009), JVM, CPython, Node.js—and browser engines like WebKit, Blink, Gecko and WebAssembly (WASM), execute the 'logic' of web pages, web apps, and server-side applications. These enable JavaScript to run both in web browsers for interactive front-ends and on servers for backend applications and APIs. Just like an operating system translates user actions into binary instructions for the CPU, runtime engines translate high-level programming languages into machine code. Similarly, runtime execution engines act as intermediary layers between human-readable code and CPU instructions, making applications actually function. Programming languages like C, Java, Python, and JavaScript are used to write code that executes logic, makes decisions, and processes data. Markup languages like HTML, XML, and Markdown describe the structure and presentation of content. Data formats like JSON, XML, CSV, and YAML store and exchange data between systems. C doesn't need a runtime engine because it compiles directly to standalone executables that run natively on the CPU. In contrast, Java Virtual Machine (JVM), CPython, and Node.js exist because Java, Python, and JavaScript need something to translate and execute their code at runtime. The flow works like this: Code written in a programming language (JavaScript, Python) makes an API request → API defines how systems communicate → JSON is the data format exchanged → Runtime engine (Node.js, JVM, CPython) executes the code → HTML displays the result to users. These technologies revolutionised and supercharged the application layer by enabling fast, rich applications that run in web browsers and server-side environments.

The application layer has sparked multiple investment frenzies:

Social media boom (2010s): Facebook, YouTube, Twitter, and Instagram created closed "walled garden" ecosystems with massive valuations despite initially limited revenue models.
Mobile app mania (2010–2018): Every company felt pressure to build dedicated mobile apps, triggering a commercial land-grab that was really just a new interface to existing services, not a fundamental technical innovation.
AI/ ML (2023-Present): The current AI frenzy is not a separate foundational layer of the Internet stack. It's just the latest tier (sub-layer) within the application tier, similar to search engines, recommendation systems, or payment gateways. What makes AI different is its appetite. AI workloads are far more compute-intensive than previous applications, which amplifies dependence on the cloud layer rather than creating a new one. This is why hyperscalers like AWS, Azure, and Google Cloud, alongside GPU chip makers, have become the primary beneficiaries of the AI boom. They're selling the pickaxes in this gold rush. Every company now adds "AI-powered" to its offerings, valuations skyrocket, and massive infrastructure investment flows despite unclear paths to profitability for many players. In that sense, today's AI frenzy is not new at all. It's simply the highest floor built on a tower that has bubbled, crashed, and rebuilt itself many times before.

The Internet has always grown in layers, and every time a new layer appeared or evolved, markets treated it as the next transformative opportunity. The technology was real. The infrastructure got built. But the valuations often ran ahead of the revenue, creating cycles of boom and correction that have defined the digital age.

Professional consulting firms have long ridden each technological wave by selling “digitisation,” “transformation,” and “enablement” programmes to corporations, often packaging familiar process-mapping and IT integration work as the next frontier of innovation; yet the outcomes have frequently fallen short of the grand promises, with clients discovering that expensive roadmaps do not necessarily translate into real change.

This cycle mirrors a broader pattern: major technological booms and busts are predominantly Western phenomena, particularly centred in the United States. Emerging markets experience technology adoption, but rarely the speculative frenzy that characterizes Silicon Valley and Wall Street. The reason is straightforward: real innovation happens in the West, and it takes years to trickle down to other economies. Western markets—specifically the US—host the venture capital ecosystem, research universities, and risk-tolerant capital markets that fund experimental technologies before they're profitable or even fully formed. This creates the boom: investors pour money into nascent technologies, valuations soar based on potential rather than revenue, and eventually reality intervenes. By the time these technologies reach emerging markets, they're mature, proven, and commoditised. Emerging economies adopt cloud services after the cloud wars have been fought. They build mobile apps after the mobile bubble has deflated. They'll deploy AI after the current frenzy has settled into practical applications. They skip the speculation and inherit the infrastructure. This isn't a disadvantage—it's often an advantage. Emerging markets avoid the costly mistakes, the overinvestment, and the wreckage of failed startups. They adopt what works. But they also don't capture the enormous wealth creation that happens during the bubble phase, when early investors and employees of successful companies extract generational fortunes. The West tolerates boom-bust cycles because the booms create real infrastructure and occasional trillion-dollar companies, even if most participants lose money. The busts are painful, but they're seen as the price of being first. Emerging markets get stability and lower risk, but they're always building on foundations laid—and paid for—elsewhere. In that sense, the fibre bubble, the cloud boom, the app frenzy, and today’s AI mania all follow the same script: a Western market inflates a vision of the future, global consultants monetise the storytelling under the banner of transformation, and the rest of the world eventually adopts the technology only once the dust has settled.

Beneath the visible Internet stack lies a deeper industrial and geopolitical foundation: chip fabs (TSMC, Intel), the ultra-sophisticated tools that make fabrication possible (ASML, Lam, Tokyo Electron), the design software that architects chips (Synopsys, Cadence), the packaging/test ecosystem, and beneath all of that the global capital markets—anchored by the US dollar—and the Western security architecture that protects supply chains. These layers form the real substructure of modern computing: a mix of physics, capital, and geopolitics on which cloud platforms, AI models, applications, runtimes like V8, and the entire Web ultimately depend.

Semiconductor Fabrication Layer: This layer represents the companies that physically manufacture chips—the CPUs, GPUs, and memory that computers and data centres rely on. These firms take circuit designs and manufacture them using advanced lithography, chemicals, and wafer-processing technologies. Everything above it—cloud servers, phones, AI accelerators, laptops—exists only because these fabs can produce chips at unimaginably small scales (3nm, 2nm). If they stop, the entire digital economy collapses. TSMC manufactures ~90% of the world’s leading-edge chips, making it the single most important industrial node on Earth. This is why the chip supply chain is a geopolitical flashpoint.

Semiconductor Manufacturing Equipment (SME) Layer: Companies like ASML, Tokyo Electron, Applied Materials, Lam Research, KLA build the machines used by chip fabs. ASML: EUV lithography machines—the most complex machines ever built (~$150M each). Lam Research: Etching and deposition tools. Tokyo Electron & KLA: Metrology and process equipment. Fabrication cannot exist at all without the machinery these firms produce. ASML’s EUV tools are so advanced and so few that ASML alone practically decides which country can make cutting-edge chips. This is the industrial bedrock beneath the semiconductor layer.

EDA (Electronic Design Automation) Software Layer: Synopsys, Cadence, Siemens EDA, ARM, Ansys represent the software used to design chips. These tools turn human circuit designs into layouts that fabs can manufacture. Chip design today is impossible without EDA tools. These companies—mostly American—hold oligopolistic control over chip design software. They are the intellectual infrastructure beneath the physical infrastructure.

Outsourced Assembly & Testing (OSAT) Layer: ASE, Amkor represent the companies that: package chips attach them to substrates test them for defects This is the layer after fabrication but before chips are usable. Without this step, raw silicon dies cannot be inserted into laptops, servers, or phones. OSAT is dominated by Taiwan and South Korea.

The meme below humorously depicts the modern web infrastructure as a towering, unstable stack of technologies—from AI and cloud services like AWS to open-source contributions—resting on the foundational work of C developers implementing dynamic arrays manually, a core concept absent from C's standard library.

A chilling near-miss shows how today’s digital infrastructure is vulnerable

Beware Malware: This is how to protect the internet from malicious attacks

Few inventions in history have been as important for human civilisation and as poorly understood as the internet. It developed not as a centrally planned system, but as a patchwork of devices and networks connected by makeshift interfaces. Decentralisation makes it possible to run such a complex system. But every so often comes a chilling reminder that the whole edifice is uncomfortably precarious. On March 29th a lone security researcher announced that he had discovered, largely by chance, a secret backdoor in XZ Utils.

This obscure but vital piece of software is incorporated into the Linux operating systems that control the world’s internet servers. Had the backdoor not been spotted in time, everything from critical national infrastructure to the website hosting your cat pictures would have been vulnerable. The backdoor was implanted by an anonymous contributor who had won the trust of other coders by making helpful contributions for over two years. That patience and diligence bears the fingerprints of a state intelligence agency. Such large-scale “supply chain” attacks—which target not individual devices or networks, but the underlying software and hardware that they rely on—are becoming more frequent. In 2019-20 the SVR, Russia’s foreign-intelligence agency, penetrated American-government networks by compromising a network-management platform called SolarWinds Orion. More recently Chinese state hackers modified the firmware of Cisco routers to gain access to economic, commercial and military targets in America and Japan.

The internet is inherently vulnerable to schemes like the XZ Utils backdoor. Like so much else that it relies on, this program is open-source—which means that its code is publicly available; rather like Wikipedia, changes to it can be suggested by anyone. The people who maintain open-source code often do so in their spare time. A headline from 2014, after the uncovering of a catastrophic vulnerability in OpenSSL, a tool widely used for secure communication, and which had a budget of just $2,000, captured the absurdity of the situation: “The Internet Is Being Protected By Two Guys Named Steve.” It is tempting to assume that the solution lies in establishing central control, either by states or companies.

In fact, history suggests that closed-source software is no more secure than is the open-source type. Only this week America’s Cyber Safety Review Board, a federal body, rebuked Microsoft for woeful security standards that allowed Russia to steal a signing key—“the cryptographic equivalent of crown jewels for any cloud service provider”. This gave it sweeping access to data. By comparison, open-source software holds many advantages because it allows for collective scrutiny and accountability.

The way forward therefore is to make the most of open-source, while easing the huge burden it places on a small number of unpaid, often harried individuals. Technology can help, too. Let’s Encrypt, a non-profit, has made the internet safer over the past decade by using clever software to make it simple to encrypt users’ connections to websites. More advanced artificial intelligence might eventually be able to spot anomalies in millions of lines of code at a stroke. Other fixes are regulatory. America’s cyber strategy, published last year, makes clear that the responsibility for failures should lie not with open-source developers but “the stakeholders most capable of taking action to prevent bad outcomes”.

In practice that means governments and tech giants, both of which benefit enormously from free software libraries. Both should expand funding for and co-operation with non-profit institutions, like the Open Source Initiative and the Linux Foundation, which support the open-source ecosystem. The New Responsibility Foundation, a German think-tank, suggests that governments might, for example, allow employees to contribute to open-source software in their spare time and ease laws that criminalise “white hat” or ethical hacking. They should act quickly. The XZ Utils backdoor is thought to be the first publicly discovered supply-chain attack against a crucial piece of open-source software. But that does not mean it was the first attempt. Nor is it likely to be the last.■

The Foundations of the cloud - Where the physical internet lives

Users of the internet can ignore its physical underpinnings but for technologies like artiﬁcial intelligence and the metaverse to work, others need to pay attention

In 1973 bob metcalfe, a researcher for Xerox at Palo Alto Research Centre, helped think up a way for the company’s computers to send information to each other via co-axial cables. He called this concept Ethernet after the medium by which, in 19th-century physics, electromagnetic forces were thought to be transmitted. Ethernet would become a cornerstone of the internet.

Despite his role in its foundations, Dr Metcalfe later doubted the hardiness of the internet as it became a global phenomenon. In late 1995 he noticed that a quarter of internet traffic was getting lost on its way, and that the system did not seem to be responding well to that volume of loss. He predicted that the whole shebang would “go spectacularly supernova and, in 1996, catastrophically collapse”. The collapse never happened, and Dr Metcalfe literally ate his words. At a conference in California, he produced a print-out of his prediction, pureed it in a blender and slurped it up with a spoon. “I learned my lesson,” Dr Metcalfe says now. “The internet is more robust than I had estimated.”

In its more than 40 years the internet as a whole has never completely stopped working. Parts of it break all the time, but resilience was built into the internet from day one. It is a decentralised, distributed network of billions of computers and billions of routers, connected to each other by perhaps billions of kilometres of cables. The network works seamlessly for end-users because of layers of software above this hardware that manage how the computers communicate, building in multiple redundancies and leaving no single point of failure. This power of abstraction—the ability to create, transmit and consume digital artefacts without needing to think about the physical realities behind them—is the secret sauce of the internet. And, indeed, of all computer science.

Abstraction is also the key to why Dr Metcalfe’s prediction ended up proving wrong. To see why, one has to grasp the internet’s layered structure. Some engineers think of the internet as having five layers (though others say there are four or seven depending on whether certain functions get layers of their own). At the bottom is the most physical of layers, where photons and electrical signals whizz from one server to another via routers and cables. Just above the cables are local-network protocols like Ethernet, Dr Metcalfe’s contribution, which allow computers and other devices near each other to interpret this traffic as groups of ones and zeros.

So many layers to this

Above the cables and local-network protocols are two communications layers, “transmission control protocol” and “internet protocol” (tcp/ip), which enable computers to interpret messages as “packets”: short strings of data with a tag at one end which describes their destination. tcp/ip interacts with Ethernet but need not know about the cables at the very bottom. Sitting above tcp/ip is the application layer of software and language that users will begin to find more familiar, like “http” (as seen on the world wide web). That allows webby stuff to interact with tcp/ip without worrying about Ethernet, cables and the like.

These levels of abstraction made the internet flexible and allowed it to scale beyond what many—including Dr Metcalfe—imagined. Each intermediate layer is designed to manage disruptions below and to present a clean image above. A well-designed layered system like the internet dampens chaos caused by errors, rather than spiralling out of control with them. And it didn’t hurt that, all the while, the physical foundation itself was strengthening. Optical fibre became increasingly available throughout the 1990s, which increased bandwidth to send more packets faster, losing fewer of them. The problem Dr Metcalfe was worried about got resolved without the rest of the internet really noticing. And as applications became more data-intensive, the plumbing below continued to hold up admirably.

The internet’s seemingly limitless adaptability has been enabled by layers of abstraction

To take an example, originally the internet was designed to carry text—a restricted set of 128 different characters—at a rate of 50 kilobits per second. Now video makes up more than 65% of traffic, travelling at hundreds of megabits per second, without gumming up the pipes. Changing web protocols from http to the more secure https did not affect lower layers. As copper wire is upgraded to fibre-optic cable, applications do not have to change. The internet’s seemingly limitless adaptability has been enabled by those layers of abstraction between the user and the cables.

But Dr Metcalfe was not entirely wrong. The benefits of abstraction are still ultimately limited by infrastructure. In its early days Google was able to beat its competitors in part because it kept things simple. Others tried loading huge pages with lots of adverts. But they misjudged how much modems could handle at a reasonable speed. Since no one wants to wait for a web page to load, you now “google” things rather than “AltaVista-ing” them.

AltaVista learned the hard way that abstraction comes at a cost: it can obscure the frailties of hardware. Tech visionaries of today should take notice. Their most ambitious schemes will not work without the appropriate infrastructure to deliver them. From autonomous cars to augmented reality, from artificial intelligence (ai) to the metaverse, decisions at the physical layer constrain or expand what is digitally possible. Underneath all the layers of abstraction, the physical infrastructure of the internet is the foundation of the digital future. Without it, the internet is just an idea.

This special report will demystify the physical building blocks of the internet in order to explain how they constrain what is possible in the abstractions which sit on top of them. It will explore what about the physical layer must change for the internet to remain sustainable—in the physical sense, but also environmentally—as the internet’s uses multiply far beyond its original remit.

Fantastic voyage

A good place to start would be to explain how this article reached your screen. Each digital article starts somewhere in the “cloud”. To users this is the infinite attic where they toss their digital stuff—articles, photos, videos. But the cloud is actually composed of tens of millions of computers connected to the internet.

Your click on a mouse or tap on a screen created packets that were turned into signals which travelled tens or thousands of kilometres through metal, glass and air to a machine in a data centre.

Depending on where you are in the world, the data centre that your article will have come from will be different. This is because The Economist, along with most content providers on the internet, gets to users via something called a content-delivery network (cdn). This stores ready-to-read articles in data centres across the world, rather than having our main servers in northern Virginia put all the components together every time. This spreads out the load so that the main servers do not get overwhelmed. And it helps an article get to your screen faster because memory devices with the data needed are physically located much closer to you.

This means that when your correspondent just clicked on an Economist headline while on her laptop, it came from a data centre in London, made a short trip through fibre-optic cable and then, for the “last mile”, perhaps by way of old-fashioned copper wiring until arriving at a cable box and Wi-Fi router in her flat. An instant later, packets of data reassembled on her laptop in front of her eyes, a digital article rendered on a digital screen.

If your correspondent had been the very first person in a region to ask for the article, the trip would have been slower, as if over the primordial internet of decades ago, because a cached copy would not yet have been available at a data centre nearby. Instead her request would have travelled through thin strands of glass that lie at the bottom of the Atlantic Ocean, to a data centre in northern Virginia, and back again. These fibre-optic cables form the backbone of the physical internet. It is through them that nearly all intercontinental internet traffic flows.

The internet relies on these cables, but not on any single cable; it relies on data centres, but not any single one. Its distributed nature and its abstractions make the internet difficult to pin down. But not so for the tech giants. They are vertically integrating the internet: laying cables, building data centres, providing cloud services and ai. As the internet becomes more powerful, it is becoming crucial to grasp both its physical and corporate composition. Only by peeling back the layers of abstraction can one lay bare the internet’s foundations and understand its future.■

Feb 3rd 2024

Towers of glass and steel

Advances in physical storage and retrieval made the cloud possible but more progress is needed to sustain it

On september 14th 1956 ibm announced the first commercial computer to use a magnetic hard disk for storage. Weighing in at about 1,000 kilograms, the 305 ramac (random access method of accounting and control) was the world’s most expensive jukebox. It stored 4.4 megabytes on 50 double-sided disks, each one measuring two feet in diameter and spinning 1,200 times a minute. Two access arms located and retrieved information in an average time of six-tenths of a second. Companies could lease the machine for $3,200 per month—roughly equivalent to paying $100m annually for a gigabyte of storage today.

Almost 70 years later, a gigabyte of storage costs pennies. Businesses and consumers can retrieve information much faster, from anywhere in the world, than they could have from a 305 ramac in the same room. What is more, they can work with this stored data where it is stored, rather than having to schlep it around. That is because their bytes are stored not in one jukebox, but in a great many of them: sliced up, replicated and distributed over a vast collection of computers and storage devices in massive data centres scattered across the world. In a word, the cloud.

The cloud is an abstraction of everything one could do on a 305 ramac and more. It endeavours to separate the actions of storing, retrieving and computing on data from the physical constraints of doing so. The concept intentionally obfuscates (clouds, one might say) the user’s ability to see the existence of hardware. To users, the cloud is a big virtual drawer or backpack into which you can put your digital stuff for safe-keeping, and later retrieve it to work on (or play with) anywhere at any time. It does not matter to you where or how—or indeed in how many pieces divided among various hardware devices strewn across the planet—your data is kept; you pay to not have to worry about it.

But to cloud providers the cloud is profoundly physical. They must build and maintain the physical components of the cloud and the illusion that goes with it, keeping up as the world produces more data that needs storing, sorting and crunching. The quantities of data being created are ever growing too. In 2023 the world generated around 123 zettabytes (that is, 123 trillion gigabytes) of data, according to International Data Corporation, a market-research firm. Picture a tower of dvds growing more than 1km higher every second until, after a year, it reaches more than halfway to Mars. This data must be stored in different ways for different purposes, from spreadsheets that need to be available instantly, as on a bookshelf, to archival material that can be put in an attic. How is it possible to do all this in an orderly, easily retrievable way?

Magnetic appeal

For a start, it helps to recognise the technical leaps in storage that have made the cloud possible. For each type of data and computational task there are different kinds of physical storage with trade-offs between cost, durability and speed of access. Much like the layers of the internet, the cloud needs these multiple layers of storage to be flexible enough to adapt to any kind of future use.

Inside an unassuming building in Didcot, England, in the Scientific Computing Department (scd) at the Rutherford Appleton Laboratory, one of Britain’s national scientific-research labs, sit Asterix and Obelix, two stewards of massive quantities of data. They are robotically managed tape libraries—respectively the largest and second-largest in Europe. Together Asterix and Obelix store and keep organised the deluge of scientific data that comes in from particle-physics experiments at the Large Hadron Collider, at cern, along with various other sorts of climate and astronomy research. The data produced by all this research has scaled up by orders of magnitude, says Tom Griffin, scd’s director, which means they have had to switch from scientists coming in with laptops and usb sticks to creating a cloud of their own.

Asterix and Obelix form a sizeable chunk of the lab’s self-contained cloud (its computing power is conveniently located in the same room). Together the two can store 440,000 terabytes of data—equivalent to a million copies of the three “Lord of the Rings” films, extended edition, in 4k resolution. Each is made up of a row of cabinets packed with tape cartridges; if all the cartridges were unspooled, the tape would stretch from Athens to Sydney. When a scientist requests data from an experiment, one of several robots zooms horizontally on a set of rails to find the right cabinet, and vertically on another set of rails to find the right tape. It then removes the tape and scans through the reel in order to find the requested information. The whole process can take up to a minute.

Magnetic tape, similar to that used in old audio cassette tapes, might seem like an odd choice for storing advanced scientific research. But modern tape is incredibly cheap and dense (its data density has increased by an average of 34% annually for decades). This has been made possible by reducing the size of the magnetic particles—called “grains”—in which information is stored and by packing them more closely together. A single cartridge, maybe the size of two side-by-side audio cassettes, can hold 40 terabytes of data. That equates to almost 1m 305 ramacs. Plus, it is durable and requires little energy to maintain. These qualities make tape the storage medium of choice not only for this scientific data, but also for big chunks of the cloud at Amazon, Google and Microsoft.

But if you are not a scientist at Didcot—or if you are, but you are taking a break to scroll your recent group chats and Instagram posts on your phone—you will want your data from the cloud much more quickly than you can get it from tape. Flash memory, in common use on laptops and phones, is best for when data needs to be frequently looked up or modified, like recent photos. Solid-state drives save data by trapping or releasing electrons in a grid of flash-memory cells. Retrieving the data is as simple as checking for the presence of electrons in each cell, and involves no moving mechanical parts; it takes about one-tenth of a millisecond, though if it is in the cloud instead of on your phone, add a few dozen milliseconds for delivery from the data centre. The data remains even when the power is turned off, though memory will eventually degrade as electrons leak out of the cells.

Backups for your backups

As new photos you take go to a data centre, your older ones get demoted from flash to old-fashioned hard-disk drives spread across multiple data centres, most likely including some in the country or at least the continent you live in (for most readers). These read and write data mechanically onto a spinning magnetic disk, not dissimilar from the 305 ramac, and are more than five times cheaper per gigabyte of storage than flash (though that gap is closing). Retrieval takes a sloth-like 5-10 milliseconds. Then years-old stuff that you forgot about might get further relegated, from disk drives to magnetic tape like they have at the Didcot lab.

Even on the side of the cloud provider, the exact physical device on which data is stored is abstracted away. One way that this is often done is called raid (redundant array of independent disks). This takes a bunch of storage hardware devices and treats them as one virtualised storage shed. Some versions of raid split up a photo into multiple parts so that no single piece of hardware has all of it, but rather several storage devices have slightly overlapping fragments. Even if two pieces of hardware break (and hardware failures happen all the time), the photo is still recoverable.

The cloud is also redundant in another way. Each piece of data will be stored in at least three separate locations. This means that were a hurricane or tornado or wildfire to destroy one of the data centres that had a copy of your photo, it would have two copies left to fall back on. This redundancy helps make cloud storage reliable. It also means that most of the time, millions of hard-disk drives are spinning on standby, just in case.

Still, companies are working on making the infrastructure of the cloud more robust. Tape, in particular, has its disadvantages as a long-term storage medium. It must be kept within a certain range of temperatures and humidities, and away from strong magnetic fields, which could erase the information. And it requires replacing every decade or two. So the hunt is on for something that takes up less room, lasts longer and requires less maintenance.

One promising medium is glass. A fast and precise laser etches tiny dots in multiple layers within platters of glass 75mm square and 2mm thick. Information is stored in the length, width, depth, size and orientation of each dot. Encoding information in glass in this way is the modern equivalent of etching in stone, says Peter Kazansky, one of the inventors of the technology, based at the University of Southampton in Britain. If you fry, boil, scratch or even microwave glass slides, you can still read the data.

May you live for 10,000 years

Researchers at Microsoft are harnessing this tech to build a cloud out of glass. They increased capacity so that each slide can hold just over 75 gigabytes, and used machine-learning to improve reading speed. They claim their slides will last for 10,000 years. Microsoft has developed a system (much like the tape robots) that can handle thousands, or even millions, of these slides.

Achieving this kind of scale, without the need to supply power to storage shelves or to replace the storage devices themselves, is necessary to build a truly durable foundation for the cloud. Necessary, but not sufficient. For the cloud is not just a storage shed. Its users are demanding that it do a lot more computing work, and more quickly, than ever before.■

Telecom

Google Sites

Report abuse