Beware Malware: This is how to protect the internet from malicious attacks
Few inventions in history have been as important for human civilisation and as poorly understood as the internet. It developed not as a centrally planned system, but as a patchwork of devices and networks connected by makeshift interfaces. Decentralisation makes it possible to run such a complex system. But every so often comes a chilling reminder that the whole edifice is uncomfortably precarious. On March 29th a lone security researcher announced that he had discovered, largely by chance, a secret backdoor in XZ Utils.
This obscure but vital piece of software is incorporated into the Linux operating systems that control the world’s internet servers. Had the backdoor not been spotted in time, everything from critical national infrastructure to the website hosting your cat pictures would have been vulnerable. The backdoor was implanted by an anonymous contributor who had won the trust of other coders by making helpful contributions for over two years. That patience and diligence bears the fingerprints of a state intelligence agency. Such large-scale “supply chain” attacks—which target not individual devices or networks, but the underlying software and hardware that they rely on—are becoming more frequent. In 2019-20 the SVR, Russia’s foreign-intelligence agency, penetrated American-government networks by compromising a network-management platform called SolarWinds Orion. More recently Chinese state hackers modified the firmware of Cisco routers to gain access to economic, commercial and military targets in America and Japan.
The internet is inherently vulnerable to schemes like the XZ Utils backdoor. Like so much else that it relies on, this program is open-source—which means that its code is publicly available; rather like Wikipedia, changes to it can be suggested by anyone. The people who maintain open-source code often do so in their spare time. A headline from 2014, after the uncovering of a catastrophic vulnerability in OpenSSL, a tool widely used for secure communication, and which had a budget of just $2,000, captured the absurdity of the situation: “The Internet Is Being Protected By Two Guys Named Steve.” It is tempting to assume that the solution lies in establishing central control, either by states or companies.
In fact, history suggests that closed-source software is no more secure than is the open-source type. Only this week America’s Cyber Safety Review Board, a federal body, rebuked Microsoft for woeful security standards that allowed Russia to steal a signing key—“the cryptographic equivalent of crown jewels for any cloud service provider”. This gave it sweeping access to data. By comparison, open-source software holds many advantages because it allows for collective scrutiny and accountability.
The way forward therefore is to make the most of open-source, while easing the huge burden it places on a small number of unpaid, often harried individuals. Technology can help, too. Let’s Encrypt, a non-profit, has made the internet safer over the past decade by using clever software to make it simple to encrypt users’ connections to websites. More advanced artificial intelligence might eventually be able to spot anomalies in millions of lines of code at a stroke. Other fixes are regulatory. America’s cyber strategy, published last year, makes clear that the responsibility for failures should lie not with open-source developers but “the stakeholders most capable of taking action to prevent bad outcomes”.
In practice that means governments and tech giants, both of which benefit enormously from free software libraries. Both should expand funding for and co-operation with non-profit institutions, like the Open Source Initiative and the Linux Foundation, which support the open-source ecosystem. The New Responsibility Foundation, a German think-tank, suggests that governments might, for example, allow employees to contribute to open-source software in their spare time and ease laws that criminalise “white hat” or ethical hacking. They should act quickly. The XZ Utils backdoor is thought to be the first publicly discovered supply-chain attack against a crucial piece of open-source software. But that does not mean it was the first attempt. Nor is it likely to be the last.■
Users of the internet can ignore its physical underpinnings but for technologies like artificial intelligence and the metaverse to work, others need to pay attention
In 1973 bob metcalfe, a researcher for Xerox at Palo Alto Research Centre, helped think up a way for the company’s computers to send information to each other via co-axial cables. He called this concept Ethernet after the medium by which, in 19th-century physics, electromagnetic forces were thought to be transmitted. Ethernet would become a cornerstone of the internet.
Despite his role in its foundations, Dr Metcalfe later doubted the hardiness of the internet as it became a global phenomenon. In late 1995 he noticed that a quarter of internet traffic was getting lost on its way, and that the system did not seem to be responding well to that volume of loss. He predicted that the whole shebang would “go spectacularly supernova and, in 1996, catastrophically collapse”. The collapse never happened, and Dr Metcalfe literally ate his words. At a conference in California, he produced a print-out of his prediction, pureed it in a blender and slurped it up with a spoon. “I learned my lesson,” Dr Metcalfe says now. “The internet is more robust than I had estimated.”
In its more than 40 years the internet as a whole has never completely stopped working. Parts of it break all the time, but resilience was built into the internet from day one. It is a decentralised, distributed network of billions of computers and billions of routers, connected to each other by perhaps billions of kilometres of cables. The network works seamlessly for end-users because of layers of software above this hardware that manage how the computers communicate, building in multiple redundancies and leaving no single point of failure. This power of abstraction—the ability to create, transmit and consume digital artefacts without needing to think about the physical realities behind them—is the secret sauce of the internet. And, indeed, of all computer science.
Abstraction is also the key to why Dr Metcalfe’s prediction ended up proving wrong. To see why, one has to grasp the internet’s layered structure. Some engineers think of the internet as having five layers (though others say there are four or seven depending on whether certain functions get layers of their own). At the bottom is the most physical of layers, where photons and electrical signals whizz from one server to another via routers and cables. Just above the cables are local-network protocols like Ethernet, Dr Metcalfe’s contribution, which allow computers and other devices near each other to interpret this traffic as groups of ones and zeros.
Above the cables and local-network protocols are two communications layers, “transmission control protocol” and “internet protocol” (tcp/ip), which enable computers to interpret messages as “packets”: short strings of data with a tag at one end which describes their destination. tcp/ip interacts with Ethernet but need not know about the cables at the very bottom. Sitting above tcp/ip is the application layer of software and language that users will begin to find more familiar, like “http” (as seen on the world wide web). That allows webby stuff to interact with tcp/ip without worrying about Ethernet, cables and the like.
These levels of abstraction made the internet flexible and allowed it to scale beyond what many—including Dr Metcalfe—imagined. Each intermediate layer is designed to manage disruptions below and to present a clean image above. A well-designed layered system like the internet dampens chaos caused by errors, rather than spiralling out of control with them. And it didn’t hurt that, all the while, the physical foundation itself was strengthening. Optical fibre became increasingly available throughout the 1990s, which increased bandwidth to send more packets faster, losing fewer of them. The problem Dr Metcalfe was worried about got resolved without the rest of the internet really noticing. And as applications became more data-intensive, the plumbing below continued to hold up admirably.
The internet’s seemingly limitless adaptability has been enabled by layers of abstraction
To take an example, originally the internet was designed to carry text—a restricted set of 128 different characters—at a rate of 50 kilobits per second. Now video makes up more than 65% of traffic, travelling at hundreds of megabits per second, without gumming up the pipes. Changing web protocols from http to the more secure https did not affect lower layers. As copper wire is upgraded to fibre-optic cable, applications do not have to change. The internet’s seemingly limitless adaptability has been enabled by those layers of abstraction between the user and the cables.
But Dr Metcalfe was not entirely wrong. The benefits of abstraction are still ultimately limited by infrastructure. In its early days Google was able to beat its competitors in part because it kept things simple. Others tried loading huge pages with lots of adverts. But they misjudged how much modems could handle at a reasonable speed. Since no one wants to wait for a web page to load, you now “google” things rather than “AltaVista-ing” them.
AltaVista learned the hard way that abstraction comes at a cost: it can obscure the frailties of hardware. Tech visionaries of today should take notice. Their most ambitious schemes will not work without the appropriate infrastructure to deliver them. From autonomous cars to augmented reality, from artificial intelligence (ai) to the metaverse, decisions at the physical layer constrain or expand what is digitally possible. Underneath all the layers of abstraction, the physical infrastructure of the internet is the foundation of the digital future. Without it, the internet is just an idea.
This special report will demystify the physical building blocks of the internet in order to explain how they constrain what is possible in the abstractions which sit on top of them. It will explore what about the physical layer must change for the internet to remain sustainable—in the physical sense, but also environmentally—as the internet’s uses multiply far beyond its original remit.
A good place to start would be to explain how this article reached your screen. Each digital article starts somewhere in the “cloud”. To users this is the infinite attic where they toss their digital stuff—articles, photos, videos. But the cloud is actually composed of tens of millions of computers connected to the internet.
Your click on a mouse or tap on a screen created packets that were turned into signals which travelled tens or thousands of kilometres through metal, glass and air to a machine in a data centre.
Depending on where you are in the world, the data centre that your article will have come from will be different. This is because The Economist, along with most content providers on the internet, gets to users via something called a content-delivery network (cdn). This stores ready-to-read articles in data centres across the world, rather than having our main servers in northern Virginia put all the components together every time. This spreads out the load so that the main servers do not get overwhelmed. And it helps an article get to your screen faster because memory devices with the data needed are physically located much closer to you.
This means that when your correspondent just clicked on an Economist headline while on her laptop, it came from a data centre in London, made a short trip through fibre-optic cable and then, for the “last mile”, perhaps by way of old-fashioned copper wiring until arriving at a cable box and Wi-Fi router in her flat. An instant later, packets of data reassembled on her laptop in front of her eyes, a digital article rendered on a digital screen.
If your correspondent had been the very first person in a region to ask for the article, the trip would have been slower, as if over the primordial internet of decades ago, because a cached copy would not yet have been available at a data centre nearby. Instead her request would have travelled through thin strands of glass that lie at the bottom of the Atlantic Ocean, to a data centre in northern Virginia, and back again. These fibre-optic cables form the backbone of the physical internet. It is through them that nearly all intercontinental internet traffic flows.
The internet relies on these cables, but not on any single cable; it relies on data centres, but not any single one. Its distributed nature and its abstractions make the internet difficult to pin down. But not so for the tech giants. They are vertically integrating the internet: laying cables, building data centres, providing cloud services and ai. As the internet becomes more powerful, it is becoming crucial to grasp both its physical and corporate composition. Only by peeling back the layers of abstraction can one lay bare the internet’s foundations and understand its future.■
Feb 3rd 2024
Advances in physical storage and retrieval made the cloud possible but more progress is needed to sustain it
On september 14th 1956 ibm announced the first commercial computer to use a magnetic hard disk for storage. Weighing in at about 1,000 kilograms, the 305 ramac (random access method of accounting and control) was the world’s most expensive jukebox. It stored 4.4 megabytes on 50 double-sided disks, each one measuring two feet in diameter and spinning 1,200 times a minute. Two access arms located and retrieved information in an average time of six-tenths of a second. Companies could lease the machine for $3,200 per month—roughly equivalent to paying $100m annually for a gigabyte of storage today.
Almost 70 years later, a gigabyte of storage costs pennies. Businesses and consumers can retrieve information much faster, from anywhere in the world, than they could have from a 305 ramac in the same room. What is more, they can work with this stored data where it is stored, rather than having to schlep it around. That is because their bytes are stored not in one jukebox, but in a great many of them: sliced up, replicated and distributed over a vast collection of computers and storage devices in massive data centres scattered across the world. In a word, the cloud.
The cloud is an abstraction of everything one could do on a 305 ramac and more. It endeavours to separate the actions of storing, retrieving and computing on data from the physical constraints of doing so. The concept intentionally obfuscates (clouds, one might say) the user’s ability to see the existence of hardware. To users, the cloud is a big virtual drawer or backpack into which you can put your digital stuff for safe-keeping, and later retrieve it to work on (or play with) anywhere at any time. It does not matter to you where or how—or indeed in how many pieces divided among various hardware devices strewn across the planet—your data is kept; you pay to not have to worry about it.
But to cloud providers the cloud is profoundly physical. They must build and maintain the physical components of the cloud and the illusion that goes with it, keeping up as the world produces more data that needs storing, sorting and crunching. The quantities of data being created are ever growing too. In 2023 the world generated around 123 zettabytes (that is, 123 trillion gigabytes) of data, according to International Data Corporation, a market-research firm. Picture a tower of dvds growing more than 1km higher every second until, after a year, it reaches more than halfway to Mars. This data must be stored in different ways for different purposes, from spreadsheets that need to be available instantly, as on a bookshelf, to archival material that can be put in an attic. How is it possible to do all this in an orderly, easily retrievable way?
For a start, it helps to recognise the technical leaps in storage that have made the cloud possible. For each type of data and computational task there are different kinds of physical storage with trade-offs between cost, durability and speed of access. Much like the layers of the internet, the cloud needs these multiple layers of storage to be flexible enough to adapt to any kind of future use.
Inside an unassuming building in Didcot, England, in the Scientific Computing Department (scd) at the Rutherford Appleton Laboratory, one of Britain’s national scientific-research labs, sit Asterix and Obelix, two stewards of massive quantities of data. They are robotically managed tape libraries—respectively the largest and second-largest in Europe. Together Asterix and Obelix store and keep organised the deluge of scientific data that comes in from particle-physics experiments at the Large Hadron Collider, at cern, along with various other sorts of climate and astronomy research. The data produced by all this research has scaled up by orders of magnitude, says Tom Griffin, scd’s director, which means they have had to switch from scientists coming in with laptops and usb sticks to creating a cloud of their own.
Asterix and Obelix form a sizeable chunk of the lab’s self-contained cloud (its computing power is conveniently located in the same room). Together the two can store 440,000 terabytes of data—equivalent to a million copies of the three “Lord of the Rings” films, extended edition, in 4k resolution. Each is made up of a row of cabinets packed with tape cartridges; if all the cartridges were unspooled, the tape would stretch from Athens to Sydney. When a scientist requests data from an experiment, one of several robots zooms horizontally on a set of rails to find the right cabinet, and vertically on another set of rails to find the right tape. It then removes the tape and scans through the reel in order to find the requested information. The whole process can take up to a minute.
Magnetic tape, similar to that used in old audio cassette tapes, might seem like an odd choice for storing advanced scientific research. But modern tape is incredibly cheap and dense (its data density has increased by an average of 34% annually for decades). This has been made possible by reducing the size of the magnetic particles—called “grains”—in which information is stored and by packing them more closely together. A single cartridge, maybe the size of two side-by-side audio cassettes, can hold 40 terabytes of data. That equates to almost 1m 305 ramacs. Plus, it is durable and requires little energy to maintain. These qualities make tape the storage medium of choice not only for this scientific data, but also for big chunks of the cloud at Amazon, Google and Microsoft.
But if you are not a scientist at Didcot—or if you are, but you are taking a break to scroll your recent group chats and Instagram posts on your phone—you will want your data from the cloud much more quickly than you can get it from tape. Flash memory, in common use on laptops and phones, is best for when data needs to be frequently looked up or modified, like recent photos. Solid-state drives save data by trapping or releasing electrons in a grid of flash-memory cells. Retrieving the data is as simple as checking for the presence of electrons in each cell, and involves no moving mechanical parts; it takes about one-tenth of a millisecond, though if it is in the cloud instead of on your phone, add a few dozen milliseconds for delivery from the data centre. The data remains even when the power is turned off, though memory will eventually degrade as electrons leak out of the cells.
As new photos you take go to a data centre, your older ones get demoted from flash to old-fashioned hard-disk drives spread across multiple data centres, most likely including some in the country or at least the continent you live in (for most readers). These read and write data mechanically onto a spinning magnetic disk, not dissimilar from the 305 ramac, and are more than five times cheaper per gigabyte of storage than flash (though that gap is closing). Retrieval takes a sloth-like 5-10 milliseconds. Then years-old stuff that you forgot about might get further relegated, from disk drives to magnetic tape like they have at the Didcot lab.
Even on the side of the cloud provider, the exact physical device on which data is stored is abstracted away. One way that this is often done is called raid (redundant array of independent disks). This takes a bunch of storage hardware devices and treats them as one virtualised storage shed. Some versions of raid split up a photo into multiple parts so that no single piece of hardware has all of it, but rather several storage devices have slightly overlapping fragments. Even if two pieces of hardware break (and hardware failures happen all the time), the photo is still recoverable.
The cloud is also redundant in another way. Each piece of data will be stored in at least three separate locations. This means that were a hurricane or tornado or wildfire to destroy one of the data centres that had a copy of your photo, it would have two copies left to fall back on. This redundancy helps make cloud storage reliable. It also means that most of the time, millions of hard-disk drives are spinning on standby, just in case.
Still, companies are working on making the infrastructure of the cloud more robust. Tape, in particular, has its disadvantages as a long-term storage medium. It must be kept within a certain range of temperatures and humidities, and away from strong magnetic fields, which could erase the information. And it requires replacing every decade or two. So the hunt is on for something that takes up less room, lasts longer and requires less maintenance.
One promising medium is glass. A fast and precise laser etches tiny dots in multiple layers within platters of glass 75mm square and 2mm thick. Information is stored in the length, width, depth, size and orientation of each dot. Encoding information in glass in this way is the modern equivalent of etching in stone, says Peter Kazansky, one of the inventors of the technology, based at the University of Southampton in Britain. If you fry, boil, scratch or even microwave glass slides, you can still read the data.
Researchers at Microsoft are harnessing this tech to build a cloud out of glass. They increased capacity so that each slide can hold just over 75 gigabytes, and used machine-learning to improve reading speed. They claim their slides will last for 10,000 years. Microsoft has developed a system (much like the tape robots) that can handle thousands, or even millions, of these slides.
Achieving this kind of scale, without the need to supply power to storage shelves or to replace the storage devices themselves, is necessary to build a truly durable foundation for the cloud. Necessary, but not sufficient. For the cloud is not just a storage shed. Its users are demanding that it do a lot more computing work, and more quickly, than ever before.■