Chapter 1-7
Hacker Crackdown

Go to Table of Contents

To the average citizen, the idea of the telephone is represented by,
well, a TELEPHONE:  a device that you talk into.  To a telco
professional, however, the telephone itself is known, in lordly
fashion, as a "subset."  The "subset" in your house is a mere adjunct,
a distant nerve ending, of the central switching stations,
which are ranked in levels of heirarchy, up to the long-distance electronic
switching stations, which are some of the largest computers on earth.

Let us imagine that it is, say, 1925, before the
introduction of computers, when the phone system was
simpler and somewhat easier to grasp.  Let's further
imagine that you are Miss Leticia Luthor, a fictional
operator for Ma Bell in New York City of the 20s.

Basically, you, Miss Luthor, ARE the "switching system."
You are sitting in front of a large vertical switchboard,
known as a "cordboard," made of shiny wooden panels,
with ten thousand metal-rimmed holes punched in them,
known as jacks.  The engineers would have put more
holes into your switchboard, but ten thousand is
as many as you can reach without actually having
to get up out of your chair.

Each of these ten thousand holes has its own little electric lightbulb,
known as a "lamp," and its own neatly printed number code.

With the ease of long habit, you are scanning your board for lit-up bulbs.
This is what you do most of the time, so you are used to it.

A lamp lights up.  This means that the phone
at the end of that line has been taken off the hook.
Whenever a handset is taken off the hook, that closes a circuit
inside the phone which then signals the local office, i.e. you,
automatically.  There might be somebody calling, or then
again the phone might be simply off the hook, but this
does not matter to you yet.  The first thing you do,
is record that number in your logbook, in your fine American
public-school handwriting.  This comes first, naturally,
since it is done for billing purposes.

You now take the plug of your answering cord, which goes
directly to your headset, and plug it into the lit-up hole.
"Operator," you announce.

In operator's classes, before taking this job, you have
been issued a large pamphlet full of canned operator's
responses for all kinds of contingencies, which you had
to memorize.  You have also been trained in a proper
non-regional, non-ethnic pronunciation and tone of voice.
You rarely have the occasion to make any spontaneous
remark to a customer, and in fact this is frowned upon
(except out on the rural lines where people have time
on their hands and get up to all kinds of mischief).

A tough-sounding user's voice at the end of the line
gives you a number.  Immediately, you write that number
down in your logbook, next to the caller's number,
which you just wrote earlier.  You then look and see if
the number this guy wants is in fact on your switchboard,
which it generally is, since it's generally a local call.
Long distance costs so much that people use it sparingly.

Only then do you pick up a calling-cord from a shelf
at the base of the switchboard.  This is a long elastic cord
mounted on a kind of reel so that it will zip back in when
you unplug it.  There are a lot of cords down there,
and when a bunch of them are out at once they look like
a nest of snakes.  Some of the girls think there are bugs
living in those cable-holes.  They're called "cable mites"
and are supposed to bite your hands and give you rashes.
You don't believe this, yourself.

Gripping the head of your calling-cord, you slip the tip
of it deftly into the sleeve of the jack for the called person.
Not all the way in, though.  You just touch it.  If you hear
a clicking sound, that means the line is busy and you can't
put the call through.  If the line is busy, you have to stick
the calling-cord into a "busy-tone jack," which will give
the guy a busy-tone.  This way you don't have to talk to him
yourself and absorb his natural human frustration.

But the line isn't busy.  So you pop the cord all the way in.
Relay circuits in your board make the distant phone ring,
and if somebody picks it up off the hook, then a phone
conversation starts.  You can hear this conversation
on your answering cord, until you unplug it.  In fact
you could listen to the whole conversation if you wanted,
but this is sternly frowned upon by management, and frankly,
when you've overheard one, you've pretty much heard 'em all.

You can tell how long the conversation lasts by the glow
of the calling-cord's lamp, down on the calling-cord's shelf.
When it's over, you unplug and the calling-cord zips back into place.

Having done this stuff a few hundred thousand times,
you become quite good at it.  In fact you're plugging,
and connecting, and disconnecting, ten, twenty, forty cords
at a time.  It's a manual handicraft, really, quite satisfying
in a way, rather like weaving on an upright loom.

Should a long-distance call come up, it would be different,
but not all that different.  Instead of connecting the call
through your own local switchboard, you have to go up the hierarchy,
onto the long-distance lines, known as "trunklines."
Depending on how far the call goes, it may have to work
its way through a whole series of operators, which can
take quite a while.  The caller doesn't wait on the line
while this complex process is negotiated across the country
by the gaggle of operators.  Instead, the caller hangs up,
and you call him back yourself when the call has finally
worked its way through.

After four or five years of this work, you get married,
and you have to quit your job, this being the natural order
of womanhood in the American 1920s.  The phone company
has to train somebody else--maybe two people, since
the phone system has grown somewhat in the meantime.
And this costs money.

In fact, to use any kind of human being as a switching
system is a very expensive proposition.  Eight thousand
Leticia Luthors would be bad enough, but a quarter of a
million of them is a military-scale proposition and makes
drastic measures in automation financially worthwhile.

Although the phone system continues to grow today,
the number of human beings employed by telcos has
been dropping steadily for years.  Phone "operators"
now deal with nothing but unusual contingencies,
all routine operations having been shrugged off onto machines.
Consequently, telephone operators are considerably less
machine-like nowadays, and have been known to have accents
and actual character in their voices.  When you reach
a human operator today, the operators are rather more
"human" than they were in Leticia's day--but on the other hand,
human beings in the phone system are much harder to reach
in the first place.

Over the first half of the twentieth century,
"electromechanical" switching systems of growing
complexity were cautiously introduced into the phone system.
In certain backwaters, some of these hybrid systems are still
in use.  But after 1965, the phone system began to go completely
electronic, and this is by far the dominant mode today.
Electromechanical systems have "crossbars," and "brushes,"
and other large moving mechanical parts, which, while faster
and cheaper than Leticia, are still slow, and tend to wear out
fairly quickly.

But fully electronic systems are inscribed on silicon chips,
and are lightning-fast, very cheap, and quite durable.
They are much cheaper to maintain than even the best
electromechanical systems, and they fit into half the space.
And with every year, the silicon chip grows smaller, faster,
and cheaper yet.  Best of all, automated electronics work
around the clock and don't have salaries or health insurance.

There are, however, quite serious drawbacks to the
use of computer-chips.  When they do break down, it is
a daunting challenge to figure out what the heck has gone
wrong with them.  A broken cordboard generally had
a problem in it big enough to see.  A broken chip has
invisible, microscopic faults.  And the faults in bad
software can be so subtle as to be practically theological.

If you want a mechanical system to do something new,
then you must travel to where it is, and pull pieces out of it,
and wire in new pieces.  This costs money.  However, if you want
a chip to do something new, all you have to do is change its software,
which is easy, fast and dirt-cheap.  You don't even have to see the chip
to change its program.  Even if you did see the chip, it wouldn't look
like much.  A chip with program X doesn't look one whit different from
a chip with program Y.

With the proper codes and sequences, and access to specialized phone-lines,
you can change electronic switching systems all over America from anywhere
you please.

And so can other people.  If they know how, and if they want to,
they can sneak into a microchip via the special phonelines and diddle with it,
leaving no physical trace at all.  If they broke into the operator's station
and held Leticia at gunpoint, that would be very obvious.  If they broke into
a telco building and went after an electromechanical switch with a toolbelt,
that would at least leave many traces.  But people can do all manner of amazing
things to computer switches just by typing on a keyboard, and keyboards are
everywhere today.  The extent of this vulnerability is deep, dark, broad,
almost mind-boggling, and yet this is a basic, primal fact of life about
any computer on a network.

Security experts over the past twenty years have insisted,
with growing urgency, that this basic vulnerability of computers
represents an entirely new level of risk, of unknown but obviously
dire potential to society.  And they are right.

An electronic switching station does pretty much
everything Letitia did, except in nanoseconds and
on a much larger scale.  Compared to Miss Luthor's
ten thousand jacks, even a primitive 1ESS switching computer,
60s vintage, has a 128,000 lines.  And the current AT&T
system of choice is the monstrous fifth-generation 5ESS.

An Electronic Switching Station can scan every line on its "board"
in a tenth of a second, and it does this over and over, tirelessly,
around the clock.  Instead of eyes, it uses "ferrod scanners"
to check the condition of local lines and trunks.  Instead of hands,
it has "signal distributors," "central pulse distributors,"
"magnetic latching relays," and "reed switches," which complete
and break the calls.  Instead of a brain, it has a "central processor."
Instead of an instruction manual, it has a program.  Instead of
a handwritten logbook for recording and billing calls,
it has magnetic tapes. And it never has to talk to anybody.
Everything a customer might say to it is done by punching
the direct-dial tone buttons on your subset.

Although an Electronic Switching Station can't talk,
it does need an interface, some way to relate to its, er,
employers.  This interface is known as the "master control
center."  (This interface might be better known simply as
"the interface," since it doesn't actually "control" phone
calls directly.  However, a term like "Master Control
Center" is just the kind of rhetoric that telco maintenance
engineers--and hackers--find particularly satisfying.)

Using the master control center, a phone engineer can test
local and trunk lines for malfunctions.  He (rarely she)
can check various alarm displays, measure traffic on the lines,
examine the records of telephone usage and the charges for those calls,
and change the programming.

And, of course, anybody else who gets into the master control center
by remote control can also do these things, if he (rarely she)
has managed to figure them out, or, more likely, has somehow swiped
the knowledge from people who already know.

In 1989 and 1990, one particular RBOC, BellSouth,
which felt particularly troubled, spent a purported $1.2
million on computer security.  Some think it spent as
much as two million, if you count all the associated costs.
Two million dollars is still very little compared to the
great cost-saving utility of telephonic computer systems.

Unfortunately, computers are also stupid.
Unlike human beings, computers possess the truly
profound stupidity of the inanimate.

In the 1960s, in the first shocks of spreading computerization,
there was much easy talk about the stupidity of computers--
how they could "only follow the program" and were rigidly required
to do "only what they were told."  There has been rather less talk
about the stupidity of computers since they began to achieve
grandmaster status in chess tournaments, and to manifest
many other impressive forms of apparent cleverness.

Nevertheless, computers STILL are profoundly brittle and stupid;
they are simply vastly more subtle in their stupidity and brittleness.
The computers of the 1990s are much more reliable in their components
than earlier computer systems, but they are also called upon to do
far more complex things, under far more challenging conditions.

On a basic mathematical level, every single line of
a software program offers a chance for some possible screwup.
Software does not sit still when it works; it "runs,"
it interacts with itself and with its own inputs and outputs.
By analogy, it stretches like putty into millions of possible
shapes and conditions, so many shapes that they can never
all be successfully tested, not even in the lifespan of the universe.
Sometimes the putty snaps.

The stuff we call "software" is not like anything that human society
is used to thinking about.  Software is something like a machine,
and something like mathematics, and something like language, and
something like thought, and art, and information. . . .  But software
is not in fact any of those other things.  The protean quality
of software is one of the great sources of its fascination.
It also makes software very powerful, very subtle,
very unpredictable, and very risky.

Some software is bad and buggy.  Some is "robust,"
even "bulletproof."  The best software is that which has
been tested by thousands of users under thousands of
different conditions, over years.  It is then known as
"stable."  This does NOT mean that the software is
now flawless, free of bugs.  It generally means that there
are plenty of bugs in it, but the bugs are well-identified
and fairly well understood.

There is simply no way to assure that software is free
of flaws.  Though software is mathematical in nature,
it cannot by "proven" like a mathematical theorem;
software is more like language, with inherent ambiguities,
with different definitions, different assumptions,
different levels of meaning that can conflict.

Human beings can manage, more or less, with
human language because we can catch the gist of it.

Computers, despite years of effort in "artificial intelligence,"
have proven spectacularly bad in "catching the gist" of anything at all.
The tiniest bit of semantic grit may still bring the mightiest computer
tumbling down.  One of the most hazardous things you can do to a
computer program is try to improve it--to try to make it safer.
Software "patches" represent new, untried un-"stable" software,
which is by definition riskier.

The modern telephone system has come to depend,
utterly and irretrievably, upon software.  And the
System Crash of January 15, 1990, was caused by an
IMPROVEMENT in software.  Or rather, an ATTEMPTED
improvement.

As it happened, the problem itself--the problem per se--took this form.
A piece of telco software had been written in C language, a standard
language of the telco field.  Within the C software was a
long "do. . .while" construct.  The "do. . .while" construct
contained a "switch" statement.  The "switch" statement contained
an "if" clause.  The "if" clause contained a "break."  The "break"
was SUPPOSED to "break" the "if clause."  Instead, the "break"
broke the "switch" statement.

That was the problem, the actual reason why people picking up phones
on January 15, 1990, could not talk to one another.

Or at least, that was the subtle, abstract, cyberspatial
seed of the problem.  This is how the problem manifested itself
from the realm of programming into the realm of real life.

The System 7 software for AT&T's 4ESS switching station,
the "Generic 44E14 Central Office Switch Software,"
had been extensively tested, and was considered very stable.
By the end of 1989, eighty of AT&T's switching systems
nationwide had been programmed with the new software.  Cautiously,
thirty-four stations were left to run the slower, less-capable
System 6, because AT&T suspected there might be shakedown problems
with the new and unprecedently sophisticated System 7 network.

The stations with System 7 were programmed to switch over to a backup net
in case of any problems.  In mid-December 1989, however, a new high-velocity,
high-security software patch was distributed to each of the 4ESS switches
that would enable them to switch over even more quickly, making the System 7
network that much more secure.

Unfortunately, every one of these 4ESS switches was now in possession
of a small but deadly flaw.

In order to maintain the network, switches must monitor
the condition of other switches--whether they are up and running,
whether they have temporarily shut down, whether they are overloaded
and in need of assistance, and so forth.  The new software helped
control this bookkeeping function by monitoring the status calls
from other switches.

It only takes four to six seconds for a troubled 4ESS switch
to rid itself of all its calls, drop everything temporarily,
and re-boot its software from scratch.  Starting over from scratch
will generally rid the switch of any software problems that may have
developed in the course of running the system.  Bugs that arise will
be simply wiped out by this process.  It is a clever idea.  This process
of automatically re-booting from scratch is known as the "normal fault
recovery routine."  Since AT&T's software is in fact exceptionally stable,
systems rarely have to go into "fault recovery" in the first place;
but AT&T has always boasted of its "real world" reliability, and this
tactic is a belt-and-suspenders routine.

The 4ESS switch used its new software to monitor its fellow switches
as they recovered from faults.  As other switches came back on line
after recovery, they would send their "OK" signals to the switch.
The switch would make a little note to that effect in its "status map,"
recognizing that the fellow switch was back and ready to go,
and should be sent some calls and put back to regular work.

Unfortunately, while it was busy bookkeeping with the status map,
the tiny flaw in the brand-new software came into play.
The flaw caused the 4ESS switch to interact, subtly but drastically,
with incoming telephone calls from human users.  If--and only if--
two incoming phone-calls happened to hit the switch within a hundredth
of a second, then a small patch of data would be garbled by the flaw.

But the switch had been programmed to monitor itself
constantly for any possible damage to its data.
When the switch perceived that its data had been somehow garbled,
then it too would go down, for swift repairs to its software.
It would signal its fellow switches not to send any more work.
It would go into the fault-recovery mode for four to six seconds.
And then the switch would be fine again, and would send out its "OK,
ready for work" signal.

However, the "OK, ready for work" signal was the VERY THING THAT
HAD CAUSED THE SWITCH TO GO DOWN IN THE FIRST PLACE.  And ALL the
System 7 switches had the same flaw in their status-map software.
As soon as they stopped to make the bookkeeping note that their fellow
switch was "OK," then they too would become vulnerable to the slight
chance that two phone-calls would hit them within a hundredth of a second.

At approximately 2:25 P.M. EST on Monday, January 15,
one of AT&T's 4ESS toll switching systems in New York City
had an actual, legitimate, minor problem.  It went into fault
recovery routines, announced "I'm going down," then announced,
"I'm back, I'm OK."  And this cheery message then blasted
throughout the network to many of its fellow 4ESS switches.

Many of the switches, at first, completely escaped trouble.
These lucky switches were not hit by the coincidence of
two phone calls within a hundredth of a second.
Their software did not fail--at first.  But three switches--
in Atlanta, St. Louis, and Detroit--were unlucky,
and were caught with their hands full.  And they went down.
And they came back up, almost immediately.  And they too began
to broadcast the lethal message that they, too, were "OK" again,
activating the lurking software bug in yet other switches.

As more and more switches did have that bit of bad luck
and collapsed, the call-traffic became more and more densely
packed in the remaining switches, which were groaning
to keep up with the load.  And of course, as the calls
became more densely packed, the switches were MUCH MORE LIKELY
to be hit twice within a hundredth of a second.

It only took four seconds for a switch to get well.
There was no PHYSICAL damage of any kind to the switches,
after all.  Physically, they were working perfectly.
This situation was "only" a software problem.

But the 4ESS switches were leaping up and down every
four to six seconds, in a virulent spreading wave all over America,
in utter, manic, mechanical stupidity.  They kept KNOCKING
one another down with their contagious "OK" messages.

It took about ten minutes for the chain reaction to cripple the network.
Even then, switches would periodically luck-out and manage to resume
their normal work.  Many calls--millions of them--were managing
to get through.  But millions weren't.

The switching stations that used System 6 were not directly affected.
Thanks to these old-fashioned switches, AT&T's national system avoided
complete collapse.  This fact also made it clear to engineers that
System 7 was at fault.

Bell Labs engineers, working feverishly in New Jersey, Illinois,
and Ohio, first tried their entire repertoire of standard network
remedies on the malfunctioning System 7.  None of the remedies worked,
of course, because nothing like this had ever happened to any
phone system before.

By cutting out the backup safety network entirely,
they were able to reduce the frenzy of "OK" messages
by about half.  The system then began to recover, as the
chain reaction slowed.  By 11:30 P.M. on Monday January
15, sweating engineers on the midnight shift breathed a
sigh of relief as the last switch cleared-up.

By Tuesday they were pulling all the brand-new 4ESS software
and replacing it with an earlier version of System 7.

If these had been human operators, rather than
computers at work, someone would simply have
eventually stopped screaming.  It would have been
OBVIOUS that the situation was not "OK," and common
sense would have kicked in.  Humans possess common sense--
at least to some extent.  Computers simply don't.

On the other hand, computers can handle hundreds
of calls per second.  Humans simply can't.  If every single
human being in America worked for the phone company,
we couldn't match the performance of digital switches:
direct-dialling, three-way calling, speed-calling, call-
waiting, Caller ID, all the rest of the cornucopia
of digital bounty.  Replacing computers with operators
is simply not an option any more.

And yet we still, anachronistically, expect humans to
be running our phone system.  It is hard for us
to understand that we have sacrificed huge amounts
of initiative and control to senseless yet powerful machines.
When the phones fail, we want somebody to be responsible.
We want somebody to blame.

When the Crash of January 15 happened, the American populace
was simply not prepared to understand that enormous landslides
in cyberspace, like the Crash itself, can happen,
and can be nobody's fault in particular.  It was easier to believe,
maybe even in some odd way more reassuring to believe,
that some evil person, or evil group, had done this to us.
"Hackers" had done it.  With a virus.  A trojan horse.
A software bomb.  A dirty plot of some kind.  People believed this,
responsible people.  In 1990, they were looking hard for evidence
to confirm their heartfelt suspicions.

And they would look in a lot of places.

Come 1991, however, the outlines of an apparent new reality
would begin to emerge from the fog.

On July 1 and 2, 1991, computer-software collapses
in telephone switching stations disrupted service in
Washington DC, Pittsburgh, Los Angeles and San Francisco.
Once again, seemingly minor maintenance problems had
crippled the digital System 7.  About twelve million
people were affected in the Crash of July 1, 1991.

Said the New York Times Service:  "Telephone company executives
and federal regulators said they were not ruling out the possibility
of sabotage by computer hackers, but most seemed to think the problems
stemmed from some unknown defect in the software running the networks."

And sure enough, within the week, a red-faced software company,
DSC Communications Corporation of Plano, Texas, owned up
to "glitches" in the "signal transfer point" software that
DSC had designed for Bell Atlantic and Pacific Bell.
The immediate cause of the July 1 Crash was a single
mistyped character:  one tiny typographical flaw
in one single line of the software.  One mistyped letter,
in one single line, had deprived the nation's capital of phone service.
It was not particularly surprising that this tiny flaw had escaped attention:
a typical System 7 station requires TEN MILLION lines of code.

On Tuesday, September 17, 1991, came the most spectacular outage yet.
This case had nothing to do with software failures--at least, not directly.
Instead, a group of AT&T's switching stations in New York City had simply
run out of electrical power and shut down cold.  Their back-up batteries
had failed.  Automatic warning systems were supposed to warn of the loss
of battery power, but those automatic systems had failed as well.

This time, Kennedy, La Guardia, and Newark airports
all had their voice and data communications cut.
This horrifying event was particularly ironic, as attacks
on airport computers by hackers had long been a standard
nightmare scenario, much trumpeted by computer-security
experts who feared the computer underground.  There had even
been a Hollywood thriller about sinister hackers ruining
airport computers--DIE HARD II.

Now AT&T itself had crippled airports with computer malfunctions--
not just one airport, but three at once, some of the busiest in the world.

Air traffic came to a standstill throughout the Greater New York area,
causing more than 500 flights to be cancelled, in a spreading wave
all over America and even into Europe.  Another 500 or so flights
were delayed, affecting, all in all, about 85,000 passengers.
(One of these passengers was the chairman of the Federal
Communications Commission.)

Stranded passengers in New York and New Jersey were further
infuriated to discover that they could not even manage to
make a long distance phone call, to explain their delay
to loved ones or business associates.  Thanks to the crash,
about four and a half million domestic calls, and half a million
international calls, failed to get through.

The September 17 NYC Crash, unlike the previous ones,
involved not a whisper of "hacker" misdeeds.  On the contrary,
by 1991, AT&T itself was suffering much of the vilification
that had formerly been directed at hackers.  Congressmen were grumbling.
So were state and federal regulators.  And so was the press.

For their part, ancient rival MCI took out snide full-page
newspaper ads in New York, offering their own long-distance
services for the "next time that AT&T goes down."

"You wouldn't find a classy company like AT&T using such advertising,"
protested AT&T Chairman Robert Allen, unconvincingly.  Once again,
out came the full-page AT&T apologies in newspapers, apologies for
"an inexcusable culmination of both human and mechanical failure."
(This time, however, AT&T offered no discount on later calls.
Unkind critics suggested that AT&T were worried about setting any precedent
for refunding the financial losses caused by telephone crashes.)

Industry journals asked publicly if AT&T was "asleep at the switch."
The telephone network, America's purported marvel of high-tech reliability,
had gone down three times in 18 months.  Fortune magazine listed the
Crash of September 17 among the "Biggest Business Goofs of 1991,"
cruelly parodying AT&T's ad campaign in an article entitled
"AT&T Wants You Back (Safely On the Ground, God Willing)."

Why had those New York switching systems simply run out of power?
Because no human being had attended to the alarm system.
Why did the alarm systems blare automatically,
without any human being noticing?  Because the three
telco technicians who SHOULD have been listening
were absent from their stations in the power-room,
on another floor of the building--attending a training class.
A training class about the alarm systems for the power room!

"Crashing the System" was no longer "unprecedented" by late 1991.
On the contrary, it no longer even seemed an oddity.  By 1991,
it was clear that all the policemen in the world could no longer
"protect" the phone system from crashes.  By far the worst crashes
the system had ever had, had been inflicted, by the system,
upon ITSELF.  And this time nobody was making cocksure statements
that this was an anomaly, something that would never happen again.
By 1991 the System's defenders had met their nebulous Enemy,
and the Enemy was--the System.