Internet week article featuring me!  :-)

http://www.benaustin.com/press/articles/internetweek/internetweek020501.html 

(Note: I was Shannon Brown throughout my career, as seen below, and now am married and happily Shannon Gavin!)

 

http://www.benaustin.com/press/articles/internetweek/internetweek020501.html

 

 

February 5, 2001

No Risk, No Reward
Site concocts ultrareliable network design on shoestring budget

By CHRISTINE ZIMMERMAN

A novel but potentially risky network design is letting one commercial site deliver Yahoo-like reliability at a bargain-basement price.

Comet Systems Inc.(www.cometsystems.com), a New York company that delivers real-time definitions of terms on Web pages using customized cursors, has tinkered with Cisco's policy-routing software to provide a site-reliability level normally achieved with redundant hardware.

The approach bends key TCP/IP rules, which raises the possibility of dropped packets. Moreover, it relies on primary routers at two locations, and because Comet's budget rules out backups, the failure of one router would create a single point of failure.

Despite those caveats, the company has used this unique implementation since late fall to achieve eye-popping reliability. It's also no performance slouch.

Comet was hot even before last week's launch of its Smart Cursors definition service. The site serves about 6,000 users daily and up to 61 million unique visitors annually. Web performance tracker Keynote Systems said Comet is performing on a par with Yahoo in terms of availability and download times.

Comet's network design demonstrates the value IT can derive from thinking outside the box. "It's fun and interesting trying something new," said Keith Pajonas, Comet's director of information systems. "This isn't something you'll find in the Cisco books."

Pajonas added his own code to Cisco's policy-based routing, a proprietary element of the vendor's Internetworking Operating System, to exert certain controls over the industry-standard border gateway protocol (BGP), which governs communications between routers. Pajonas's code directs Cisco routers at Comet and at GlobalCenter, its backup host, to send packets to both routers simultaneously.

How It Works

Skirting the specifications of TCP/IP, the two Cisco routers share the same IP address. Though packets flow to both routers, only the one Pajonas chooses handles the traffic.

Pajonas said he sets up "blocks" that indicate which router is primary and which is the backup. The policies, in effect, override the routing table that normally governs the operation of the router.

If at any time during the user's connection something happens to the network--whether there's a problem with the ISP, the phone company cuts a cable or a server goes down--Comet's policy routing implementation ensures that traffic goes to the other router.

That all sounds great in theory, but tinkering with core routing technology and protocols is not something to be taken lightly.

"Asking a router to ignore its routing table is not normally done," said Shannon Brown, a networking engineer for Cisco who worked closely with Pajonas. "There is the possibility of lost packets because the router could be confused by a single IP address in more than one location."

Officially, Cisco doesn't recommend this configuration to customers. In fact, Cisco asked InternetWeek to not include its name in this article.

Cisco's reticence may be understandable. "There are rules, and you don't go against those rules," said Frank Dzubeck, president of consulting firm Communications Network Architects.

Policy Routing

Policy routing is in its infant stages, Dzubeck said. "Everybody uses BGP to connect networks to each other," he noted. "That's not unique. Adding in policy-based routing offers some optimization, but then management and scalability are in question."

Pajonas acknowledged that requiring a router to apply such policies drains some router CPU cycles. As traffic grows, the problem exacerbates.

Comet can, however, offer tangible proof that its approach is working. On Oct. 31, Verizon cut all Internet connections into Comet's New York facility. In that instance, BGP and policy routing automatically redirected traffic from Comet's router to the GlobalCenter router over the T1 link between the facilities, and customers didn't experience any service interruptions. The Comet server at GlobalCenter's facility is mirrored so the latest updates are available to customers even when the primary site is down. Comet's main ISP, Savvis Communications, was up by that evening, so Pajonas reset the policy to direct all traffic back to Comet through Savvis.

GlobalCenter had an outage later that day, a coincidence that illustrates an important pitfall of Pajonas's approach: Having only two routers leaves Comet open to trouble. If one router fails, the network is reduced to a single point of failure, at least until the primary router is restored.

Why It's Good

But for such a cost-conscious customer as Comet, the solution Pajonas developed works.

The company spent about $75,000 on its two Cisco routers and two RadWare Web Server Director Pro load balancers. Without BGP and policy routing, Pajonas estimated, Comet would have needed to double that investment to buy redundant hardware.

The flexibility of controlling packet flow through policy routing enables Pajonas to minimize use of GlobalCenter.

Comet pays $20,000 for 20 Mbps and $1,500 for each Mbps beyond that. The new system will enable the company to control costs by switching from GlobalCenter to Savvis when Comet nears the 20-Mbps limit.

The company's savings have gone right into product development, said Ben Austin, Comet's director of marketing.

Keeping Up With Traffic

Pajonas said he is confident that the network can handle the traffic that is now starting to build as a result of the Smart Cursors announcement.

What may be most impressive is that Comet has built this redundant architecture while performing with the best sites on the Web, delivering an average of 21,600 bytes per second, according to Keynote Systems.

Keynote took measurements specifically for this story and found that it took less than a second to download the first Comet page, while the page's availability was 99.85 percent.

"That's pretty impressive," said Dan Todd, Keynote's chief technologist of public services. "They're right there with the leaders like Yahoo."

Just don't try Comet's approach at home.