Cloud Computing

Cloud Computing, such as Amazon's EC2, offers many value propositions, including "infinite" computation capacity,  on demand, pay as you go, pay for what you use. Many applications from Accenture's Enterprise clients can benefit from the various aspects of the value propositions. However, we believe one of the most attractive use is outsourcing large seasonal computation demands.  As an example, we have shown through a couple of client examples that we reduced the infrastructure cost by two orders of magnitude by outsourcing to Cloud.  Unfortunately, mapping an Enterprise application to Cloud is not easy because of the differences in infrastructure architecture.  Inspired by Google's MapReduce, we developed GridBatch, a system that makes it easy to port large-scale data-intensive Enterprise batch applications to Cloud.  While MapReduce is designed for web applications (such as word count, computing reverse link etc.), GridBatch is designed for Enterprise data analysis applications. We intend to build more capabilities into GridBatch to free programmer from reimplementing common primitives. Capabilities in 1.0 is described in CCGRID'08[PDF] and 2.0 is described in Cluster'09[PDF].

Another common use of Cloud is to host a web presence. Since we rarely know the traffic demand up front, Cloud provides the ability to dynamically shrink and grow the infrastructure in response to traffic fluctuation. The WebScalar project conducts performance evaluation of cloud and designs an optimal web server farm architecture in the cloud. More details are in CloudCom'09[PDF] and ACM SAC'10[PDF].

Similar to a compute cloud, we have also build a network cloud for internal usage. It shares many attributes as a compute cloud, such as on-demand, pay-per-use, fully-programmable. Full details are described in SIGCOMM WREN'09[PDF].

Tradeoffs between optical and electronic switching

Unlike in the 70's, switching at a node is the bottleneck in today's Internet, not the link capacity. To overcome the switching bottleneck, the Internet backbone transport network will likely employ a combination of electronic and optical switching in order to reduce the total equipment cost. The cost of electronic switching increases as more traffic is switched. On the other hand, the cost of optical switching is independent of the amount of traffic carried in a wavelength. As a result, optical switching should be used when there is a large amount of traffic to be sent, in order to amortize the high cost of establishing all-optical connections, and electronic switching should be used when the traffic is small. We, for the first time, derive the quantitative relationship between optical and electronic switching. We show that their relationship can be described by a power-law and we quantify the power-law exponents for many different types of topologies. Understanding the tradeoff not only allows network designers to choose the best combination of optical and electronic switching, but also allow them to compare many alternative fiber topologies in terms of the switching cost. (Infocom'06[PDF])

Grooming heterogeneous traffic in WDM/SONET rings

SONET remains the dominant transport network in the Metropolitan area. A natural migration path to support traffic growth is to use WDM technology where each wavelength serves as a separate SONET ring. In general, the traffic demand between a pair of nodes is typically small compared to the wavelength capacity. If we could intelligently "groom" the traffic, large savings in electronic equipment can be achieved. For simplicity, most prior work assume only one line speed is available. E.g., you have to install a 10G box even if there is only 1bit/s to send, simply because it is the only available box. We show that large savings in electronic equipment cost can be achieved by having a mix of line speeds. Higher line speed should be used when traffic is large to enjoy the economy of scale, and lower line speed should be used when traffic is small to reduce equipment cost. Given the high cost of deploying a WDM/SONET network, it is very desirable to solve the traffic grooming problem optimally. This is possible even though the traffic grooming problem is clearly NP-complete, because SONET standard limits the network to contain at most 16 nodes. We propose several computation techniques which exploit the problem structure, allowing the traffic grooming problem to be quickly solved to optimum. (JSAC'07[PDF] Infocom'05[PDF], Globecom'05[PDF], ICC'04[PDF])

Designing load-balanced backbone network with performance guarantee

Design an Internet backbone network is hard, mostly because it is hard to predict the traffic matrix at design time and the traffic matrix changes over time. Today's networks are designed in an Ad-hoc fashion, and there is no guarantee that the network can support a traffic matrix if it deviates from the original estimation. The Valiant Load-Balancing architecture is a promising solution as it can support any traffic matrix at a small cost of over-provisioning and higher delay. Even though it is hard to forecast the traffic matrix at design time, it is quite easy to estimate the traffic matrix at operation time. Armed with the current traffic matrix, we could use "direct routing" to overcome one of the short-comings of the Valiant Load-Balancing architecture---the longer delay. We show how to maximize directly routed traffic, hence minimizing delay. We prove that direct routing is not always feasible in the current Valiant Load-Balancing architecture, but a slight increase in the link capacity is sufficient to guarantee the feasibility. (Globecom'05[PDF])

Exploiting dense parallelism in commodity hardware

To achieve high performance, instead of implementing in software, one can design an algorithm into custom ASIC. There are several drawbacks to this approach: 1) it is costly not only because of the custom design, but also because of the typically low volume which cannot effectively amortize the high fixed cost. 2) it is not flexible enough to handle requirement changes, 3) The increase in speed is limited because hardware parallelism is not fully exploited. Another approach is using commodity hardware, such as Ternary Content Addressable Memory (TCAM). There are several advantages of using TCAM: 1) The hardware design is simple and can be easily optimized. TCAM is made up of small cells (e.g. 6 transistors cell), which can be hand optimized for speed, power and area, and this cell can be replicated easily to create a larger chip, 2) being a commodity, the cost could be very low, 3) it has very high parallelism where all bits work in parallel in each cycle. Naturally, there are also disadvantages of using TCAM. For example, the rigid structure means that many applications cannot be mapped to the hardware directly. This problem can be overcome by designing innovative algorithms. We showed how to use TCAM more efficiently (accommodate more routes) to perform route lookup (HOTi'01[PDF] or IEEE Micro'02[PDF]), and we also showed how to use TCAM to perform packet classification, in particular, how to handle range classifiers (HOTi'02[PDF]).

Caching routes in Internet routers

Even though cache is widely used in computer systems, it is hardly ever used in Internet routers and network processors, partly because the hit rate is so low that the additional cost is not warranted. Caching aggregate instead of individual route could greatly increase the hit rate, however, erroneous lookups could result. Exploiting the structure of routing tables, we show how route aggregates can be correctly cached and we show that the hit rate is indeed high. We show that the aggregate route cache can be cheaply built, allowing it to be used in future Routers. (Globecom'02[PDF], ICCCN'01[PDF])

Understanding network traffic

Network processor architects have a considerably easier job than general-purpose CPU architects, because, supposedly, there is an embarrassing amount of parallelism in network traffic. Besides the constraint that all packets in a flow have to be in order, most packets are independent of each other. Therefore, to achieve high throughput, network processor architects can simply glue together hundreds of processing units, each handling a separate flow. We show that the amount of parallelism is actually not that large. Blindly increasing the number of processing units only adds marginal value. Instead, we should exploit parallelism at packet level or even at instruction level. (ICC'02[PDF])