It is not always possible to know what one has learned, or when the dawning will arrive. You will continue to shift, sift, to shake out and to double back. The synthesis that finally occurs can be in the most unexpected place and the most unexpected time. My charge ... is to be alert to the dawnings.


Position :    Research Assistant at Dept of Electrical and Computer Engineering, University of British Columbia

Research Group : Computer Systems Group

Motive/ Direction : Working on Computer Hardware and Systems Design with Computer and Software Engineering (COSE) Group. My current area of research is on Virtualization of Multi-core architecture and performance evaluation.

Previous Position:  Research Trainee at Waran Research Foundation (WARFT)
with Prof. N Venkateswaran, Founder & Director,  WARFT.(

Research Group : High Performance Architecture Group, VISHWAKARMA

Motive/ Direction : Grand challenge applications of the future will require processing power having orders of  magnitude speedup greater, than those offered by current supercomputers. The current approaches of integrating off-the-shelf components to build massively parallel computers have their own intrinsic drawbacks, some of which stem from the fact that the Von-Neumann Bottleneck will become starker at this level. IBM’s BlueGene is the first supercomputing initiative to have made use of the PIM-technology.  The Memory In Processor (MIP) is a novel concept conceived at WARFT. It eliminates bottlenecks that are typical of von Neumann architectures and its many parallel derivatives. This architectural paradigm is suitable for a wide class of massive applications.

Research Projects :

October 2008 – Present : Block IO Scheduling on Virtual Machines for Performance and QoS

October 2008 - April 2010: Virtualization and Cache Architecture

Multi-core processors belong to shared-memory architecture and are evolving into massively parallel machines because of the implication of cache coherence at the application level. The analysis of multi-threaded application impact on shared cache architecture is very important. Initially, the cache is modelled by allocating a portion of it to a particular application and the bottleneck in accessing simultaneous threads which access a same portion of the cache is predicted. Multiple applications and its dependency are resolved and VMM is used to map them on to the architecture which is modified by the original cache modelling technique.

September 2008 – December 2008    : MPI Library for Agitated Particle Motion Simulation (APMS)

Agitated Particle Motion Simulator is a generalized MPI Library for particle motion analysis when there is a change in the state of the any medium. Current particle position and velocity are compared to the previous instants in subsequent time instants in response to disturbance generated in the mass, size, velocity, opposing force by the medium, force of attraction (or) repulsion between other particles, pressure etc. MPI abstraction of a master-manager-slave hierarchy is used to fork the processes using Map-reduce depending on the workload concentration on the medium.

September 2008 – December 2008    : Wait-free Synchronization of Fault tolerant Data Structures

Continuous updates and frequent resource access of the fusible data structure by numerous threads simultaneously makes it lock sensitive due to data starvation. Deadlocks occur when multiple reads or multiple write occurs at the same time, or data corruption when read after write occurs. To avoid this, wait-free synchronization algorithms is used to moderate between these threads and resolve the locking inconsistencies. By evaluating this technique for Lock Servers, Database Servers and Apache Tomcat Web Server, wait-free synchronization improves the performance by 25%.

October 2007 – July 2008     : On Node Network Architecture Simulator – ONNET SIMULATOR

ONNET Simulator is intended to obtain the intricacies of the communication network within the MIP SCOC. The objective is to track and optimize the performance of NoC by simulating and analyzing its functionalities while considering power, performance and reliability of the system. I am currently developing the simulator tool to model the ONNET and extract its performance characteristics. Together with MIPSIM this tool will provide the effective computational and communicational complexity of the MIP Node Architecture.

February 2007 – October 2007    :  Design and Development of Higher Level Functional Units – ALFU Design

This project on the design of algorithm level functional units (ALFU) for MIP SCOC Node Architecture is to design pipelineable and scalable modules in Register Transfer Logic programming. The design will reflect on the latency and clock distribution associated with these modules. Apart from that, actual integration of Memory and Processor is done in this level. Along with my peers, I coded and tested the functional units in HDL and introduced pipelined stages of execution in order to regulate the data and memory transfer among the units.

March 2007 – August 2008   :  Memory In Processor Simulator – MIPSIM

This project is being developed to undertake a functional level simulation of MIP SCOC node architecture. The goal of the project was to extract the performance characteristics of the various combinations of Algorithms mapped on the MIP SCOC. My role is to analyze the utilization of constituent functional units by different algorithms and establish the interactions with Memory and Processor. Apart from that, I am jointly optimizing the simulator by including Hardware Compilers and simulate the architecture for detailed information and data extraction.

October 2006 – August 2008 : Synthetic Benchmark for High Performance Clusters – BENSIM

The purpose of BENSIM is to create user specified benchmarks for supercomputing clusters. It can be used to measure the performance of a cluster over various parameters such as power, FLOPS, ability of the cluster to handle a particular class of problems etc. The synthetic application is assumed to consist of interdependent algorithms (Numeric, Semi-numeric & Non-numeric algorithms). The user can customize the complexity (computational and communication) of the synthetic application. I jointly worked in the design and development of some Algorithms and analyzed its results.