Faraz Ahmad
Software Engineer (Teradata) 2055 Laurelwood Rd, Suite 100
Santa Clara, CA 95054
Phone: (765) 491-0424
Email: faraz.ahmad@gmail.com
I received my Ph.D. in the Department of Electrical and Computer Engg. at Purdue University, working in Computer Systems Architecture group under my advisor T. N. Vijaykumar. I got my Bachelor's degree from the University of Engineering and Technology, Lahore, Pakistan.
My research interests include cloud computing, data mining, statistical analytics, data center architectures, big-data and energy-aware computing, distributed systems and computer architecture. My current research focuses on analytical models for big-data analytics.
My past projects during my graduation include ShuffleWatcher, Shuffle-aware scheduling in multi-tenant MapReduce clusters (Usenix ATC 2014), Tarazu, optimizing MapReduce On Heterogeneous Clusters (ASPLOS 2012), PowerTrade, a joint optimization of idle power and cooling power to reduce overall data center power (ASPLOS 2010), and MaRCO, a runtime performance optimization for MapReduce, the well-known programming model for large-volume data analysis in data centers (Tech Report 2007). During this work, I also developed a benchmark suite for MapReduce (details below). I have also worked on providing architecture support for debugging multithreaded programming in multicores (TimeTraveler, ISCA 2010).
Publications
ShuffleWatcher: Shuffle-aware scheduling in multi-tenant MapReduce clusters
Faraz Ahmad, Srimat T. Chakradhar, Anand Raghunathan, T.N. Vijaykumar.
In Proceedings of Usenix ATC 2014, Philadelphia, PA, June 19-20, 2014. Acceptance Rate: 44/245 (18%).
Tarazu: Optimizing MapReduce on Heterogeneous Clusters
Faraz Ahmad, Srimat T. Chakradhar, Anand Raghunathan, T.N. Vijaykumar.
In Proceedings of ASPLOS 2012, London, UK, March 3-7, 2012. Acceptance Rate: 37/172 (21%).
MapReduce with Communication Overlap (MaRCO)
Faraz Ahmad, Seyong Lee, Mithuna Thottethodi, T. N. Vijaykumar.
In the Journal of Parallel and Distributed Computing (JPDC), 2012. Technical Report, Purdue ECE Tech Report TR-ECE-07-11.
Joint Optimization of Idle and Cooling Power in Data Centers while Maintaining Response Time
Faraz Ahmad, T.N. Vijaykumar.
In Proceedings of ASPLOS 2010, Pittsburgh, PA, March 13-17, 2010. Acceptance Rate: 32/181 (18%).
Timetraveler: Exploiting Acyclic races for optimizing Memory Race Recording
Gwendolyn Voskuilen, Faraz Ahmad, T.N. Vijaykumar.
In Proceedings of ISCA 2010, Saint-Malo, France, June 19-23, 2010. Acceptance Rate: 44/245 (18%)
PUMA: Purdue MapReduce Benchmarks Suite
Faraz Ahmad, Seyong Lee, Mithuna Thottethodi, T. N. Vijaykumar.
Technical Report, Purdue ECE Tech Report TR-ECE-12-11.
PUMA: MapReduce Benchmarks
MapReduce is a well-known programming model, developed within Google, for processing large amounts of raw data, for example, crawled documents or web request logs. This data is usually so large that it must be distributed across thousands of machines in order to be processed in a reasonable time. The ease of programmability, automatic data management and transparent fault tolerance has made MapReduce a favorable choice for large-scale data centers batch processing. Map, written by a user of the MapReduce library, takes an input pair and produces a set of intermediate key/value pairs. The library groups together all intermediate values associated with the same intermediate key and passes them to the reduce function through an all-map-to-all-reduce communication called Shuffle. Reduce, also written by the user, receives intermediate key along with a set of values from Map and merges together these values to produce the final output. Hadoop is an open-source implementation of MapReduce which is being improved and developed regularly by software developers / researchers and is maintained by Apache Software Foundation. Despite being vast efforts on the development of Hadoop MapReduce, there has not been a very rigorous work done on the benchmarks side.
During our work on MapReduce, we developed a benchmark suite which represents a broad range of MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes. The details of applications, their code (compatible with Hadoop-0.20 and Hadoop-1.0.0), and details about input datasets can be found below.
Courses
ECE 565: Computer Architecture
ECE 666: Advanced Computer Systems
CS 503: Operating Systems
CS 525: Parallel Computing
ECE 573: Compiler and Translator Writing Systems
ECE 608: Computational Models and Methods.
ECE 600: Random Variables and Signals
ECE 568: Embedded Systems Design
ECE 559: MOS VLSI Design
ECE 629: Introduction to Neural Networks
MA 528: Complex Variables and Vector Calculus
MA 532: Stochastic Processes
Awards
Magoon Teaching Award for excellence in teaching by College of Engineering, Purdue
University. 2008-2009.
Higher Education Scholarship for Overseas Education by National R&D Fund, Pakistan.
All University Merit Scholarship for being among the top 3% students by University of
Engineering and Technology, Lahore, Pakistan.